Unconditional latent diffusion models memorize patient imaging data.

Authors

Dar SUH,Seyfarth M,Ayx I,Papavassiliu T,Schoenberg SO,Siepmann RM,Laqua FC,Kahmann J,Frey N,Baeßler B,Foersch S,Truhn D,Kather JN,Engelhardt S

Affiliations (16)

  • Department of Internal Medicine III, Heidelberg University Hospital, Heidelberg, Germany. [email protected].
  • Heidelberg Faculty of Medicine, Heidelberg University, Heidelberg, Germany. [email protected].
  • AI Health Innovation Cluster (AIH), Heidelberg, Germany. [email protected].
  • German Centre for Cardiovascular Research (DZHK), Partner Site Heidelberg/Mannheim, Heidelberg, Germany. [email protected].
  • Department of Internal Medicine III, Heidelberg University Hospital, Heidelberg, Germany.
  • Heidelberg Faculty of Medicine, Heidelberg University, Heidelberg, Germany.
  • German Centre for Cardiovascular Research (DZHK), Partner Site Heidelberg/Mannheim, Heidelberg, Germany.
  • Department of Radiology and Nuclear Medicine, University Medical Center Mannheim, Mannheim, Germany.
  • AI Health Innovation Cluster (AIH), Heidelberg, Germany.
  • Department of Cardiology, Angiology, Hemostasis, and Medical Intensive Care, University Medical Centre Mannheim, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany.
  • Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany.
  • Institute for Diagnostic and Interventional Radiology, University Hospital Würzburg, Würzburg, Germany.
  • Institute of Pathology, University Medical Center Mainz, Mainz, Germany.
  • Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany.
  • Department of Medicine I, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany.
  • Medical Oncology, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany.

Abstract

Generative artificial intelligence models facilitate open-data sharing by proposing synthetic data as surrogates of real patient data. Despite the promise for healthcare, some of these models are susceptible to patient data memorization, where models generate patient data copies instead of novel synthetic samples, resulting in patient re-identification. Here we assess memorization in unconditional latent diffusion models by training them on a variety of datasets for synthetic data generation and detecting memorization with a self-supervised copy detection approach. We show a high degree of patient data memorization across all datasets, with approximately 37.2% of patient data detected as memorized and 68.7% of synthetic samples identified as patient data copies. Latent diffusion models are more susceptible to memorization than autoencoders and generative adversarial networks, and they outperform non-diffusion models in synthesis quality. Augmentation strategies during training, small architecture size and increasing datasets can reduce memorization, while overtraining the models can enhance it. These results emphasize the importance of carefully training generative models on private medical imaging datasets and examining the synthetic data to ensure patient privacy.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.