Evaluating the utility of AI-generated mammography images in breast cancer analysis.
Authors
Affiliations (10)
Affiliations (10)
- Department of Pathology and Laboratory Medicine, Indiana University School of Medicine, Indianapolis, IN, USA.
- Indiana University Simon Comprehensive Cancer Center, Indianapolis, IN, USA.
- Department of Computer Science, Kingston University London, London, UK.
- Mallinckrodt Institute of Radiology, Washington University in St. Louis, St. Louis, MO, USA.
- Department of Pathology and Laboratory Medicine, Indiana University School of Medicine, Indianapolis, IN, USA. [email protected].
- Indiana University Simon Comprehensive Cancer Center, Indianapolis, IN, USA. [email protected].
- Department of Computer Science, Kingston University London, London, UK. [email protected].
- Department of Computer Science, Luddy School of Informatics, Computing, and Engineering, Indiana University, Indianapolis, IN, USA. [email protected].
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA. [email protected].
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, USA. [email protected].
Abstract
Generative AI models are increasingly used to address data scarcity and class imbalance in biomedical image analysis, yet their practical value depends on downstream clinical task improvements rather than visual fidelity alone. To gain insight into their downstream utility, we conduct a systematic evaluation of class-conditional AI-generated images and their impact when integrated into a breast cancer classification pipeline. We develop a classifier-free guided denoising diffusion probabilistic model (DDPM) to generate benign and malignant full-image mammograms under varying inference configurations. Model training is based on digital breast tomosynthesis and 2D digital mammograms from the Emory Breast Imaging Dataset. DDPM guidance scales for inference are chosen based on image fidelity and class separability, as evaluated by Fréchet Inception Distance and a pre-trained Oracle classifier, respectively. We also introduce a two-phase training strategy, where models get pretrained on real data augmented with AI-generated data followed by fine-tuning on real data only. Quantitative performance evaluation for guidance strength, AI-generated-to-real data proportion, and training strategy impact, is based on Balanced Accuracy, Sensitivity, and Specificity. Our findings indicate lack of consistent performance gains for AI-generated images used to mitigate data scarcity. Any gains are sensitive to guidance scale, liable to degrade when AI-generated data replaces real beyond a proportion. However, our two-phase training strategy yields consistent improvements, when compared with real-data-only baseline and fixed guidance scales. This suggests that AI-generated data cannot replace real clinical data, but can serve as a useful transfer learning signal when carefully integrated into the training pipelines.