Generating Brain MRI with StyleGAN2-ADA: The Effect of the Training Set Size on the Quality of Synthetic Images.
Authors
Affiliations (5)
Affiliations (5)
- Department of Electrical, Electronic, and Information Engineering "Guglielmo Marconi" - DEI, University of Bologna, 47522, Cesena, Italy.
- Experimental and Clinical Biomedical Sciences "Mario Serio", University of Florence, 50139, Florence, Italy.
- Unit of Radiology, Azienda USL Toscana Nord Ovest, Apuane Hospital, 54100, Massa, Italy.
- Department of Electrical, Electronic, and Information Engineering "Guglielmo Marconi" - DEI, University of Bologna, 47522, Cesena, Italy. [email protected].
- Alma Mater Research Institute for Human-Centered Artificial Intelligence, University of Bologna, 40121, Bologna, Italy. [email protected].
Abstract
The potential of deep learning for medical imaging is often constrained by limited data availability. Generative models can unlock this potential by generating synthetic data that reproduces the statistical properties of real data while being more accessible for sharing. In this study, we investigated the influence of training set size on the performance of a state-of-the-art generative adversarial network, the StyleGAN2-ADA, trained on a cohort of 3,227 subjects from the OpenBHB dataset to generate 2D slices of brain MR images from healthy subjects. The quality of the synthetic images was assessed through qualitative evaluations and state-of-the-art quantitative metrics, which are provided in a publicly accessible repository. Our results demonstrate that StyleGAN2-ADA generates realistic and high-quality images, deceiving even expert radiologists while preserving privacy, as it did not memorize training images. Notably, increasing the training set size led to slight improvements in fidelity metrics. However, training set size had no noticeable impact on diversity metrics, highlighting the persistent limitation of mode collapse. Furthermore, we observed that diversity metrics, such as coverage and β-recall, are highly sensitive to the number of synthetic images used in their computation, leading to inflated values when synthetic data significantly outnumber real ones. These findings underscore the need to carefully interpret diversity metrics and the importance of employing complementary evaluation strategies for robust assessment. Overall, while StyleGAN2-ADA shows promise as a tool for generating privacy-preserving synthetic medical images, overcoming diversity limitations will require exploring alternative generative architectures or incorporating additional regularization techniques.