Synthetic Versus Classic Data Augmentation: Impacts on Breast Ultrasound Image Classification.
Authors
Abstract
The effectiveness of deep neural networks (DNNs) for the ultrasound image analysis depends on the availability and accuracy of the training data. However, the large-scale data collection and annotation, particularly in medical fields, is often costly and time consuming, especially when healthcare professionals are already burdened with their clinical responsibilities. Ensuring that a model remains robust across different imaging conditions-such as variations in ultrasound devices and manual transducer operation-is crucial in the ultrasound image analysis. The data augmentation is a widely used solution, as it increases both the size and diversity of datasets, thereby enhancing the generalization performance of DNNs. With the advent of generative networks such as generative adversarial networks (GANs) and diffusion-based models, the synthetic data generation has emerged as a promising augmentation technique. However, comprehensive studies comparing classic and generative method-based augmentation methods are lacking, particularly in ultrasound-based breast cancer imaging, where variability in breast density, tumor morphology, and operator skill poses significant challenges. This study aims to compare the effectiveness of classic and generative network-based data augmentation techniques in improving the performance and robustness of breast ultrasound image classification models. Specifically, we seek to determine whether the computational intensity of generative networks is justified in data augmentation. This analysis will provide valuable insights into the role and benefits of each technique in enhancing the diagnostic accuracy of DNN for breast cancer diagnosis. The code for this work will be available at: ht.tps://github.com/yasamin-med/SCDA.git.