Back to all papers

Evaluating data heterogeneity's impact on convolutional neural network performance in medical imaging.

April 21, 2026pubmed logopapers

Authors

Valen J,Yang L,Levman J,Jafarpisheh N,Wang C,Tyrrell PN

Affiliations (6)

  • Department of Medical Imaging, University of Toronto, Toronto, ON, M5T 1W7, Canada.
  • Department of Computer Science, St. Francis Xavier University, Antigonish, NS, Canada.
  • Nova Scotia Health Authority, Halifax, NS, Canada.
  • Department of Medical Imaging, University of Toronto, Toronto, ON, M5T 1W7, Canada. [email protected].
  • Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada. [email protected].
  • Institute of Medical Science, University of Toronto, Toronto, ON, Canada. [email protected].

Abstract

Machine learning in medical imaging (MIML) is critical to computer-aided diagnostics. However, data heterogeneity-variation in medical data across sources and conditions-remains underexplored, despite its impact on model generalizability. This study investigates how data heterogeneity influences the performance of convolutional neural networks (CNNs), providing novel insights into optimizing model reliability and clinical applicability across diverse imaging datasets. Four medical imaging datasets representing different pathologies were used to evaluate heterogeneity's effect, with a fifth neuroimaging dataset included for exploratory analysis. CNN extracted features and clustering identified internal data groupings. Model performance on these clusters was evaluated using k-fold cross-validation. Inter-cluster distances were measured to approximate heterogeneity and compared against random clusters. Accuracy, F1 score, and the coefficient of variation of accuracy (CVA) were used as primary performance indicators across training set sizes. Increased inter-cluster distance generally corresponded with lower model performance and higher variability. As training set size increased, inter-cluster distance decreased, and accuracy and F1 score improved. Data augmentation was associated with reduced inter-cluster distance in some settings but did not significantly improve performance. Clusters based on CNN-derived features showed differences in performance variability relative to random clusters, suggesting structured variance related to feature-space organization. Our findings highlight the critical role of addressing data heterogeneity in medical imaging. Larger training sets and feature-driven clustering improve model robustness and consistency. The study emphasizes that explicitly modeling heterogeneity can lead to more generalizable and clinically reliable MIML systems. Future work should focus on scalable approaches to heterogeneity quantification and mitigation in real-world clinical settings.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.