Back to all papers

Mammo-Bench: A Large-scale Benchmark Dataset of Mammography Images

October 14, 2025medrxiv logopreprint

Authors

Bhole, G.,Suseela, S.,Parekh, N.

Affiliations (1)

  • International Institute of Information Technology Hyderabad

Abstract

Breast cancer remains a significant global health concern, and machine learning algorithms and computer-aided detection systems have shown great promise in enhancing the accuracy and efficiency of mammography image analysis. However, there is a critical need for large, benchmark datasets for training deep learning models for breast cancer detection. In this work we developed Mammo-Bench, a large-scale benchmark dataset of mammography images, by collating data from six well-curated resources, viz., DDSM, INbreast, KAU-BCMD, CMMD, CDD-CESM and DMID. To ensure consistency across images from diverse sources while preserving clinically relevant features, a preprocessing pipeline that includes breast segmentation, pectoral muscle removal, and intelligent cropping is proposed. The dataset consists of 19,731 high-quality mammographic images from 6,500 patients across 6 countries and is one of the largest open-source mammography databases to the best of our knowledge. To show the efficacy of training on the large dataset, performance of ResNet101 architecture was evaluated on Mammo-Bench and the results compared by training independently on a few member datasets and an external dataset, VinDr-Mammo. An accuracy of 78.8% (with data augmentation of the minority classes) and 77.8% (without data augmentation) was achieved on the proposed benchmark dataset, compared to the other datasets for which accuracy varied from 25 - 69%. Noticeably, improved prediction of the minority classes is observed with the Mammo-Bench dataset. These results establish baseline performance and demonstrate Mammo-Benchs utility as a comprehensive resource for developing and evaluating mammography analysis systems.

Topics

radiology and imaging

Ready to Sharpen Your Edge?

Subscribe to join 7,600+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.