Visualizing Radiologic Connections: An Explainable Coarse-to-Fine Foundation Model with Multiview Mammograms and Associated Reports.
Authors
Affiliations (9)
Affiliations (9)
- GROW School for Oncology and Developmental Biology, Maastricht University Medical Centre, P. Debyelaan 25, 6202 AZ, Maastricht, the Netherlands.
- Department of Radiology, the Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, the Netherlands.
- Department of Radiology and Nuclear Medicine, Radboud University Medical Centre, Geert Grooteplein 10, 6525 GA, Nijmegen, the Netherlands.
- Department of Biomedical Informatics, Harvard Medical School, Boston, USA.
- Department of Diagnostic Imaging, Oncological Radiotherapy and Hematology, Fondazione Policlinico Universitario" A. Gemelli" IRCCS, Rome, Italy.
- Department of Diagnostic and Interventional Radiology, University of Tübingen, Tübingen, Germany.
- Department of Computer Science, The University of Hong Kong, Pokfulam, Hong Kong.
- Department of Radiation Oncology, the Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, the Netherlands.
- Faculty of Applied Sciences, Macao Polytechnic University, 999078, Macao, China.
Abstract
Purpose To develop a foundational pretraining method for digital mammography that extracts fine-grained visual-language representations from images and reports in label-limited settings. Materials and Methods A multiview mammogram-report pretraining framework for automated breast cancer analysis was developed using retrospectively collected data from January 2010 to December 2020. This framework provides visual explanations of the model's learning was proposed, allowing researchers to "visualize what you learn." The abnormality-aware technique was tailored to mammogram characteristics of dense fibroglandular tissue. The proposed framework was evaluated on downstream tasks from four external medical centers, involving label-efficient abnormality recognition in mammograms, including malignancy classification, segmentation, and localization. Statistical analyses were performed using DeLong's test and paired <i>t</i> test for area under the receiver operating characteristic curve and Dice scores, respectively. Results The visualization results, including abnormality-enhanced mammograms and abnormality-awareness maps, could explain that the developed model successfully captures relationships between multiview mammograms and corresponding reports. This reduces the false positives for breast cancer by 37% and enables zero-shot abnormality segmentation. Furthermore, the developed model consistently outperformed existing approaches in fine-tuning for both malignancy classification (area under the receiver operating characteristic curve, INbreast: 0.90 vs 0.78 [<i>P</i> < .001]; CBIS-DDSM: 0.85 vs 0.79 [<i>P</i> < .01]; CMMD: 0.85 vs 0.78 [<i>P</i> < .001]; and CSAW-CC: 0.86 vs 0.77 [<i>P</i> < .001]) and segmentation/localization (Dice score, INbreast: 0.75 vs 0.63 [<i>P</i> < .001]; CBISDDSM: 0.76 vs 0.61 [<i>P</i> < .001]). Conclusion The proposed framework enhances interpretability and fine-grained multimodal foundational learning for multiview mammograms and reports. ©RSNA, 2025.