Improving discriminative ability in mammographic microcalcification classification using deep learning: a novel double transfer learning approach validated with an explainable artificial intelligence technique
Authors
Affiliations (1)
Affiliations (1)
- University of Helsinki and Helsinki University Hospital
Abstract
BackgroundBreast microcalcification diagnostics are challenging due to their subtle presentation, overlapping with benign findings, and high inter-reader variability, often leading to unnecessary biopsies. While deep learning (DL) models - particularly deep convolutional neural networks (DCNNs) - have shown potential to improve diagnostic accuracy, their clinical application remains limited by the need for large annotated datasets and the "black box" nature of their decision-making. PurposeTo develop and validate a deep learning model (DCNN) using a double transfer learning (d-TL) strategy for classifying suspected mammographic microcalcifications, with explainable AI (XAI) techniques to support model interpretability. Material and methodsA retrospective dataset of 396 annotated regions of interest (ROIs) from full-field digital mammography (FFDM) images of 194 patients who underwent stereotactic vacuum-assisted biopsy at the Womens Hospital radiological department, Helsinki University Hospital, was collected. The dataset was randomly split into training and test sets (24% test set, balanced for benign and malignant cases). A ResNeXt-based DCNN was developed using a d-TL approach: first pretrained on ImageNet, then adapted using an intermediate mammography dataset before fine-tuning on the target microcalcification data. Saliency maps were generated using Gradient-weighted Class Activation Mapping (Grad-CAM) to evaluate the visual relevance of model predictions. Diagnostic performance was compared to a radiologists BI-RADS-based assessment, using final histopathology as the reference standard. ResultsThe ensemble DCNN achieved an area under the ROC curve (AUC) of 0.76, with 65% sensitivity, 83% specificity, 79% positive predictive value (PPV), and 70% accuracy. The radiologist achieved an AUC of 0.65 with 100% sensitivity but lower specificity (30%) and PPV (59%). Grad-CAM visualizations showed consistent activation of the correct ROIs, even in misclassified cases where confidence scores fell below the threshold. ConclusionThe DCNN model utilizing d-TL achieved performance comparable to radiologists, with higher specificity and PPV than BI-RADS. The approach addresses data limitation issues and may help reduce additional imaging and unnecessary biopsies.