Counterfactual Reasoning for Mammogram Classification via Semantic Texture Masking.
Authors
Affiliations (4)
Affiliations (4)
- Department of Computer Science and Engineering, Manipal University Jaipur, Jaipur, Rajasthan, 303007, India.
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA, 15213, USA.
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA, 15213, USA. [email protected].
- Department of Industrial Engineering, University of Pittsburgh, Pittsburgh, PA, 15213, USA. [email protected].
Abstract
Artificial intelligence-based computer-aided diagnosis (CADx) systems have seen growing adoption in mammography, yet the limited interpretability of their decision-making processes remains a barrier to clinical trust. The present study aimed to investigate whether deep learning classifiers primarily rely on the characteristics of lesions or the surrounding breast tissue through a counterfactual reasoning, specifically using semantic masking in mammogram texture. We modified a part of mammograms by selectively removing texture information from lesion (foreground, FG) or non-lesion (background, BG) regions, replacing it with the mean image intensity, resulting in four scenarios involving benign and malignant foreground or background alterations. MobileNet, ResNet50, and ResNet50v2 were trained and evaluated on the CBIS-DDSM dataset; the area under the ROC curve (AUC) was used for assessing classification performance. All models had similar performance (AUCs = 0.74, 0.72, and 0.78, pairwise p-value > 0.05) on the original unaltered test set. Performance results differed dramatically under the above four masking scenarios: ResNet50 went completely wrong (AUC = 0.20, p-value < 0.0001) when malignant background information was removed, proving strong dependence on background context and difficulty focusing on subtle lesion features, while ResNet50v2 showed improved robustness (albeit its performance was severely impacted) for the same changes (AUC = 0.53, p-value < 0.0001), suggesting better preservation of lesion-level information. MobileNet was relatively stable across all masking scenarios, indicating robustness to region-specific changes. Understanding such region-specific dependencies can enhance model interpretability and support the development of more robust and reliable CADx systems for clinical use.