Contrast-enhanced mammography-based interpretable machine learning model for the prediction of the molecular subtype breast cancers.
Authors
Affiliations (3)
Affiliations (3)
- Department of Radiology, Nanfang Hospital, Southern Medical University, Guangzhou, Guangdong, 510515, China.
- Department of Radiology, Nanfang Hospital, Southern Medical University, Guangzhou, Guangdong, 510515, China. [email protected].
- Department of Radiology, Nanfang Hospital, Southern Medical University, Guangzhou, Guangdong, 510515, China. [email protected].
Abstract
This study aims to establish a machine learning prediction model to explore the correlation between contrast-enhanced mammography (CEM) imaging features and molecular subtypes of mass-type breast cancer. This retrospective study included women with breast cancer who underwent CEM preoperatively between 2018 and 2021. We included 241 patients, which were randomly assigned to either a training or a test set in a 7:3 ratio. Twenty-one features were visually described, including four clinical features and seventeen radiological features, these radiological features which extracted from the CEM. Three binary classifications of subtypes were performed: Luminal vs. non-Luminal, HER2-enriched vs. non-HER2-enriched, and triple-negative (TNBC) vs. non-triple-negative. A multinomial naive Bayes (MNB) machine learning scheme was employed for the classification, and the least absolute shrink age and selection operator method were used to select the most predictive features for the classifiers. The classification performance was evaluated using the area under the receiver operating characteristic curve. We also utilized SHapley Additive exPlanation (SHAP) values to explain the prediction model. The model that used a combination of low energy (LE) and dual-energy subtraction (DES) achieved the best performance compared to using either of the two images alone, yielding an area under the receiver operating characteristic curve of 0.798 for Luminal vs. non-Luminal subtypes, 0.695 for TNBC vs. non-TNBC, and 0.773 for HER2-enriched vs. non-HER2-enriched. The SHAP algorithm shows that "LE_mass_margin_spiculated," "DES_mass_enhanced_margin_spiculated," and "DES_mass_internal_enhancement_homogeneous" have the most significant impact on the model's performance in predicting Luminal and non-Luminal breast cancer. "mass_calcification_relationship_no," "calcification_ type_no," and "LE_mass_margin_spiculated" have a considerable impact on the model's performance in predicting HER2 and non-HER2 breast cancer. The radiological characteristics of breast tumors extracted from CEM were found to be associated with breast cancer subtypes in our study. Future research is needed to validate these findings.