Radiological and biological dictionary of radiomics features: addressing understandable AI issues in personalized breast cancer; dictionary version BM1.0.
Authors
Affiliations (7)
Affiliations (7)
- Department of Neuroscience, Hamadan University of Medical Sciences, QFQQ+W7C Daneshgah-e-Bu Ali Sina, Hamedan, 6517838678, Iran (the Islamic Republic of).
- Department of Neuroscience, Hamadan University of Medical Sciences, QFQQ+W7C Daneshgah-e-Bu Ali Sina, Hamadan, Hamadan Province, 6517838678, Iran (the Islamic Republic of).
- Department of Computer Engineering, Amirkabir University of Technology, Hafez Ave, Valiasr Square, Tehran, Iran, Tehran, 1591634311, Iran (the Islamic Republic of).
- Department of Integrative Oncology, Motamed Cancer Institute, No. 146, South Gandi St., Vanak Sq., Tehran, 1517964311, Iran (the Islamic Republic of).
- Department of Radiology, The University of British Columbia, 6200 University Blvd, Vancouver, British Columbia, V6T 1Z4, CANADA.
- Radiology and Physics, The University of British Columbia, 6200 University Blvd, Vancouver, British Columbia, V6T 1Z4, CANADA.
- Department of Radiology, The University of British Columbia - Vancouver Campus, 6200 University Blvd, Vancouver, British Columbia, V6T 1Z4, CANADA.
Abstract
Radiomics-based AI models show potential in breast cancer diagnosis but lack interpretability. This study bridges the gap between radiomic features (RF) and BI-RADS descriptors through a clinically interpretable framework.
Methods: We developed a dual-dictionary approach. First, a Clinical Mapping Dictionary (CMD) was constructed by mapping 56 RFs to BI-RADS descriptors (shape, margin, internal enhancement) based on literature and expert review. Second, we applied this framework to a classification task to predict triple-negative (TNBC) versus non-TNBC subtypes using Dynamic Contrast-Enhanced MRI data from a multi-institutional cohort of 1,549 patients. We trained 27 machine learning classifiers with 27 feature selection methods. Using SHapley Additive exPlanations (SHAP), we interpreted the model's predictions and developed a Statistical Mapping Dictionary (SMD) for 51 RFs, not included in the CMD.
Results: The best-performing model (Variance Inflation Factor feature selector+Extra Trees Classifier) achieved an average cross-validation accuracy of 0.83±0.02. Our dual-dictionary approach successfully translated predictive RFs into understandable clinical concepts. For example, higher values of 'Sphericity', corresponding to a round/oval shape, were predictive of TNBC. Similarly, lower values of 'Busyness', indicating more homogeneous internal enhancement, were also associated with TNBC, aligning with existing clinical observations. This framework confirmed known imaging biomarkers and identified novel, data-driven quantitative features.
Conclusion: This study introduces a novel dual-dictionary framework (BM1.0) that bridges RFs and the BI-RADS clinical lexicon. By enhancing the interpretability and transparency of AI models, the framework supports greater clinical trust and paves the way for integrating RFs into breast cancer diagnosis and personalized care.