Machine learning-based classification of histological subtypes of invasive breast cancer using MRI contralateral breast texture features.
Authors
Affiliations (3)
Affiliations (3)
- Department of Radiography/Radiotherapy, Faculty of Allied Health Sciences, University of Peradeniya, Peradeniya, 20400, Sri Lanka.
- Department of Radiology, Faculty of Medicine, University of Peradeniya, Peradeniya, 20400, Sri Lanka.
- Department of Radiography/Radiotherapy, Faculty of Allied Health Sciences, University of Peradeniya, Peradeniya, 20400, Sri Lanka. [email protected].
Abstract
Invasive Breast Cancer (IBC), encompassing Invasive Ductal Carcinoma (IDC) and Invasive Lobular Carcinoma (ILC), is the most prevalent cancer in women. This study aimed to develop a machine learning (ML) model for distinguishing between its histological subtypes (IDC and ILC) by analyzing glandular texture features from the contralateral breast. T1-weighted pre-contrast MRI images were sourced from the Cancer Imaging Archive, with image segmentation performed in 3D Slicer software, yielding a dataset of 2444 slices (1890 IDC, 554 ILC). First-order and GLCM texture features were extracted using MATLAB, and feature selection via ANOVA F-test revealed correlation (0.1233) and mean (0.5335) as the least significant features. Despite this, the initial model with all features achieved an accuracy of 0.9038, suggesting the importance of all extracted features. To address dataset imbalance, the SMOTE technique was applied, creating balanced training (80%) and testing (20%) subsets. Various ML algorithms were tested, and the Random Forest Classifier achieved the highest cross-validation scores for both SMOTE (0.8723 ± 0.0209) and original (0.8989 ± 0.0224) datasets. The final model achieved an accuracy of 91% on the original and 87% on the SMOTE dataset, revealing a comprehensive classification. The findings support early diagnosis and make an innovative contribution to the literature.