Developing ultrasound-based machine learning models for accurate differentiation between sclerosing adenosis and invasive ductal carcinoma.
Authors
Affiliations (10)
Affiliations (10)
- Center of Scientific Research, Maoming People's Hospital, Maoming, China.
- Department of Medical Imaging, Affiliated Hospital of Jilin Medical University, Jilin, China.
- The Guangxi Engineering Research Center of Digital Medicine and Clinical Translation, The Affiliated Hospital of Guilin Medical University, Guilin, China.
- Department of Medical Ultrasound, Tongji Hospital of Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.
- Department of General Surgery, Maoming People's Hospital, Maoming, China.
- Department of General Surgery, The First Affiliated Hospital of Jiamusi University, Jiamusi, China.
- School of Medical Imaging, Mudanjiang Medical University, Mudanjiang, China.
- Department of Ultrasound, The Affiliated Hospital of Guilin Medical University, Guilin, China.
- Department of Anesthesiology, Second Affiliated Hospital of Guangdong Medical University, Zhanjiang, China. [email protected].
- Center of Scientific Research, Maoming People's Hospital, Maoming, China. [email protected].
Abstract
This study aimed to develop a machine learning model using breast ultrasound images to improve the non-invasive differential diagnosis between Sclerosing Adenosis (SA) and Invasive Ductal Carcinoma (IDC). 2046 ultrasound images from 772 SA and IDC patients were collected, Regions of Interest (ROI) were delineated, and features were extracted. The dataset was split into training and test cohorts, and feature selection was performed by correlation coefficients and Recursive Feature Elimination. 10 classifiers with Grid Search and 5-fold cross-validation were applied during model training. Receiver Operating Characteristic (ROC) curve and Youden index were used to model evaluation. SHapley Additive exPlanations (SHAP) was employed for model interpretation. Another 224 ROIs of 84 patients from other hospitals were used for external validation. For the ROI-level model, XGBoost with 18 features achieved an area under the curve (AUC) of 0.9758 (0.9654-0.9847) in the test cohort and 0.9906 (0.9805-0.9973) in the validation cohort. For the patient-level model, logistic regression with 9 features achieved an AUC of 0.9653 (0.9402-0.9859) in the test cohort and 0.9846 (0.9615-0.9978) in the validation cohort. The feature "Original shape Major Axis Length" was identified as the most important, with its value positively correlated with a higher likelihood of the sample being IDC. Feature contributions for specific ROIs were visualized as well. We developed explainable, ultrasound-based machine learning models with high performance for differentiating SA and IDC, offering a potential non-invasive tool for improved differential diagnosis. Question Accurately distinguishing between sclerosing adenosis (SA) and invasive ductal carcinoma (IDC) in a non-invasive manner has been a diagnostic challenge. Findings Explainable, ultrasound-based machine learning models with high performance were developed for differentiating SA and IDC, and validated well in external validation cohort. Critical relevance These models provide non-invasive tools to reduce misdiagnoses of SA and improve early detection for IDC.