Machine learning model for preoperative classification of stromal subtypes in salivary gland pleomorphic adenoma based on ultrasound histogram analysis.
Authors
Affiliations (7)
Affiliations (7)
- Department of Ultrasound, The First Affiliated Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen, 361003, China. [email protected].
- The First Affiliated Hospital of Xiamen University, 55, Zhenhai Road, Siming District, Xiamen, 361003, China. [email protected].
- Department of Ultrasound, Zhongshan Hospital (Xiamen), Fudan University, Xiamen, 361015, China.
- Department of Ultrasound, The First Affiliated Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen, 361003, China.
- Department of Pathology, The First Affiliated Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen, 361003, China.
- Department of Ultrasound, The First Affiliated Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen, 361003, China. [email protected].
- The First Affiliated Hospital of Xiamen University, 55, Zhenhai Road, Siming District, Xiamen, 361003, China. [email protected].
Abstract
Accurate preoperative discrimination of salivary gland pleomorphic adenoma (SPA) stromal subtypes is essential for therapeutic plannings. We aimed to establish and test machine learning (ML) models for classification of stromal subtypes in SPA based on ultrasound histogram analysis. A total of 256 SPA patients were enrolled in the study and categorized into two groups: stroma-low and stroma-high. The dataset was split into a training cohort with 177 patients and a validation cohort with 79 patients. The least absolute shrinkage and selection operator (LASSO) regression identified optimal features, which were then utilized to build predictive models using logistic regression (LR) and eight ML algorithms. The effectiveness of the models was evaluated using a range of performance metrics, with a particular focus on the area under the receiver operating characteristic curve (AUC). After LASSO regression, six key features (lesion size, shape, cystic areas, vascularity, mean, and skewness) were selected to develop predictive models. The AUCs ranged from 0.575 to 0.827 for the nine models. The support vector machine (SVM) algorithm achieved the highest performance with an AUC of 0.827, accompanied by an accuracy of 0.798, precision of 0.792, recall of 0.862, and an F1 score of 0.826. The LR algorithm also exhibited robust performance, achieving an AUC of 0.818, slightly trailing behind the SVM algorithm. Decision curve analysis indicated that the SVM-based model provided superior clinical utility compared to other models. The ML model based on ultrasound histogram analysis offers a precise and non-invasive approach for preoperative categorization of stromal subtypes in SPA.