Progression risk of adolescent idiopathic scoliosis based on SHAP-Explained machine learning models: a multicenter retrospective study.
Authors
Affiliations (10)
Affiliations (10)
- The Fourth School of Clinical Medicine, Zhejiang Chinese Medical University, Hangzhou First People's Hospital, Hangzhou, China.
- The First People's Hospital of Yuhang District, Hangzhou, China.
- Hangzhou Children's Hospital, Hangzhou, China.
- The First Affiliated Hospital of Ningbo University, Ningbo, China.
- Department of Radiology, Tongde Hospital of Zhejiang Province, Hangzhou, China.
- Children's Hospital, Zhejiang University School of Medicine, Hangzhou, China.
- Department of Radiology, Affiliated Hangzhou First People's Hospital, School of Medicine, Westlake University, Hangzhou, China.
- Department of Radiology, The First Affiliated Hospital of Xiamen University, Xiamen, Fujian Province, People's Republic of China.
- The Fourth School of Clinical Medicine, Zhejiang Chinese Medical University, Hangzhou First People's Hospital, Hangzhou, China. [email protected].
- Children's Hospital, Zhejiang University School of Medicine, Hangzhou, China. [email protected].
Abstract
To develop an interpretable machine learning model, explained using SHAP, based on imaging features of adolescent idiopathic scoliosis extracted by convolutional neural networks (CNNs), in order to predict the risk of curve progression and identify the most accurate predictive model. This study included 233 patients with adolescent idiopathic scoliosis from three medical centers. CNNs were used to extract features from full-spine coronal X-ray images taken at three follow-up points for each patient. Imaging and clinical features from center 1 were analyzed using the Boruta algorithm to identify independent predictors. Data from center 1 were divided into training (80%) and testing (20%) sets, while data from centers 2 and 3 were used as external validation sets. Six machine learning models were constructed. Receiver operating characteristic (ROC) curves were plotted, and model performance was assessed by calculating the area under the curve (AUC), accuracy, sensitivity, and specificity in the training, testing, and external validation sets. The SHAP interpreter was used to analyze the most effective model. The six models yielded AUCs ranging from 0.565 to 0.989, accuracies from 0.600 to 0.968, sensitivities from 0.625 to 1.0, and specificities from 0.571 to 0.974. The XGBoost model achieved the best performance, with an AUC of 0.896 in the external validation set. SHAP analysis identified the change in the main Cobb angle between the second and first follow-ups [Cobb1(2−1)] as the most important predictor, followed by the main Cobb angle at the second follow-up (Cobb1-2) and the change in the secondary Cobb angle [Cobb2(2−1)]. The XGBoost model demonstrated the best predictive performance in the external validation cohort, confirming its preliminary stability and generalizability. SHAP analysis indicated that Cobb1(2−1) was the most important feature for predicting scoliosis progression. This model offers a valuable tool for clinical decision-making by enabling early identification of high-risk patients and supporting early intervention strategies through automated feature extraction and interpretable analysis. The online version contains supplementary material available at 10.1186/s12891-025-08841-3.