Interpretable machine learning for Parkinson's disease diagnosis, staging, and biological mechanism exploration: a multicenter analysis.
Authors
Affiliations (3)
Affiliations (3)
- Department of Medical Imaging Center, The Second Affiliated Hospital of Xinjiang Medical University, Urumqi, 830011, China.
- Department of Neurology, The Second Affiliated Hospital of Xinjiang Medical University, Urumqi, 830011, China.
- Department of Neurology, The Affiliated Renji Hospital of Shanghai Jiao Tong University, Shanghai, 200000, China. [email protected].
Abstract
To develop and validate an interpretable machine learning model based on multicenter T1-weighted MRI radiomics data for the three-way classification of Parkinson's disease (PD)-Normal (NM), early-stage, and mid-late-stage-and to clarify the diagnostic value of key cortical nuclear features, with further elucidation of the molecular basis through proteomic association analysis. A total of 200 patients from multiple centers were included and divided into a training set (n = 76), an internal validation set (n = 33), and an external validation set (n = 91). Six core nuclei, including the caudate nucleus (CN), putamen (PUT), globus pallidus (GP), red nucleus (RN), substantia nigra (SN), and nucleus accumbens (NAc), were segmented using the DKT template. A total of 107 radiomics features were extracted per nucleus. After a three-step dimension reduction (variance thresholding, univariate selection, and multi-class LASSO), key features were identified. Five machine learning models were constructed. Model performance was evaluated using micro/macro-AUC and accuracy (ACC). SHAP analysis was employed to reveal the contribution of specific features to the diagnostic process. Additionally, proteomic analysis was performed on a subset of patients from Center 1 to screen for differentially expressed proteins among groups, and correlation analysis between DEPs and key radiomic features was conducted to explore biological mechanisms. Nine key features were finally selected, predominantly consisting of gray-level non-uniformity within the PUT. The GBT model performed optimally, with macro-AUC/micro-AUC of 0.890/0.902 in the internal validation set and 0.876/0.869 in the external validation set, respectively. SHAP analysis indicated that the NM group primarily relied on texture features of the PUT and NAc; early-stage PD was characterized by structural changes in the GP and PUT; and mid-late-stage PD was correlated with features from the SN and RN. Proteomic analysis identified 514 DEPs between early PD and controls (411 upregulated, 103 downregulated), which were enriched in cytoskeletal and immune pathways (FDR < 0.05), and 123 DEPs between mid-to-late and early PD (26 upregulated, 97 downregulated), which were enriched in proteasome and neurodegeneration pathways (FDR < 0.05). Multiple significant correlations between key radiomic features and DEPs were identified (all FDR-corrected P < 0.01), including original_gldm_DependenceNonUniformity_PUT with HGFAC (r = 0.510), original_ngtdm_Contrast_GP with PLAT (r = 0.453), and original_glrlm_RunEntropy_GP_QSM with both ASPN (r = 0.62) and TFPI (r = - 0.58). Interpretable machine learning models based on multicenter T1-weighted nuclear radiomics features can accurately diagnose and stage PD. Integration of radiomics and proteomics enhances model interpretability by linking imaging features to underlying biological mechanisms, providing an objective imaging basis for clinical practice and exhibiting strong generalizability and potential for clinical translation.