Development of an explainable machine learning model for 3-year cardiovascular risk prediction in new-onset type 2 diabetes using the TyG index and ultrasound features.
Authors
Affiliations (2)
Affiliations (2)
- Department of Ultrasound, Shaoxing People`s Hospital (The First Affiliated Hospital, Shaoxing University), 568 N Zhongxing Rd, Shaoxing, Zhejiang, 312000, China.
- Department of Ultrasound, Shaoxing People`s Hospital (The First Affiliated Hospital, Shaoxing University), 568 N Zhongxing Rd, Shaoxing, Zhejiang, 312000, China. [email protected].
Abstract
New-onset type 2 diabetes (T2D) is associated with increased cardiovascular risk and requires tailored prevention strategies. Traditional risk factors and assessment tools may not accurately predict cardiovascular disease (CVD) in this population. In our study, we compared different machine learning (ML) methods to predict the 3-year risk of developing CVD in new-onset T2D patients and developed models combining clinical data and ultrasound features for better risk evaluation. A group of 3,358 hospitalized T2D patients was screened. ML models were developed and evaluated. Feature selection was conducted via SHapley Additive exPlanations (SHAPs) and recursive feature elimination to improve both the model’s performance and its interpretability. The optimal model was subsequently compared with the Framingham Risk Score (FRS). Ultimately, the model was employed for risk stratification. Of the ML models developed, LightGBM, which incorporates six features—namely, hypertension, age, the triglyceride-glucose (TyG) index, plaque burden, maximum plaque thickness, and intima-media thickness, achieved robust performance (AUC 0.845 in the training cohort and 0.772 in the validation cohort). The model outperformed the traditional FRS (AUC 0.672 in the training cohort and 0.608 in the validation cohort, <i>P</i> < 0.05). SHAP analysis enabled individualized interpretability and clinical insights. A web-based tool was deployed to facilitate clinical application. The predictive model developed in this study by integrating clinical and imaging data, with a focus on the TyG index and ultrasound features, demonstrated enhanced predictive capability for CVD incidence in individuals with new–onset T2D. It also allows easy risk classification and is available as a web tool for real-time use, helping improve early detection and personalized care. The online version contains supplementary material available at 10.1186/s12911-025-03247-6.