Development and interpretation of machine learning-based prognostic models for predicting high-risk prognostic pathological components in pulmonary nodules: integrating clinical features, serum tumor marker and imaging features.
Authors
Affiliations (4)
Affiliations (4)
- Department of Thoracic Surgery, Qilu Hospital of Shandong University, Jinan, Shandong, 250012, China.
- Department of Thoracic Surgery, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Jinan, Shandong, China.
- Department of Thoracic Surgery, Qilu Hospital of Shandong University, Jinan, Shandong, 250012, China. [email protected].
- Department of Thoracic Surgery, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Jinan, Shandong, China. [email protected].
Abstract
With the improvement of imaging, the screening rate of Pulmonary nodules (PNs) has further increased, but their identification of High-Risk Prognostic Pathological Components (HRPPC) is still a major challenge. In this study, we aimed to build a multi-parameter machine learning predictive model to improve the discrimination accuracy of HRPPC. This study included 816 patients with ≤ 3 cm pulmonary nodules with clear pathology and underwent pulmonary resection. High-resolution chest CT images, clinicopathological characteristics were collected from patients. Lasso regression was utilized in order to identify key features, and a machine learning prediction model was constructed based on the screened key features. The recognition ability of the prediction model was evaluated using (ROC) curves and confusion matrices. Model calibration ability was evaluated using calibration curves. Decision curve analysis (DCA) was used to evaluate the value of the model for clinical applications. Use SHAP values for interpreting predictive models. A total of 816 patients were included in this study, of which 112 (13.79%) had HRPPC of pulmonary nodules. By selecting key variables through Lasso recursive feature elimination, we finally identified 13 key relevant features. The XGB model performed the best, with an area under the ROC curve (AUC) of 0.930 (95% CI: 0.906-0.954) in the training cohort and 0.835 (95% CI: 0.774-0.895) in the validation cohort, indicating that the XGB model had excellent predictive performance. In addition, the calibration curves of the XGB model showed good calibration in both cohorts. DCA demonstrated that the predictive model had a positive benefit in general clinical decision-making. The SHAP values identified the top 3 predictors affecting the HRPPC of PNs as CT Value, Nodule Long Diameter, and PRO-GRP. Our prediction model for identifying HRPPC in PNs has excellent discrimination, calibration and clinical utility. Thoracic surgeons could make relatively reliable predictions of HRPPC in PNs without the possibility of invasive testing.