Development and Validation of Time-to-Event Machine Learning Models for Predicting Disease-Free Survival in Patients with Locally Advanced Colorectal Cancer: A Multicenter Cohort Study.
Authors
Affiliations (8)
Affiliations (8)
- Department of Radiology, Jiangxi Cancer Hospital & Institute, Jiangxi Clinical Research Center for Cancer, The Second Affiliated Hospital of Nanchang Medical College, Nanchang, China.
- Department of Radiology, Xiangtan Central Hospital, Xiangtan, China.
- Department of Radiology, The Second Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang, China.
- Department of Radiology, The First Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang, China.
- School of Mathematics and Computational Science, Xiangtan University, Xiangtan, China.
- Department of Pathology, Jiangxi Cancer Hospital, Nanchang, China.
- Department of Radiology, Jiangxi Cancer Hospital & Institute, Jiangxi Clinical Research Center for Cancer, The Second Affiliated Hospital of Nanchang Medical College, Nanchang, China. [email protected].
- Department of Radiology, Jiangxi Cancer Hospital & Institute, Jiangxi Clinical Research Center for Cancer, The Second Affiliated Hospital of Nanchang Medical College, Nanchang, China. [email protected].
Abstract
The postoperative prognosis of locally advanced colorectal cancer (LACRC) exhibits significant heterogeneity. However, conventional models for predicting disease-free survival (DFS) often lack the necessary precision. Therefore, we aim to develop and validate time-to-event machine learning (ML) models for predicting DFS in patients with LACRC, ultimately improving prognostic accuracy. This multicenter cohort study enrolled 456 patients with LACRC from three medical centers. A training cohort consisting of 350 patients was formed from centers 1 and 2, while an external validation cohort comprising 106 patients was sourced from center 3. Preoperative computed tomography (CT) images were segmented to extract radiomics features, and a radiomics score (radscore) was calculated through feature engineering. In addition, intratumor heterogeneity (ITH) scores were derived by integrating clustered mask regions with global pixel distribution patterns. To predict DFS, five time-to-event ML models were trained: Cox proportional hazards, FastKernelSurvivalSVM, GradientBoostingSurvival (GB-Survival), RandomSurvivalForest, and ExtraSurvivalTrees. Model performance was assessed using the concordance index (C-index), and Survival SHapley Additive exPlanations over time (SurvSHAP (t)) analysis was conducted for model interpretation. Among the models tested, GB-Survival demonstrated the highest predictive performance for DFS, achieving a C-index of 0.7823. SurvSHAP (t) analysis revealed that the key prognostic factors included the ITH score, pathological TNM stage, lymphovascular invasion, radscore, and the prognostic nutritional index. The GB-Survival model that integrates multimodal data outperforms other time-to-event ML models in predicting DFS for LACRC. This approach may facilitate the development of data-driven treatment strategies and personalized risk stratification for patients with LACRC.