Feature Selection and Machine Learning Strategies for CT Radiomics-Based Survival Prediction in Non-Small Cell Lung Cancer: A Comparative Study.

June 7, 2026

papers

DOI: 10.3390/diagnostics16121761 PMID: 42351421

Authors

Huang M,Hui A,Leung CW,Li CL,Leung TL,Tang FH,Tam SY

Affiliations (1)

School of Medical and Health Sciences, Tung Wah College, Homantin, Hong Kong SAR, China.

Abstract

Background/Objectives: Computed tomography (CT)-based radiomics shows promise for non-small cell lung cancer (NSCLC) prognosis prediction, but model performance varies widely by feature selection and machine learning strategies. Optimal combinations remain unclear. This study aims to systematically compare feature selection methods and machine learning algorithms for 12-month overall survival prediction using CT radiomics in NSCLC patients. Methods: We analyzed 385 patients from The Cancer Imaging Archive (TCIA) NSCLC-Radiomics dataset. Radiomic features from primary tumor volumes were combined with clinical variables. Three feature selection methods-sequential forward selection (SFS), maximum relevance minimum redundancy (mRMR), and least absolute shrinkage and selection operator (LASSO)-were compared across five classifiers: k-nearest neighbors (KNN), support vector machine (SVM), random forest (RF), logistic regression (LR), and gradient boosting classifier (GBC). Performance was assessed using area under the receiver operating characteristic curve (AUC) and accuracy on independent test sets. Cox regression and Kaplan-Meier analyses evaluated survival risk stratification. Results: Logistic regression showed the most stable classification performance across feature selection strategies (test AUC 0.60-0.65, accuracy 0.72-0.73). The mRMR-LR model achieved highest AUC (0.65); LASSO-LR showed highest accuracy (0.73). For survival analysis, LASSO-based Cox modeling demonstrated superior risk stratification with significant separation between high- and low-risk groups in both training and testing sets (p = 0.0095). Conclusions: Simpler models like logistic regression provide robust performance in CT radiomics, while LASSO excels for survival risk stratification. As we employed single-dataset validation, clinical applicability remains limited because validation was performed within a single public dataset. Nevertheless, the findings provide methodological insights into the selection of feature selection and machine learning strategies for CT radiomics-based prognostic modeling in NSCLC.

View Source Full Text PDF

Topics

Journal Article

Feature Selection and Machine Learning Strategies for CT Radiomics-Based Survival Prediction in Non-Small Cell Lung Cancer: A Comparative Study.

Authors

Affiliations (1)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?