Machine learning-assisted classification of lung cancer: the role of sarcopenia, inflammatory biomarkers, and PET/CT anatomical-metabolic parameters.
Authors
Affiliations (7)
Affiliations (7)
- Radiotherapy Program, Vocational School of Health Sciences, Altınbaş University, Kartaltepe Dist., No.11, Bakirkoy, 34147, Istanbul, Turkey. [email protected].
- Department of Nuclear Medicine, Istanbul Training and Research Hospital, Cerrahpasa, Org. Abdurrahman Nafiz Gurman Cd. No:24, Fatih, 34098, Istanbul, Turkey.
- Department of Nuclear Physics, Faculty of Science, Istanbul University, Vezneciler, Fatih, 34134, Istanbul, Turkey.
- Department of Nuclear Medicine, Yedikule Chest Diseases Hospital, Kazlıcesme, Zeytinburnu, 34020, Istanbul, Turkey.
- Physiotherapy Program, Vocational School, Istanbul Galata University, Evliya Celebi Dist., Mesrutiyet St., No:62, Beyoglu, 34430, Istanbul, Turkey.
- Department of Chest Diseases, Faculty of Medicine, Altınbaş University, Bahcelievler MedicalPark Hospital, E-5 Highway, Kultur St., No:1, Bahçelievler, 34147, Istanbul, Turkey.
- Physiotherapy and Rehabilitation Master Program, Institute of Graduate Programs, Istanbul Bilgi University, Eyüpsultan, 34060, Istanbul, Turkey.
Abstract
Accurate differentiation between non-cancerous, benign, and malignant lung cancer remains a diagnostic challenge due to overlapping clinical and imaging characteristics. This study proposes a multimodal machine learning (ML) framework integrating positron emission tomography/computed tomography (PET/CT) anatomic-metabolic parameters, sarcopenia markers, and inflammatory biomarkers to enhance classification performance in lung cancer. A retrospective dataset of 222 patients was analyzed, including demographic variables, functional and morphometric sarcopenia indices, hematological inflammation markers, and PET/CT derived parameters such as maximum and mean standardized uptake value (SUVmax, SUVmean), metabolic tumor volume (MTV), total lesion glycolysis (TLG). Five ML algorithms-Logistic Regression, Multi-Layer Perceptron, Support Vector Machine, Extreme Gradient Boosting, and Random Forest-were evaluated using standardized performance metrics. Synthetic Minority Oversampling Technique was applied to balance class distributions. Feature importance analysis was conducted using the optimal model, and classification was repeated using the top 15 features. Among the models, Random Forest demonstrated superior predictive performance with a test accuracy of 96%, precision, recall, and F1-score of 0.96, and an average AUC of 0.99. Feature importance analysis revealed SUVmax, SUVmean, total lesion glycolysis, and skeletal muscle index as leading predictors. A secondary classification using only the top 15 features yielded even higher test accuracy (97%). These findings underscore the potential of integrating metabolic imaging, physical function, and biochemical inflammation markers in a non-invasive ML-based diagnostic pipeline. The proposed framework demonstrates high accuracy and generalizability and may serve as an effective clinical decision support tool in early lung cancer diagnosis and risk stratification.