Machine learning-based prediction of invasiveness in lung adenocarcinoma presenting as ground-glass nodules using radiomics and clinical CT features.
Authors
Affiliations (3)
Affiliations (3)
- Department of Thoracic Surgery, The Second Hospital & Clinical Medical School, Lanzhou University, 82 Cuiyingmen, Chengguan District, Lanzhou, 730030, China.
- Department of Thoracic Surgery, The Second Hospital & Clinical Medical School, Lanzhou University, 82 Cuiyingmen, Chengguan District, Lanzhou, 730030, China. [email protected].
- The Affiliated Yantai Yuhuangding Hospital of Qingdao University, Yantai, Shandong, China.
Abstract
Lung adenocarcinoma(LA), the predominant histological subtype of lung cancer, frequently manifests as ground-glass nodules (GGNs) on computed tomography. Preoperative discrimination of invasiveness—critical for guiding surgical and therapeutic decisions—remains challenging due to subjective radiological assessment and limited sensitivity of conventional methods. This multicenter study aimed to develop a robust, non-invasive predictive framework integrating radiomics and clinical CT features using machine learning (ML) to stratify GGN-associated LA invasiveness. A retrospective dual-cohort analysis was conducted on 357 patients with pathologically confirmed LA. The primary cohort (<i>n</i> = 312) was randomly divided into a training cohort (<i>n</i> = 249) and a test cohort (<i>n</i> = 63) at an 8:2 ratio. The external validation cohort consisted of 45 patients. Radiomics features (<i>n</i> = 1129) were extracted from High Resolution CT (HRCT), and clinical CT features (<i>n</i> = 16) were evaluated by blinded radiologists. Principal component analysis (PCA) and least absolute shrinkage and selection operator (LASSO) were respectively used for dimensionality reduction of radiomics features and five ML algorithms (XGBoost, SVM, Random Forest, Logistic Regression, LightGBM) were trained to predict invasiveness (low: minimally invasive adenocarcinoma/Grade 1 invasive adenocarcinoma; high: Grade 2/3 invasive adenocarcinoma). Model performance was assessed using Area Under the Curve (AUC), sensitivity, specificity, and Decision Curve Analysis. The calibration curve was plotted, and SHapley Additive exPlanations methods were used to interpret the predictive models. The Random Forest model In the Clinical CT Features-PCA radiomics model performed the best, with an AUC value of 0.854 for the training cohort, 0.769 for the test cohort, and 0.778 for the external validation cohort. Key predictive features included PCA-derived radiomic components and clinical CT Features. Clinical CT Features-PCA Radiomics RF model significantly outperformed clinical-only models and Clinical CT Features-LASSO Radiomics Model, showing superior predictive ability. Integration of radiomics and clinical CT features via ML, particularly RF, enables accurate preoperative prediction of LA invasiveness in GGNs. This approach enhances objectivity over conventional radiological assessment and may optimize personalized treatment strategies. Further validation in larger, prospective cohorts is warranted to confirm clinical utility. The online version contains supplementary material available at 10.1186/s12885-025-14983-3.