Machine learning-based quantitative prediction of spread through air spaces in primary lung adenocarcinoma using intratumoural heterogeneity scores.
Authors
Affiliations (8)
Affiliations (8)
- Department of Radiology, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.
- Department of Radiology, Harbin Medical University, Harbin Medical University Cancer Hospital, Harbin, Heilongjiang, China.
- Department of Radiology, Beijing Jishuitan Hospital, Capital Medical University, Beijing, China.
- Department of Radiology, The Second Affiliated Hospital of Mudanjiang Medical University, Mudanjiang City, Heilongjiang Province, China.
- Department of Ultrasound, First Affiliated Hospital of Harbin Medical University, Harbin, China.
- Department of Radiology, Beidahuang Industry Group General Hospital, Harbin, Heilongjiang Province, China.
- Department of Medical Oncology, Harbin Medical University Cancer Hospital, Harbin, Heilongjiang, China.
- Department of Thoracic Surgery, Harbin Medical University, Harbin Medical University Cancer Hospital, Harbin, Heilongjiang, China.
Abstract
Spread through air spaces (STAS) is a distinct, aggressive pattern of primary lung adenocarcinoma (LUAD) that affects both prognosis and treatment strategies for patients. This study aimed to quantify intratumoural heterogeneity (ITH) and integrate the quantitative metrics with intratumoural-peritumoural habitat features and clinical-radiologic characteristics to preoperatively predict the STAS status of primary LUAD and further explore the potential biological basis underlying the prediction model. Conventional radiomics features and habitat features were extracted from intratumoural and peritumoural regions on preoperative computerized tomography (CT) images. A new index, the ITH score, was developed to quantify ITH levels. Univariable and multivariable logistic regression analyses were conducted to identify clinical-radiologic characteristics associated with STAS. Various machine learning algorithms were used to build the prediction models. Additionally, intratumoural-peritumoural habitat features, ITH score, and clinical-radiologic characteristics were integrated into a combined model. Finally, 24 patients with RNA sequencing data were utilised for gene expression analysis. A total of 1268 patients (median age, 60 years; IQR, 53.8-66.0 years; 850 female) were divided into the training set (n = 943), validation set (n = 236), and external test set (n = 89). Using the Light Gradient Boosting Machine classifier, the combined model demonstrated the highest predictive performance for STAS, achieving an AUC value of 0.97 in the training, 0.98 in the validation, and 0.91 in the external test set. Differentially expressed genes in a high combined model probability group were associated with monocarboxylic acid transport and metabolism. The combined model demonstrated superior performance in predicting STAS in patients with primary LUAD.