Preoperative Prediction of STAS Risk in Primary Lung Adenocarcinoma Using Machine Learning: An Interpretable Model with SHAP Analysis.

Authors

Wang P,Cui J,Du H,Qian Z,Zhan H,Zhang H,Ye W,Meng W,Bai R

Affiliations (4)

  • Department of Radiology, Beijing Jishuitan Hospital, Capital Medical University, Beijing 100035, China (P.W., J.C., Z.Q., H.Z., H.Z., W.Y., R.B.).
  • Department of Orthopaedic Surgery, Beijing Jishuitan Hospital, Capital Medical University, Beijing 100035, China (H.D.).
  • Radiology Department, Harbin Medical University, Harbin Medical University Cancer Hospital, 150 Haping Road, Harbin, Heilongjiang 150081, China (W.M.).
  • Department of Radiology, Beijing Jishuitan Hospital, Capital Medical University, Beijing 100035, China (P.W., J.C., Z.Q., H.Z., H.Z., W.Y., R.B.). Electronic address: [email protected].

Abstract

Accurate preoperative prediction of spread through air spaces (STAS) in primary lung adenocarcinoma (LUAD) is critical for optimizing surgical strategies and improving patient outcomes. To develop a machine learning (ML) based model to predict STAS using preoperative CT imaging features and clinicopathological data, while enhancing interpretability through shapley additive explanations (SHAP) analysis. This multicenter retrospective study included 1237 patients with pathologically confirmed primary LUAD from three hospitals. Patients from Center 1 (n=932) were divided into a training set (n=652) and an internal test set (n=280). Patients from Centers 2 (n=165) and 3 (n=140) formed external validation sets. CT imaging features and clinical variables were selected using Boruta and least absolute shrinkage and selection operator regression. Seven ML models were developed and evaluated using five-fold cross-validation. Performance was assessed using F1 score, recall, precision, specificity, sensitivity, and area under the receiver operating characteristic curve (AUC). The Extreme Gradient Boosting (XGB) model achieved AUCs of 0.973 (training set), 0.862 (internal test set), and 0.842/0.810 (external validation sets). SHAP analysis identified nodule type, carcinoembryonic antigen, maximum nodule diameter, and lobulated sign as key features for predicting STAS. Logistic regression analysis confirmed these as independent risk factors. The XGB model demonstrated high predictive accuracy and interpretability for STAS. By integrating widely available clinical and imaging features, this model offers a practical and effective tool for preoperative risk stratification, supporting personalized surgical planning in primary LUAD management.

Topics

Machine LearningLung NeoplasmsTomography, X-Ray ComputedAdenocarcinoma of LungJournal ArticleMulticenter Study

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.