Auxiliary Diagnosis of Pulmonary Nodules' Benignancy and Malignancy Based on Machine Learning: A Retrospective Study.
Wang W, Yang B, Wu H, Che H, Tong Y, Zhang B, Liu H, Chen Y
•papers•Jan 1 2025Lung cancer, one of the most lethal malignancies globally, often presents insidiously as pulmonary nodules. Its nonspecific clinical presentation and heterogeneous imaging characteristics hinder accurate differentiation between benign and malignant lesions, while biopsy's invasiveness and procedural constraints underscore the critical need for non-invasive early diagnostic approaches. In this retrospective study, we analyzed outpatient and inpatient records from the First Medical Center of Chinese PLA General Hospital between 2011 and 2021, focusing on pulmonary nodules measuring 5-30mm on CT scans without overt signs of malignancy. Pathological examination served as the reference standard. Comparative experiments evaluated SVM, RF, XGBoost, FNN, and Atten_FNN using five-fold cross-validation to assess AUC, sensitivity, and specificity. The dataset was split 70%/30%, and stratified five-fold cross-validation was applied to the training set. The optimal model was interpreted with SHAP to identify the most influential predictive features. This study enrolled 3355 patients, including 1156 with benign and 2199 with malignant pulmonary nodules. The Atten_FNN model demonstrated superior performance in five-fold cross-validation, achieving an AUC of 0.82, accuracy of 0.75, sensitivity of 0.77, and F1 score of 0.80. SHAP analysis revealed key predictive factors: demographic variables (age, sex, BMI), CT-derived features (maximum nodule diameter, morphology, density, calcification, ground-glass opacity), and laboratory biomarkers (neuroendocrine markers, carcinoembryonic antigen). This study integrates electronic medical records and pathology data to predict pulmonary nodule malignancy using machine/deep learning models. SHAP-based interpretability analysis uncovered key clinical determinants. Acknowledging limitations in cross-center generalizability, we propose the development of a multimodal diagnostic systems that combines CT imaging and radiomics, to be validated in multi-center prospective cohorts to facilitate clinical translation. This framework establishes a novel paradigm for early precision diagnosis of lung cancer.