Machine learning and discriminant analysis model for predicting benign and malignant pulmonary nodules.
Authors
Affiliations (10)
Affiliations (10)
- The Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical University, Guangzhou, 511436, China.
- School of Public Health, Guangzhou Medical University, Xinzao, Panyu District, Guangzhou, 511436, China.
- The Health Management Center, The Third Affiliated Hospital of Southern Medical University, Guangzhou, 511436, China.
- Key Laboratory of Research on Clinical Molecular Diagnosis for High Incidence Diseases in Western Guangxi, Center for Medical Laboratory Science, The Afliated Hospital of Youjiang Medical University for Nationalities, No. 18 Zhongshaner Rd., Youjiang District, Baise, 533000, China. [email protected].
- School of Public Health, Guangzhou Medical University, Xinzao, Panyu District, Guangzhou, 511436, China. [email protected].
- The Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical University, Guangzhou, 511436, China. [email protected].
- Department of Thoracic Surgery, The First Affiliated Hospital of Guangzhou Medical University, State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, Guangzhou, 510120, China. [email protected].
- School of Public Health, Guangzhou Medical University, Xinzao, Panyu District, Guangzhou, 511436, China. [email protected].
- The Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical University, Guangzhou, 511436, China. [email protected].
- School of Public Health, Guangzhou Medical University, Xinzao, Panyu District, Guangzhou, 511436, China. [email protected].
Abstract
Pulmonary Nodules (PNs) are a trend considered as the early manifestation of lung cancer. Among them, PNs that remain stable for more than two years or whose pathological results suggest not being lung cancer are considered benign PNs (BPNs), while PNs that conform to the growth pattern of tumors or whose pathological results indicate lung cancer are considered malignant PNs (MPNs). Currently, more than 90% of PNs detected by screening tests are benign, with a false positive rate of up to 96.4%. While a range of predictive models have been developed for the identification of MPNs, there are still some challenges in distinguishing between BPNs and MPNs. We included a total of 5197 patients for the case-control study according to the preset exclusion criteria and sample size. Among them, 4735 with BPNs and 2509 with MPNs were randomly divided into training, validation, and test sets according to a 7:1.5:1.5 ratio. Three widely applicable machine learning algorithms (Random Forests, Gradient Boosting Machine, and XGBoost) were used to screen the metrics, and then the corresponding predictive models were constructed using discriminative analysis, and the best performing model was selected as the target model. The model is internally validated with 10-fold cross validation and compared with PKUPH and Block models. We collated information from chest CT examinations performed from 2018 to 2021 in the physical examination population and found that the detection rate of PNs was 21.57% and showed an overall upward trend. The GMU_D model constructed by discriminative analysis based on machine learning screening features had an excellent discriminative performance (AUC = 0.866, 95% CI: 0.858-0.874), and higher accuracy than the PKUPH model (AUC = 0.559, 95% CI: 0.552-0.567) and the Block model (AUC = 0.823, 95% CI: 0.814-0.833). Moreover, the cross-validation results also exhibit excellent performance (AUC = 0.866, 95% CI: 0.858-0.874). The detection rate of PNs was 21.57% in the physical examination population undergoing chest CT. Meanwhile, based on real-world studies of PNs, a greater prediction tool was developed and validated that can be used to accurately distinguish between BPNs and MPNs with the excellent predictive performance and differentiation.