Back to all papers

Externally Tested AI Models for Malignancy Classification of Lung Nodules at Chest CT: A Systematic Review and Meta-Analysis.

June 3, 2026pubmed logopapers

Authors

Asmara OD,Steenhuis EGM,de Jong K,Joseph A,Timmer M,Tenda ED,Boerma EC,Heuvelmans MA,van Geffen WH

Affiliations (10)

  • Department of Pulmonary Medicine, Frisius Medical Center, Henri Dunantweg 2, 8934 AD Leeuwarden, the Netherlands.
  • Division of Respirology and Critical Illness, Department of Internal Medicine, Faculty of Medicine, Universitas Indonesia, Dr. Cipto Mangunkusumo National General Hospital, Jakarta, Indonesia.
  • Faculty Campus Fryslán, University of Groningen, Leeuwarden, the Netherlands.
  • Department of Pulmonary Diseases, Radboud University Medical Center, Nijmegen, the Netherlands.
  • Department of Pulmonology, Isala, Zwolle, the Netherlands.
  • Department of Epidemiology, Frisius Medical Center, Leeuwarden, the Netherlands.
  • Knowledge and Information Center, Frisius Medical Center, Leeuwarden, the Netherlands.
  • Department of Epidemiology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands.
  • Institute for Diagnostic Accuracy, Groningen, the Netherlands.
  • Department of Respiratory Medicine, Amsterdam University Medical Center, Amsterdam, the Netherlands.

Abstract

Purpose To evaluate the pooled diagnostic accuracy of externally tested AI models for malignancy classification of lung nodules on chest CT. Materials and Methods A systematic search of PubMed, Embase, Web of Science, CINAHL, and the Cochrane Library was performed in January 2025 to identify studies evaluating AI models for malignancy classification of lung nodules on chest CT using pathology and/or at least 2-year follow-up as reference standards. Risk of bias was assessed using QUADAS-2, and pooled sensitivity and specificity were estimated using bivariate random-effects models. Results Twenty-one studies including 7,454 nodules were analyzed, with lung cancer prevalence ranging from 5.7% (17/297) to 91.5% (214/234). All models were based on deep learning; 17 studies (81%) involved Asian populations, 15 (71%) used non-screening populations, 14 (67%) reported 2D or 3D CNN architectures, and eight (38%) specified predefined malignancy thresholds. High risk of bias was identified in five studies for patient selection and two for index testing. Pooled sensitivity was 88%, specificity 75%, positive likelihood ratio 3.55, negative likelihood ratio 0.16, area under the receiver operating characteristic curve 0.89, and diagnostic odds ratio 22.4. Heterogeneity was high (I<sup>2</sup> > 90%). Model architecture was associated with specificity, with higher values in studies reporting 2D or 3D CNNs compared with those without reported architecture (82-83% vs 58%, <i>P</i> = .03; meta-regression <i>P</i> = .02); other subgroup analyses showed no evidence of differences. Conclusion Externally tested AI models demonstrated high sensitivity but moderate specificity for malignancy classification of lung nodules on chest CT, supporting a potential role in rule-out strategies. However, substantial heterogeneity, inconsistent reporting, and risk of bias limit interpretation. ©RSNA, 2026.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.