Back to all papers

Comparative Performance of 3 Artificial Intelligence Systems for Lung Nodule Characterization in Low-Dose Computed Tomography Screening.

March 12, 2026pubmed logopapers

Authors

Khurelsukh K,Lin YP,Chang HM,Hsu WC,Huang PC,Wu CT,Wan YL

Affiliations (3)

  • Department of Medical Imaging and Radiological Sciences, College of Medicine, Chang Gung University.
  • Department of Radiology, Intermed Hospital, Ulaanbaatar, Mongolia.
  • Department of Medical Imaging and Intervention, Linkou Chang Gung Memorial Hospital, Taoyuan City, Taiwan.

Abstract

This study evaluates 3 artificial intelligence (AI) systems in detecting, characterizing, and classifying lung nodules on low-dose computed tomography (LDCT) scans of 100 subjects, assessing agreement with a reference standard and inter-vendor consistency. Performance of 3 commercially available AI platforms-AI 1, AI 2, and AI 3-was retrospectively analyzed against evaluations by 2 thoracic radiologists, with discordances resolved by consensus as reference standard. Agreements were assessed for nodule presence, type (solid, part-solid, ground-glass), and Lung-RADS category using Cohen Kappa. Agreement for continuous measurements (nodule diameter and volume) across AI systems was evaluated using intraclass correlation coefficients (ICC). Group comparisons for continuous variables were performed using the Kruskal-Wallis test, with Mann-Whitney U tests for post hoc pairwise comparisons. Categorical variables were compared using χ2 tests. Bland-Altman analysis evaluated variability in diameter and volume measurements. The 3 AI systems detected 435, 152, and 70 nodules, respectively, whereas radiologists identified 126 nodules (P<0.001). Sensitivity, specificity, and accuracy were 77.0%, 8.2%, and 25.7% for AI 1; 72.2%, 83.4%, and 80.6% for AI 2; and 42.9%, 95.7%, and 82.2% for AI 3. Agreement with the reference standard was perfect for AI 2 and almost perfect for AI 3, but absent for AI 1. Inter-AI agreement was substantial (Îș=0.66 to 0.78), and diameter/volume measurements showed moderate to good reliability (ICC=0.57 to 0.87). Commercial AI systems show variable performance in nodule detection and classification, underscoring the need for users to understand each system's characteristics and interpret results within clinical context.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.