Comparative Performance of 3 Artificial Intelligence Systems for Lung Nodule Characterization in Low-Dose Computed Tomography Screening.
Authors
Affiliations (3)
Affiliations (3)
- Department of Medical Imaging and Radiological Sciences, College of Medicine, Chang Gung University.
- Department of Radiology, Intermed Hospital, Ulaanbaatar, Mongolia.
- Department of Medical Imaging and Intervention, Linkou Chang Gung Memorial Hospital, Taoyuan City, Taiwan.
Abstract
This study evaluates 3 artificial intelligence (AI) systems in detecting, characterizing, and classifying lung nodules on low-dose computed tomography (LDCT) scans of 100 subjects, assessing agreement with a reference standard and inter-vendor consistency. Performance of 3 commercially available AI platforms-AI 1, AI 2, and AI 3-was retrospectively analyzed against evaluations by 2 thoracic radiologists, with discordances resolved by consensus as reference standard. Agreements were assessed for nodule presence, type (solid, part-solid, ground-glass), and Lung-RADS category using Cohen Kappa. Agreement for continuous measurements (nodule diameter and volume) across AI systems was evaluated using intraclass correlation coefficients (ICC). Group comparisons for continuous variables were performed using the Kruskal-Wallis test, with Mann-Whitney U tests for post hoc pairwise comparisons. Categorical variables were compared using Ï2 tests. Bland-Altman analysis evaluated variability in diameter and volume measurements. The 3 AI systems detected 435, 152, and 70 nodules, respectively, whereas radiologists identified 126 nodules (P<0.001). Sensitivity, specificity, and accuracy were 77.0%, 8.2%, and 25.7% for AI 1; 72.2%, 83.4%, and 80.6% for AI 2; and 42.9%, 95.7%, and 82.2% for AI 3. Agreement with the reference standard was perfect for AI 2 and almost perfect for AI 3, but absent for AI 1. Inter-AI agreement was substantial (Îș=0.66 to 0.78), and diameter/volume measurements showed moderate to good reliability (ICC=0.57 to 0.87). Commercial AI systems show variable performance in nodule detection and classification, underscoring the need for users to understand each system's characteristics and interpret results within clinical context.