Diagnostic performance of commercial AI systems versus participating radiologists for pulmonary nodule detection in routine clinical practice.
Authors
Affiliations (5)
Affiliations (5)
- Department of Diagnostic Radiology, University of Yamanashi, 1110, Shimokato, Chuo, Yamanashi, 409-3898, Japan. [email protected].
- Department of Diagnostic Radiology, University of Yamanashi, 1110, Shimokato, Chuo, Yamanashi, 409-3898, Japan.
- Department of Therapeutic Radiology, University of Yamanashi, Chuo, Yamanashi, Japan.
- Department of Diagnostic Radiology, Yamanashi Central Hospital, Kofu, Yamanashi, Japan.
- Division of Radiology, University of Yamanashi Hospital, Chuo, Yamanashi, Japan.
Abstract
To evaluate the diagnostic performance of two commercial artificial intelligence (AI) systems versus that of radiologists on routine clinical chest computed tomography (CT) and to identify the imaging characteristics that limit AI performance. We retrospectively analyzed the 5-mm-slice chest CT of 102 patients (353 nodules or masses). The detection performance of two board-certified radiologists and two commercial AI systems was compared against an expert-established reference standard. Sensitivity, false positives (FPs) per scan, and positive predictive value (PPV) were evaluated. Logistic regression identified factors influencing detection. The radiologists demonstrated higher sensitivity (90.4-94.3%) than the AI systems (80.2-89.2%) and significantly fewer FPs per scan (1.61-3.56 vs. 5.28-7.97; p < 0.001), resulting in superior PPVs. Multivariate analysis revealed divergent limitations: radiologists were challenged by intrinsic features (e.g., ground-glass nodules [GGNs]), whereas AI performance was degraded by case-level complexity (multiple nodules), specific locations (central), and atypical morphologies (large masses). In this study, the participating radiologists outperformed the commercial AI systems in routine clinical settings. Both exhibited distinct weakness profiles. Current AI systems are best suited as complementary tools rather than autonomous readers, provided FPs are managed effectively.