Artificial Intelligence for Low-Dose CT Lung Cancer Screening: Comparison of Utilization Scenarios.
Authors
Affiliations (6)
Affiliations (6)
- Department of Radiology, Seoul National University Hospital, 101 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea.
- Department of Radiology, Seoul National University College of Medicine, Seoul, Republic of Korea.
- Department of Radiology, Seoul National University Bundang Hospital, Gyeonggi-do, Republic of Korea.
- Department of Radiology, Inje University Haeundae Paik Hospital, Busan, Republic of Korea.
- Department of Radiology, Chung-Ang University Hospital, Chung-ang University College of Medicine, Seoul, Republic of Korea.
- Department of Radiology, Yeungnam University Medical Center, Daegu, Republic of Korea.
Abstract
<b>BACKGROUND</b>. Artificial intelligence (AI) tools for evaluating low-dose CT (LDCT) lung cancer screening examinations are used predominantly for assisting radiologists' interpretations. Alternate utilization scenarios (e.g., use of AI as a prescreener or backup) warrant consideration. <b>OBJECTIVE</b>. The purpose of this study was to evaluate the impact of different AI utilization scenarios on diagnostic outcomes and interpretation times for LDCT lung cancer screening. <b>METHODS</b>. This retrospective study included 366 individuals (358 men, 8 women; mean age, 64 years) who underwent LDCT from May 2017 to December 2017 as part of an earlier prospective lung cancer screening trial. Examinations were interpreted by one of five readers, who reviewed their assigned cases in two sessions (with and without a commercial AI computer-aided detection tool). These interpretations were used to reconstruct simulated AI utilization scenarios: as an assistant (i.e., radiologists interpret all examinations with AI assistance), as a prescreener (i.e., radiologists only interpret examinations with a positive AI result), or as backup (i.e., radiologists reinterpret examinations when AI suggests a missed finding). A group of thoracic radiologists determined the reference standard. Diagnostic outcomes and mean interpretation times were assessed. Decision-curve analysis was performed. <b>RESULTS</b>. Compared with interpretation without AI (recall rate, 22.1%; per-nodule sensitivity, 64.2%; per-examination specificity, 88.8%; mean interpretation time, 164 seconds), AI as an assistant showed higher recall rate (30.3%; <i>p</i> < .001), lower per-examination specificity (81.1%), and no significant change in per-nodule sensitivity (64.8%; <i>p</i> = .86) or mean interpretation time (161 seconds; <i>p</i> = .48); AI as a prescreener showed lower recall rate (20.8%; <i>p</i> = .02) and mean interpretation time (143 seconds; <i>p</i> = .001), higher per-examination specificity (90.3%; <i>p</i> = .04), and no significant difference in per-nodule sensitivity (62.9%; <i>p</i> = .16); and AI as a backup showed increased recall rate (33.6%; <i>p</i> < .001), per-examination sensitivity (66.4%; <i>p</i> < .001), and mean interpretation time (225 seconds; <i>p</i> = .001), with lower per-examination specificity (79.9%; <i>p</i> < .001). Among scenarios, only AI as a prescreener demonstrated higher net benefit than interpretation without AI; AI as an assistant had the least net benefit. <b>CONCLUSION</b>. Different AI implementation approaches yield varying outcomes. The findings support use of AI as a prescreener as the preferred scenario. <b>CLINICAL IMPACT</b>. An approach whereby radiologists only interpret LDCT examinations with a positive AI result can reduce radiologists' workload while preserving sensitivity.