AI-CAD for diagnostic mammography: comparison to radiologists according to different indications.
Authors
Affiliations (4)
Affiliations (4)
- Department of Radiology, Research Institute of Radiological Science, Yongin Severance Hospital, Yonsei University College of Medicine, Yongin, Korea.
- Biostatistics Collaboration Unit, Yonsei University College of Medicine, Seoul, Korea.
- Department of Radiology, Research Institute of Radiological Science, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea.
- Department of Radiology, Research Institute of Radiological Science, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea. [email protected].
Abstract
Although artificial intelligence-based computer-aided diagnosis (AI-CAD) is increasingly applied in screening mammography, its use in diagnostic settings is less established. This study evaluated the diagnostic performance of AI-CAD abnormality scores at optimized thresholds across various diagnostic indications vs radiologists. This retrospective study included 1534 women (mean age, 51.4 ± 8.8 years) who underwent diagnostic mammography between March 2015 and February 2016. Cases were categorized into three diagnostic indications: (1) symptomatic, (2) BI-RADS 3 follow-up, and (3) referral for abnormal imaging. A commercially available AI-CAD system provided abnormality scores (0-100%). Final diagnosis was confirmed by pathology or ≥ 2-year imaging stability. AI-CAD performance (sensitivity, specificity, accuracy, PPV, and AUC was evaluated at two thresholds: vendor-recommended 10% for screening and an optimized 50% from ROC analysis (Youden's index), and compared with original radiologist interpretations. Among the 1534 patients, 397 (25.9%) were diagnosed with breast cancer. At the 50% threshold, AI-CAD showed significantly higher specificity (95.0% vs 86.2%), accuracy (91.7% vs 87.2%), and PPV (85.1% vs 69.5%) than radiologists (all p < 0.001). AUCs were comparable (AI-CAD: 0.886; radiologists: 0.882; p = 0.75). In symptomatic patients, AUC was significantly higher than radiologists (0.873 vs 0.815, p = 0.002); in BI-RADS 3 follow-ups and asymptomatic imaging-detected abnormalities, specificity was improved with a tradeoff in lower sensitivity. AI-CAD demonstrated diagnostic performance comparable to radiologists in mammography and, at an optimized threshold, offered superior specificity, PPV, and accuracy. Especially in symptomatic patients, a higher threshold increased diagnostic performance without compromising sensitivity. Question AI-CAD has the potential to be applied for diagnostic mammography by applying different thresholds. Findings Using an optimized threshold, AI-CAD demonstrated higher specificity, accuracy, and positive predictive value compared to radiologists. Clinical relevance When an optimized threshold is applied, AI-CAD shows comparable performance to radiologists, with higher specificity, accuracy, and positive predictive value.