Back to all papers

Artificial Intelligence Versus Radiologist False Positives on Digital Breast Tomosynthesis Examinations in a Population-Based Screening Program.

Authors

Shahrvini T,Wood EJ,Joines MM,Nguyen H,Hoyt AC,Chalfant JS,Capiro NM,Fischer CP,Sayre J,Hsu W,Milch HS

Affiliations (3)

  • David Geffen School of Medicine at UCLA, Los Angeles, CA 90095, USA.
  • Department of Radiology, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095, USA.
  • Department of Bioengineering at UCLA, Los Angeles, CA 90095, USA.

Abstract

<b>Background:</b> Insights into the nature of false-positive findings flagged by contemporary mammography artificial intelligence (AI) systems could inform the potential use of AI to reduce false-positive recall rates. <b>Objective:</b> To compare AI and radiologists in terms of characteristics of false-positive digital breast tomosynthesis (DBT) examinations in a breast cancer screening population. <b>Methods:</b> This retrospective study included 2977 women (mean age, 58 years) participating in an observational population-based screening study who underwent 3183 screening DBT examinations from January 2013 to June 2017. A commercial AI tool analyzed DBT examinations. Positive examinations were defined for AI as an elevated-risk result and for interpreting radiologists as BI-RAD category 0. False-positive examinations were defined as the absence of a breast cancer diagnosis within 1 year. Radiologists re-reviewed the imaging for AI-flagged false-positive findings. <b>Results:</b> The false-positive rate was 10% for both AI (308/3183) and radiologists (304/3183). Of 541 total false-positive examinations, 233 (43%) were false positives for AI only, 237 (44%) for radiologists only, and 71 (13%) for both. AI-only versus radiologist-only false positives were associated with greater mean patient age (60 vs 52 years, p<.001), lower frequency of dense breasts (24% vs 57%, p<.001), and greater frequencies of a personal history of breast cancer (13% vs 4%, p<.001), prior breast imaging studies (95% vs 78%, p<.001), and prior breast surgical procedures (37% vs 11%, p<.001). The false-positive examinations included 932 AI-only flagged findings, 315 radiologist-only flagged findings, and 49 flagged findings concordant between AI and radiologists. AI-only flagged findings were most commonly benign calcifications (40%), asymmetries (13%), and benign postsurgical change (12%); radiologist-only flagged findings were most commonly masses (47%), asymmetries (19%), and indeterminate calcifications (15%). Of 18 concordant flagged findings undergoing biopsy, 44% yielded high-risk lesions. <b>Conclusion:</b> Imaging and patient-level differences were observed between AI and radiologist false-positive DBT examinations. Although only a small fraction of false-positive examinations overlapped between AI and radiologists, concordant flagged findings had a high rate of representing high-risk lesions. <b>Clinical Impact:</b> The findings may help guide strategies for using AI to improve DBT recall specificity. In particular, concordant findings may represent an enriched subset of actionable abnormalities.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.