Performance of artificial intelligence in breast cancer screening programmes: a systematic review.
Authors
Affiliations (4)
Affiliations (4)
- Royal College of Surgeons in Ireland and Medical University of Bahrain, Muharraq, Busaiteen, Bahrain [email protected].
- NHS North West, Manchester, Greater Manchester, UK.
- Library & Learning Resource Centre, Royal College of Surgeons in Ireland Medical University of Bahrain, Muharraq, Busaiteen, Bahrain.
- Royal College of Surgeons in Ireland Medical University of Bahrain, Muharraq, Bisaiteen, Bahrain.
Abstract
With growing interest in applying artificial intelligence (AI) to population breast cancer screening, the evidence base has expanded rapidly. This systematic review aims to systematically review and summarise the published evidence on the use of AI in breast cancer screening. We conducted a systematic review of primary studies assessing AI for screening mammography, extracting test-accuracy metrics (sensitivity, specificity, recall and cancer detection rates) and workflow outcomes. We searched the Cochrane Breast Cancer Group Specialised Register, Cochrane CENTRAL, PubMed, Embase (Elsevier), Scopus, ClinicalTrials.gov and the WHO International Clinical Trials Registry Platform from January 2012 to June 2025; we also screened reference lists of included studies and relevant reviews. No language restrictions were applied. Primary studies evaluating AI for screening mammography (digital mammography or digital breast tomosynthesis) in asymptomatic women, assessing AI as a standalone reader or AI-assisted radiologist workflows versus radiologists alone. Eligible designs included randomised trials, prospective paired reader studies, real-world implementation/registry cohorts, retrospective cohorts and multireader-multicase reader studies conducted in population-based or opportunistic screening settings. Key outcomes included diagnostic accuracy metrics (eg, sensitivity, specificity, Area Under the Curve (AUC) and/or programme metrics (cancer detection rate (CDR), recall/abnormal interpretation rate, positive predictive value, arbitration/workload). We excluded protocols, pilot/feasibility studies, case reports, editorials and studies without relevant accuracy or screening outcomes. Two independent reviewers extracted data and assessed risk of bias. Study quality was appraised with Quality Assessment of Diagnostic Accuracy Studies-2 and an AI-specific critical appraisal tool, and findings were synthesised narratively with stratification by study design and AI integration role. 31 studies met the inclusion criteria, encompassing randomised controlled trials, prospective paired-reader studies, registry-based implementations and retrospective simulations, representing more than two million screening examinations across Europe, Asia, North America and Australia. When used as a second reader or within double-reading workflows, AI generally maintained or modestly increased sensitivity (up to +9 percentage points (PP)) while preserving or improving specificity. Triage and decision-referral configurations delivered the greatest operational benefit, reducing reading volumes by 40-90% while maintaining non-inferior cancer detection when thresholds were conservatively calibrated. Stand-alone AI achieved AUC values comparable to radiologists and similar cancer detection in real-world, non-enriched cohorts, although interval-cancer follow-up remains incomplete in several datasets. In prospective randomised evidence, including the Mammography Screening with Artificial Intelligence trial (MASAI) trial, AI-supported screening achieved higher CDRs (6.4 versus 5.0 per 1000; p=0.0021) with stable or reduced false-positive and recall rates. Across implementation and simulation settings, integration of AI reduced radiologist workload substantially, with triage and band-pass approaches reducing the number of reads by approximately 40-90%. Overall certainty is limited by heterogeneity across study designs, reliance on enriched datasets for some accuracy estimates and incomplete interval-cancer follow-up in several major studies. Contemporary AI systems show diagnostic performance that is broadly comparable to radiologists and can substantially reduce reading workload, particularly when used as a second reader or triage tool. Emerging prospective evidence supports their safe integration in these roles, although transparent reporting, standardised evaluation and long-term population studies are still required before considering AI as a stand-alone reader. AI may improve workflow efficiency and possibly cancer detection, but definitive evidence on safety, especially interval cancer outcomes, remains essential.