Adding artificial intelligence case malignancy scoring to reduce screen-reading workload in breast screening program: results of the retrospective REAI program.
Authors
Affiliations (7)
Affiliations (7)
- Medical Physics Unit - Oncology and Innovative Technologies Department, Azienda USL - IRCCS di Reggio Emilia, Reggio Emilia, Italy.
- Epidemiology Unit, Azienda USL - IRCCS di Reggio Emilia, 42123, Reggio Emilia, Italy.
- Medical Physics Unit - Oncology and Innovative Technologies Department, Azienda USL - IRCCS di Reggio Emilia, Reggio Emilia, Italy. [email protected].
- Enzo Ferrari Department of Engineering, University of Modena and Reggio Emilia, Modena, Italy. [email protected].
- Oncology Screening Center, Azienda USL - IRCCS di Reggio Emilia, Reggio Emilia, Italy.
- Radiology Unit, Department of Diagnostic Imaging, Azienda USL - IRCCS di Reggio Emilia, Reggio Emilia, Italy.
- Department of Medical and Surgical Sciences, University of Modena and Reggio Emilia, Modena, Italy.
Abstract
The AI case malignancy score (AI-CMS) represents the AI algorithm's confidence (from 0 to 100%) that a mammography exam is malignant. This work aims to retrospectively evaluate, through simulation on real-world data, a strategy that integrates AI-CMS into a standard screening scenario to reduce the radiologists' workload. A total of 89176 consecutive screening exams from the 2023-2024 Reggio Emilia Breast Screening Program (REBSP) were retrospectively considered, which included 479 biopsy-proven cancers (interval cancers were only partially available, therefore false negatives beyond those detected in the real screening workflow could not be assessed). In the proposed strategy, computer-aided detection (CAD) acts as a reader (CR), recalling women with an AI-CMS greater than a predefined threshold (ranging from 5 to 25%). If the first radiologist (HR1) disagrees with CR, the case goes to a second radiologist (HR2) and, in case of human disagreement, to a third radiologist (HR3). For each threshold, final recall rate (RR), cancer detection rate (CDR), number of detected cancers (DC), predictive positive value (PPV) of recalls, false positive rate (FPR), human reading workload, and economic impact were estimated. At AI-CMS thresholds of 5%, 8%, 10%, 15%, 20%, and 25%, human workload decrease ranged from 13.4% to 36.1%. The final RR decreased between 4.3% and 4.0%, slightly lower than the current 4.4% with human double reading. The PPV ranged from 12.6% to 13.3%, higher than the current PPV of 12.2%. The FPR ranged from 3.8% to 3.5%, down from the current 3.9%. With thresholds up to 5%, no true positive cases were missed, maintaining the CDR of 5.4‰ of those detected by current double reading. Considering CAD payback periods of either 6 or 8 years, financial savings from our strategy ranged from approximately 17800 to over 590,000€. Integrating AI-CMS support into a standard screening scenario could substantially reduce the screen-reading workload and slightly reduce unnecessary ascertainments without affecting the cancer detection rate. This approach, although limited by its retrospective simulation design and the partial availability of interval cancer data, has also proven to be economically sustainable.