Keeping AI on Track: Regular monitoring of algorithmic updates in mammography.

Authors

Taib AG,James JJ,Partridge GJW,Chen Y

Affiliations (3)

  • Translational Medical Sciences, School of Medicine, University of Nottingham, Clinical Sciences Building, Nottingham City Hospital, Nottingham NG5 1PB, United Kingdom.
  • Nottingham Breast Institute, Nottingham University Hospitals NHS Trust, Nottingham NG5 1PB England, United Kingdom.
  • Translational Medical Sciences, School of Medicine, University of Nottingham, Clinical Sciences Building, Nottingham City Hospital, Nottingham NG5 1PB, United Kingdom. Electronic address: [email protected].

Abstract

To demonstrate a method of benchmarking the performance of two consecutive software releases of the same commercial artificial intelligence (AI) product to trained human readers using the Personal Performance in Mammographic Screening scheme (PERFORMS) external quality assurance scheme. In this retrospective study, ten PERFORMS test sets, each consisting of 60 challenging cases, were evaluated by human readers between 2012 and 2023 and were evaluated by Version 1 (V1) and Version 2 (V2) of the same AI model in 2022 and 2023 respectively. Both AI and humans considered each breast independently. Both AI and humans considered the highest suspicion of malignancy score per breast for non-malignant cases and per lesion for breasts with malignancy. Sensitivity, specificity, and area under the receiver operating characteristic curve (AUC) were calculated for comparison, with the study powered to detect a medium-sized effect (odds ratio, 3.5 or 0.29) for sensitivity. The study included 1,254 human readers, with a total of 328 malignant lesions, 823 normal, and 55 benign breasts analysed. No significant difference was found between the AUCs for AI V1 (0.93) and V2 (0.94) (p = 0.13). In terms of sensitivity, no difference was observed between human readers and AI V1 (83.2 % vs 87.5 % respectively, p = 0.12), however V2 outperformed humans (88.7 %, p = 0.04). Specificity was higher for AI V1 (87.4 %) and V2 (88.2 %) compared to human readers (79.0 %, p < 0.01 respectively). The upgraded AI model showed no significant difference in diagnostic performance compared to its predecessor when evaluating mammograms from PERFORMS test sets.

Topics

MammographyBreast NeoplasmsArtificial IntelligenceAlgorithmsRadiographic Image Interpretation, Computer-AssistedJournal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.