Deep learning for clinically significant prostate cancer detection on MRI: a systematic review, HSROC meta-analysis, and direct comparison with PI-RADS-based interpretation.
Authors
Affiliations (2)
Affiliations (2)
- School of Medicine, College of Medicine and Health Sciences, Bahir Dar University, Bahir Dar, Ethiopia. [email protected].
- School of Health, Faculty of Medicine and Health, University of New England, Armidale, Australia.
Abstract
To estimate patient-level diagnostic accuracy of deep learning (DL) for MRI-based detection of clinically significant prostate cancer (csPCa), assess heterogeneity and clinical-readiness signals, and compare DL-alone, PI-RADS-alone, and AI-assisted/DL + PI-RADS interpretation where direct comparator data were available. Following PRISMA-DTA, MEDLINE, Embase, and Web of Science were searched from 2010 to June 2025 for studies reporting patient-level 2 × 2 diagnostic accuracy data for DL applied to prostate MRI. Risk of bias was assessed using QUADAS-2. Pooled sensitivity and specificity were estimated using bivariate random-effects and HSROC models, with prespecified subgroup, meta-regression, and sensitivity analyses. Deeks' funnel plot asymmetry test assessed publication bias and small-study effects. A secondary direct three-way comparative analysis was performed in studies reporting DL-alone, PI-RADS-alone, and AI-assisted/DL + PI-RADS data within the same or closely matched cohorts. AI-specific reporting and clinical-readiness signals were mapped using items adapted from STARD-AI, CLAIM, and DECIDE-AI. Thirty-six studies including 9,411 patients were included. Pooled sensitivity was 0.91 (95% CI, 0.89-0.93), specificity was 0.55 (95% CI, 0.46-0.64), LR + was 2.04, and LR - was 0.16. Sensitivity was relatively consistent, whereas specificity varied widely, with a broad HSROC prediction region indicating limited transportability of pooled specificity. Deeks' test showed no significant funnel plot asymmetry (p = 0.393). Sensitivity analyses excluding MRI + clinical-variable hybrid models and small cohorts produced similar estimates. In nine directly comparative studies, sensitivity was similar across groups, while specificity was highest for AI-assisted/DL + PI-RADS and lowest for PI-RADS-alone. DL for prostate MRI shows high sensitivity and low LR-, supporting a rule-out or assistive role. However, moderate and variable specificity limits stand-alone rule-in use. Combined AI/DL + PI-RADS workflows may reduce false positives, but prospective validation, calibration, interpretability evaluation, and patient-level safety studies are needed.