Back to all papers

Deep learning classification of significant prostate cancer on MRI: a systematic review and meta-analysis.

June 27, 2026pubmed logopapers

Authors

Chung C,Michalak J,Rubinov Y,Watts K,Kanmaniraja D,Baca A,Hou W,Duong T

Affiliations (2)

  • Albert Einstein College of Medicine, The Bronx, United States.
  • Albert Einstein College of Medicine, The Bronx, United States. [email protected].

Abstract

Deep learning (DL) image analysis of prostate MRI can be integrated into the prostate cancer diagnostic pathway to improve biopsy decision making, reduce radiologist interpretation time, and decrease inter-reader variability. In this paper, we systematically review the current literature on DL classification of clinically significant prostate cancer detected on magnetic resonance imaging and perform a meta-analysis of reported external validation performance metrics. A literature search was performed on PubMed, Embase, and ClinicalTrials.gov to identify studies describing end-to-end DL classification models that distinguish clinically significant prostate cancer (defined as Gleason Grade Group ≥ 2) on biparametric or multiparametric MRI. Information on the data sources used for model training and external validation, reference standards, imaging sequences, DL architectures, model inputs, the use of gradient-weighted class activation mapping (GRAD-CAM) or saliency maps, and the area under the receiver operating characteristic curve (AUC) was extracted. Out of 387 potentially relevant studies, 7 were included in the final meta-analysis. 6 studies were retrospective in design and performed external validation using data from different institutions. One study employed both retrospective and prospective designs, validating the model on the prospective dataset. The pooled patient-level AUC was 0.83 [0.80, 0.86]. While current performance is promising, future research should prioritize prospective clinical trials, standardize reporting measures, and include direct comparisons to experienced radiologists. To ensure the development of widely generalizable models, future studies should evaluate performance on shared, large, and diverse datasets, enabling meaningful cross-study comparisons.

Topics

Journal ArticleReview

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.