Deep learning classification of significant prostate cancer on MRI: a systematic review and meta-analysis.
Authors
Affiliations (2)
Affiliations (2)
- Albert Einstein College of Medicine, The Bronx, United States.
- Albert Einstein College of Medicine, The Bronx, United States. [email protected].
Abstract
Deep learning (DL) image analysis of prostate MRI can be integrated into the prostate cancer diagnostic pathway to improve biopsy decision making, reduce radiologist interpretation time, and decrease inter-reader variability. In this paper, we systematically review the current literature on DL classification of clinically significant prostate cancer detected on magnetic resonance imaging and perform a meta-analysis of reported external validation performance metrics. A literature search was performed on PubMed, Embase, and ClinicalTrials.gov to identify studies describing end-to-end DL classification models that distinguish clinically significant prostate cancer (defined as Gleason Grade Group ≥ 2) on biparametric or multiparametric MRI. Information on the data sources used for model training and external validation, reference standards, imaging sequences, DL architectures, model inputs, the use of gradient-weighted class activation mapping (GRAD-CAM) or saliency maps, and the area under the receiver operating characteristic curve (AUC) was extracted. Out of 387 potentially relevant studies, 7 were included in the final meta-analysis. 6 studies were retrospective in design and performed external validation using data from different institutions. One study employed both retrospective and prospective designs, validating the model on the prospective dataset. The pooled patient-level AUC was 0.83 [0.80, 0.86]. While current performance is promising, future research should prioritize prospective clinical trials, standardize reporting measures, and include direct comparisons to experienced radiologists. To ensure the development of widely generalizable models, future studies should evaluate performance on shared, large, and diverse datasets, enabling meaningful cross-study comparisons.