Back to all papers

Evaluation of AI for prostate cancer detection in biparametric-MRI screening population data.

December 8, 2025pubmed logopapers

Authors

Langkilde F,Gren M,Wallström J,Kuczera S,Maier SE

Affiliations (5)

  • Department of Radiology, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden. [email protected].
  • Department of Radiology, Sahlgrenska University Hospital, Gothenburg, Sweden. [email protected].
  • Department of Radiology, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden.
  • Department of Radiology, Sahlgrenska University Hospital, Gothenburg, Sweden.
  • Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.

Abstract

The goal of this study was to curate a prostate MRI dataset from a screening population and to train and evaluate a deep-learning segmentation method on the same data. An artificial intelligence (AI) system, based on a deep-learning-based segmentation model (nnU-Net method), was trained and evaluated with MRI data from a prostate cancer screening population (G2-trial). The goal of the AI was to detect clinically significant prostate cancer (csPC), defined as International Society of Urological Pathology (ISUP) grade 2 or higher. The AI system was compared to the performance of radiologists using PI-RADS v2 evaluation metrics. Histopathology was used as the reference standard in the dataset. To better verify negative cases, 288 men were subject to systematic biopsies regardless of MRI findings, and all men had at least 3 years of follow-up. A total of 1354 MRI examinations in 1254 men with a median age of 58 years (range 50-63 years) were randomly divided into a training set (1086 examinations) and a test set (268 examinations). The resulting area under the receiver operating characteristic curve (AUROC) was 0.83 (95% CI 0.73-0.92) for the AI system; however, with significantly lower specificity at matched sensitivity levels compared to radiologists. A prostate MRI dataset from a screening population with histological confirmation was curated and evaluated with AI. The neural network trained and tested on this data produced lower specificities than the radiologists. Question Does an AI system trained in a screening cohort perform as well as radiologists? Findings An AI trained on screening data achieved an AUROC of 0.83 (95% CI 0.73-0.92) with lower specificity at the same sensitivity levels as radiologists. Clinical relevance An AI system trained in a screening population has lower specificity than radiologists using PI-RADS v2.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 7,100+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.