Back to all news

Study Evaluates LLMs for Automated PI-RADS Classification in Prostate MRI Reports

AuntMinnieIndustry

Large language models demonstrate promising performance in automating PI-RADS classification from structured prostate MRI reports, with some limitations in intermediate-risk lesions.

Key Details

  • 1Study included 146 structured prostate MRI reports from October 2023 to October 2024.
  • 2Four LLMs compared: GPT-4o, GPT-o1, Google Gemini 1.5 Pro, Google Gemini 2.0 Experimental Advanced.
  • 3Radiologist consensus used as ground truth; Cohen's kappa measured agreement.
  • 4GPT-o1 achieved the highest agreement (kappa = 0.87) and perfect F1 score (1.00) for high-risk PI-RADS category.
  • 5All LLMs struggled with PI-RADS 3 (equivocal risk) category (F1 scores 0.53–0.75).
  • 6Authors recommend further multicenter validation and larger datasets before clinical adoption.

Why It Matters

Automating PI-RADS classification could streamline prostate MRI reporting, reduce variability, and enhance workflow efficiency for radiologists. However, challenges in intermediate-risk lesion classification underscore the need for more robust validation before real-world clinical use.

Ready to Sharpen Your Edge?

Subscribe to join 8,000+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.