Large language models demonstrate promising performance in automating PI-RADS classification from structured prostate MRI reports, with some limitations in intermediate-risk lesions.
Key Details
- 1Study included 146 structured prostate MRI reports from October 2023 to October 2024.
- 2Four LLMs compared: GPT-4o, GPT-o1, Google Gemini 1.5 Pro, Google Gemini 2.0 Experimental Advanced.
- 3Radiologist consensus used as ground truth; Cohen's kappa measured agreement.
- 4GPT-o1 achieved the highest agreement (kappa = 0.87) and perfect F1 score (1.00) for high-risk PI-RADS category.
- 5All LLMs struggled with PI-RADS 3 (equivocal risk) category (F1 scores 0.53–0.75).
- 6Authors recommend further multicenter validation and larger datasets before clinical adoption.
Why It Matters
Automating PI-RADS classification could streamline prostate MRI reporting, reduce variability, and enhance workflow efficiency for radiologists. However, challenges in intermediate-risk lesion classification underscore the need for more robust validation before real-world clinical use.

Source
AuntMinnie
Related News

•Radiology Business
Study Finds Disparities in Access to Stroke Imaging AI Tools
Research shows access to AI stroke detection tools is concentrated in resource-rich hospitals despite Medicare incentives.

•Cardiovascular Business
AI Is Quietly Embedded in Cardiac Imaging Workflows
AI is now seamlessly integrated into cardiac imaging, often unnoticed by clinicians.

•Radiology Business
AI Tool Dramatically Reduces Breast MRI Scan Time
A new AI-enabled MRI technique significantly speeds up breast imaging while enhancing image quality and tumor detection.