Large language models demonstrate promising performance in automating PI-RADS classification from structured prostate MRI reports, with some limitations in intermediate-risk lesions.
Key Details
- 1Study included 146 structured prostate MRI reports from October 2023 to October 2024.
- 2Four LLMs compared: GPT-4o, GPT-o1, Google Gemini 1.5 Pro, Google Gemini 2.0 Experimental Advanced.
- 3Radiologist consensus used as ground truth; Cohen's kappa measured agreement.
- 4GPT-o1 achieved the highest agreement (kappa = 0.87) and perfect F1 score (1.00) for high-risk PI-RADS category.
- 5All LLMs struggled with PI-RADS 3 (equivocal risk) category (F1 scores 0.53–0.75).
- 6Authors recommend further multicenter validation and larger datasets before clinical adoption.
Why It Matters

Source
AuntMinnie
Related News

Study Reveals Women's Willingness to Pay for AI in Mammography
Awareness of AI accuracy, error rates, and advertising affect women's out-of-pocket willingness to pay for AI-supported mammography.

Deep Learning Chest X-ray Aging Estimates Predict Mortality Risk
Chest x-ray-based biologic aging and aging velocity estimated by deep learning are linked to all-cause and disease-specific mortality.

Lucida Medical Raises $11M for AI-Based Prostate MRI Diagnosis Expansion
Lucida Medical, specializing in AI-assisted prostate cancer diagnosis via MRI, raises $11.4M to drive US FDA approval and platform expansion.