Large language models demonstrate promising performance in automating PI-RADS classification from structured prostate MRI reports, with some limitations in intermediate-risk lesions.
Key Details
- 1Study included 146 structured prostate MRI reports from October 2023 to October 2024.
- 2Four LLMs compared: GPT-4o, GPT-o1, Google Gemini 1.5 Pro, Google Gemini 2.0 Experimental Advanced.
- 3Radiologist consensus used as ground truth; Cohen's kappa measured agreement.
- 4GPT-o1 achieved the highest agreement (kappa = 0.87) and perfect F1 score (1.00) for high-risk PI-RADS category.
- 5All LLMs struggled with PI-RADS 3 (equivocal risk) category (F1 scores 0.53–0.75).
- 6Authors recommend further multicenter validation and larger datasets before clinical adoption.
Why It Matters

Source
AuntMinnie
Related News

GPT-4o Outperforms Radiologists in CT Protocoling With Prompt Engineering
GPT-4o, with prompt engineering, selected optimal abdominal/pelvic CT protocols more accurately than radiologists without increasing inappropriate selections.

Deep Learning Improves Dual-Energy CT Venography for Liver Transplants
Deep-learning image reconstruction significantly enhances image quality in dual-energy CT portal venography, aiding liver transplant planning.

AI Enhancement Dramatically Improves Quality of Suboptimal Chest CTs
AI-powered image enhancement significantly boosts the diagnostic quality of suboptimal chest CT and CTPA studies.