Large language models demonstrate promising performance in automating PI-RADS classification from structured prostate MRI reports, with some limitations in intermediate-risk lesions.
Key Details
- 1Study included 146 structured prostate MRI reports from October 2023 to October 2024.
- 2Four LLMs compared: GPT-4o, GPT-o1, Google Gemini 1.5 Pro, Google Gemini 2.0 Experimental Advanced.
- 3Radiologist consensus used as ground truth; Cohen's kappa measured agreement.
- 4GPT-o1 achieved the highest agreement (kappa = 0.87) and perfect F1 score (1.00) for high-risk PI-RADS category.
- 5All LLMs struggled with PI-RADS 3 (equivocal risk) category (F1 scores 0.53–0.75).
- 6Authors recommend further multicenter validation and larger datasets before clinical adoption.
Why It Matters

Source
AuntMinnie
Related News

AI Tool Enhances Quality and Efficiency of MSK Ultrasound for Novices
A China-developed MSK ultrasound AI tool significantly improves scanning quality and efficiency for novice sonologists.

AI Systems Predict Breast Cancer Risk Years Before Diagnosis
AI-enabled mammography systems can flag individuals at increased risk for breast cancer up to six years ahead of diagnosis, outperforming traditional risk measures.

AI Models Show High Sensitivity but Moderate Specificity for Lung Nodule Classification on CT
AI shows high sensitivity but only moderate specificity in classifying lung cancer nodules on CT, indicating a role as a rule-out adjunct.