Stanford and Mayo Clinic Arizona researchers demonstrated that LLMs like GPT-4 can categorize critical findings in radiology reports using few-shot prompting.
Key Details
- 1GPT-4 and Mistral-7B LLMs tested for classifying critical findings in radiology reports from ICU patients.
- 2252 MIMIC-III reports (mixed modalities: 56% CT, ~30% x-ray, 9% MRI) and 180 external chest x-ray reports evaluated.
- 3LLMs categorized findings as true, known/expected, or equivocal critical findings.
- 4GPT-4 achieved 90.1% precision and 86.9% recall for true critical findings in internal test set; 82.6% precision and 98.3% recall in external test set.
- 5Mistral-7B showed lower precision (75.6%) but comparable recall (77.4%-93.1%).
- 6Study highlights few-shot prompting as an efficient strategy; real-world deployment requires further refinement.
Why It Matters

Source
AuntMinnie
Related News

Experts Urge Development of Generalist Radiology AI to Cut Costs and Improve Care
Leading scientists advocate for broader, generalist radiology AI models to overcome limitations of narrow, single-task solutions.

GE HealthCare Acquires icometrix to Bolster MRI Neurology AI
GE HealthCare is acquiring icometrix to expand its AI-powered MRI neuroimaging capabilities and integrate advanced analytics into its global product ecosystem.

Experts Outline Framework and Benefits for Generalist Radiology AI
Researchers propose key features and benefits for implementing generalist radiology AI (GRAI) frameworks over narrow AI tools.