Stanford and Mayo Clinic Arizona researchers demonstrated that LLMs like GPT-4 can categorize critical findings in radiology reports using few-shot prompting.
Key Details
- 1GPT-4 and Mistral-7B LLMs tested for classifying critical findings in radiology reports from ICU patients.
- 2252 MIMIC-III reports (mixed modalities: 56% CT, ~30% x-ray, 9% MRI) and 180 external chest x-ray reports evaluated.
- 3LLMs categorized findings as true, known/expected, or equivocal critical findings.
- 4GPT-4 achieved 90.1% precision and 86.9% recall for true critical findings in internal test set; 82.6% precision and 98.3% recall in external test set.
- 5Mistral-7B showed lower precision (75.6%) but comparable recall (77.4%-93.1%).
- 6Study highlights few-shot prompting as an efficient strategy; real-world deployment requires further refinement.
Why It Matters
This research suggests that off-the-shelf LLMs could automate detection of urgent findings in radiology workflow with minimal manual annotation, potentially easing communication bottlenecks and improving patient safety. Further development and EHR integration will be needed for clinical implementation.

Source
AuntMinnie
Related News

•AuntMinnie
AI-Based Slab Reconstruction Streamlines Digital Breast Tomosynthesis
AI-driven slab reconstruction in DBT improves workflow efficiency without compromising diagnostic accuracy in breast cancer screening.

•AuntMinnie
AI Model Predicts Dosimetry for Lu-177 PSMA Therapy Using PET/CT
A machine learning PET/CT model shows promise for predicting radiation dose prior to Lu-177 PSMA therapy in prostate cancer patients.

•AuntMinnie
AI Model Uses Ultrasound to Assess Fetal Lung Maturity
Researchers demonstrated an AI model's strong accuracy in measuring fetal lung maturity from ultrasound images.