At SIIM 2026, LLMs demonstrated high accuracy on radiology-based numerical tasks, particularly in extraction and judgment tests.
Key Details
- 1LLMs evaluated included Llama 3.1 8B, DeepSeek R1-distilled Llama 8B, OpenAI o1-mini, and OpenAI GPT 5-mini.
- 2Tasks tested involved extraction and judgment from DEXA, ultrasound, CT, and PET radiology reports.
- 3Most models, except Llama, achieved over 95% accuracy on extraction tasks; Llama ranged from 86% to 98.7%.
- 4GPT 5-mini achieved highest minimum accuracy (judgment tasks: 91.7%) among tested models.
- 5o1-mini and GPT 5-mini reached perfect accuracy in detecting osteoporosis and made no mathematical errors.
- 6Answer-only output formats reduced accuracy for Llama and DeepSeek, but not OpenAI models.
Why It Matters
Reliable numerical extraction and judgment by LLMs could streamline radiology workflows, increasing efficiency for repetitive data extraction. However, caution remains due to ongoing risks of non-mathematical and medical knowledge-based errors.

Source
AuntMinnie
Related News

•Radiology Business
AI-Generated Report Summaries Improve Patient Understanding at Emory Radiology
Emory University demonstrated that LLM-based tools can enhance patient comprehension of radiology reports.

•AuntMinnie
Cardiac MRI Plus AI Boosts Heart Risk Prediction Accuracy
A random-forest machine learning model leveraging cardiac MRI and patient health history improves prediction of major adverse cardiovascular events.

•AuntMinnie
AI Tool Enhances Quality and Efficiency of MSK Ultrasound for Novices
A China-developed MSK ultrasound AI tool significantly improves scanning quality and efficiency for novice sonologists.