Multimodal LLMs achieved up to 94% accuracy for scoliosis detection on spine x-rays, but struggled with lumbar stenosis on MRI.
Key Details
- 1Five multimodal LLMs tested: Grok 2, 3, 4, ChatGPT 4o, Gemini 1.5 Flash.
- 2171 spine x-rays (100 scoliosis, 71 normal) and 200 lumbar spine MRIs (100 severe stenosis, 100 normal) used in the study.
- 3Best x-ray result: Grok 4 with 94.2% accuracy for scoliosis detection; best MRI result: Gemini at 60% for stenosis.
- 4ChatGPT 4o showed better confidence calibration when incorrect, considered a 'superior metacognitive capability.'
- 5Authors emphasize LLMs not ready for clinical diagnosis; highlight potential for patient education in obvious cases.
- 6Study published in World Neurosurgery on May 2, 2024.
Why It Matters

Source
AuntMinnie
Related News

AI Advances in Ultrasound Highlighted at AIUM 2026 Keynote
AI is increasingly enhancing ultrasound imaging, clinical workflows, and education, though challenges in trust and implementation remain.

LLM AI Significantly Boosts MRI Accuracy For Less Experienced Readers
AI LLMs notably improve diagnostic accuracy for less experienced brain MRI readers, with diminishing benefits for experts.

AI Concerns Influence Medical Students' Interest in Radiology
AI is deterring a significant portion of medical students from choosing radiology as a career, though most remain optimistic about AI's benefits for the field.