Back to all news

Multimodal LLMs Achieve High Accuracy Detecting Scoliosis on X-rays

Multimodal LLMs achieved up to 94% accuracy for scoliosis detection on spine x-rays, but struggled with lumbar stenosis on MRI.

Key Details

  • 1Five multimodal LLMs tested: Grok 2, 3, 4, ChatGPT 4o, Gemini 1.5 Flash.
  • 2171 spine x-rays (100 scoliosis, 71 normal) and 200 lumbar spine MRIs (100 severe stenosis, 100 normal) used in the study.
  • 3Best x-ray result: Grok 4 with 94.2% accuracy for scoliosis detection; best MRI result: Gemini at 60% for stenosis.
  • 4ChatGPT 4o showed better confidence calibration when incorrect, considered a 'superior metacognitive capability.'
  • 5Authors emphasize LLMs not ready for clinical diagnosis; highlight potential for patient education in obvious cases.
  • 6Study published in World Neurosurgery on May 2, 2024.

Why It Matters

As patients increasingly use commercial LLMs for medical advice, understanding their capabilities and risks in radiology is crucial. These results highlight both the promise and current limitations of generalist AI in medical image interpretation, especially for more subtle pathologies.

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.