Back to all papers

Large language models with image processing in automated Cobb angle.

February 20, 2026pubmed logopapers

Authors

Gibson J,Kharwadkar S,Lam C,Harland W,Jones M,Botchu R

Affiliations (5)

  • College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK.
  • King's College London GKT School of Medical Education, King's College London, London, UK.
  • Department of Orthopaedics, St George's University Hospitals NHS Foundation Trust, London, UK.
  • Department of Spinal Surgery, Royal Orthopaedic Hospital, Birmingham, UK.
  • Department of Musculoskeletal Radiology, Royal Orthopaedic Hospital, Birmingham, UK. [email protected].

Abstract

The degree of scoliosis is assessed through the Cobb angle, which quantifies severity and is measured by clinicians on radiographs. With the increasing adoption of artificial intelligence (AI) in clinical workflows, there is uncertainty as to whether large language models (LLMs) with image processing capabilities can streamline and improve spinal deformity classification. This study aims to assess the diagnostic capabilities of 4 leading LLMs: ChatGPT, Gemini, Perplexity and Grok in calculating Cobb angles from radiographs. A cross-sectional analysis of 122 scoliosis patients was undertaken. Cobb angles were independently calculated using Horos software by a fellowship-trained radiologist, serving as the reference standard. All 122 radiographs were further uploaded to each of the 4 AI models to identify the type of scoliosis, generate a Cobb angle overlay and calculate the Cobb angle. Qualitative usability was assessed through pre-defined questions ranked on a Likert scale. Statistical tests included mean difference, paired t-tests and intraclass correlation coefficients. Gemini produced no calculated Cobb angles. ChatGPT failed to produce Cobb angles in 90 radiographs, and, even when Cobb angles were calculated, there were large errors (MAE 58.6° ± 45.9°). Both Perplexity and Grok generated estimates for all thoracolumbar cases, with mean differences of 18.8° (± 13.3°) and 24.2° (± 18.3°), respectively. None of the AI models successfully identified the S-shaped scoliosis cases. All AI models demonstrated a difference greater than the clinically accepted difference (≤ 10%). This study concludes that current commercially available LLMs show limited accuracy in Cobb angle measurement. Whilst out of the 4 AI models assessed, Perplexity and Grok displayed the highest performance, no model displayed an acceptable clinical ability. These findings highlight the need for a dedicated and rigorous development of a spinal deformity AI tool before clinical integration of Cobb angle determination.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.