Back to all papers

Diagnostic accuracy of artificial intelligence for spinopelvic parameters in standing total spine X-ray and limitations after fusion surgery.

July 4, 2026pubmed logopapers

Authors

Winkler T,Khakzad T,Pumberger M,Schömig F,Becher M,Diekhoff T

Affiliations (4)

  • Department of Radiology, Charité - Universitätsmedizin Berlin, Campus Mitte, Humboldt-Universität Zu Berlin, Freie Universität Berlin, Charitéplatz 1, 10117, Berlin, Germany.
  • Center for Musculoskeletal Surgery, Charité - Universitätsmedizin Berlin, Campus Mitte, Humboldt-Universität Zu Berlin, Freie Universität Berlin, Charitéplatz 1, 10117, Berlin, Germany.
  • Institute of Biometry and Clinical Epidemiology, Charité - Universitätsmedizin Berlin, Campus Mitte, Humboldt-Universität Zu Berlin, Freie Universität Berlin, Charitéplatz 1, 10117, Berlin, Germany.
  • Department of Radiology, Immanuel Clinic Ruedersdorf, Brandenburg Medical School, Ruedersdorf Bei Berlin, Germany. [email protected].

Abstract

To evaluate the accuracy of a commercial deep learning algorithm in measuring spinopelvic parameters on full spine radiographs. This retrospective study analyzed total spine X-rays from a clinical cohort, assessing spinopelvic parameters (coronal Cobb angle, thoracic kyphosis, lumbar lordosis, pelvic incidence, sagittal vertical axis). Measurements were performed by clinical radiologists, a trained research reader serving as ground truth, and a commercial AI software. Inter-rater reliability was determined via intraclass correlation coefficients between an orthopedic surgeon and the trained reader (n = 50 images). Performance of both the AI and radiologists was statistically compared to the ground truth using mean absolute error, intraclass correlation coefficients, the Spearman correlation, and diagnostic accuracy metrics (sensitivity, specificity, predictive values). Four hundred ninety-five radiographs were analyzed. Inter-rater reliability between human raters, assessed via intraclass correlation coefficients, was excellent (0.94-1). Agreement between the algorithm and ground truth, also assessed via intraclass correlation coefficients, was good to excellent (0.79-0.99), except for lumbar lordosis (0.68, moderate). Mean absolute errors were lowest for coronal Cobb angles (4,6°; 95% confidence interval = 3.6-5.5°) and highest for lumbar lordosis (8.1°; 95% confidence interval = 6.9-9.3°). The Spearman rank correlations ranged from 0.74 to 0.99, and sensitivity was moderate to excellent (72.6-94.8). These results closely matched those obtained when comparing radiologists to ground truth. In a subgroup of patients with prior spinal fusion surgery, the correlation between the algorithm predictions and the ground truth was reduced, and measurement deviations were higher compared to non-instrumented patients. The commercial AI software predicted most spinopelvic parameters with good reliability and accuracy, coinciding with radiologists in clinical practice, however, showed limitations in patients with spinal instrumentation.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.