Back to all papers

Diagnostic Accuracy of Deep Learning for Automated Detection of Spinal Degenerative Disease on MRI: A Systematic Review and Meta-Analysis.

March 9, 2026pubmed logopapers

Authors

Gete KY,Durga P,Bekele BA,Tesfay RH,Jibat N,Assefa AB,Gebre ME,Yimer KA,Kassahun NB,Ayele BA

Affiliations (10)

  • School of Medicine, College of Medicine and Health Sciences, Bahir Dar University, Bahir Dar, Ethiopia. [email protected].
  • EPIC Health Systems, Addis Ababa, Ethiopia. [email protected].
  • Indira Gandhi Government Medical College & Hospital, Nagpur, Maharashtra, India.
  • School of Public Health, Washington University in St. Louis, Saint Louis, MO, USA.
  • School of Medicine, College of Health Sciences, Addis Ababa University, Addis Ababa, Ethiopia.
  • Johns Hopkins Bloomberg School of Public Health, Baltimore, USA.
  • School of Medicine, Hayat Medical College, Addis Ababa, Ethiopia.
  • Rollins School of Public Health, Emory University, Atlanta, GA, USA.
  • School of Medicine and Public Health, College of Health, Medicine and Wellbeing, The University of Newcastle, Newcastle, NSW, Australia.
  • Amhara Regional Health Bureau, Amhara, Ethiopia.

Abstract

This study aims to estimate the diagnostic accuracy of deep learning (DL) models for automated detection/classification of spinal degenerative disease (SDD) on spine MRI and explore clinically relevant heterogeneity. We searched Ovid MEDLINE, Ovid Embase and Web of Science (January 2010-5 December 2025) for diagnostic accuracy studies of DL applied to spine MRI with reconstructible 2 × 2 data (TP/FP/FN/TN). Risk of bias was assessed with QUADAS-2. Pooled sensitivity and specificity were synthesised using hierarchical bivariate/HSROC models with a prespecified arm-selection hierarchy. Prespecified subgroup/sensitivity analyses examined spinal region, severity threshold, validation type and target focus. Fourteen studies (2020-2025) were included from 2363 records. Sample sizes ranged from 29 to 2991. Overall pooled sensitivity was 0.94 (95% CI 0.89-0.97) and specificity 0.95 (0.90-0.97) (LR + 17.5; LR - 0.06). Stenosis-focused studies showed lower pooled sensitivity/specificity (0.88/0.92) than studies targeting broader degenerative changes (0.96/0.96). Excluding small studies (n ≤ 50) yielded similar estimates (sensitivity 0.95; specificity 0.95; 12 studies). No study was low risk across all QUADAS-2 domains; 9/14 had ≥ 1 high-risk domain. Deeks' test showed no evidence of small-study effects (p = 0.28). DL models show high pooled accuracy for SDD detection on MRI, but clinical readiness is constrained by risk of bias, predominantly retrospective single-centre designs, subjective reference standards and limited external validation; prospective multicentre evaluations with prespecified clinically meaningful thresholds are needed.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.