Diagnostic Accuracy of Deep Learning for Automated Detection of Spinal Degenerative Disease on MRI: A Systematic Review and Meta-Analysis.

March 9, 2026

papers

DOI: 10.1007/s10278-026-01897-0 PMID: 41803519

Authors

Gete KY,Durga P,Bekele BA,Tesfay RH,Jibat N,Assefa AB,Gebre ME,Yimer KA,Kassahun NB,Ayele BA

Affiliations (10)

School of Medicine, College of Medicine and Health Sciences, Bahir Dar University, Bahir Dar, Ethiopia. [email protected].
EPIC Health Systems, Addis Ababa, Ethiopia. [email protected].
Indira Gandhi Government Medical College & Hospital, Nagpur, Maharashtra, India.
School of Public Health, Washington University in St. Louis, Saint Louis, MO, USA.
School of Medicine, College of Health Sciences, Addis Ababa University, Addis Ababa, Ethiopia.
Johns Hopkins Bloomberg School of Public Health, Baltimore, USA.
School of Medicine, Hayat Medical College, Addis Ababa, Ethiopia.
Rollins School of Public Health, Emory University, Atlanta, GA, USA.
School of Medicine and Public Health, College of Health, Medicine and Wellbeing, The University of Newcastle, Newcastle, NSW, Australia.
Amhara Regional Health Bureau, Amhara, Ethiopia.

Abstract

This study aims to estimate the diagnostic accuracy of deep learning (DL) models for automated detection/classification of spinal degenerative disease (SDD) on spine MRI and explore clinically relevant heterogeneity. We searched Ovid MEDLINE, Ovid Embase and Web of Science (January 2010-5 December 2025) for diagnostic accuracy studies of DL applied to spine MRI with reconstructible 2 × 2 data (TP/FP/FN/TN). Risk of bias was assessed with QUADAS-2. Pooled sensitivity and specificity were synthesised using hierarchical bivariate/HSROC models with a prespecified arm-selection hierarchy. Prespecified subgroup/sensitivity analyses examined spinal region, severity threshold, validation type and target focus. Fourteen studies (2020-2025) were included from 2363 records. Sample sizes ranged from 29 to 2991. Overall pooled sensitivity was 0.94 (95% CI 0.89-0.97) and specificity 0.95 (0.90-0.97) (LR + 17.5; LR - 0.06). Stenosis-focused studies showed lower pooled sensitivity/specificity (0.88/0.92) than studies targeting broader degenerative changes (0.96/0.96). Excluding small studies (n ≤ 50) yielded similar estimates (sensitivity 0.95; specificity 0.95; 12 studies). No study was low risk across all QUADAS-2 domains; 9/14 had ≥ 1 high-risk domain. Deeks' test showed no evidence of small-study effects (p = 0.28). DL models show high pooled accuracy for SDD detection on MRI, but clinical readiness is constrained by risk of bias, predominantly retrospective single-centre designs, subjective reference standards and limited external validation; prospective multicentre evaluations with prespecified clinically meaningful thresholds are needed.

View Source Full Text PDF

Topics

Journal Article

Diagnostic Accuracy of Deep Learning for Automated Detection of Spinal Degenerative Disease on MRI: A Systematic Review and Meta-Analysis.

Authors

Affiliations (10)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?