Back to all papers

Deep learning for the diagnosis of lumbar disc herniation: a systematic review and meta-analysis.

June 11, 2026pubmed logopapers

Authors

Li Y,Zhang C,Qi Z,Wang Q,Li D,Gao F,Ren X,Chen C

Affiliations (4)

  • Nanjing University of Chinese Medicine, Nanjing, Jiangsu, 210023, China.
  • Suzhou TCM Hospital Affiliated to Nanjing University of Chinese Medicine, Suzhou, Jiangsu, 215019, China.
  • Department of Orthopaedics, Jiangyin Hospital Affiliated to Nanjing University of Chinese Medicine, Jiangyin, Jiangsu, 214400, China.
  • Department of Orthopaedics, Jiangyin Hospital Affiliated to Nanjing University of Chinese Medicine, Jiangyin, Jiangsu, 214400, China. [email protected].

Abstract

Lumbar disc herniation (LDH) is a major cause of low back pain and disability worldwide. Although magnetic resonance imaging (MRI) is the standard modality for diagnosis, interpretation remains subject to interobserver variability. Deep learning (DL)-based approaches have been increasingly applied to improve diagnostic accuracy; however, their overall performance and sources of heterogeneity remain unclear. PubMed, Web of Science, and the Cochrane Library were searched from inception to March 2026. Studies evaluating imaging-based DL models for LDH diagnosis were included if sufficient diagnostic performance data were available. Two reviewers independently performed study selection, data extraction, and quality assessment using QUADAS-2. Pooled sensitivity and specificity were estimated using random-effects models, and summary receiver operating characteristic (SROC), subgroup, and sensitivity analyses were performed. The primary subgroup analysis used one primary standalone DL result per study to reduce non-independence. Ten retrospective studies were included. The pooled sensitivity and specificity were 0.94 (95% CI: 0.90-0.96) and 0.94 (95% CI: 0.90-0.97), respectively, with an area under the SROC curve of 0.99. Substantial heterogeneity was observed (I² > 97%), with no obvious threshold effect (ρ = -0.188, P = 0.603), indicating that the pooled estimates should be interpreted as exploratory. External validation studies showed lower specificity than internal or same-center temporally independent validation studies (0.87 vs. 0.96; P = 0.034), while sensitivity was similar. Sensitivity analyses suggested that differences in model task and output structure contributed to heterogeneity. At a pretest probability of 20%, a positive DL result increased the posttest probability to approximately 80%, whereas a negative result reduced it to approximately 2%. DL-based imaging models show promising diagnostic potential for LDH and may support assisted screening, triage, and lesion localization. However, the evidence is limited by substantial heterogeneity, retrospective study designs, non-patient-level analytical units, variable reference standards, and limited external validation. Future studies should use standardized task definitions, annotation procedures, AI reporting frameworks, and multicenter prospective patient-level validation before routine clinical implementation. Not applicable. This study is a systematic review and meta-analysis based on previously published literature and did not involve any prospective intervention involving human participants. This systematic review and meta-analysis was registered in PROSPERO (CRD420261353452).

Topics

Journal ArticleSystematic Review

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.