Deep learning for the diagnosis of lumbar disc herniation: a systematic review and meta-analysis.

June 11, 2026

papers

DOI: 10.1186/s12880-026-02502-0 PMID: 42277705

Authors

Li Y,Zhang C,Qi Z,Wang Q,Li D,Gao F,Ren X,Chen C

Affiliations (4)

Nanjing University of Chinese Medicine, Nanjing, Jiangsu, 210023, China.
Suzhou TCM Hospital Affiliated to Nanjing University of Chinese Medicine, Suzhou, Jiangsu, 215019, China.
Department of Orthopaedics, Jiangyin Hospital Affiliated to Nanjing University of Chinese Medicine, Jiangyin, Jiangsu, 214400, China.
Department of Orthopaedics, Jiangyin Hospital Affiliated to Nanjing University of Chinese Medicine, Jiangyin, Jiangsu, 214400, China. [email protected].

Abstract

Lumbar disc herniation (LDH) is a major cause of low back pain and disability worldwide. Although magnetic resonance imaging (MRI) is the standard modality for diagnosis, interpretation remains subject to interobserver variability. Deep learning (DL)-based approaches have been increasingly applied to improve diagnostic accuracy; however, their overall performance and sources of heterogeneity remain unclear. PubMed, Web of Science, and the Cochrane Library were searched from inception to March 2026. Studies evaluating imaging-based DL models for LDH diagnosis were included if sufficient diagnostic performance data were available. Two reviewers independently performed study selection, data extraction, and quality assessment using QUADAS-2. Pooled sensitivity and specificity were estimated using random-effects models, and summary receiver operating characteristic (SROC), subgroup, and sensitivity analyses were performed. The primary subgroup analysis used one primary standalone DL result per study to reduce non-independence. Ten retrospective studies were included. The pooled sensitivity and specificity were 0.94 (95% CI: 0.90-0.96) and 0.94 (95% CI: 0.90-0.97), respectively, with an area under the SROC curve of 0.99. Substantial heterogeneity was observed (I² > 97%), with no obvious threshold effect (ρ = -0.188, P = 0.603), indicating that the pooled estimates should be interpreted as exploratory. External validation studies showed lower specificity than internal or same-center temporally independent validation studies (0.87 vs. 0.96; P = 0.034), while sensitivity was similar. Sensitivity analyses suggested that differences in model task and output structure contributed to heterogeneity. At a pretest probability of 20%, a positive DL result increased the posttest probability to approximately 80%, whereas a negative result reduced it to approximately 2%. DL-based imaging models show promising diagnostic potential for LDH and may support assisted screening, triage, and lesion localization. However, the evidence is limited by substantial heterogeneity, retrospective study designs, non-patient-level analytical units, variable reference standards, and limited external validation. Future studies should use standardized task definitions, annotation procedures, AI reporting frameworks, and multicenter prospective patient-level validation before routine clinical implementation. Not applicable. This study is a systematic review and meta-analysis based on previously published literature and did not involve any prospective intervention involving human participants. This systematic review and meta-analysis was registered in PROSPERO (CRD420261353452).

View Source Full Text PDF

Topics

Journal ArticleSystematic Review

Deep learning for the diagnosis of lumbar disc herniation: a systematic review and meta-analysis.

Authors

Affiliations (4)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?