Evaluating the Diagnostic Accuracy of Artificial Intelligence in Spondylolisthesis Detection: A Systematic Review and Meta-analysis.
Authors
Affiliations (6)
Affiliations (6)
- School of Medicine, Tehran University of Medical Sciences, Tehran, Iran (M.-T.P.-F., A.S.K., F.S., P.Y.); Department of Radiology, Division of Musculoskeletal Imaging and Intervention, University of Washington, Seattle, WA, USA (M.-T.P.-F., M.C., S.H.).
- School of Medicine, Kermanshah University of Medical Sciences, Kermanshah, Iran (A.-M.A.).
- School of Medicine, Tehran University of Medical Sciences, Tehran, Iran (M.-T.P.-F., A.S.K., F.S., P.Y.); Department of Neurosurgery, Tehran University of Medical Sciences, Tehran, Iran (A.S.K.).
- Department of Radiology, Division of Musculoskeletal Imaging and Intervention, University of Washington, Seattle, WA, USA (M.-T.P.-F., M.C., S.H.).
- School of Medicine, Tehran University of Medical Sciences, Tehran, Iran (M.-T.P.-F., A.S.K., F.S., P.Y.).
- School of Medicine, Tehran University of Medical Sciences, Tehran, Iran (M.-T.P.-F., A.S.K., F.S., P.Y.); School of Medicine, Guilan University of Medical Sciences, Guilan, Iran (P.Y.). Electronic address: [email protected].
Abstract
Spondylolisthesis, a vertebral displacement condition affecting 5-26% of adults, poses a significant health risk to the population. Artificial Intelligence (AI), has emerged as a tool for enhancing diagnostic accuracy. However, the heterogeneity in model performances requires a synthesis of existing evidence. This study evaluated the diagnostic accuracy of AI models for the detection of spondylolisthesis across multiple imaging modalities. Following PRISMA PubMed, Scopus, Embase, Web of Science, including 24 studies (21 for meta-analysis) with 8029 observations. Inclusion criteria focused on original studies using standalone deep learning (DL) models with reported diagnostic metrics. Quality assessment was performed using Quality Assessment of Diagnostic Accuracy Studies-2, and statistical analysis employed random-effects meta-analysis. AI models demonstrated high diagnostic performance, with a pooled sensitivity of 94.7% (95% CI: 92.6-96.2%) and specificity of 97.1% (95% CI: 95.0-98.4%). The area under the curve (AUC) was 0.979, indicating robust discriminative ability. MRI-based models slightly outperformed radiography models (sensitivity: 95.71% vs. 94.95%; specificity: 98.38% vs. 96.80%), though differences were nonsignificant (p = 0.651). Classification models significantly surpassed detection-focused models (p = 0.026), while biomechanical feature-based models and DL image processing models showed comparable performance (p = 0.264). Notably, models like FAR networks and YOLOv8 achieved high accuracy (89-98%) in grading and localization tasks. AI models show considerable diagnostic accuracy for spondylolisthesis, underscoring their potential as clinical adjunctive tools. However, considerable heterogeneity highlights the need for standardized studies. These findings advocate for integrating AI into diagnostic workflows, particularly in resource-limited settings, while urging further research to ensure real-world applicability.