Back to all papers

Diagnostic performance of X-ray-based deep learning models for detecting ankle and foot fractures: a systematic review and meta-analysis.

November 19, 2025pubmed logopapers

Authors

Pahlevan-Fallahy MT,Hosseinzadeh N,Shaker F,Asgari AM,Teymouri Athar MM,Rouzrokh P

Affiliations (5)

  • School of Medicine, Tehran University of Medical Sciences, Tehran, Iran.
  • School of Medicine, Kermanshah University of Medical Sciences, Kermanshah, Iran. [email protected].
  • School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
  • Department of Radiology & Biomedical Imaging, Yale School of Medicine, Yale University, New Haven, CT, USA.
  • Department of Radiology, Mayo Clinic Artificial Intelligence Laboratory, Mayo Clinic, Rochester, MN, USA.

Abstract

Lower extremity fractures are prevalent in vulnerable populations leading to significant burdens, which highlight the need for timely and precise diagnosis. Integrating deep learning with imaging findings has shown promising results for enhancing fracture detection accuracy. This study assesses the diagnostic accuracy of AI models using X-ray images to detect ankle and foot fractures and investigates the probable factors influencing their performance. A comprehensive search of four databases was done up to September 30th, 2025. Studies that investigated the accuracy of deep learning models for the detection of ankle and foot fractures utilizing X-rays were included. A bivariate random-effects model was used to perform meta-analysis. A total of 506 studies were reviewed, and 14 were included in the meta-analysis. Analysis of all the representative models of the included studies had a cumulative sensitivity and specificity of 93.2% and 94.5% (95% CI 88.8-95.9%, 90.1-97.0%, respectively, I<sup>2</sup> 6.6%). The pooled F1 score was estimated at 0.94 (95% CI 0.88-0.97). Subgroup analysis revealed no difference in the sensitivity or specificity of studies using multi-view vs. single-view X-rays (P = 0.64, P = 0.89, respectively). Models detecting calcaneal fractures performed significantly better than models detecting foot or ankle fractures, with a pooled sensitivity of 95.1% (95% CI 93.0-96.6%, P = 0.008), specificity of 98.3% (95% CI 97.00-99.0%), and a DOR of 1751.8 (95% CI 445.9-6882.5). The type of dataset used (validation vs. test, internal testing vs. external testing) did not significantly affect performance. Deep learning models can be considered a notably accurate method for the detection of ankle and foot fractures. Further studies with larger samples, external validation, and clinical implementations are required. This systematic review and meta-analysis study was registered with PROSPERO, registration number CRD42024624044.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.