Identifying the diagnostic utility of artificial intelligence for elbow effusion detection: A systematic review and meta-analysis.
Authors
Affiliations (6)
Affiliations (6)
- Rowan-Virtua School of Osteopathic Medicine, Stratford, NJ, USA. [email protected].
- Futures Forward Research Institute, Toms River, USA, NJ. [email protected].
- Rowan-Virtua School of Osteopathic Medicine, Stratford, NJ, USA.
- Futures Forward Research Institute, Toms River, USA, NJ.
- UBMD Orthopaedics and Sports Medicine, Buffalo, NY, USA.
- Virtua Health System, Marlton, NJ, USA.
Abstract
Introduction: Elbow fractures are often difficult to assess on radiographs, and joint effusions may be the only indication of a fracture. As radiologists may not always be available in the acute care setting, the use of artificial intelligence (AI) for effusion detection may be beneficial. This systematic review and meta-analysis analyzes the sensitivity and specificity of AI for elbow effusion detection and compares them to physician radiographic interpretations. We hypothesize there will be no significant differences between the diagnostic utility of AI and physicians. Methods: A systematic review and meta-analysis was performed on five databases (Cochrane, Embase, Scopus, PubMed, Web of Science). Included studies reported numbers of true positives, true negatives, false positives, false negatives, the number of radiographs, sensitivity, specificity, and positive and negative predictive values. A bivariate random-effects meta-analysis was performed in R Studio v.4.5.1 to calculate sensitivity, specificity, and area under the curve (AUC), as well as 95% confidence intervals for sensitivity and specificity. Results: Four retrospective studies were included in the analysis, encompassing 5790 radiographs. (2–5) AI software had a sensitivity of 92.7% (95% CI: 75.3–98.2%), specificity of 97.8% (95% CI: 84.3–99.7%), AUC of 97.8%, and normalized AUC of 95.8%. There was minimal heterogeneity between the AI studies (I<sup>2</sup> = 39.3%). Physicians, including residents and attendings, had a sensitivity of 94.8% (95% CI: 85.5–98.3%), specificity of 96.8% (95% CI: 84.2–99.4%), AUC of 97.9%, and normalized AUC of 94.6%. There was no significant heterogeneity between the physician studies (I<sup>2</sup> = 0%). There was no significant difference between AI and physicians in sensitivity (<i>p</i> = 0.79) or specificity (<i>p</i> = 0.82). Discussion: No significant differences were observed between the radiologists and the AI software, with both groups yielding high specificity and sensitivity, indicating comparable performance.