Real-world clinical impact of three commercial AI algorithms on musculoskeletal radiography interpretation: A prospective crossover reader study.
Authors
Affiliations (7)
Affiliations (7)
- Institute for Diagnostic and Interventional Radiology, TUM School of Medicine and Health, TUM University Hospital Rechts der Isar, Munich, Germany.
- Institute for Diagnostic and Interventional Radiology, TUM School of Medicine and Health, TUM University Hospital Rechts der Isar, Munich, Germany; Institute of Diagnostic and Interventional Neuroradiology, TUM School of Medicine and Health, TUM University Hospital Rechts der Isar, Munich, Germany.
- Department of Neuroradiology, Charité - Universitätsmedizin Berlin, Corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany.
- Department of Neuroradiology, Charité - Universitätsmedizin Berlin, Corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany; Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany.
- Department of Gynecology and Center for Hereditary Breast and Ovarian Cancer, Technical University of Munich (TUM), School of Medicine and Health, Klinikum Rechts der Isar, TUM University Hospital, Munich, Germany.
- Institute for Diagnostic and Interventional Radiology, TUM School of Medicine and Health, TUM University Hospital Rechts der Isar, Munich, Germany; Institute for Cardiovascular Radiology and Nuclear Medicine, TUM School of Medicine and Health, German Heart Center Munich, Munich, Germany.
- Institute for Diagnostic and Interventional Radiology, TUM School of Medicine and Health, TUM University Hospital Rechts der Isar, Munich, Germany. Electronic address: [email protected].
Abstract
To prospectively assess the diagnostic performance, workflow efficiency, and clinical impact of three commercial deep-learning tools (BoneView, Rayvolve, RBfracture) for routine musculoskeletal radiograph interpretation. From January to March 2025, two radiologists (4 and 5 years' experience) independently interpreted 1,037 adult musculoskeletal studies (2,926 radiographs) first unaided and, after 14-day washouts, with each AI tool in a randomized crossover design. Ground truth was established by confirmatory CT when available. Outcomes included sensitivity, specificity, accuracy, area under the receiver operating characteristic curve (AUC), interpretation time, diagnostic confidence (5-point Likert), and rates of additional CT recommendations and senior consultations. DeLong tests compared AUCs; Mann-Whitney U and χ2 tests assessed secondary endpoints. AI assistance did not significantly change performance for fractures, dislocations, or effusions. For fractures, AUCs were comparable to baseline (Reader 1: 96.50 % vs. 96.30-96.50 %; Reader 2: 95.35 % vs. 95.97 %; all p > 0.11). For dislocations, baseline AUCs (Reader 1: 92.66 %; Reader 2: 90.68 %) were unchanged with AI (92.76-93.95 % and 92.00 %; p ≥ 0.280). For effusions, baseline AUCs (Reader 1: 92.52 %; Reader 2: 96.75 %) were similar with AI (93.12 % and 96.99 %; p ≥ 0.157). Median interpretation times decreased with AI (Reader 1: 34 s to 21-25 s; Reader 2: 30 s to 21-26 s; all p < 0.001). Confidence improved across tools: BoneView increased combined "very good/excellent" ratings versus unaided reads (Reader 1: 509 vs. 449, p < 0.001; Reader 2: 483 vs. 439, p < 0.001); Rayvolve (Reader 1: 456 vs. 449, p = 0.029; Reader 2: 449 vs. 439, p < 0.001) and RBfracture (Reader 1: 457 vs. 449, p = 0.017; Reader 2: 448 vs. 439, p = 0.001) yielded smaller but significant gains. Reader 1 recommended fewer CT scans with AI assistance (33 vs. 22-23, p = 0.007). In a real-world clinical setting, AI-assisted interpretation of musculoskeletal radiographs reduced reading time and increased diagnostic confidence without materially affecting diagnostic performance. These findings support AI assistance as a lever for workflow efficiency and potential cost-effectiveness at scale.