Real-world clinical impact of three commercial AI algorithms on musculoskeletal radiography interpretation: A prospective crossover reader study.
Prucker P, Lemke T, Mertens CJ, Ziegelmayer S, Graf MM, Weller D, Kim SH, Gassert FT, Kader A, Dorfner FJ, Meddeb A, Makowski MR, Lammert J, Huber T, Lohöfer F, Bressem KK, Adams LC, Luiken I, Busch F
•papers•Sep 17 2025To prospectively assess the diagnostic performance, workflow efficiency, and clinical impact of three commercial deep-learning tools (BoneView, Rayvolve, RBfracture) for routine musculoskeletal radiograph interpretation. From January to March 2025, two radiologists (4 and 5 years' experience) independently interpreted 1,037 adult musculoskeletal studies (2,926 radiographs) first unaided and, after 14-day washouts, with each AI tool in a randomized crossover design. Ground truth was established by confirmatory CT when available. Outcomes included sensitivity, specificity, accuracy, area under the receiver operating characteristic curve (AUC), interpretation time, diagnostic confidence (5-point Likert), and rates of additional CT recommendations and senior consultations. DeLong tests compared AUCs; Mann-Whitney U and χ2 tests assessed secondary endpoints. AI assistance did not significantly change performance for fractures, dislocations, or effusions. For fractures, AUCs were comparable to baseline (Reader 1: 96.50 % vs. 96.30-96.50 %; Reader 2: 95.35 % vs. 95.97 %; all p > 0.11). For dislocations, baseline AUCs (Reader 1: 92.66 %; Reader 2: 90.68 %) were unchanged with AI (92.76-93.95 % and 92.00 %; p ≥ 0.280). For effusions, baseline AUCs (Reader 1: 92.52 %; Reader 2: 96.75 %) were similar with AI (93.12 % and 96.99 %; p ≥ 0.157). Median interpretation times decreased with AI (Reader 1: 34 s to 21-25 s; Reader 2: 30 s to 21-26 s; all p < 0.001). Confidence improved across tools: BoneView increased combined "very good/excellent" ratings versus unaided reads (Reader 1: 509 vs. 449, p < 0.001; Reader 2: 483 vs. 439, p < 0.001); Rayvolve (Reader 1: 456 vs. 449, p = 0.029; Reader 2: 449 vs. 439, p < 0.001) and RBfracture (Reader 1: 457 vs. 449, p = 0.017; Reader 2: 448 vs. 439, p = 0.001) yielded smaller but significant gains. Reader 1 recommended fewer CT scans with AI assistance (33 vs. 22-23, p = 0.007). In a real-world clinical setting, AI-assisted interpretation of musculoskeletal radiographs reduced reading time and increased diagnostic confidence without materially affecting diagnostic performance. These findings support AI assistance as a lever for workflow efficiency and potential cost-effectiveness at scale.