YOLO26x-based automated fracture detection on radiographs and its impact on radiologist performance: A multi-reader multi-case study.
Authors
Affiliations (6)
Affiliations (6)
- Bagcilar Training and Research Hospital, Radiology Clinic, Istanbul, Türkiye.
- Icahn School of Medicine at Mount Sinai Biomedical Engineering and Imaging Institute, NY, USA.
- Esenler Obstetrics & Gynecology and Pediatrics Hospital, Radiology Clinic, Istanbul, Türkiye. Electronic address: [email protected].
- Ceylanpinar State Hospital, Radiology Clinic, Sanliurfa, Türkiye.
- Basaksehir Cam and Sakura City Hospital, Radiology Clinic, Istanbul, Türkiye.
- Bakirkoy Dr. Sadi Konuk Training and Research Hospital, Radiology Clinic, Istanbul, Türkiye.
Abstract
Missed fractures are among the most common diagnostic errors in emergency radiology and may lead to delayed treatment and adverse outcomes. Deep learning-based approaches have shown promise for automated fracture detection; however, many existing models are restricted to specific anatomical regions, operate on downsampled images, or focus primarily on image classification rather than robust localization. This study aimed to develop and validate a YOLO26x-based deep learning model trained using high-resolution input (1280 × 1280 pixels) for detection of appendicular fractures on radiographs and to evaluate its impact on radiologist diagnostic performance, reading time, and diagnostic confidence in a multi-reader multi-case (MRMC) study. A total of 8,690 appendicular radiographs from an institutional picture archiving and communication system (PACS) and an open-source repository were used to train a YOLO26x-based object detection model using transfer learning. High-resolution input (1280 × 1280 pixels) was used to preserve fine fracture detail. The test set comprised 500 images with a balanced design (250 fracture-positive, 250 fracture-negative; 250 institutional, 250 open-source), ensuring equal representation of both data sources and providing a prevalence-controlled evaluation of diagnostic performance. Model performance was assessed at both image and object levels. A multi-reader multi-case (MRMC) study involving three radiologists (3, 6, and 21 years of experience) evaluated the effect of model assistance on diagnostic accuracy, efficiency, and self-reported confidence using a sequential unassisted-then-assisted design with a 3-week washout interval between sessions. Statistical comparisons included McNemar's test with Holm-Bonferroni correction, Wilcoxon signed-rank tests, Cohen's and Fleiss' κ statistics, and source-stratified analyses. The model achieved an image-level AUC-ROC of 0.847 (95 % CI: 0.811-0.881) with an F1-score of 0.799 (95 % CI: 0.760-0.834) at the primary operating threshold. Source-stratified analysis demonstrated comparable performance across institutional (AUC 0.843) and independent open-source (AUC 0.852) subsets. Model-assisted reading significantly improved accuracy for two of three readers (RAD1: 76.2 % → 83.4 %, p = 0.001; RAD3: 75.6 % → 83.4 %, p < 0.001; Holm-corrected), with sensitivity gains of 2.4-16.4 percentage points. Inter-reader agreement improved substantially (Fleiss' κ: 0.432 → 0.642). Median interpretation time decreased by 33.2 % overall (12.9 → 8.6 s; p < 0.001), and self-reported confidence increased significantly across all readers (all p < 0.001). Decision-change analysis demonstrated a net positive effect, with beneficial changes substantially outnumbering detrimental ones (197 vs 104 across 1,500 reader-case pairs). The proposed YOLO26x model trained using high-resolution input demonstrated robust image-level fracture detection with consistent performance across institutional and independent open-source data sources. Model assistance improved diagnostic accuracy, efficiency, inter-reader agreement, and diagnostic confidence. These findings support the potential of high-resolution deep learning-based systems as clinically practical decision-support tools in emergency radiology, while prospective multicenter validation and workflow integration studies remain warranted prior to routine clinical implementation. AI-assisted fracture detection using high-resolution radiograph input may enhance diagnostic consistency, improve interpretation efficiency, and help reduce missed-injury rates in high-volume emergency settings. With additional validation and workflow integration, such systems may support triage prioritization and clinical decision-making in emergency radiology.