Cross-validation of an artificial intelligence tool for fracture classification and localization on conventional radiography in Dutch population.
Authors
Affiliations (5)
Affiliations (5)
- Department of Radiology and Nuclear Medicine, Erasmus MC, University Medical Center Rotterdam, Rotterdam, The Netherlands.
- Qure.ai, Level 7, Oberoi Commerz II, Goregaon East, Mumbai, India.
- Department of Epidemiology, Erasmus Medical Center Rotterdam, Rotterdam, The Netherlands.
- Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.
- Department of Radiology and Nuclear Medicine, Erasmus MC, University Medical Center Rotterdam, Rotterdam, The Netherlands. [email protected].
Abstract
The aim of this study is to validate the effectiveness of an AI tool trained on Indian data in a Dutch medical center and to assess its ability to classify and localize fractures. Conventional radiographs acquired between January 2019 and November 2022 were analyzed using a multitask deep neural network. The tool, trained on Indian data, identified and localized fractures in 17 body parts. The reference standard was based on radiology reports resulting from routine clinical workflow and confirmed by an experienced musculoskeletal radiologist. The analysis included both patient-wise and fracture-wise evaluations, employing binary and Intersection over Union (IoU) metrics to assess fracture detection and localization accuracy. In total, 14,311 radiographs (median age, 48 years (range 18-98), 7265 male) were analyzed and categorized by body parts; clavicle, shoulder, humerus, elbow, forearm, wrist, hand and finger, pelvis, hip, femur, knee, lower leg, ankle, foot and toe. 4156/14,311 (29%) had fractures. The AI tool demonstrated overall patient-wise sensitivity, specificity, and AUC of 87.1% (95% CI: 86.1-88.1%), 87.1% (95% CI: 86.4-87.7%), and 0.92 (95% CI: 0.91-0.93), respectively. Fracture detection rate was 60% overall, ranging from 7% for rib fractures to 90% for clavicle fractures. This study validates a fracture detection AI tool on a Western-European dataset, originally trained on Indian data. While classification performance is robust on real clinical data, fracture-wise analysis reveals variability in localization accuracy, underscoring the need for refinement in fracture localization. AI may provide help by enabling optimal use of limited resources or personnel. This study evaluates an AI tool designed to aid in detecting fractures, possibly reducing reading time or optimization of radiology workflow by prioritizing fracture-positive cases. Cross-validation on a consecutive Dutch cohort confirms this AI tool's clinical robustness. The tool detected fractures with 87% sensitivity, 87% specificity, and 0.92 AUC. AI localizes 60% of fractures, the highest for clavicle (90%) and lowest for ribs (7%).