Diagnostic performance of ChatGPT-4.0 in elbow fracture detection: A comparative study of radial head, distal humerus, and olecranon fractures.
Authors
Affiliations (1)
Affiliations (1)
- University of Health Science Kocaeli City Hospital, Orthopedics and Traumatology Clinic, Kocaeli, Turkey.
Abstract
Artificial intelligence has been increasingly used for radiographic fracture detection in recent years. However, its performance in the diagnosis of displaced and non-displaced fractures in specific anatomical regions has not been sufficiently investigated. This study aimed to evaluate the accuracy and sensitivity of Chat Generative Pretrained Transformer (ChatGPT-4.0) in the diagnosis of radial head, distal humerus and olecranon fractures. Anonymized radiographs, previously confirmed by an expert radiologist and orthopedist, were evaluated. Anteroposterior and lateral radiographs of 266 patients were analyzed. Each fracture site was divided into 2 groups: displaced and non-displaced. ChatGPT-4.0 asked 2 questions to indicate whether each image was broken. Responses were categorized as "fracture detected in the first question," "fracture detected in the second question," or "no fracture detected." ChatGPT-4.0 showed a significantly higher accuracy in diagnosing displaced fractures at all sites (P < .001). The highest fracture detection rate in the first question was observed for displaced distal humeral fractures (87.7%). The success rate was significantly lower in non-displaced fractures, and in the non-displaced group the highest diagnostic rate was observed in radial head fractures (25.3%). No statistically significant difference was found in pairwise sensitivity comparisons between non-displaced fractures (P > .05). ChatGPT-4.0 shows promising diagnostic performance in the detection of displaced olecranon, radial head and distal humeral fractures. However, its limited success in non-displaced fractures indicates that the model requires further training and development before clinical use. Level 3.