External validation of an artificial intelligence tool for fracture detection in children with osteogenesis imperfecta: a multireader study.
Authors
Affiliations (9)
Affiliations (9)
- University College London, Gower Street, London, UK. [email protected].
- Department of Paediatric Radiology, Evelina London Children's Hospital, Guy's and St Thomas NHS Foundation Trust, London, United Kingdom.
- Department of Clinical Radiology, University Hospital Coventry and Warwickshire, Coventry, UK.
- Department of Paediatric Radiology, Alder Hey Children's NHS Foundation Trust Hospital, Liverpool, UK.
- Victoria Hospital, NHS Fife, Kirkcaldy, UK.
- Guy's and St Thomas' NHS Foundation Trust, London, UK.
- Great Ormond Street Hospital for Children, London, UK.
- UCL Great Ormond Street Institute of Child Health, University College London, London, UK.
- Great Ormond Street Hospital NIHR Biomedical Research Centre, London, UK.
Abstract
To determine the performance of a commercially available AI tool for fracture detection when used in children with osteogenesis imperfecta (OI). All appendicular and pelvic radiographs from an OI clinic at a single centre from 48 patients were included. Seven radiologists evaluated anonymised images in two rounds, first without, then with AI assistance. Differences in diagnostic accuracy between the rounds were analysed. 48 patients (mean 12 years) provided 336 images, containing 206 fractures established by consensus opinion of two radiologists. AI produced a per-examination accuracy of 74.8% [95% CI: 65.4%, 82.7%], compared to average radiologist performance at 83.4% [95% CI: 75.2%, 89.8%]. Radiologists using AI assistance improved average radiologist accuracy per examination to 90.7% [95% CI: 83.5%, 95.4%]. AI gave more false negatives than radiologists, with 80 missed fractures versus 41, respectively. Radiologists were more likely (74.6%) to alter their original decision to agree with AI at the per-image level, 82.8% of which led to a correct result, 64.0% of which were changing from a false positive to a true negative. Despite inferior standalone performance, AI assistance can still improve radiologist fracture detection in a rare disease paediatric population. Radiologists using AI typically led to more accurate diagnostic outcomes through reduced false positives. Future studies focusing on the real-world application of AI tools in a larger population of children with bone fragility disorders will help better evaluate whether these improvements in accuracy translate into improved patient outcomes. Question How well does a commercially available artificial intelligence (AI) tool identify fractures, on appendicular radiographs of children with osteogenesis imperfecta (OI), and can it also improve radiologists' identification of fractures in this population? Findings Specialist human radiologists outperformed the AI fracture detection tool when acting alone; however, their diagnostic performance overall improved with AI assistance. Clinical relevance AI assistance improves specialist radiologist fracture detection in children with osteogenesis imperfecta, even with AI performance alone inferior to the radiologists acting alone. The reason for this was due to the AI moderating the number of false positives generated by the radiologists.