Clinical deployment and prospective validation of an AI model for limb-length discrepancy measurements using an open-source platform.
Authors
Affiliations (2)
Affiliations (2)
- Department of Radiology, Boston Children's Hospital, Boston, MA, USA. [email protected].
- Department of Radiology, Boston Children's Hospital, Boston, MA, USA.
Abstract
To deploy an AI model to measure limb-length discrepancy (LLD) and prospectively validate its performance. We encoded the inference of an LLD AI model into a docker container, incorporated it into a computational platform for clinical deployment, and conducted two prospective validation studies: a shadow trial (07/2024-9/2024) and a clinical trial (11/2024-01/2025). During each trial period, we queried for LLD EOS scanograms to serve as inputs to our model. For the shadow trial, we hid the AI-annotated outputs from the radiologists, and for the clinical trial, we displayed the AI-annotated output to the radiologists at the time of study interpretation. Afterward, we collected the bilateral femoral and tibial lengths from the radiology reports and compared them against those generated by the AI model. We used median absolute difference (MAD) and interquartile range (IQR) as summary statistics to assess the performance of our model. Our shadow trial consisted of 84 EOS scanograms from 84 children, with 168 femoral and tibial lengths. The MAD (IQR) of the femoral and tibial lengths were 0.2 cm (0.3 cm) and 0.2 cm (0.3 cm), respectively. Our clinical trial consisted of 114 EOS scanograms from 114 children, with 228 femoral and tibial lengths. The MAD (IQR) of the femoral and tibial lengths were 0.3 cm (0.4 cm) and 0.2 cm (0.3 cm), respectively. We successfully employed a computational platform for seamless integration and deployment of an LLD AI model into our clinical workflow, and prospectively validated its performance. Question No AI models have been clinically deployed for limb-length discrepancy (LLD) assessment in children, and the prospective validation of these models is unknown. Findings We deployed an LLD AI model using a homegrown platform, with prospective trials showing a median absolute difference of 0.2-0.3 cm in estimating bone lengths. Clinical relevance An LLD AI model with performance comparable to that of radiologists can serve as a secondary reader in increasing the confidence and accuracy of LLD measurements.