Diagnostic accuracy of artificial intelligence-assisted infant hip ultrasound interpretation for developmental dysplasia of the hip: systematic review and meta-analysis.
Authors
Affiliations (3)
Affiliations (3)
- Faculty of Medicine and Dentistry, University of Alberta, 2J2.00 Walter C. MacKenzie Health Sciences Centre, T6G 2R7, Edmonton, Canada. [email protected].
- Department of Radiology & Diagnostic Imaging, University of Alberta, Edmonton, Canada.
- Faculty of Medicine and Dentistry, University of Alberta, 2J2.00 Walter C. MacKenzie Health Sciences Centre, T6G 2R7, Edmonton, Canada.
Abstract
Ultrasound is the standard imaging test for infant developmental dysplasia of the hip (DDH) but is highly operator-dependent, leading to variable image quality and classification. Artificial intelligence (AI)-assisted ultrasound may standardize acquisition and interpretation and support DDH screening beyond specialist centers. To evaluate the diagnostic accuracy and feasibility of AI-assisted ultrasound for infant DDH. We performed a systematic review and diagnostic test-accuracy meta-analysis of studies enrolling infants (≤12 months) undergoing hip ultrasound, in which the index test was AI applied to two-dimensional (2D) or three-dimensional (3D) ultrasound and the reference standard was expert Graf-based interpretation or follow-up consensus. Risk of bias was assessed with QUADAS-2 (diagnostic accuracy bias tool). Sensitivity and specificity were pooled with a bivariate random-effects model. Twenty-nine studies were eligible; nine provided 2×2 data (6,351 hips) for pooling. Pooled sensitivity was 0.92 (95% CI 0.86-0.95) and specificity 0.96 (95% CI 0.91-0.98). Risk of bias was frequently high or unclear for patient selection and the index test. Feasibility signals included short operator training times (approx. 1-2 h) and scan acquisition time reductions (approx. 20-50%), while economic reporting was limited. AI-assisted ultrasound demonstrates high diagnostic accuracy for infant DDH and may help standardize hip imaging and facilitate safe use by nonexpert operators, but larger multicenter studies with external validation and robust economic evaluation are needed.