Back to all papers

Landmark-based deep learning for radiographic screening for developmental dysplasia of the hip in infants: Development and external evaluation with IHDI-guided triage.

February 18, 2026pubmed logopapers

Authors

Oba M,Kawabe Y,Tsuzawa K,Yokoyama Y,Sumi K,Nakamura N,Choe H,Inaba Y

Affiliations (3)

  • Department of Pediatric Orthopedics, Kanagawa Children's Medical Center, Yokohama, Japan. Electronic address: [email protected].
  • Department of Pediatric Orthopedics, Kanagawa Children's Medical Center, Yokohama, Japan.
  • Department of Orthopaedic Surgery, Yokohama City University, Japan.

Abstract

In Japan, secondary screening for developmental hip dysplasia has expanded. However, the capacity of screening programs has outpaced the availability of ultrasonography and the number of clinicians who perform and interpret examinations outside tertiary centers. Plain radiography is widely accessible; however, interpreting images in infants can be challenging. This study developed and validated a deep learning-based system to support radiographic diagnosis and test a prespecified two-step triage strategy for clinical use. Overall, 1188 anteroposterior pelvic radiographs of infants aged 2-12 months were retrospectively analyzed. Three non-overlapping test subsets (50 images each) represented routine screening, images without a visible femoral-head ossification center, and images from external hospitals; the remainder were used for training and internal validation. The system generates measurements and the International Hip Dysplasia Institute grades for each radiograph. All test images were independently graded by two pediatric orthopedic surgeons, and the consensus served as a categorical reference. The agreement was summarized using the intraclass correlation coefficient for measurements and quadratic-weighted kappa for grades. The triage strategy was as follows: (1) no further imaging or referral when both hips were grade 1, and (2) high-priority alert when either hip was grade ≥2 and/or the acetabular angle was at least 25°. Agreement for the principal measurement between the system and each reader was 0.83-0.84 by intraclass correlation, comparable to inter-reader agreement (0.81), with small biases and acceptable limits of agreement. For grades, quadratic-weighted kappa was 0.63-0.75 across subsets, with disagreements mainly between adjacent categories. With a 25-degree cutoff, the triage strategy achieved sensitivities of 0.75-0.93 and specificities of 0.62-0.95 across subsets. The system supported radiographic screening decisions across diverse images typical of this age range, achieving comparable agreement with clinicians. Therefore, a prospective multicenter evaluation with thresholds adjusted for age and location is required.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.