Artificial intelligence for predicting the pubertal growth spurt using cephalometric and hand-wrist radiographs: a systematic review and meta-analysis.
Authors
Affiliations (2)
Affiliations (2)
- Hospital de Clínicas de Porto Alegre, Federal University of Rio Grande do Sul (UFRGS), Rua Ramiro Barcelos, 2350, Porto Alegre, RS, 90035-903, Brazil. [email protected].
- Hospital de Clínicas de Porto Alegre, Federal University of Rio Grande do Sul (UFRGS), Rua Ramiro Barcelos, 2350, Porto Alegre, RS, 90035-903, Brazil.
Abstract
The optimal timing of dentofacial orthopedic and growth-modifying orthodontic interventions depends on accurate assessment of skeletal maturation, particularly the pubertal growth spurt (PGS). This systematic review and meta-analysis evaluated the performance of artificial intelligence (AI) models for identifying the PGS from cephalometric and hand-wrist radiographs and compared their diagnostic performance across imaging modalities and algorithm types. A PRISMA 2020-guided systematic review and meta-analysis was conducted and prospectively registered in PROSPERO (CRD42024594040). PubMed, Embase, Web of Science, and LILACS were searched through December 2025. Eligible studies included individuals aged ≤ 21 years in whom AI was applied to cephalometric and/or hand-wrist radiographs to predict the PGS or classify validated skeletal maturation stages associated with it. Two reviewers independently screened studies, extracted data, and assessed methodological quality using QUADAS-AI. Confusion matrices were extracted or reconstructed, skeletal maturation stages were harmonized into a three-class scheme (pre-spurt, spurt, and post-spurt), and the best-performing model from each study was synthesized using random-effects, bivariate, and hierarchical summary receiver operating characteristic (HSROC) models. Twenty-one studies published between 2020 and 2025 were included in the quantitative synthesis. The pooled accuracy was 0.83 (95% confidence interval, 0.76-0.89), with substantial between-study heterogeneity. HSROC analysis including 20 studies showed good overall discriminative ability, with summary sensitivity and specificity estimates of 0.76 and 0.88, respectively; complementary bivariate pooling yielded estimates of 0.86 and 0.93, respectively. In subgroup analyses, hand-wrist-based models showed higher sensitivity, whereas cephalometric models showed higher specificity; however, substantial heterogeneity was observed. Deep learning and multimodal approaches generally showed more favorable performance than traditional machine learning models, whereas large language model-based approaches showed more uncertain performance. AI models may have potential for identifying the PGS from cephalometric and hand-wrist radiographs and may support the timing of growth-modifying orthodontic interventions. However, substantial heterogeneity, limited external validation, and methodological weaknesses reduce the certainty and generalizability of current evidence. Multicenter studies are needed before routine clinical implementation.