Back to all papers

Proof-of-concept comparison of an artificial intelligence-based bone age assessment tool with Greulich-Pyle and Tanner-Whitehouse version 2 methods in a pediatric cohort.

Authors

Marinelli L,Lo Mastro A,Grassi F,Berritto D,Russo A,Patanè V,Festa A,Grassi E,Grandone A,Nasto LA,Pola E,Reginelli A

Affiliations (3)

  • Department of Precision Medicine, University of Campania, Luigi Vanvitelli, Piazza Miraglia 2, 80138, Naples, Italy. [email protected].
  • Department of Precision Medicine, University of Campania, Luigi Vanvitelli, Piazza Miraglia 2, 80138, Naples, Italy.
  • University of Foggia, Foggia, Italy.

Abstract

Bone age assessment is essential in evaluating pediatric growth disorders. Artificial intelligence (AI) systems offer potential improvements in accuracy and reproducibility compared to traditional methods. To compare the performance of a commercially available artificial intelligence-based software (BoneView BoneAge, Gleamer, Paris, France) against two human-assessed methods-the Greulich-Pyle (GP) atlas and Tanner-Whitehouse version 2 (TW2)-in a pediatric population. This proof-of-concept study included 203 pediatric patients (mean age, 9.0 years; range, 2.0-17.0 years) who underwent hand and wrist radiographs for suspected endocrine or growth-related conditions. After excluding technically inadequate images, 157 cases were analyzed using AI and GP-assessed methods. A subset of 35 patients was also evaluated using the TW2 method by a pediatric endocrinologist. Performance was measured using mean absolute error (MAE), root mean square error (RMSE), bias, and Pearson's correlation coefficient, using chronological age as reference. The AI model achieved a MAE of 1.38 years, comparable to the radiologist's GP-based estimate (MAE, 1.30 years), and superior to TW2 (MAE, 2.86 years). RMSE values were 1.75 years, 1.80 years, and 3.88 years, respectively. AI showed minimal bias (-0.05 years), while TW2-based assessments systematically underestimated bone age (bias, -2.63 years). Strong correlations with chronological age were observed for AI (r=0.857) and GP (r=0.894), but not for TW2 (r=0.490). BoneView demonstrated comparable accuracy to radiologist-assessed GP method and outperformed TW2 assessments in this cohort. AI-based systems may enhance consistency in pediatric bone age estimation but require careful validation, especially in ethnically diverse populations.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.