Comparison of neural networks for classification of urinary tract dilation from renal ultrasounds: evaluation of agreement with expert categorization.

Authors

Chung K,Wu S,Jeanne C,Tsai A

Affiliations (3)

  • Department of Radiology, Boston Children's Hospital, 300 Longwood Ave, Boston, MA, 02115, USA. [email protected].
  • Department of Radiology, Boston Children's Hospital, 300 Longwood Ave, Boston, MA, 02115, USA.
  • Department of Radiology, Boston Children's Hospital, 300 Longwood Ave, Boston, MA, 02115, USA. [email protected].

Abstract

Urinary tract dilation (UTD) is a frequent problem in infants. Automated and objective classification of UTD from renal ultrasounds would streamline their interpretations. To develop and evaluate the performance of different deep learning models in predicting UTD classifications from renal ultrasound images. We searched our image archive to identify renal ultrasounds performed in infants ≤ 3-months-old for the clinical indications of prenatal UTD and urinary tract infection (9/2023-8/2024). An expert pediatric uroradiologist provided the ground truth UTD labels for representative sagittal sonographic renal images. Three different deep learning models trained with cross-entropy loss were adapted with four-fold cross-validation experiments to determine the overall performance. Our curated database included 492 right and 487 left renal ultrasounds (mean age ± standard deviation = 1.2 ± 0.1 months for both cohorts, with 341 boys/151 girls and 339 boys/148 girls, respectively). The model prediction accuracies for the right and left kidneys were 88.7% (95% confidence interval [CI], [85.8%, 91.5%]) and 80.5% (95% CI, [77.6%, 82.9%]), with weighted kappa scores of 0.90 (95% CI, [0.88, 0.91]) and 0.87 (95% CI, [0.82, 0.92]), respectively. When predictions were binarized into mild (normal/P1) and severe (UTD P2/P3) dilation, accuracies of the right and left kidneys increased to 96.3% (95% CI, [94.9%, 97.8%]) and 91.3% (95% CI, [88.5%, 94.2%]), but agreements decreased to 0.78 (95% CI, [0.73, 0.82]) and 0.75 (95% CI, [0.68, 0.82]), respectively. Deep learning models demonstrated high accuracy and agreement in classifying UTD from infant renal ultrasounds, supporting their potential as decision-support tools in clinical workflows.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.