Back to all papers

Assessing the Performance and Reliability of Deep Learning Auto-Segmentation in Videofluoroscopic Swallowing Studies: A Systematic Review and Meta-Analysis.

March 27, 2026pubmed logopapers

Authors

Chuang WK,Lin BF,Lee YH,Su PH,Kao YS,Lu CF

Affiliations (6)

  • Department of Radiation Oncology, Shuang Ho Hospital, Taipei Medical University, New Taipei City 235, Taiwan; Department of Biomedical Imaging and Radiological Sciences, National Yang Ming Chiao Tung University, Taipei 112, Taiwan; Department of Radiation Oncology, Saint Paul's Hospital, Taoyuan 330, Taiwan.
  • Department of Biomedical Imaging and Radiological Science, China Medical University, Taichung 404, Taiwan.
  • Department of Physical Medicine and Rehabilitation, Taipei Medical University-Hsin Kuo Min Hospital, Taoyuan City 320, Taiwan; Department of Physical Medicine and Rehabilitation, School of Medicine, College of Medicine, Taipei Medical University 110, Taipei City, Taiwan; Graduate Institute of Sports Science, College of Exercise and Health Sciences, National Taiwan Sport University, Taoyuan City, Taiwan.
  • Department of Biomedical Imaging and Radiological Sciences, National Yang Ming Chiao Tung University, Taipei 112, Taiwan.
  • Department of Radiation Oncology, Taoyuan General Hospital, Ministry of Health and Welfare, Taoyuan 330, Taiwan. Electronic address: [email protected].
  • Department of Biomedical Imaging and Radiological Sciences, National Yang Ming Chiao Tung University, Taipei 112, Taiwan. Electronic address: [email protected].

Abstract

To systematically evaluate the accuracy and reliability of deep learning-based auto-segmentation methods in videofluoroscopic swallowing studies (VFSS) through meta-analysis. A comprehensive literature search was conducted across PubMed, IEEE Xplore, Embase, Web of Science, and Cochrane Library databases for studies published in English between 2013 and 2024. Studies were included if they applied deep learning techniques to the auto-segmentation of anatomical structures in VFSS, specifically the bolus, cervical spine, hyoid bone, or thyroid cartilage-vocal fold complex (TVC) and reported quantitative performance metrics such as the Dice similarity coefficient. Two independent reviewers extracted data on study characteristics, segmentation targets, deep learning model types, and performance metrics. Methodological quality was assessed using the CLAIM and QUADAS-2 tools. Ten studies met inclusion criteria. A random-effects meta-analysis yielded an overall pooled Dice score of 0.83 (95% CI: 0.76-0.88, I² = 77%). Subgroup analyses showed similar performance for bolus segmentation (pooled Dice score = 0.84; 95% CI: 0.70-0.92, I² = 74%) and cervical spine segmentation (pooled Dice score = 0.83; 95% CI: 0.69-0.91, I² = 87%). Despite high accuracy, substantial heterogeneity was observed. Deep learning-based auto-segmentation in VFSS demonstrates promising accuracy across different anatomical targets. However, methodological variability among studies underscores the need for standardized protocols, multi-center datasets, and comparative evaluations of model architectures to enhance generalizability and clinical utility. PROSPERO registration: CRD42024578117.

Topics

Journal ArticleReview

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.