Advancing modified barium swallow pre-sorting with deep learning: a new paradigm for the first step analysis in X-ray swallowing study.
Authors
Affiliations (5)
Affiliations (5)
- Department of Head and Neck Surgery, University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA.
- Department of Radiation Oncology, Division of Radiation Oncology, University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA.
- Department of Imaging Physics, Division of Diagnostic Imaging, University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA.
- Department of Head and Neck Surgery, University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA. [email protected].
- Department of Radiation Oncology, Division of Radiation Oncology, University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA. [email protected].
Abstract
Modified barium swallow (MBS) exams are pivotal for assessing swallowing function and include diagnostic video segments imaged in various planes, such as anteroposterior (AP or coronal plane) and lateral (or mid-sagittal plane), alongside non-diagnostic 'scout' image segments used for anatomic reference and image set-up that do not include bolus swallows. These variations in imaging files necessitate manual sorting and labeling, complicating the pre-analysis workflow. Our study introduces a deep learning approach to automate the categorization of swallow videos in MBS exams, distinguishing between the different types of diagnostic videos and identifying non-diagnostic scout videos to streamline the MBS review workflow. Our algorithms were developed on a dataset that included 3,740 video segments with a total of 986,808 frames from 285 MBS exams in 216 patients (average age 60 ± 9). Our model achieved an accuracy of 99.68% at the frame level and 100% at the video level in differentiating AP from lateral planes. For distinguishing scout from bolus swallowing videos, the model reached an accuracy of 90.26% at the frame level and 93.86% at the video level. Incorporating a multi-task learning approach notably enhanced the video-level accuracy to 96.35% for scout/bolus video differentiation. Our analysis highlighted the importance of leveraging inter-frame connectivity for improving the model performance. These findings significantly boost MBS exam processing efficiency, minimizing manual sorting efforts and allowing raters to allocate greater focus to clinical interpretation and patient care.