Multicenter validation and randomized crossover reader evaluation of deep learning-assisted tri-sequence three-dimensional MRI segmentation for hypopharyngeal tumor.
Authors
Affiliations (12)
Affiliations (12)
- Division of Radiation Oncology, Department of Oncology, National Taiwan University Hospital, Taipei, Taiwan.
- Department of Mathematics and Institute of Applied Mathematical Sciences, National Taiwan University, Taipei, Taiwan.
- Department of Radiation Oncology, Chang Gung Memorial Hospital, Linkou Branch, Taiwan.
- Department of Radiation Oncology, Taipei Medical University Hospital, Taipei, Taiwan.
- Division of Radiation Oncology, Department of Oncology, National Taiwan University BioMedical Park Hospital Zhubei Campus, Hsinchu, Taiwan.
- Division of Radiation Oncology, Department of Oncology, National Taiwan University BioMedical Park Hospital Zhubei Campus, Hsinchu, Taiwan; Graduate Program of Data Science, National Taiwan University, Taipei, Taiwan.
- Department of Medical Imaging, National Taiwan University Hospital, Taipei, Taiwan.
- Graduate Program of Data Science, National Taiwan University, Taipei, Taiwan.
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan.
- Department of Otolaryngology, National Taiwan University Hospital, Taipei, Taiwan.
- Division of Radiation Oncology, Department of Oncology, National Taiwan University Hospital, Taipei, Taiwan. Electronic address: [email protected].
- Department of Mathematics and Institute of Applied Mathematical Sciences, National Taiwan University, Taipei, Taiwan. Electronic address: [email protected].
Abstract
Accurate MRI-based target delineation for hypopharyngeal squamous cell carcinoma (HPSCC) is clinically important but expertise dependent. We aimed to develop a multicenter-validated tri-sequence deep-learning model, determine whether AI assistance narrows contouring expertise gap, and explore quality-aware low-overlap risk modeling to inform deployment support. This retrospective study included 727 HPSCC patients from three institutions. A tri-sequence 3D nnU-Net trained on the development cohort (n = 530) was evaluated in the internal test cohort (n = 37), external cohort 1 (n = 109), and external cohort 2 (n = 51) using Dice similarity coefficient (DSC), surface DSC, average symmetric surface distance (ASSD), and mean surface distance (MSD). Clinical utility was assessed in a randomized double-crossover study of the 51-case external cohort 2 involving three junior and three senior radiation oncologists, comparing manual with AI-assisted contouring by DSC, contouring time, Fleiss' κ, and 5-point Likert scores. For exploratory deployment-support analysis, MRI-quality features and auto-segmentation-derived tumor volume were used to characterize domain shift and perform XGBoost-based low-overlap classification (DSC < 0.75). Tri-sequence mean DSC was 0.87 ± 0.11 internally and 0.85 ± 0.14 and 0.82 ± 0.16 in external cohorts. In the reader study, AI assistance increased mean DSC in juniors from 0.73 ± 0.16 to 0.86 ± 0.14 and in seniors from 0.79 ± 0.13 to 0.84 ± 0.15, reduced contouring time by 55%, and improved Fleiss' κ from 0.69 ± 0.12 to 0.86 ± 0.12 (all p < 0.01). The multivariable low-overlap risk model achieved an area under the receiver operating characteristic curve of 0.89 internally and 0.71-0.78 externally. Deep-learning-assisted tri-sequence MRI segmentation enabled robust multicenter HPSCC delineation, improved contouring efficiency and consistency, and supports quality-aware analysis in radiotherapy planning.