Benchmarking Deep Learning Models for Laryngeal Cancer Staging Using the LaryngealCT Dataset
Authors
Abstract
Laryngeal cancer imaging research lacks standardised datasets to enable reproducible deep learning (DL) model development. We present LaryngealCT, a curated benchmark of 1,029 computed tomography (CT) scans aggregated from six collections from The Cancer Imaging Archive (TCIA). Uniform 1 mm isotropic volumes of interest encompassing the larynx were extracted using a weakly supervised parameter search framework validated by clinical experts. 3D DL architectures (3D CNN, ResNet18,50,101, DenseNet121) were benchmarked on (i) early (Tis,T1,T2) vs. advanced (T3,T4) and (ii) T4 vs. non-T4 classification tasks. 3D CNN (AUC-0.881, F1-macro-0.821) and ResNet18 (AUC-0.892, F1-macro-0.646) respectively outperformed the other models in the two tasks. Model explainability assessed using 3D GradCAMs with thyroid cartilage overlays revealed greater peri-cartilage attention in non-T4 cases and focal activations in T4 predictions. Through open-source data, pretrained models, and integrated explainability tools, LaryngealCT offers a reproducible foundation for AI-driven research to support clinical decisions in laryngeal oncology.