Back to all papers

Multimodal deep learning for laryngeal squamous cell carcinoma staging using CT and laryngoscopy.

January 30, 2026pubmed logopapers

Authors

Liu R,Zhou Y,Wang R,Chen X,Yang Y,Jiang H,Xie K,Ning Y,Deng Y,Yu Q,Xu L,Hu G,Peng J

Affiliations (5)

  • Department of Radiology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China.
  • Department of Otorhinolaryngology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China.
  • School of Intelligent Medicine, Chengdu University of TCM, Chengdu, China. [email protected].
  • Department of Otorhinolaryngology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China. [email protected].
  • Department of Radiology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China. [email protected].

Abstract

To develop and validate a multimodal deep learning model integrating clinical data, contrast-enhanced CT, and laryngoscopic images for differentiating early-stage (I-II) from advanced-stage (III-IV) laryngeal squamous cell carcinoma (LSCC). This retrospective multicenter study included 450 patients with pathologically confirmed LSCC from two Chinese medical centers. All patients had contrast-enhanced CT, white-light laryngoscopy, and clinical records. They were divided into training (n = 235), internal validation (n = 101), and external validation (n = 114) cohorts. Three single-modality models (CT-based deep learning [CT-DL], laryngoscopy-based multiple instance learning [L-MIL], and a clinical logistic regression model [CL]) and their combinations were compared. A feature-level fusion strategy was applied, and the final integrated multimodal model (CL + CT + L) was built using a stochastic gradient descent (SGD) classifier. Performance was evaluated by AUC, accuracy, sensitivity, specificity, calibration, and decision curve analysis (DCA), with prognostic value assessed by Kaplan-Meier and concordance index (C-index). A total of 450 patients were included (median age, 62 years [range, 31-88]; 365 men). The integrated multimodal model achieved AUCs of 0.902 (0.833-0.954) in the internal cohort and 0.888 (0.826-0.944) in the external cohort, outperforming all single- and dual-modality models (p < 0.05). Calibration and DCA confirmed strong consistency and clinical utility. The model categorized patients into distinct risk groups, which exhibited notable differences in progression-free survival (C-index = 0.584, p = 0.036). The integrated multimodal model showed high accuracy and generalizability for preoperative LSCC staging and may aid individualized treatment planning. Question Can a multimodal deep learning model combining clinical, CT, and laryngoscopic data improve preoperative staging accuracy of LSCC? Findings The integrated multimodal model achieved higher diagnostic accuracy and provided reliable prognostic stratification compared with conventional approaches. Clinical relevance This multimodal model offers a non-invasive, accurate, and generalizable tool for LSCC staging, supporting individualized treatment planning and enhancing patient management.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 9,500+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.