Back to all papers

Leveraging vision transformer for histological grade prediction in laryngeal and hypopharyngeal squamous cell carcinoma: a large-scale multicenter study.

December 27, 2025pubmed logopapers

Authors

Guo R,Qu X,Tian S,Li Z,Wang X,Sun Z,Xin R,Xian J

Affiliations (5)

  • Department of Radiology, Beijing Tongren Hospital, Capital Medical University, Beijing, China.
  • Department of Radiology, Beijing Luhe Hospital, Capital Medical University, Beijing, China.
  • Philips Healthcare, Beijing, China.
  • Department of Radiology, Linyi People's Hospital, Linyi, China.
  • Department of Radiology, Beijing Tongren Hospital, Capital Medical University, Beijing, China. [email protected].

Abstract

Pretreatment determination of histological differentiation grade is critical for prognostic evaluation in laryngeal and hypopharyngeal squamous cell carcinoma (LHSCC). This study aimed to develop a contrast-enhanced CT (CECT)-based Vision Transformer (ViT) model for noninvasive evaluation of histological grades in LHSCC. In this retrospective multicenter study, a total of 1,648 LHSCC patients who underwent CECT scans were enrolled from three hospitals in this study. Participants were divided into a training cohort (n = 1,239), an internal validation cohort (n = 310) from one hospital, and an external validation cohort (n = 99) from the other two hospitals. The diagnostic model integrates a pre-trained ViT for CECT feature extraction and an XGBoost classifier for prediction. The model's predictive performance was evaluated using the area under the curve (AUC), decision curve analysis (DCA), and calibration curve. The ViT model achieved AUCs of 0.887 (95%CI: 0.848-0.927) in internal validation and 0.796 (95%CI: 0.693-0.899) in external validation cohorts, significantly outperforming the conventional radiomics model (AUCs: 0.775, 95%CI: 0.714-0.837 and 0.544, 95%CI: 0.388-0.699; p < 0.001 and 0.002, respectively). Clinically, DCA demonstrated superior clinical utility, while calibration curves showed excellent prediction reliability. Gradient-weighted Class Activation Mapping visualization identified CT image regions most influential for the model's predictions, providing interpretability for clinical decision-making. The ViT-based deep learning model developed in this study using CECT demonstrated excellent predictive performance for histological grading of LHSCC, with promising application for patient prognosis assessment.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 7,700+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.