Back to all papers

Text-guided automatic segmentation of clinical target volume in rectal cancer radiotherapy.

March 18, 2026pubmed logopapers

Authors

Peng H,Liang Y,Wei S,Liu Q,Chen X,Tang Y,Men K,Dai J

Affiliations (3)

  • National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Radiation Oncology, Cancer Hospital, Chinese Academy of Medical Sciences, Chaoyang District, Beijing, 100021, China.
  • Radiation Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Cancer Hospital, Chinese Academy of Medical Sciences, Chaoyang District, Bei Jing, 100021, China.
  • Department of Radiation Oncology, National Cancer Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Cancer Hospital, Chinese Academy of Medical Sciences, Chaoyang District, Beijing, 100021, China.

Abstract

Current automatic segmentation models in radiotherapy, which are predominantly unimodal and image-based, have limited generalizability due to boundary ambiguity and the lack of guideline integration. This study proposes a text-guided segmentation network, termed TG-SegNet, for the automatic delineation of clinical target volumes (CTVs) in rectal cancer radiotherapy. Data from 567 preoperative patients with rectal cancer were retrospectively collected. Text prompts contained (i) patient case information (age, sex, tumor stage, tumor location, position) and (ii) guideline-derived descriptions indicating which CTV subsites should be included. TG-SegNet integrates computed tomography (CT)-derived visual features with structured clinical text prompts encoded by PubMedBERT, fused via cross-attention and fine-grained fusion. The model was trained on 452 patients and tested on 115. Its performance was compared with that of nnU-Net and two ablated variants (TG-SegNet without text prompts and TG-SegNet with simplified fusion). The evaluation comprised quantitative metrics, including Dice similarity coefficient (DSC), 95% Hausdorff distance (HD95), mean surface distance (MSD), surface DSC (S-DSC), and average path length (APL), along with blinded expert scoring and an efficiency analysis. In additional analyses, we conducted text-prompt and module ablations. TG-SegNet achieved the best performance across all quantitative metrics: DSC 0.927±0.022, HD95 7.01±6.05 mm, MSD 1.94±1.08 mm, S-DSC 0.799±0.074, and APL 7372±4452 (all p<0.01). In clinical evaluation, TG-SegNet significantly improved target coverage, guideline adherence, and overall clinical acceptability compared with nnU-Net and ablations (p<0.05), with boundary appropriateness comparable to nnU-Net. TG-SegNet had the shortest correction time (3.39±1.10 minutes), corresponding to 82.1% time savings versus manual delineation. Text-prompt ablations suggested that the CTV-subsite prompt component contributed more to performance. Module ablations showed that both cross-attention and fine-grained fusion were beneficial. By integrating clinical semantics with imaging, TG-SegNet demonstrated superior accuracy, efficiency, and clinical acceptability over nnU-Net and ablated models, highlighting its potential for clinical Translation.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.