LEARNABLE HIERARCHICAL VISUAL CONTEXTS FOR TUMOR SEGMENTATION IN COMPUTED TOMOGRAPHY IMAGES.
Authors
Affiliations (1)
Affiliations (1)
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, 10065 NY, USA.
Abstract
Despite advances in deep learning (DL), automated tumor segmentation on computed tomography (CT) scans remains challenging for radiotherapy applications due to due to large variability in tumor shapes, appearance, and diffuse boundaries. We present LeVal-learnable visual query contexts that refines attention towards tumor-relevant regions for improved segmentation. LeVal combines task-agnostic learnable tokens called semantic contexts with task-specific query tokens. Semantic contexts cross attend to multi-scale features of a 3D Swin transformer encoder, which are jointly subject to 2-stage pretraining: (a) self-supervised learning (SSL) using 14,000 unlabeled CTs and (b), supervised pretraining for multi-organ segmentation use pseudo-contours generated by bespoke methods. Task queries are refined through cross-attention with semantic contexts, which then modulate the decoder output to generate segmentation. LeVal was evaluated across four public datasets involving pancreas, colon, adrenal, and head-and-neck cancers. It consistently outperformed existing methods. Leval also demonstrated stronger embedding separation between tumor and surrounding healthy tissues, indicating better discriminability. Code and model checkpoints will be made available through GitHub upon manuscript acceptance.