Back to all papers

LEARNABLE HIERARCHICAL VISUAL CONTEXTS FOR TUMOR SEGMENTATION IN COMPUTED TOMOGRAPHY IMAGES.

May 20, 2026pubmed logopapers

Authors

Jiang J,Veeraraghavan H

Affiliations (1)

  • Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, 10065 NY, USA.

Abstract

Despite advances in deep learning (DL), automated tumor segmentation on computed tomography (CT) scans remains challenging for radiotherapy applications due to due to large variability in tumor shapes, appearance, and diffuse boundaries. We present LeVal-learnable visual query contexts that refines attention towards tumor-relevant regions for improved segmentation. LeVal combines task-agnostic learnable tokens called semantic contexts with task-specific query tokens. Semantic contexts cross attend to multi-scale features of a 3D Swin transformer encoder, which are jointly subject to 2-stage pretraining: (a) self-supervised learning (SSL) using 14,000 unlabeled CTs and (b), supervised pretraining for multi-organ segmentation use pseudo-contours generated by bespoke methods. Task queries are refined through cross-attention with semantic contexts, which then modulate the decoder output to generate segmentation. LeVal was evaluated across four public datasets involving pancreas, colon, adrenal, and head-and-neck cancers. It consistently outperformed existing methods. Leval also demonstrated stronger embedding separation between tumor and surrounding healthy tissues, indicating better discriminability. Code and model checkpoints will be made available through GitHub upon manuscript acceptance.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.