A Multi-Resolution Hybrid CNN-Transformer Network With Scale-Guided Attention for Medical Image Segmentation.

Authors

Zhu S,Li Y,Dai X,Mao T,Wei L,Yan Y

Abstract

Medical image segmentation remains a challenging task due to the intricate nature of anatomical structures and the wide range of target sizes. In this paper, we propose a novel U -shaped segmentation network that integrates CNN and Transformer architectures to address these challenges. Specifically, our network architecture consists of three main components. In the encoder, we integrate an attention-guided multi-scale feature extraction module with a dual-path downsampling block to learn hierarchical features. The decoder employs an advanced feature aggregation and fusion module that effectively models inter-dependencies across different hierarchical levels. For the bottleneck, we explore multi-scale feature activation and multi-layer context Transformer modules to facilitate high-level semantic feature learning and global context modeling. Additionally, we implement a multi-resolution input-output strategy throughout the network to enrich feature representations and ensure fine-grained segmentation outputs across different scales. The experimental results on diverse multi-modal medical image datasets (ultrasound, gastrointestinal polyp, MR, and CT images) demonstrate that our approach can achieve superior performance over state-of-the-art methods in both quantitative measurements and qualitative assessments. The code is available at https://github.com/zsj0577/MSAGHNet.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.