A Multi-Resolution Hybrid CNN-Transformer Network With Scale-Guided Attention for Medical Image Segmentation.

June 11, 2025

DOI: 10.1109/JBHI.2025.3578625 PMID: 40498622

Authors

Zhu S,Li Y,Dai X,Mao T,Wei L,Yan Y

Abstract

Medical image segmentation remains a challenging task due to the intricate nature of anatomical structures and the wide range of target sizes. In this paper, we propose a novel U -shaped segmentation network that integrates CNN and Transformer architectures to address these challenges. Specifically, our network architecture consists of three main components. In the encoder, we integrate an attention-guided multi-scale feature extraction module with a dual-path downsampling block to learn hierarchical features. The decoder employs an advanced feature aggregation and fusion module that effectively models inter-dependencies across different hierarchical levels. For the bottleneck, we explore multi-scale feature activation and multi-layer context Transformer modules to facilitate high-level semantic feature learning and global context modeling. Additionally, we implement a multi-resolution input-output strategy throughout the network to enrich feature representations and ensure fine-grained segmentation outputs across different scales. The experimental results on diverse multi-modal medical image datasets (ultrasound, gastrointestinal polyp, MR, and CT images) demonstrate that our approach can achieve superior performance over state-of-the-art methods in both quantitative measurements and qualitative assessments. The code is available at https://github.com/zsj0577/MSAGHNet.

View Source Full Text PDF

Topics

Journal Article

A Multi-Resolution Hybrid CNN-Transformer Network With Scale-Guided Attention for Medical Image Segmentation.

Authors

Abstract

Tags

Topics

Ready to Sharpen Your Edge?