X-UNet:A novel global context-aware collaborative fusion U-shaped network with progressive feature fusion of codec for medical image segmentation.
Authors
Affiliations (7)
Affiliations (7)
- School of Information Engineering, Southwest University of Science and Technology, Mianyang, 621010, PR China. Electronic address: [email protected].
- School of Information Engineering, Southwest University of Science and Technology, Mianyang, 621010, PR China. Electronic address: [email protected].
- School of Information Engineering, Southwest University of Science and Technology, Mianyang, 621010, PR China. Electronic address: [email protected].
- Mianyang Central Hospital, School of Medicine, University of Electronic Science and Technology of China, Mianyang, 621000, China; NHC Key Laboratory of Nuclear Technology Medical Transformation, Mianyang Central Hospital, Mianyang, 621000, China. Electronic address: [email protected].
- School of Information Engineering, Southwest University of Science and Technology, Mianyang, 621010, PR China. Electronic address: [email protected].
- School of Information Engineering, Southwest University of Science and Technology, Mianyang, 621010, PR China. Electronic address: [email protected].
- School of Information Engineering, Southwest University of Science and Technology, Mianyang, 621010, PR China. Electronic address: [email protected].
Abstract
Due to the inductive bias of convolutions, CNNs perform hierarchical feature extraction efficiently in the field of medical image segmentation. However, the local correlation assumption of inductive bias limits the ability of convolutions to focus on global information, which has led to the performance of Transformer-based methods surpassing that of CNNs in some segmentation tasks in recent years. Although combining with Transformers can solve this problem, it also introduces computational complexity and considerable parameters. In addition, narrowing the encoder-decoder semantic gap for high-quality mask generation is a key challenge, addressed in recent works through feature aggregation from different skip connections. However, this often results in semantic mismatches and additional noise. In this paper, we propose a novel segmentation method, X-UNet, whose backbones employ the CFGC (Collaborative Fusion with Global Context-aware) module. The CFGC module enables multi-scale feature extraction and effective global context modeling. Simultaneously, we employ the CSPF (Cross Split-channel Progressive Fusion) module to progressively align and fuse features from corresponding encoder and decoder stages through channel-wise operations, offering a novel approach to feature integration. Experimental results demonstrate that X-UNet, with fewer computations and parameters, exhibits superior performance on various medical image datasets.The code and models are available on https://github.com/XSJ0410/X-UNet.