A dual attention and cross layer fusion network with a hybrid CNN and transformer architecture for medical image segmentation.
Authors
Affiliations (4)
Affiliations (4)
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004, China.
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, Nanning, 530004, China.
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004, China. [email protected].
- Parallel and Distributed Laboratory, Guangxi University, Nanning, 530004, China. [email protected].
Abstract
Medical image segmentation is a crucial technology for disease diagnosis and treatment planning. However, current approaches face challenges in capturing global semantic dependencies and integrating cross-layer features. While Convolutional Neural Networks (CNNs) excel at extracting local features, they struggle with long-range dependencies; Transformers effectively model global context but may compromise spatial details. To address these limitations, this paper proposes a novel hybrid CNN-Transformer architecture, Dual Attention and Cross-layer Fusion Network (DCF-Net). Based on an encoder-decoder framework, DCF-Net introduces two key modules: the Channel-Adaptive Sparse Attention (CASA) module and the Synergistic Skip-connection and Cross-layer Fusion (SSCF) module. Specifically, CASA enhances semantic modeling by filtering critical features and focusing on anatomically important regions, while SSCF enables effective hierarchical feature fusion by bridging encoder-decoder representations. Extensive experiments on the Synapse, ACDC, and ISIC2017 datasets demonstrate that DCF-Net achieves state-of-the-art performance without pre-training. This work highlights the value of cross-layer fusion and attention mechanism, providing a robust and generalizable solution for medical image segmentation tasks.