DCTC-Net: Dual-Branch Cross-Fusion Transformer-CNN Architecture for Medical Image Segmentation.
Authors
Abstract
Hybrid architectures that combine convolutional neural networks (CNNs) with Transformers have emerged as a promising approach for medical image segmentation. However, existing networks based on this hybrid architecture often encounter two challenges. First, while the CNN branch effectively captures local image features through convolution operations, vanilla convolution lacks the ability to achieve adaptive feature extraction. Second, although the Transformer branch can model global image information, conventional self-attention (SA) primarily focuses on spatial relationships, neglecting channel and cross-dimensional attention, leading to suboptimal segmentation results, particularly for medical images with complex backgrounds. To address these limitations, we propose a dual-branch cross-fusion Transformer-CNN architecture for medical image segmentation (DCTC-Net). Our network provides two key advantages. First, a dynamic deformable convolution (DDConv) is integrated into the CNN branch to overcome the limitations of adaptive feature extraction with fixed-size convolution kernels and also eliminate the issue of shared convolution kernel parameters across different inputs, significantly enhancing the feature expression capabilities of the CNN branch. Second, a (shifted)-window adaptive complementary attention module ((S)W-ACAM) and compact convolutional projection are incorporated into the Transformer branch, enabling the network to comprehensively learn cross-dimensional long-range dependencies in medical images. Experimental results demonstrate that the proposed DCTC-Net achieves superior medical image segmentation performance compared to state-of-the-art (SOTA) methods, including CNN and Transformer networks. In addition, our DCTC-Net requires fewer parameters and lower computational costs and does not rely on pretraining.