Dual-branch attention network with deep split convolution and multi-dimensional transformers for medical image segmentation.
Authors
Affiliations (3)
Affiliations (3)
- School of Public Health, Qiqihar Medical University, Qiqihar, 161003, China. [email protected].
- College of Pharmacy, Qiqihar Medical University, Qiqihar, 161003, China.
- School of Public Health, Qiqihar Medical University, Qiqihar, 161003, China.
Abstract
While the segmentation of anatomical structures and pathological regions is indispensable for reliable disease assessment, contemporary algorithms often fail to achieve sharp demarcation. This deficiency stems from the high degree of morphological heterogeneity across different cases, which results in obscured contours and compromised segmentation accuracy. To address this gap, we propose a dual-branch attention network (D3T-Net) that combines deep split convolution and multi-dimensional Transformer. This network introduces parallel CNN branches and Transformer branches in the encoder and decoder, respectively, to capture local details and model global contextual information. In the CNN branch, a deep split module (DSM) is designed to enhance local representations through multiple sub-branches and fusion attention mechanisms. The Transformer stream employs multi-dimensional modules to capture extensive spatial and channel-wise dependencies, thereby mitigating the localized limitations inherent in standard self-attention. To facilitate robust feature exchange between the two pathways, we introduce a direction-aware interaction attention (LA) module within the encoder, designed to accentuate critical structural characteristics across various orientations. In the decoder, a cross-attention mechanism is introduced to achieve feature reorganization and integration. Additionally, to enhance feature expression capabilities, the model adopts a multi-scale fusion skip connection mechanism between the encoder and decoder to achieve efficient feature transfer, improve boundary retention ability, and enhance the segmentation effect of small objects. Extensive evaluations reveal that D3T-Net surpasses contemporary benchmarks in segmenting the liver and associated lesions. Such advancements in automated image analysis effectively augment diagnostic accuracy, thereby offering robust support for precision medicine in hepatology.