PDAFormer 3+: A full-scale connected modified transformer with parallel dual attention for 3D medical image segmentation.
Authors
Affiliations (3)
Affiliations (3)
- School of Automation, Beijing Institute of Technology, Beijing 100081, China. Electronic address: [email protected].
- School of Automation, Beijing Institute of Technology, Beijing 100081, China.
- School of Medical Technology, Beijing Institute of Technology, Beijing 100081, China.
Abstract
Medical image segmentation is essential for enhancing diagnostic and therapeutic accuracy, improving healthcare efficiency, and advancing medical research. In recent years, transformers have gained increasing attention in medical image segmentation owing to their ability to capture long-range dependencies, effectively compensating for the limitations of convolutional neural networks (CNNs) in global context modeling. This paper proposes PDAFormer 3+, a full-scale connected 3D medical image segmentation framework that integrates a parallel dual-attention modified transformer. Specifically, we introduce a parallel dual attention (PDA) mechanism to replace the conventional self-attention mechanism in transformers, enabling parallel modeling of global dependencies in both spatial and channel dimensions. In addition, the multi-layer perceptron (MLP) in transformers is replaced with a residual convolution block (RCB) as the feed-forward network, reducing computational complexity while enhancing the local representations. Inspired by U-Net 3+, we further incorporate full-scale features at each network layer and design a convolution excitation module (CEM) to enhance fused features. A deep supervision strategy is employed to perform multi-level representation learning from the aggregated feature maps. Notably, the introduction of linear mappings and convolution modules enables the modified transformer to be applied to large-scale 3D medical images, significantly reducing the computational burden of the network. Extensive experiments on Synapse, Automated Cardiac Diagnosis Challenge (ACDC), and type-B aortic dissection (type-B AD) show that PDAFormer 3+ achieves 86.90%, 92.54%, and 91.93% mean Dice Similarity Coefficient (DSC), respectively, while maintaining strong efficiency. Overall, PDAFormer 3+ couples the complementary strengths of CNNs (local detail) and transformers (global context) to deliver accurate and efficient 3D medical image segmentation. Our code is publicly available at https://github.com/BitGyy/PDAFormer.