MDFormer: a multi-scale dense dilated transformer model for 3D medical image segmentation.
Authors
Affiliations (3)
Affiliations (3)
- School of Medical and Information Engineering, Key Laboratory of Prevention and Treatment of Cardiovascular and Cerebrovascular Diseases, Ministry of Education, Jiangxi Provincial Key Laboratory of Tissue Engineering (2024SSY06291), Gannan Medical University, Ganzhou, 341000, China.
- Science & Technology Institute, Wuhan Textile University, Wuhan, 430200, China.
- School of Medical and Information Engineering, Key Laboratory of Prevention and Treatment of Cardiovascular and Cerebrovascular Diseases, Ministry of Education, Jiangxi Provincial Key Laboratory of Tissue Engineering (2024SSY06291), Gannan Medical University, Ganzhou, 341000, China. [email protected].
Abstract
To improve the precision of medical image segmentation for enhanced clinical diagnosis and treatment, this study focuses on overcoming the limitations of existing models in capturing multi-scale information under resolution constraints while maintaining efficiency without compromising accuracy. We developed a multi-scale dense dilated Transformer (MDFormer), which integrates the multi-scale dense dilated self-attention (MDDSA) module. This module dynamically adjusts the size of the dense dilated matrix based on the resolution characteristics, enabling spatial downsampling at a fixed resolution and cross-scale information aggregation to effectively reduce computational costs. Our model achieved DSC of 92.7% on the ACDC dataset and 86.88% on the Synapse dataset, with 37.7 M parameters and 47.39G FLOPs. Paired T-tests were performed, demonstrating statistically significant improvements in segmentation performance compared to other models. The proposed MDFormer significantly enhances medical image segmentation through the effective leveraging of multi-scale information extraction, demonstrating promising potential in various medical image segmentation tasks and laying a solid foundation for future advancements in transformer-based models.