Lightweight Multi-Stage Aggregation Transformer for robust medical image segmentation.
Authors
Affiliations (5)
Affiliations (5)
- School of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, 310023, China; Zhejiang Key Laboratory of Visual Information Intelligent Processing, Hangzhou, 310023, China.
- School of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, 310023, China; Zhejiang Key Laboratory of Visual Information Intelligent Processing, Hangzhou, 310023, China. Electronic address: [email protected].
- The Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, 310009, China. Electronic address: [email protected].
- School of Life and Health Sciences, Fujian Fuyao University of Science and Technology, Fuzhou, 350000, China; Nanjing Jingsan Medical Science and Technology, Ltd., Jiangsu, China.
- School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, 300384, China.
Abstract
Capturing rich multi-scale features is essential to address complex variations in medical image segmentation. Multiple hybrid networks have been developed to integrate the complementary benefits of convolutional neural networks (CNN) and Transformers. However, existing methods may suffer from either huge computational cost required by the complicated networks or unsatisfied performance of lighter networks. How to give full play to the advantages of both convolution and self-attention and design networks that are both effective and efficient still remains an unsolved problem. In this work, we propose a robust lightweight multi-stage hybrid architecture, named Multi-stage Aggregation Transformer version 2 (MA-TransformerV2), to extract multi-scale features with progressive aggregations for accurate segmentation of highly variable medical images at a low computational cost. Specifically, lightweight Trans blocks and lightweight CNN blocks are parallelly introduced into the dual-branch encoder module in each stage, and then a vector quantization block is incorporated at the bottleneck to discretizes the features and discard the redundance. This design not only enhances the representation capabilities and computational efficiency of the model, but also makes the model interpretable. Extensive experimental results on public datasets show that our method outperforms state-of-the-art methods, including CNN-based, Transformer-based, advanced hybrid CNN-Transformer-based models, and several lightweight models, in terms of both segmentation accuracy and model capacity. Code will be made publicly available at https://github.com/zjmiaprojects/MATransformerV2.