MedFormer: hierarchical medical vision transformer with content-aware dual sparse selection attention.
Authors
Affiliations (1)
Affiliations (1)
- Chongqing University of Technology, No.69 Hongguang Avenue, Banan District, Chongqing 400054, China, Chongqing, 400050, CHINA.
Abstract
Medical image recognition serves as a key way to aid in clinical diagnosis, enabling more accurate and timely identification of diseases and abnormalities. Vision transformer-based approaches have proven effective in handling various medical recognition tasks. However, these methods encounter two primary challenges. First, they are often task-specific and architecture-tailored, limiting their general applicability. Second, they usually either adopt full attention to model long-range dependencies, resulting in high computational costs, or rely on handcrafted sparse attention, potentially leading to suboptimal performance. To tackle these issues, we present MedFormer, an efficient medical vision transformer with two key ideas. First, it employs a pyramid scaling structure as a versatile backbone for various medical image recognition tasks, including image classification and dense prediction tasks such as semantic segmentation and lesion detection. This structure facilitates hierarchical feature representation while reducing the computation load of feature maps, highly beneficial for boosting performance. Second, it introduces a novel Dual Sparse Selection Attention (DSSA) with content awareness to improve computational efficiency and robustness against noise while maintaining high performance. As the core building technique of MedFormer, DSSA is designed to explicitly attend to the most relevant content. Theoretical analysis demonstrates that MedFormer outperforms existing medical vision transformers in terms of generality and efficiency. Extensive experiments across various imaging modality datasets show that MedFormer consistently enhances performance in all three medical image recognition tasks mentioned above. MedFormer provides an efficient and versatile solution for medical image recognition, with strong potential for clinical application. The code is available on GitHub.