MMSeg: Multi-scale Vision Mamba for Lightweight Generalizable Medical Image Segmentation.
Authors
Affiliations (2)
Affiliations (2)
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, 200000, China.
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, 200000, China. [email protected].
Abstract
Medical image segmentation plays a crucial role in computer-aided diagnosis and treatment planning, yet existing methods suffer from limited cross-domain generalization and excessive computational complexity. Despite the recent Segment Anything Model (SAM) having demonstrated remarkable potential in medical image analysis, its massive Vision Transformer (ViT) architecture incurs quadratic computational complexity, hindering deployment in resource-constrained clinical environments. To address these limitations, we propose MMSeg, a lightweight and efficient medical image segmentation framework based on the Vision Mamba architecture. Specifically, we introduce a Multi-scale Lightweight Mamba Encoder (MLME) that exploits multi-resolution pyramid structures and linear computational complexity to enable parallel processing of multi-scale features with superior efficiency. Furthermore, we design an Automatic Domain Matching Decoder (ADMD) that automatically identifies and aligns feature distributions across different medical domains, dynamically calibrating domain discrepancies to enhance cross-domain generalization. To reduce dependency on extensive pre-training data and accelerate convergence, we incorporate a Feature Distillation Module (FDM) that leverages few-shot knowledge distillation from teacher models to obtain enriched feature representations. Extensive experiments on diverse medical datasets demonstrate that MMSeg consistently outperforms state-of-the-art methods while maintaining computational efficiency. Our framework exhibits superior cross-domain generalization capabilities, making it well-suited for practical clinical deployment.