SwiftMSeg: lightweight multi-scale local-global context modeling with transformer for medical image segmentation.

June 7, 2026

papers

DOI: 10.1038/s41598-026-56845-3 PMID: 42252324

Authors

Rony JH,Hossain MS,Siddiqui FH

Affiliations (2)

Department of CSE, Dhaka University of Engineering and Technology, Gazipur, Bangladesh.
School of Informatics, Kochi University of Technology, Kami, 782-8502, Japan. [email protected].

Abstract

Accurate medical image segmentation requires both fine boundary localization and robust contextual understanding, which is often difficult to achieve simultaneously, particularly in lightweight architectures. In this paper, we propose SwiftMSeg, a lightweight encoder-decoder framework that integrates a convolutional encoder, a transformer-based local-global-local module, and a hierarchical multi-scale decoder. The proposed framework addresses the boundary-context challenge by effectively combining progressive multi-scale refinement for fine boundary separation with global context modeling through long-range dependency aggregation. Extensive evaluations on publicly available colonoscopy, pathology, ultrasound, and magnetic resonance imaging datasets demonstrated the capability of SwiftMSeg to accurately segment diverse anatomical structures, ranging from tiny nuclei to polyps and large tumor regions. The model further demonstrated moderate domain-independent generalization on an external dataset, achieving Dice scores of 0.896 (colonoscopy), 0.860 (pathology), 0.850 (ultrasound), and 0.870 (MRI), consistently outperforming most baseline methods. In addition, it achieved improved boundary localization with lower Hausdorff distance (e.g., 16.43 in MRI and 33.89 in ultrasound) and reduced average symmetric surface distance, indicating more precise and stable segmentation. Statistical analysis further confirmed that the improvements of SwiftMSeg are significant ([Formula: see text]) with large effect sizes across modalities, validated by both paired t-tests and Wilcoxon tests. Despite its strong performance, SwiftMSeg remains highly efficient, requiring only 4.48M parameters and 0.940 giga floating-point operations per second (GFLOPs), reducing computational cost by approximately ∼53× compared to the U-Net-based baselines (standard U-Net ∼31M parameters and ∼50 GFLOPs), while maintaining high segmentation accuracy. These results highlight the effectiveness of SwiftMSeg as a practical and scalable solution for real-world medical image segmentation across diverse modalities.

View Source Full Text PDF

Topics

Journal Article

SwiftMSeg: lightweight multi-scale local-global context modeling with transformer for medical image segmentation.

Authors

Affiliations (2)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?