CLT-MambaSeg: An integrated model of Convolution, Linear Transformer and Multiscale Mamba for medical image segmentation.
Authors
Affiliations (2)
Affiliations (2)
- Department of Computer Science & Engineering, Indian Institute of Technology Indore, Indore 453552, India. Electronic address: [email protected].
- Department of Computer Science & Engineering, Indian Institute of Technology Indore, Indore 453552, India. Electronic address: [email protected].
Abstract
Recent advances in deep learning have significantly enhanced the performance of medical image segmentation. However, maintaining a balanced integration of feature localization, global context modeling, and computational efficiency remains a critical research challenge. Convolutional Neural Networks (CNNs) effectively capture fine-grained local features through hierarchical convolutions; however, they often struggle to model long-range dependencies due to their limited receptive field. Transformers address this limitation by leveraging self-attention mechanisms to capture global context, but they are computationally intensive and require large-scale data for effective training. The Mamba architecture has emerged as a promising approach, effectively capturing long-range dependencies while maintaining low computational overhead and high segmentation accuracy. Based on this, we propose a method named CLT-MambaSeg that integrates Convolution, Linear Transformer, and Multiscale Mamba architectures to capture local features, model global context, and improve computational efficiency for medical image segmentation. It utilizes a convolution-based Spatial Representation Extraction (SREx) module to capture intricate spatial relationships and dependencies. Further, it comprises a Mamba Vision Linear Transformer (MVLTrans) module to capture multiscale context, spatial and sequential dependencies, and enhanced global context. In addition, to address the problem of limited data, we propose a novel Memory-Guided Augmentation Generative Adversarial Network (MeGA-GAN) that generates synthetic realistic images to further enhance the segmentation performance. We conduct extensive experiments and ablation studies on the five benchmark datasets, namely CVC-ClinicDB, Breast UltraSound Images (BUSI), PH2, and two datasets from the International Skin Imaging Collaboration (ISIC), namely ISIC-2016 and ISIC-2017. Experimental results demonstrate the efficacy of the proposed CLT-MambaSeg compared to other state-of-the-art methods.