UNet with self-adaptive Mamba-like attention and causal-resonance learning for medical image segmentation.
Authors
Affiliations (6)
Affiliations (6)
- Faculty of Computing and IT (FCIT), Sohar University, Sohar, 311, Oman.
- Department of Intelligent Systems, KTH Royal Institute of Technology, 10044, Stockholm, Sweden.
- College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, Saudi Arabia. [email protected].
- Auckland Bioengineering Institute, University of Auckland, P. O. Box 1010, Auckland, New Zealand.
- College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, Saudi Arabia.
- Department of Computer Science, Faculty of Science, Northern Border University, Arar, 73213, Kingdom of Saudi Arabia.
Abstract
Medical image segmentation plays an important role in various clinical applications, but existing deep learning models face trade-offs between efficiency and accuracy. Convolutional Neural Networks (CNNs) capture local details well but miss the global context, whereas transformers handle the global context but at a high computational cost. Recently, State Space Sequence Models (SSMs) have shown potential for capturing long-range dependencies with linear complexity, but their direct use in medical image segmentation remains limited due to incompatibility with image structures and autoregressive assumptions. To overcome these challenges, we propose SAMA-UNet, a novel U-shaped architecture that introduces two key innovations. First, the Self-Adaptive Mamba-like Aggregated Attention (SAMA) block adaptively integrates local and global features through dynamic attention weighting, enabling an efficient representation of complex anatomical patterns. Second, the causal resonance multi-scale module (CR-MSM) improves encoder-decoder interactions by adjusting feature resolution and causal dependencies across scales, enhancing the semantic alignment between low- and high-level features. Extensive experiments on MRI, CT, and endoscopy datasets demonstrate that SAMA-UNet consistently outperforms CNN, Transformer, and Mamba-based methods. It achieves 85.38% DSC and 87.82% NSD on BTCV, 92.16% and 96.54% on ACDC, 67.14% and 68.70% on EndoVis17, and 84.06% and 88.47% on ATLAS23, establishing new benchmarks across modalities. These results confirm the effectiveness of SAMA-UNet in combining efficiency with accuracy, making it a promising solution for real-world clinical segmentation tasks. The source code is available on https://github.com/sqbqamar/SAMA-UNet.