An Ultra-Lightweight Cross-scale Attention Mamba Network for Accurate Skin Lesion Segmentation.
Authors
Abstract
Accurate skin lesion segmentation is essential for the early detection and effective management of skin cancer. Existing deep learning architectures are constrained by a persistent computational-precision trade-off: models either require heavy computation that limits practical deployment, or accept reduced accuracy when running in resource-limited environments. Recent approaches reach strong performance by integrating advanced modules, but this also introduces complexity that restricts practical use. We address this limitation with the Cross-scale Attention Mamba Network (UCA-MNet), a hierarchical encoder-decoder that integrates three components: a multi-scale feature encoder, a precision-focused fusion decoder, and a Feature Compression and Fusion Module (FCFM). The central component is the Multi-Scale Module (MSM), which uses a bidirectional Mamba (Bi-Mamba) to model long-range spatial dependencies with linear computational complexity, a clear advantage over the quadratic cost of transformer methods. Combined with Cross-Scale Attention (CSA) and Pyramidal Squeeze Attention (PSA), UCA-MNet captures both fine-grained local textures and global contextual information. Experiments on the ISIC-2017, ISIC-2018, and PH<sup>2</sup> benchmarks show that UCA-MNet reaches F1 scores of 0.8254, 0.8814, and 0.9202, and mIoU scores of 0.8180, 0.8515, and 0.8604 respectively, using only 0.33 million parameters and 4.3 GFLOPs. This makes UCA-MNet about 83× smaller than VM-UNET, a leading existing model, while achieving competitive accuracy. The findings indicate that effective medical image segmentation can be achieved with much lower computational requirements, supporting deployment in resource-constrained environments. The code is available at: https://github.com/razanharith/UCA-MNet.