FreqConvMamba: Frequency-guided hierarchical hybrid SSM-CNN for medical image segmentation.
Authors
Affiliations (5)
Affiliations (5)
- Institute of Big Data Science and Industry, Taiyuan, 030006, Shanxi, China; Key Laboratory of Evolutionary Science Intelligence of Shanxi Province, Taiyuan, 030006, Shanxi, China. Electronic address: [email protected].
- Institute of Big Data Science and Industry, Taiyuan, 030006, Shanxi, China; Key Laboratory of Evolutionary Science Intelligence of Shanxi Province, Taiyuan, 030006, Shanxi, China. Electronic address: [email protected].
- Institute of Big Data Science and Industry, Taiyuan, 030006, Shanxi, China; Key Laboratory of Evolutionary Science Intelligence of Shanxi Province, Taiyuan, 030006, Shanxi, China. Electronic address: [email protected].
- Institute of Big Data Science and Industry, Taiyuan, 030006, Shanxi, China; Key Laboratory of Evolutionary Science Intelligence of Shanxi Province, Taiyuan, 030006, Shanxi, China. Electronic address: [email protected].
- Institute of Big Data Science and Industry, Taiyuan, 030006, Shanxi, China; Key Laboratory of Evolutionary Science Intelligence of Shanxi Province, Taiyuan, 030006, Shanxi, China. Electronic address: [email protected].
Abstract
Accurate segmentation of medical images is a fundamental prerequisite for quantitative disease diagnosis, treatment planning, and computational pathology. Although convolutional neural networks (CNNs) and Mamba-based approaches have shown promise in this domain, each comes with distinct strengths and limitations. To address these challenges, we propose a novel hierarchical network named FreqConvMamba. The core innovation of this architecture lies in its frequency-guided feature extraction mechanism, which enables simultaneous modeling of both local and global information across spatial and frequency domains. Furthermore, the integration of Haar wavelet transformation decomposes features into different frequency components, thereby enhancing the representation of fine details such as anatomical boundaries. We also introduce a Frequency Position Encoding (FPE) module that incorporates positional encoding along the frequency dimension, embedding spatial structural awareness while preserving the discriminative nature of frequency representations. This design effectively mitigates the lack of spatial perception in frequency-domain features and significantly improves the efficiency of frequency-aware feature extraction. Experimental evaluations on five public datasets spanning three imaging modalities demonstrate that FreqConvMamba outperforms state-of-the-art methods across multiple performance metrics. Code is available at: https://github.com/ccode-Rookie/FreqConvMamba.