HMC-transducer: hierarchical mamba-CNN transducer for robust liver tumor segmentation.
Authors
Affiliations (11)
Affiliations (11)
- Department of Hepatopancreatobiliary Surgery, The First Affiliated Hospital of Ningbo University, Ningbo, Zhejiang, China.
- Department of Oncology, The Second Affiliated Hospital of Zhejiang Chinese Medical University, Hangzhou, Zhejiang, China.
- Zhejiang Key Laboratory of Blood-Stasis-Toxin Syndrome, Zhejiang Chinese Medical University, Hangzhou, Zhejiang Province, China.
- Jinling Hospital, Affiliated Hospital of Medical School, Nanjing University, Nanjing, Jiangsu Province, China.
- School of Basic Medical Sciences, Zhejiang Chinese Medical University, Hangzhou, Zhejiang Province, China.
- Traditional Chinese Medicine "Preventing Disease" Wisdom Health Project Research Center of Zhejiang, Hangzhou, Zhejiang Province, China.
- Emergency Department, The First Affiliated Hospital of Ningbo University, Ningbo, Zhejiang, China.
- Department of Hepatobiliary Surgery, The Second Hospital of Hebei Medical University, Shijiazhuang, Hebei, China. [email protected].
- Department of Oncology, Tongde Hospital of Zhejiang Province, Hangzhou, Zhejiang Province, China. [email protected].
- Zhejiang Key Laboratory of Disease-Syndrome Integration for Cancer Prevention and Treatment, Tongde Hospital of Zhejiang Province, Hangzhou, Zhejiang Province, China. [email protected].
- Emergency Department, The First Affiliated Hospital of Ningbo University, Ningbo, Zhejiang, China. [email protected].
Abstract
Accurate segmentation of liver tumors from computed tomography (CT) scans is critical for clinical diagnosis and treatment planning, yet it remains a significant challenge due to the high variability in tumor shape, size, and indistinct boundaries. Existing deep learning models, dominated by convolutional neural networks (CNNs) and Transformers, face a fundamental trade-off: CNNs excel at capturing local features but are limited in modeling long-range spatial dependencies, while transformers capture global context at a computationally prohibitive quadratic cost for high-resolution 3D volumes. To overcome this, we propose the hierarchical mamba-CNN transducer (HMC-transducer), a novel and efficient hybrid architecture. Our model synergistically integrates the strengths of CNNs with the linear-complexity long-range modeling capabilities of Mamba, a recent state space model. The core innovations are twofold: (1) a direction-aware 3D Mamba (DA3D-Mamba) block, specifically designed to process volumetric data by preserving spatial topology along all three axes, and (2) a Mamba-CNN Transducer block with a gated fusion mechanism that learns to adaptively weigh and combine local and global features at each level of the network hierarchy. Extensive experiments on multiple public benchmarks, including LiTS17, MSD-liver, and KiTS21, demonstrate that our HMC-Transducer not only sets a new state-of-the-art in segmentation accuracy but also exhibits superior generalization and computational efficiency compared to leading CNN- and transformer-based methods. Our work presents a significant step towards developing generalizable and practical segmentation models for clinical applications.