A Dual-Branch Lightweight Network for Multimodal Image Fusion with Mamba and INN.
Authors
Affiliations (2)
Affiliations (2)
- Xinjiang Laboratory of Phase Transitions and Microstructures in Condensed Matter Physics, Yili Normal University, Yining 835000, China.
- School of Electronic Engineering, Yili Normal University, Yining 835000, China.
Abstract
Multimodal image fusion aims to integrate complementary information from heterogeneous imaging modalities into a single informative image. However, many deep learning-based fusion methods rely on complex feature extractors, leading to high computational cost and limited suitability for real-time deployment on resource-constrained devices. To address this issue, this paper proposes a lightweight Mamba-INN dual-branch network for efficient multimodal image fusion. The proposed model decouples global structure modeling from local detail preservation. A simplified Mamba-inspired branch is designed to capture long-range contextual dependencies, while a lightweight invertible neural network branch preserves high-frequency textures and edge information through information-preserving transformations. The lightweight INN branch preserves high-frequency texture and edge information during the forward feature transformation process through reversible feature partitioning, coupled transformations, and exponential scale modulation, thereby reducing the loss of detail caused by feature compression. Compact shallow feature refinement, module reuse, low-dimensional channel design, and a streamlined decoder are further introduced to reduce redundant computation. Experiments on infrared-visible and medical image fusion benchmarks, including MSRS, TNO, RoadScene, MRI-CT, MRI-PET, and MRI-SPECT datasets, demonstrate that the proposed method achieves competitive fusion quality with low model complexity. The proposed method achieves performance comparable to or better than that of methods such as CDDFuse, U2Fusion, CNN and SDNet on metrics including MI, VIF, Qabf, and SSIM for infrared-visible and medical image fusion tasks, while containing only 0.24 million parameters and requiring 24.04 GFLOPs of computational power at an input resolution of 256 × 256. Compared to CDDFuse, our method significantly reduces model complexity, enhancing the potential for lightweight deployment while maintaining fusion quality.