3DWaFusion: Three-Dimensional Multiscale Wavelet Convolutional Neural Network for Multimodal Medical Image Fusion.
Authors
Affiliations (2)
Affiliations (2)
- MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China.
- Jinling Clinical Medical College, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China.
Abstract
Multimodal image fusion is a promising technology designed to fuse information from different medical sensors, which offer structured insights for disease diagnosis and treatment. However, existing 2D-centric fusion methods fail to capture 3D spatial continuity, and conventional wavelet-based approaches lack adaptability to diverse lesion regions and suffer from background artifacts. To address this issue, we propose a 3D multiscale wavelet convolutional neural network for multimodal medical image fusion. Specifically, a 3D Discrete Wavelet Transformation (3D DWT) is introduced to decompose input volumes into multi-frequency bands, isolating anatomical structures and lesion details while reducing 3D spatial redundancy. We embed hierarchical multiple frequency band into a Global and Local Feature Calibration (GLFC) module to adaptively enhance single-modal features by fusing global contextual information and local details. Furthermore, a pyramid group-wise multiscale feature interaction is proposed for capturing complementary features across different spatial scales. Finally, a voxel-wise weighted averaging strategy reconstructs the fused image by adaptively assigning contributions to each modality at every spatial position, effectively eliminating artifacts and improving the visual fidelity of the result. Extensive experiments on the BraTS2020 and Hecktor datasets demonstrate that our proposed method outperforms state-of-the-art (SOTA) fusion methods in both subjective visual quality and objective metrics. Moreover, downstream segmentation validation confirms that fused images from our method significantly improve tumor segmentation accuracy. The source code and pre-trained models will be publicly available.