SpectFusion: Cross-modal Spectrum-aware Attention Network for Unsupervised Multimodal Medical Image Fusion.
Authors
Abstract
Medical image fusion aims to synthesize relevant and complementary information from different modalities, thereby enhancing clinical diagnosis. Current deep learning-based fusion approaches, particularly Transformer-based architectures, have achieved remarkable results due to their strong capacity for modeling long-range dependencies. However, there are still limitations in capturing sufficient global information because of the window-based local attention mechanism. Moreover, existing fusion schemes predominantly focus on spatial features while rarely considering spectral features, thus affecting the fusion performance. To address these challenges, we propose a new unsupervised cross-modal spectrum-aware fusion framework, named SpectFusion, for medical image fusion. Specifically, we devise a spatial-spectrum hybrid block, which effectively extracts fine-grained local features via a gradient retention strategy in the spatial domain, and captures global features with an image-wide receptive field through Fourier convolution in the frequency domain. Furthermore, we develop a novel cross-modal spectrum-aware attention to facilitate spatial-spectrum information interactions during fusion. It dynamically guides the retention of relevant spectral components while integrating multimodal spatial features. Additionally, to achieve more precise alignment image pairs, we incorporate a refined registration module to correct minor local deviations. We also define corresponding frequency and spatial domain losses to jointly constrain the proposed SpectFusion. By leveraging spatial-spectrum information interactions, fine-grained fusion can be adaptively realized. Extensive experiments, including clinical brain tumor image fusion, demonstrate that SpectFusion outperforms other state-of-the-art methods both qualitatively and quantitatively. We show that SpectFusion can boost performance in downstream tasks such as multimodal medical image segmentation. The code is available at https://github.com/PlumW/SpectFusion.