RMT-match: an unsupervised 3D medical image registration network based on RMT and wavelet convolution.
Authors
Affiliations (2)
Affiliations (2)
- Business School, University of Shanghai for Science and Technology, Shanghai 200093, People's Republic of China.
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, People's Republic of China.
Abstract
Deformable image registration plays a crucial role in the field of medical image analysis. Although medical image registration (MIR) models based on vision transformer (ViT) can establish long-range dependencies between patches, their core component, self-attention, lacks important spatial priors. Meanwhile, the down-sampling operation similar to the U-Net structure in MIR tasks leads to the loss of spatial information. Traditional convolutional down-sampling is more sensitive to high-frequency information. Although high-frequency information such as image edges and details is more important for 3D MIR tasks, information containing low-frequency components like the overall image contour is also crucial during the multi-scale sampling process. To address these issues, this paper proposes a novel deformable MIR framework called RMT-Match. This framework extends the traditional Retentive Networks Meet vision transformers (RMT) structure to a 3D form. It refines the spatial attenuation matrix in RMT based on the Manhattan distance to enhance the self-attention mechanism, thereby introducing 3D spatial prior information. Moreover, it adopts the attention decomposition form of RMT to alleviate the burden of global modeling. In addition, this paper proposes a 3D wavelet convolutional down-sampling module and achieves multi-frequency responses to the input data to make up for the deficiencies of ordinary convolutional down-sampling. After thorough experimental validation on the IXI and OASIS datasets, the RMT network has proven to be a promising approach in the field of MIR. Compared to the traditional CNN-based VoxelMorph method, RMT achieved performance improvements of 5.3% on the IXI dataset and 2.7% on the OASIS dataset. Furthermore, compared to the state-of-the-art transformer-based method TransMatch, it achieved performance gains of 0.1% and 0.7% while reducing the parameter count by 40%, thereby balancing model performance with computational efficiency. These results further confirm the significant advantages and potential of this method for MIR tasks.