LADDA: Latent Diffusion-based Domain-adaptive Feature Disentangling for Unsupervised Multi-modal Medical Image Registration.
Authors
Abstract
Deformable image registration (DIR) is critical for accurate clinical diagnosis and effective treatment planning. However, patient movement, significant intensity differences, and large breathing deformations hinder accurate anatomical alignment in multi-modal image registration. These factors exacerbate the entanglement of anatomical and modality-specific style information, thereby severely limiting the performance of multi-modal registration. To address this, we propose a novel LAtent Diffusion-based Domain-Adaptive feature disentangling (LADDA) framework for unsupervised multi-modal medical image registration, which explicitly addresses the representation disentanglement. First, LADDA extracts reliable anatomical priors from the Latent Diffusion Model (LDM), facilitating downstream content-style disentangled learning. A Domain-Adaptive Feature Disentangling (DAFD) module is proposed to promote anatomical structure alignment further. This module disentangles image features into content and style information, boosting the network to focus on cross-modal content information. Next, a Neighborhood-Preserving Hashing (NPH) is constructed to further perceive and integrate hierarchical content information through local neighbourhood encoding, thereby maintaining cross-modal structural consistency. Furthermore, a Unilateral-Query-Frozen Attention (UQFA) module is proposed to enhance the coupling between upstream prior and downstream content information. The feature interaction within intra-domain consistent structures improves the fine recovery of detailed textures. The proposed framework is extensively evaluated on large-scale multi-center datasets, demonstrating superior performance across diverse clinical scenarios and strong generalization on out-of-distribution (OOD) data.