Cross-Modality Image Registration Via Generating Aligned Image Using Reference-Augmented Framework.
Authors
Abstract
Aligning a pair of cross-modality images (e.g., MR-CT, CBCT-CT) is important, yet conventional approaches, including registration or Image-to-Image (I2I) translation methods often have limitations. To overcome these challenges, we introduce a "Register by Generation (RbG)" framework, a novel 2D deep learning approach designed to generate images that are structurally well-aligned with the fixed image while preserving the detailed intensity and contrast of the moving image, which we refer to as the reference image. Our approach operates in two sequential key stages: first, we employ a novel semi-global reference-augmented image synthesis network incorporating Patch Adaptive Instance Normalization (PAdaIN). This method leverages a down-sampled reference image to guide local adaptive synthesis, generating a more accurately aligned image with a reduced risk of hallucinations. In the second stage, we introduce a detailed refining reference-augmented network featuring a Deformation-Aware Cross-Attention (DACA) block, which aims to recover finer details and textures that may be missing from the initial stage. This unique component (DACA block) enables the transfer of corresponding relevant features from the reference image, effectively performing a "copy-and-paste" operation within the latent feature space. Additionally, we propose a novel combination of loss functions that enables self-supervised training on misaligned datasets, eliminating the need for pre-aligned data. We rigorously evaluate our method on multiple misaligned datasets using metrics focused on structural alignment and distributional consistency, demonstrating comprehensively superior performance. Furthermore, we test its robustness by simulating intentional misalignments in a well-aligned dataset. Additionally, experiments from a case study and downstream segmentation tasks highlight the broad applicability of our approach.