MoNetV2: Enhanced Motion Network for Freehand 3-D Ultrasound Reconstruction.
Authors
Abstract
Three-dimensional ultrasound (US) aims to provide sonographers with the spatial relationships of anatomical structures, playing a crucial role in clinical diagnosis. Recently, deep-learning-based freehand 3-D US has made significant advancements. It reconstructs volumes by estimating transformations between images without external tracking. However, image-only reconstruction poses difficulties in reducing cumulative drift and further improving reconstruction accuracy, particularly in scenarios involving complex motion trajectories. In this context, we propose an enhanced motion network (MoNetV2) to enhance the accuracy and generalizability of reconstruction under diverse scanning velocities and tactics. First, we propose a sensor-based temporal and multibranch structure (TMS) that fuses image and motion information from a velocity perspective to improve image-only reconstruction accuracy. Second, we devise an online multilevel consistency constraint (MCC) that exploits the inherent consistency of scans to handle various scanning velocities and tactics. This constraint exploits scan-level velocity consistency (SVC), path-level appearance consistency (PAC), and patch-level motion consistency (PMC) to supervise interframe transformation estimation. Third, we distill an online multimodal self-supervised strategy (MSS) that leverages the correlation between network estimation and motion information to further reduce cumulative errors. Extensive experiments clearly demonstrate that MoNetV2 surpasses existing methods in both reconstruction quality and generalizability performance across three large datasets.