OnUVS: An Online Motion Transfer Framework with Content-Texture Decoupling for High-Fidelity Ultrasound Video Synthesis.
Authors
Abstract
Ultrasound (US) imaging plays a crucial role in diagnosing heart and pelvic diseases, where sonographers tend to evaluate dynamic motion and structure. However, the scarcity of US videos for rare cases limits training opportunities for novice sonographers and deep learning models, hindering detection rates and clinical di agnostic applications. US video synthesis is a promising solution to this issue. Nevertheless, accurately imitating the intricate motion of the anatomy while preserving image f idelity presents a significant challenge. In this work, we propose OnUVS, a novel online feature-decoupling frame work for high-fidelity US video synthesis. First, to simulate realistic motion, we incorporate keypoints into anatomical learning through a weakly supervised training approach, which enhances motion representation and minimizes the need for fully annotated data. Second, we implement a dual decoder generator that effectively balances content and textural features of generated frames, significantly enhancing the image fidelity of US videos. Third, a multi-scale discriminator further refines the sharpness and fine details, ensuring high-fidelity video synthesis. Fourth, an online learning strategy is designed to smooth coherence between frames by constraining the keypoint trajectories during inference. Validation on echocardiographic and pelvic floor US datasets demonstrates that OnUVS outperforms existing methods, achieving a 22.08% improvement in motion consistency (FVD) and 25.04% in image fidelity (FID). To facilitate reproducibility, we publicly release the code of OnUVSat: https://github.com/LucyChen159/OnUVS.