EchoFM: Foundation Model for Generalizable Echocardiogram Analysis.
Authors
Abstract
Echocardiography is the first-line noninvasive cardiac imaging modality, providing rich spatio-temporal information on cardiac anatomy and physiology. Recently, foundation model trained on extensive and diverse datasets has shown strong performance in various downstream tasks. However, translating foundation models into the medical imaging domain remains challenging due to domain differences between medical and natural images, the lack of diverse patient and disease datasets. In this paper, we introduce EchoFM, a general-purpose vision foundation model for echocardiography trained on a large-scale dataset of over 20 million echocardiographic images from 6,500 patients. To enable effective learning of rich spatio-temporal representations from periodic videos, we propose a novel self-supervised learning framework based on a masked autoencoder with a spatio-temporal consistent masking strategy and periodic-driven contrastive learning. The learned cardiac representations can be readily adapted and fine-tuned for a wide range of downstream tasks, serving as a strong and flexible backbone model. We validate EchoFM through experiments across key downstream tasks in the clinical echocardiography workflow, leveraging public and multi-center internal datasets. EchoFM consistently outperforms SOTA methods, demonstrating superior generalization capabilities and flexibility. The code and checkpoints are available at: https://github.com/SekeunKim/EchoFM.git.