Comprehensive echocardiogram evaluation with view primed vision language AI.
Authors
Affiliations (10)
Affiliations (10)
- Department of Cardiology, Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA.
- Department of Bioengineering, University of California Los Angeles, Los Angeles, CA, USA.
- Department of Emergency Medicine, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung, Taiwan.
- Division of Cardiology, Department of Medicine, Stanford University, Palo Alto, CA, USA.
- Department of Medicine, University of California, San Francisco, CA; Division of Cardiology, San Francisco Veterans Affairs Medical Center, San Francisco, CA, USA.
- Division of Cardiology, Department of Medicine, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung, Taiwan.
- Biomedical Imaging Research Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA.
- Department of Computer Science, Stanford University, Stanford, CA, USA. [email protected].
- Department of Cardiology, Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA. [email protected].
- Division of Research, Kaiser Permanente Northern California, Pleasanton, CA, USA. [email protected].
Abstract
Echocardiography is the most widely used cardiac imaging modality, capturing ultrasound video data to assess cardiac structure and function<sup>1</sup>. Artificial intelligence (AI) in echocardiography has the potential to streamline manual tasks and improve reproducibility and precision<sup>2</sup>. However, most echocardiography AI models are single-view, single-task systems that do not synthesize complementary information from multiple views captured during a full exam<sup>3,4</sup>, and thus lead to limited performance and scope of applications. To address this problem, we introduce EchoPrime, a multi-view, view-informed, video-based vision-language foundation model trained on over 12 million video-report pairs. EchoPrime uses contrastive learning to train a unified embedding model for all standard views in a comprehensive echocardiogram study with representation of both rare and common diseases and diagnoses. EchoPrime then utilizes view-classification and a view-informed anatomic attention module to weight video-specific embeddings that accurately map the relationship between echocardiographic views and anatomical structures. With retrieval-augmented interpretation, EchoPrime integrates information from all echocardiogram videos in a comprehensive study and performs holistic clinical interpretation. In datasets from five international independent healthcare systems, EchoPrime achieves state-of-the art performance on 23 diverse benchmarks of cardiac form and function, surpassing the performance of both task-specific approaches and prior foundation models. Following rigorous clinical evaluation, EchoPrime can assist physicians in the automated preliminary assessment of comprehensive echocardiography.