Back to all papers

EchoAtlas: A Conversational, Multi-View Vision-Language Foundation Model for Echocardiography Interpretation and Clinical Reasoning

March 17, 2026medrxiv logopreprint

Authors

Chao, C.-J.,Asadi, M.,Li, L.,Ramasamy, G.,Pecco, N.,Wang, Y.-C.,Poterucha, T.,Arsanjani, R.,Kane, G. C.,Oh, J. K.,Banerjee, I.,Langlotz, C. P.,Fei-Fei, L.,Adeli, E.,Erickson, B. J.

Affiliations (1)

  • Mayo Clinic, Rochester, MN

Abstract

Echocardiography is the most widely used cardiac imaging modality, yet artificial intelligence-enabled interpretation remains limited by the inability of existing models to integrate visual assessment, quantitative measurement, and clinical reasoning within a unified framework. Here we present EchoAtlas, the first autoregressive vision-language model developed for echocardiographic interpretation. Trained on over 12.9 million question-answer pairs derived from approximately 2 million echocardiogram videos, EchoAtlas achieves 0.966 accuracy on multiple-choice questions in our internal test set and establishes a new state-of-the-art on the public MIMIC-EchoQA benchmark (0.699 vs. 0.508 previously). EchoAtlas also provides accurate quantitative measurements, segment-level regional wall motion assessment, longitudinal comparison, and diagnostic reasoning across diverse question formats -- capabilities not previously demonstrated in this domain. These results highlight the potential of autoregressive vision-language models as a foundation for interactive echocardiographic interpretation, representing an early step toward scalable, auditable artificial intelligence systems in cardiology practice.

Topics

cardiovascular medicine

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.