Back to all papers

Merlin: a computed tomography vision-language foundation model and dataset.

March 4, 2026pubmed logopapers

Authors

Blankemeier L,Kumar A,Cohen JP,Liu J,Liu L,Van Veen D,Gardezi SJS,Yu H,Paschali M,Chen Z,Delbrouck JB,Reis E,Holland R,Truyts C,Bluethgen C,Wu Y,Lian L,Jensen MEK,Ostmeier S,Varma M,Valanarasu JMJ,Fang Z,Huo Z,Nabulsi Z,Ardila D,Weng WH,Junior EA,Ahuja N,Fries J,Shah NH,Zaharchuk G,Willis M,Yala A,Johnston A,Boutin RD,Wentland A,Langlotz CP,Hom J,Gatidis S,Chaudhari AS

Affiliations (18)

  • Department of Electrical Engineering, Stanford University, Stanford, CA, USA.
  • Stanford Center for Artificial Intelligence in Medicine and Imaging, Stanford University, Stanford, CA, USA.
  • Department of Radiology, Stanford University, Stanford, CA, USA.
  • Computational Precision Health, University of California, Berkeley, Berkeley, CA, USA.
  • Department of Radiology, University of Wisconsin-Madison, Madison, WI, USA.
  • Department of Radiology, Hospital Israelita Albert Einstein, San Paulo, Brazil.
  • Department of Radiology, University Hospital Zurich, Zurich, Switzerland.
  • Department of Medical Imaging and Intervention, Chang Gung Memorial Hospital at Linkou, Taoyuan City, Taiwan.
  • Department of Electrical Engineering and Computer Science, University of California, Berkeley, Berkeley, CA, USA.
  • Department of Computer Science, Stanford University, Stanford, CA, USA.
  • Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
  • Google, Mountain View, CA, USA.
  • Department of Medicine, Stanford University, Stanford, CA, USA.
  • Stanford Center for Artificial Intelligence in Medicine and Imaging, Stanford University, Stanford, CA, USA. [email protected].
  • Department of Radiology, Stanford University, Stanford, CA, USA. [email protected].
  • Department of Biomedical Data Science, Stanford University, Stanford, CA, USA. [email protected].
  • Stanford Cardiovascular Institute, Stanford, CA, USA. [email protected].
  • Weill Cancer Hub West, Stanford, CA, USA. [email protected].

Abstract

The large volume of abdominal computed tomography (CT) scans<sup>1,2</sup> coupled with the shortage of radiologists<sup>3-6</sup> have intensified the need for automated medical image analysis tools. Previous state-of-the-art approaches for automated analysis leverage vision-language models (VLMs) that jointly model images and radiology reports<sup>7-12</sup>. However, current medical VLMs are generally limited to 2D images and short reports. Here to overcome these shortcomings for abdominal CT interpretation, we introduce Merlin, a 3D VLM that learns from volumetric CT scans, electronic health record data and radiology reports. This approach is enabled by a multistage pretraining framework that does not require additional manual annotations. We trained Merlin using a high-quality clinical dataset of paired CT scans (>6 million images from 15,331 CT scans), diagnosis codes (>1.8 million codes) and radiology reports (>6 million tokens). We comprehensively evaluated Merlin on 6 task types and 752 individual tasks that covered diagnostic, prognostic and quality-related tasks. The non-adapted (off-the-shelf) tasks included zero-shot classification of findings (30 findings), phenotype classification (692 phenotypes) and zero-shot cross-modal retrieval (image-to-findings and image-to-impression). The model-adapted tasks included 5-year chronic disease prediction (6 diseases), radiology report generation and 3D semantic segmentation (20 organs). We validated Merlin at scale, with internal testing on 5,137 CT scans and external testing on 44,098 CT scans from 3 independent sites and 2 public datasets. The results demonstrated high generalization across institutions and anatomies. Merlin outperformed 2D VLMs, CT foundation models and off-the-shelf radiology models. We also computed scaling laws and conducted ablation studies to identify optimal training strategies. We release our trained models, code and dataset for 25,494 pairs of abdominal CT scans and radiology reports. Our results demonstrate how Merlin may assist in the interpretation of abdominal CT scans and mitigate the burden on radiologists while simultaneously adding value for future biomarker discovery and disease risk stratification.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.