Back to all papers

Multimodal foundation models exploit text to make medical image predictions.

June 12, 2026pubmed logopapers

Authors

Buckley TA,Diao JA,Srivastava CN,Brodeur PG,Rajpurkar P,Rodman A,Manrai AK

Affiliations (4)

  • Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
  • Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA.
  • Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA.
  • Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA. [email protected].

Abstract

Multimodal foundation models have shown compelling but conflicting performance in medical image interpretation. However, the ways in which these models integrate and prioritize different data modalities, including images and text, remain poorly understood. Here we evaluate 8 proprietary and open-source multimodal foundation models using 1090 multimodal medical cases. We show that image predictions are largely driven by text, with accuracy increasing monotonically with the amount of informative text. Exploitation of text is a double-edged sword; even mild suggestions of an incorrect diagnosis in text diminish image-based classification, dramatically reducing performance in cases the model could previously answer using images alone-o3 accuracy fell from 84% to 28% when a misleading clinical vignette was introduced. In physician evaluations of long-form cases, adding images reduces or does not improve performance when text is highly informative (e.g., GPT-4V showed decreased accuracy when images were added to highly informative text across 69 clinicopathological conferences). Our results suggest that multimodal AI models may be useful in medical diagnostic reasoning but that their accuracy is largely driven, for better and worse, by text.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.