Back to all papers

Cognitively Biased Prompt Effects on Large Language Model Accuracy for Radiology Board-Style Examination Questions.

April 15, 2026pubmed logopapers

Authors

Dietrich NT,Patel D,Bellissimo J,Loh CT,Tyrrell PN

Affiliations (7)

  • Temerty Faculty of Medicine, University of Toronto, 1 King's College Cir, Toronto, ON, Canada M5S 1A8.
  • Institute of Biomedical Engineering, University of Toronto, Toronto, ON, Canada.
  • School of Medicine, Queen's University, Kingston, ON, Canada.
  • Department of ??????, University of North Dakota School of Medicine and Health Sciences, Grand Forks, ND.
  • Department of Medical Imaging, University of Toronto, Toronto, ON, Canada.
  • Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada.
  • Institute of Medical Science, University of Toronto, Toronto, ON, Canada.

Abstract

Large language models (LLMs) are increasingly explored for radiology-related applications, yet their vulnerability to cognitive biases remains undercharacterized. The aim of this study was to investigate whether targeted prompts exploiting cognitive biases degrade LLM accuracy on radiology board-style questions. Ten contemporary LLMs were evaluated on 200 text-based and 200 multimodal American Board of Radiology examination-style questions under baseline and three cognitive bias prompts: authority bias prompts (ABPs), complexity bias prompts (CBPs), and anchoring bias prompts (AnBPs). Two mitigation approaches, a prompt bias audit and a one-shot mitigation strategy, were also evaluated. Under baseline prompts, models achieved a mean accuracy of 84.8 ± 5.5% (154-186 of 200) for text-based and 59.5 ± 7.7% (101-143 of 200) for multimodal questions. All models showed reduced accuracy to cognitively biased prompts, with ABP, CBP, and AnBP yielding absolute declines of 21.1%, 10.1%, and 4.4%, respectively, for text questions (<i>P</i> < .001 for each), and 44.9%, 44.4%, 39.6%, respectively, for multimodal questions (<i>P</i> < .001 for each). The prompt bias audit increased accuracy by 5.6% for text-based and 15.8% for multimodal questions, while the one-shot mitigation yielded gains of 4.0% for text questions and 24.9% for multimodal questions. These findings demonstrate that LLMs are susceptible to cognitively biased inputs. ©RSNA, 2026.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.