Cognitively Biased Prompt Effects on Large Language Model Accuracy for Radiology Board-Style Examination Questions.

April 15, 2026

papers

DOI: 10.1148/ryai.250585 PMID: 41983923

Authors

Dietrich NT,Patel D,Bellissimo J,Loh CT,Tyrrell PN

Affiliations (7)

Temerty Faculty of Medicine, University of Toronto, 1 King's College Cir, Toronto, ON, Canada M5S 1A8.
Institute of Biomedical Engineering, University of Toronto, Toronto, ON, Canada.
School of Medicine, Queen's University, Kingston, ON, Canada.
Department of ??????, University of North Dakota School of Medicine and Health Sciences, Grand Forks, ND.
Department of Medical Imaging, University of Toronto, Toronto, ON, Canada.
Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada.
Institute of Medical Science, University of Toronto, Toronto, ON, Canada.

Abstract

Large language models (LLMs) are increasingly explored for radiology-related applications, yet their vulnerability to cognitive biases remains undercharacterized. The aim of this study was to investigate whether targeted prompts exploiting cognitive biases degrade LLM accuracy on radiology board-style questions. Ten contemporary LLMs were evaluated on 200 text-based and 200 multimodal American Board of Radiology examination-style questions under baseline and three cognitive bias prompts: authority bias prompts (ABPs), complexity bias prompts (CBPs), and anchoring bias prompts (AnBPs). Two mitigation approaches, a prompt bias audit and a one-shot mitigation strategy, were also evaluated. Under baseline prompts, models achieved a mean accuracy of 84.8 ± 5.5% (154-186 of 200) for text-based and 59.5 ± 7.7% (101-143 of 200) for multimodal questions. All models showed reduced accuracy to cognitively biased prompts, with ABP, CBP, and AnBP yielding absolute declines of 21.1%, 10.1%, and 4.4%, respectively, for text questions (<i>P</i> < .001 for each), and 44.9%, 44.4%, 39.6%, respectively, for multimodal questions (<i>P</i> < .001 for each). The prompt bias audit increased accuracy by 5.6% for text-based and 15.8% for multimodal questions, while the one-shot mitigation yielded gains of 4.0% for text questions and 24.9% for multimodal questions. These findings demonstrate that LLMs are susceptible to cognitively biased inputs. ©RSNA, 2026.

View Source Full Text PDF

Topics

Journal Article

Cognitively Biased Prompt Effects on Large Language Model Accuracy for Radiology Board-Style Examination Questions.

Authors

Affiliations (7)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?