Back to all papers

Cultural bias in large language models' ability to follow neuroradiology guidelines.

May 26, 2026pubmed logopapers

Authors

Bazerbachi N,Bentegeac R,Pistilli G,Kim SH,Amouyel P,Pruvo JP,Hacein-Bey L,Hamroun A,Kuchcinski G,Le Guellec B

Affiliations (10)

  • Neuroradiology Department, Lille University Hospital, Lille, France.
  • UMR1167 RID-AGE, Pasteur Institute of Lille, Inserm, Lille University, Lille University Hospital Center, Lille, France.
  • Department of Public Health and Epidemiology, Lille University Hospital, Lille University, Lille, France.
  • UMR 8011 - Sciences Normes Démocratie (SND), Sorbonne Université, CNRS, Paris, France.
  • Radiology Department, Technical University Munich, Munich, Germany.
  • INSERM, U1172-LilNCog-Lille Neuroscience & Cognition, Université de Lille, Lille, France.
  • Radiology Department, Stanford School of Medicine, Palo Alto, CA, USA.
  • Neuroradiology Department, Lille University Hospital, Lille, France. [email protected].
  • UMR1167 RID-AGE, Pasteur Institute of Lille, Inserm, Lille University, Lille University Hospital Center, Lille, France. [email protected].
  • INSERM, U1172-LilNCog-Lille Neuroscience & Cognition, Université de Lille, Lille, France. [email protected].

Abstract

Large language models (LLMs) are increasingly explored as decision-support tools in medical imaging. However, their ability to align with country-specific guidelines, which often diverge, remains uncertain. We set out to evaluate the geographic neutrality of three state-of-the-art LLMs-GPT-o3, Mistral Large, and DeepSeek R1-and a biomedical LLM (MedGemma 1.5 4B), when applied to neuroradiology scenarios with conflicting U.S. and non-U.S. Vignettes derived from contradictory international guidelines were presented to each model under two conditions: an implicit setting, where no guideline was specified and vignettes were provided in English and French; and an explicit setting, where prompts directed models to follow a named guideline. Performance was reviewed against the target guideline, and mitigation strategies were tested. Thirty clinical vignettes presenting conflicting guidelines were evaluated by GPT-o3, Mistral Large, and DeepSeek R1. In the implicit setting, all models favored U.S. guidelines, with GPT-o3, Mistral, and DeepSeek aligning with them in 27 of 30 scenarios (90.0%; 95% CI, 74.4-96.5). In the explicit setting, adherence declined sharply for non-U.S. recommendations for all models. Providing the complete guideline text was the most effective mitigation strategy, restoring accuracies above 90% across all models. Across languages and model origins, LLMs exhibited a systematic bias toward U.S. neuroradiology guidelines, even when explicitly instructed otherwise. This U.S.-centrism likely reflects training data imbalances and raises concerns for safe global deployment. Strategies for local contextualization, such as guideline integration at deployment, are necessary to ensure context-appropriate clinical decision support. Question Do large language models display geographical neutrality in neuroradiology decision support? Findings Even models developed in France and China systematically preferred United States guidelines, aligning with them in most implicit scenarios while failing to follow explicit guidelines from other sources. Clinical relevance This systematic United States-centric bias poses clinical and legal risks for global deployment. Safe implementation requires specific localization strategies, such as providing full guideline texts, to ensure recommendations align with local practice standards.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.