Comparison between multimodal foundation models and radiologists for the diagnosis of challenging neuroradiology cases with text and images.

May 9, 2025pubmed logopapers

Authors

Le Guellec B,Bruge C,Chalhoub N,Chaton V,De Sousa E,Gaillandre Y,Hanafi R,Masy M,Vannod-Michel Q,Hamroun A,Kuchcinski G

Affiliations (7)

  • Department of Neuroradiology, CHU Lille, Salengro Hospital, Lille 59000, France; Université Lille, INSERM, CHU Lille, Institut Pasteur de Lille, U1167-RID-AGE - Facteurs de Risque et Déterminants Moléculaires des Maladies Liées au Vieillissement, Lille 59000, France; INSERM, U1172-LilNCog-Lille Neuroscience & Cognition, Université de Lille, Lille 59000, France. Electronic address: [email protected].
  • Department of Radiology, Lens Hospital, Lens 62300, France.
  • Department of Neuroradiology, CHU Lille, Salengro Hospital, Lille 59000, France.
  • Department of Neuroradiology, Saint Philibert Hospital, Lille 59160, France.
  • Department of Neuroradiology, Valenciennes Hospital, Valenciennes 59300, France.
  • Université Lille, INSERM, CHU Lille, Institut Pasteur de Lille, U1167-RID-AGE - Facteurs de Risque et Déterminants Moléculaires des Maladies Liées au Vieillissement, Lille 59000, France; Public Health - Epidemiology Department, CHU Lille, Maison Régionale de la Recherche Clinique, Lille 59000, France.
  • Department of Neuroradiology, CHU Lille, Salengro Hospital, Lille 59000, France; INSERM, U1172-LilNCog-Lille Neuroscience & Cognition, Université de Lille, Lille 59000, France.

Abstract

The purpose of this study was to compare the ability of two multimodal models (GPT-4o and Gemini 1.5 Pro) with that of radiologists to generate differential diagnoses from textual context alone, key images alone, or a combination of both using complex neuroradiology cases. This retrospective study included neuroradiology cases from the "Diagnosis Please" series published in the Radiology journal between January 2008 and September 2024. The two multimodal models were asked to provide three differential diagnoses from textual context alone, key images alone, or the complete case. Six board-certified neuroradiologists solved the cases in the same setting, randomly assigned to two groups: context alone first and images alone first. Three radiologists solved the cases without, and then with the assistance of Gemini 1.5 Pro. An independent radiologist evaluated the quality of the image descriptions provided by GPT-4o and Gemini for each case. Differences in correct answers between multimodal models and radiologists were analyzed using McNemar test. GPT-4o and Gemini 1.5 Pro outperformed radiologists using clinical context alone (mean accuracy, 34.0 % [18/53] and 44.7 % [23.7/53] vs. 16.4 % [8.7/53]; both P < 0.01). Radiologists outperformed GPT-4o and Gemini 1.5 Pro using images alone (mean accuracy, 42.0 % [22.3/53] vs. 3.8 % [2/53], and 7.5 % [4/53]; both P < 0.01) and the complete cases (48.0 % [25.6/53] vs. 34.0 % [18/53], and 38.7 % [20.3/53]; both P < 0.001). While radiologists improved their accuracy when combining multimodal information (from 42.1 % [22.3/53] for images alone to 50.3 % [26.7/53] for complete cases; P < 0.01), GPT-4o and Gemini 1.5 Pro did not benefit from the multimodal context (from 34.0 % [18/53] for text alone to 35.2 % [18.7/53] for complete cases for GPT-4o; P = 0.48, and from 44.7 % [23.7/53] to 42.8 % [22.7/53] for Gemini 1.5 Pro; P = 0.54). Radiologists benefited significantly from the suggestion of Gemini 1.5 Pro, increasing their accuracy from 47.2 % [25/53] to 56.0 % [27/53] (P < 0.01). Both GPT-4o and Gemini 1.5 Pro correctly identified the imaging modality in 53/53 (100 %) and 51/53 (96.2 %) cases, respectively, but frequently failed to identify key imaging findings (43/53 cases [81.1 %] with incorrect identification of key imaging findings for GPT-4o and 50/53 [94.3 %] for Gemini 1.5). Radiologists show a specific ability to benefit from the integration of textual and visual information, whereas multimodal models mostly rely on the clinical context to suggest diagnoses.

Topics

Journal Article
Get Started

Upload your X-ray image and get interpretation.

Upload now →

Disclaimer: X-ray Interpreter's AI-generated results are for informational purposes only and not a substitute for professional medical advice. Always consult a healthcare professional for medical diagnosis and treatment.