Evaluation of GPT-4o for multilingual translation of radiology reports across imaging modalities.
Terzis R, Salam B, Nowak S, Mueller PT, Mesropyan N, Oberlinkels L, Efferoth AF, Kravchenko D, Voigt M, Ginzburg D, Pieper CC, Hayawi M, Kuetting D, Afat S, Maintz D, Luetkens JA, Kaya K, Isaak A
•papers•Jul 29 2025Large language models (LLMs) like GPT-4o offer multilingual and real-time translation capabilities. This study aims to evaluate GPT-4o's effectiveness in translating radiology reports into different languages. In this experimental two-center study, 100 real-world radiology reports from four imaging modalities (X-ray, ultrasound, CT, MRI) were randomly selected and fully anonymized. Reports were translated using GPT-4o with zero-shot prompting from German into four languages including English, French, Spanish, and Russian (n = 400 translations). Eight bilingual radiologists (two per language) evaluated the translations for general readability, overall quality, and utility for translators using 5-point Likert scales (ranging from 5 [best score] to 1 [worst score]). Binary questions (yes/no) were conducted to evaluate potential harmful errors, completeness, and factual correctness. The average processing time of GPT-4o for translating reports ranged from 9 to 24 s. The overall quality of translations achieved a median of 4.5 (IQR 4-5), with English (5 [4,5]), French and Spanish (each 4.5 [4,5]) significantly outperforming Russian (4 [3.5-4]; each p < 0.05). Usefulness for translators was rated highest for English (5 [5-5], p < 0.05 against other languages). Readability scores and translation completeness were significantly higher for translations into Spanish, English and French compared to Russian (each p < 0.05). Factual correctness averaged 79 %, with English (84 %) and French (83 %) outperforming Russian (69 %) (each p < 0.05). Potentially harmful errors were identified in 4 % of translations, primarily in Russian (9 %). GPT-4o demonstrated robust performance in translating radiology reports across multiple languages, with limitations observed in Russian translations.