Back to all papers

Large language models for simplifying radiology reports: a systematic review and meta-analysis of patient, public, and clinician evaluations.

February 16, 2026pubmed logopapers

Authors

Alabed S,Anderson A,Maiter A,Hughes A,McAnenly N,Salehi M,Sharkey M,Dwivedi K,Hokmabadi A,Alahdab F,Stevenson M,Ma N,Gaizauskas R,Chico TJ,Swift AJ,Li JJ,Kleesiek J,Langlotz C

Affiliations (8)

  • School of Medicine and Population Health, Institute for In Silico Medicine, National Institute for Health and Care Research, University of Sheffield, Sheffield, UK; Department of Clinical Radiology, Sheffield Teaching Hospitals, Sheffield, UK. Electronic address: [email protected].
  • School of Medicine and Population Health, Institute for In Silico Medicine, National Institute for Health and Care Research, University of Sheffield, Sheffield, UK.
  • School of Medicine and Population Health, Institute for In Silico Medicine, National Institute for Health and Care Research, University of Sheffield, Sheffield, UK; Department of Clinical Radiology, Sheffield Teaching Hospitals, Sheffield, UK.
  • School of Computer Science, University of Sheffield, Sheffield, UK.
  • Department of Biomedical Informatics, Biostatistics, and Medical Epidemiology and Department of Cardiology, University of Missouri, Columbia, MO, USA.
  • Department of Linguistics, University of Texas at Austin, Austin, TX, USA.
  • Institute for AI in Medicine, University Medicine Essen, Essen, Germany.
  • Department of Radiology, Department of Medicine, and Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA.

Abstract

Radiology reports are typically written in language that is difficult for patients to understand. Large language models (LLMs) excel at simplifying text. We aimed to evaluate the ability of LLMs to improve the understanding of radiology reports. In this systematic review and meta-analysis, we searched CENTRAL, MEDLINE, and Embase from inception to Nov 11, 2025, without restrictions on language. Full-text articles and preprints were considered for inclusion. Eligible studies applied LLMs to simplify radiology reports and had these reports assessed by members of the public or medical professionals. We excluded studies that focused solely on dialogues with interactive chatbots, preimaging leaflets, educational materials, appointment letters, or summarising findings without simplifying them for patients. Search results were screened independently by two authors and full-text review and data extraction were done by three authors; disagreements were resolved by consensus. The main outcomes were patient, public, and clinician evaluations (Likert scores) and text readability metrics. We assessed study quality with the MAIC-10 tool. This study was registered with PROSPERO (CRD420251027489). We identified 2385 records, of which 38 studies were eligible. These 38 studies generated 12 922 simplified reports, assessed by 508 evaluators (387 lay people and 121 clinicians). 35 (92%) of 38 studies used OpenAI GPT models and 29 (76%) produced simplified reports in English. Patients perceived LLM-rewritten reports as significantly more understandable than radiologist reports (mean Likert score 4·04 [SD 1·20] for simplified reports vs 2·16 [SD 0·94] for original reports; mean difference 2·00 [95% CI 1·54-2·46]). Clinicians rated LLM-rewritten reports highly for accuracy (mean 4·45 [95% CI 4·27-4·63]; 27 studies) and completeness (mean 4·53 [95% CI 4·30-4·76]; 14 studies). Readability was improved across imaging modalities, with lower Flesch-Kincaid Grade Level for LLM-rewritten reports, including a mean difference of -6·20 (95% CI -6·91 to -5·48) for CT, -5·07 (-5·99 to -4·15) for x-ray, and -5·0 (-6·0 to -4·0) for MRI. The error rate in LLM-rewritten reports was 7·2% (95% CI 5·1%-10·0%; 13 studies) and 0·9% (95% CI 0·6-1·5%; 2 studies) for clinically significant errors. LLM-simplified radiology reports improved patient-perceived understanding and readability and were rated by clinicians as largely accurate and complete, although a small proportion contained clinically significant errors. LLM-based simplification shows promise for making radiology communication more patient-centred, but further evaluation of its effect on patient outcomes and clinical workflows is required. National Institute for Health and Care Research Sheffield Biomedical Research Centre.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.