Back to all papers

Use of large language models for providing automated feedback in medical imaging education: a systematic review.

June 10, 2026pubmed logopapers

Authors

Al-Mashhadani MM,Ajaz F,Guraya SS,Ennab F

Affiliations (2)

  • College of Medicine (CoM), Mohammed Bin Rashid University of Medicine and Health Sciences (MBRU), Dubai Health, Dubai, United Arab Emirates.
  • Institute of Learning (IoL), Mohammed Bin Rashid University of Medicine and Health Sciences (MBRU), Dubai Health, Dubai, United Arab Emirates.

Abstract

Large language models (LLMs) are an emerging form of generative artificial intelligence (AI) with promising applications in medical education, and their ability to provide automated feedback may enhance medical imaging education for trainees. This review aims to systematically examine and synthesize the published literature on the use of LLMs in providing automated feedback in medical imaging education. We conducted this systematic review in accordance with the PRISMA 2020 guidelines. A comprehensive search of the PubMed, Scopus, and Embase databases was conducted, covering studies published through January 2026. Our search strategy included keywords related to "feedback, generative artificial intelligence, large language models, radiology, and medical imaging." Studies were eligible if they examined the use of LLMs to generate automated feedback for medical trainees within medical imaging education. Extracted data were synthesized using descriptive synthesis, with quality appraisal assessed using ROBINS-I and GRADE. Of 1,003 identified records, 7 met the inclusion criteria. All studies examined the applications of automated LLM feedback in the medical education of radiology residents, with one study also including fellows. Reported educational outcomes included enhanced report quality, improved diagnostic accuracy, and increased efficiency in discrepancy detection. LLM feedback was generally well-received among trainees, with learners expressing satisfaction with the LLM feedback and preferring a hybrid human-AI feedback model. Additionally, fine-tuned models generally showed stronger performance than general-purpose LLMs and demonstrated variable agreement with expert-human consensus. LLMs show a potentially promising role as supportive tools for providing automated feedback in medical imaging education, alongside human feedback. This includes reported gains in accuracy, efficiency, and learner satisfaction. However, the current published evidence is preliminary and limited. Larger multicenter studies with standardized methods are necessary before widespread adoption can be justified. Our systematic review emphasizes that human expert oversight remains essential, as the current evidence supports preliminary technical feasibility, but not yet definitive educational effectiveness. https://www.crd.york.ac.uk/PROSPERO/view/CRD420251081394, Identifier CRD420251081394.

Topics

Journal ArticleSystematic Review

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.