Back to all papers

Natural language generation in healthcare: A review of methods and applications.

February 10, 2026pubmed logopapers

Authors

Lyu M,Li X,Chen Z,Pan J,Peng C,Talankar S,Wu Y

Affiliations (2)

  • Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA.
  • Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Preston A. Wells, Jr. Center for Brain Tumor Therapy, Lillian S. Wells Department of Neurosurgery, University of Florida, Gainesville, FL, USA. Electronic address: [email protected].

Abstract

This study presents a systematic review of natural language generation (NLG) methods and applications in the medical domain, providing quantitative and qualitative analyses to answer four key research questions regarding methods, evaluation, applications, and challenges of NLG in healthcare. We searched PubMed, ACM Digital Library, Web of Science, Science Direct, Scopus, Embase, and ACL Anthology for NLG-related studies in healthcare from 2018 to 2024. Out of 3,988 research articles, 113 met the inclusion criteria and were analyzed across data modality, model architecture, evaluation metrics, and application domain. NLG in healthcare has grown substantially, with annual publications increasing from 2 in 2018 to 40 in 2024. Of the 113 included studies, text-to-text generation was the most common data modality (65.5%), followed by image-to-text (19.5%) and multimodal-to-text (15.0%). Transformer-based architectures were dominant, especially encoder-decoder models (61.6%). Automatic evaluation metrics such as ROUGE (81.4%) and BLEU (57.5%) were widely used. Human evaluation metrics, such as Likert scales (31.9%), were increasingly adopted. The four most prevalent application domains include summarization (e.g., discharge summaries, radiology reports), clinical documentation, medical dialogue, and data augmentation. The transformer-based large language models (LLMs) and the accumulation of large-scale multimodal clinical datasets have remarkably advanced NLG in healthcare. However, challenges remain in factual consistency, explainability, evaluation robustness, and AI safety. Addressing these challenges is essential for the adoption of NLG in various healthcare applications.

Topics

Journal ArticleReview

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.