Large Language Models for Diagnosis and Prognosis of Chronic Liver Diseases: A Systematic Review.
Authors
Affiliations (11)
Affiliations (11)
- Engelhardt School of Global Health and Bioethics Euclid University Bangui Central African Republic.
- Section of Digestive Diseases, Department of Medicine Yale University New Haven Connecticut USA.
- VA Connecticut Healthcare West Haven Connecticut USA.
- Ohio University Heritage College of Osteopathic Medicine Athens Ohio USA.
- Yale Liver Center Yale New Haven Health New Haven Connecticut USA.
- Yale International Medicine Program Yale University New Haven Connecticut USA.
- Department of Rehabilitation Montefiore Medical Center New York USA.
- Department of Medicine Morehouse School of Medicine Atlanta Georgia USA.
- School of Medicine Al Balqa' Applied University Salt Jordan.
- Department of Medicine Allegheny Health Network Pittsburgh Pennsylvania USA.
- Department of Medicine Yale New Haven Health, Bridgeport Hospital Connecticut USA.
Abstract
Chronic liver disease (CLD) affects more than 800 million people worldwide and remains a leading cause of morbidity and mortality. Artificial intelligence (AI), particularly machine learning, has been applied to hepatology for diagnostic and prognostic purposes. Large language models (LLMs) represent a new generation of AI with unique capabilities for processing unstructured clinical text, integrating multimodal inputs, and facilitating patient communication. Their role in CLD, however, has not been systematically reviewed. This systematic review was conducted in accordance with PRISMA guidelines and registered with PROSPERO (CRD420250650268). A literature search of five databases was performed using predefined keywords related to LLMs and CLD. Eligible studies included articles reporting diagnostic, prognostic, clinical decision support, or patient education applications of LLMs in CLD. A total of 18 studies published between 2023 and 2025 met the inclusion criteria. Studies spanned multiple regions, including the USA, Europe, China, South Asia, and Australia, and employed diverse designs. Evaluated models included ChatGPT-3.5/4, GPT-4o, Bard, Gemini, vision-enabled GPT, and retrieval-augmented frameworks. Applications clustered into four thematic domains: (1) diagnostics, including HCC detection from CT/MRI, CEUS LI-RADS classification, fibrosis staging from pathology text and histology, and MASLD identification from clinical/lab data; (2) prognosis, including cirrhosis phenotyping and fibrosis progression; (3) clinical decision support, with RAG-based systems improving HCV guideline interpretation and agent-based approaches generating guideline-concordant prescriptions; and (4) patient education, where LLMs achieved 70%-90% accuracy in HBV, MASLD, cirrhosis, and AIH queries, though readability and complexity limited patient-facing utility. LLMs show promising applications across the CLD spectrum, from diagnostics and prognostics to decision support and patient engagement. Current evidence is preliminary, largely retrospective, and heterogeneous. Rigorous prospective studies and careful integration strategies are required to ensure safe, effective, and equitable deployment in hepatology.