Back to all papers

Application of generative artificial intelligence to utilize unstructured clinical data for acceleration of inflammatory bowel disease research.

October 31, 2025pubmed logopapers

Authors

Kadhim AZ,Green Z,Nazari I,Baker J,George M,Heinson A,Vadgama B,Stammers M,Kipps CM,Beattie RM,Ashton JJ,Ennis S

Affiliations (11)

  • Department of Human Genetics and Genomic Medicine, University of Southampton, Southampton SO16 6YD, UK; National Institute for Health Research (NIHR) Southampton Biomedical Research Centre, Southampton SO16 6YD, UK. Electronic address: [email protected].
  • Department of Human Genetics and Genomic Medicine, University of Southampton, Southampton SO16 6YD, UK; Department of Paediatric Gastroenterology, Southampton Children's Hospital, Southampton SO16 6YD, UK.
  • Department of Human Genetics and Genomic Medicine, University of Southampton, Southampton SO16 6YD, UK.
  • Department of Paediatric Gastroenterology, Southampton Children's Hospital, Southampton SO16 6YD, UK.
  • Clinical Informatics Research Unit, University Hospital Southampton NHS Trust, Southampton SO16 6YD, UK; Southampton Emerging Therapies and Technologies (SETT) Centre, University Hospital Southampton NHS Trust, Southampton SO16 6YD, UK.
  • Clinical Informatics Research Unit, University Hospital Southampton NHS Trust, Southampton SO16 6YD, UK.
  • Department of Histopathology, University Hospital Southampton NHS Trust, Southampton SO16 6YD, UK.
  • Clinical Informatics Research Unit, University Hospital Southampton NHS Trust, Southampton SO16 6YD, UK; Southampton Emerging Therapies and Technologies (SETT) Centre, University Hospital Southampton NHS Trust, Southampton SO16 6YD, UK; Department of Gastroenterology, University Hospital Southampton NHS Trust, Southampton SO16 6YD, UK.
  • Southampton Emerging Therapies and Technologies (SETT) Centre, University Hospital Southampton NHS Trust, Southampton SO16 6YD, UK; Department of Neurology, University Hospital Southampton NHS Trust, Southampton SO16 6YD, UK.
  • Department of Human Genetics and Genomic Medicine, University of Southampton, Southampton SO16 6YD, UK; Department of Paediatric Gastroenterology, Southampton Children's Hospital, Southampton SO16 6YD, UK. Electronic address: [email protected].
  • Department of Human Genetics and Genomic Medicine, University of Southampton, Southampton SO16 6YD, UK; National Institute for Health Research (NIHR) Southampton Biomedical Research Centre, Southampton SO16 6YD, UK. Electronic address: [email protected].

Abstract

Inflammatory bowel disease (IBD) research is a dynamic field. However, the growing volume of electronic health records (EHRs) and research data presents significant challenges. Traditional methods for structuring unstructured EHRs are labor-intensive and lack scalability. Large language models (LLMs) may present a solution, however, their usefulness in data standardization in the context of IBD remains unknown. We sought to evaluate LLMs in structuring free-text histology and radiology reports from IBD patients (n = 32,041), compare their performance to manual clinician curation, and assess the usefulness of fine-tuning and retrieval-augmented generation (RAG). We developed an IBD-specialized LLM-based framework utilizing structured prompt engineering and fine-tuning. Free-text reports from two independent sites were manually curated and processed using various LLMs (n = 120). Overall, Llama 3.3 achieved the highest F1 scores for histology and imaging (1.00 ± 0 and 0.85 ± 0.29, respectively) in extracting findings and anatomical regions, surpassing other models in structured data generation. Fine-tuning improved the performance of the smaller Llama 3.1 8B model for imaging reports (0.70 ± 0.46 vs. 0.82 ± 0.35), enabling better extraction with reduced computational requirements. Our findings demonstrate the feasibility of LLM-based automated structuring of IBD-related medical records. Unstructured data from free-text reports can be reliably converted into standardized ontologies with location, severity, and qualifiers. These advancements enable scalable, privacy-compliant AI-driven solutions for data standardization. The Institute for Life Sciences, University of Southampton, the NIHR Southampton BRC, and EPSRC (EP/Y01720X/1).

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.