Multimodal Large Language Model With Knowledge Retrieval Using Flowchart Embedding for Forming Follow-Up Recommendations for Pancreatic Cystic Lesions.

Authors

Zhu Z,Liu J,Hong CW,Houshmand S,Wang K,Yang Y

Affiliations (1)

  • Department of Radiology and Biomedical Imaging, University of California, San Francisco, 505 Parnassus Ave, San Francisco, CA 94143.

Abstract

<b>BACKGROUND</b>. The American College of Radiology (ACR) Incidental Findings Committee (IFC) algorithm provides guidance for pancreatic cystic lesion (PCL) management. Its implementation using plain-text large language model (LLM) solutions is challenging given that key components include multimodal data (e.g., figures and tables). <b>OBJECTIVE</b>. The purpose of the study is to evaluate a multimodal LLM approach incorporating knowledge retrieval using flowchart embedding for forming follow-up recommendations for PCL management. <b>METHODS</b>. This retrospective study included patients who underwent abdominal CT or MRI from September 1, 2023, to September 1, 2024, and whose report mentioned a PCL. The reports' Findings sections were inputted to a multimodal LLM (GPT-4o). For task 1 (198 patients: mean age, 69.0 ± 13.0 [SD] years; 110 women, 88 men), the LLM assessed PCL features (presence of PCL, PCL size and location, presence of main pancreatic duct communication, presence of worrisome features or high-risk stigmata) and formed a follow-up recommendation using three knowledge retrieval methods (default knowledge, plain-text retrieval-augmented generation [RAG] from the ACR IFC algorithm PDF document, and flowchart embedding using the LLM's image-to-text conversion for in-context integration of the document's flowcharts and tables). For task 2 (85 patients: mean initial age, 69.2 ± 10.8 years; 48 women, 37 men), an additional relevant prior report was inputted; the LLM assessed for interval PCL change and provided an adjusted follow-up schedule accounting for prior imaging using flowchart embedding. Three radiologists assessed LLM accuracy in task 1 for PCL findings in consensus and follow-up recommendations independently; one radiologist assessed accuracy in task 2. <b>RESULTS</b>. For task 1, the LLM with flowchart embedding had accuracy for PCL features of 98.0-99.0%. The accuracy of the LLM follow-up recommendations based on default knowledge, plain-text RAG, and flowchart embedding for radiologist 1 was 42.4%, 23.7%, and 89.9% (<i>p</i> < .001), respectively; radiologist 2 was 39.9%, 24.2%, and 91.9% (<i>p</i> < .001); and radiologist 3 was 40.9%, 25.3%, and 91.9% (<i>p</i> < .001). For task 2, the LLM using flowchart embedding showed an accuracy for interval PCL change of 96.5% and for adjusted follow-up schedules of 81.2%. <b>CONCLUSION</b>. Multimodal flowchart embedding aided the LLM's automated provision of follow-up recommendations adherent to a clinical guidance document. <b>CLINICAL IMPACT</b>. The framework could be extended to other incidental findings through the use of other clinical guidance documents as the model input.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.