Multimodal Large Language Model With Knowledge Retrieval Using Flowchart Embedding for Forming Follow-Up Recommendations for Pancreatic Cystic Lesions.
Authors
Affiliations (1)
Affiliations (1)
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, 505 Parnassus Ave, San Francisco, CA 94143.
Abstract
<b>BACKGROUND</b>. The American College of Radiology (ACR) Incidental Findings Committee (IFC) algorithm provides guidance for pancreatic cystic lesion (PCL) management. Its implementation using plain-text large language model (LLM) solutions is challenging given that key components include multimodal data (e.g., figures and tables). <b>OBJECTIVE</b>. The purpose of the study is to evaluate a multimodal LLM approach incorporating knowledge retrieval using flowchart embedding for forming follow-up recommendations for PCL management. <b>METHODS</b>. This retrospective study included patients who underwent abdominal CT or MRI from September 1, 2023, to September 1, 2024, and whose report mentioned a PCL. The reports' Findings sections were inputted to a multimodal LLM (GPT-4o). For task 1 (198 patients: mean age, 69.0 ± 13.0 [SD] years; 110 women, 88 men), the LLM assessed PCL features (presence of PCL, PCL size and location, presence of main pancreatic duct communication, presence of worrisome features or high-risk stigmata) and formed a follow-up recommendation using three knowledge retrieval methods (default knowledge, plain-text retrieval-augmented generation [RAG] from the ACR IFC algorithm PDF document, and flowchart embedding using the LLM's image-to-text conversion for in-context integration of the document's flowcharts and tables). For task 2 (85 patients: mean initial age, 69.2 ± 10.8 years; 48 women, 37 men), an additional relevant prior report was inputted; the LLM assessed for interval PCL change and provided an adjusted follow-up schedule accounting for prior imaging using flowchart embedding. Three radiologists assessed LLM accuracy in task 1 for PCL findings in consensus and follow-up recommendations independently; one radiologist assessed accuracy in task 2. <b>RESULTS</b>. For task 1, the LLM with flowchart embedding had accuracy for PCL features of 98.0-99.0%. The accuracy of the LLM follow-up recommendations based on default knowledge, plain-text RAG, and flowchart embedding for radiologist 1 was 42.4%, 23.7%, and 89.9% (<i>p</i> < .001), respectively; radiologist 2 was 39.9%, 24.2%, and 91.9% (<i>p</i> < .001); and radiologist 3 was 40.9%, 25.3%, and 91.9% (<i>p</i> < .001). For task 2, the LLM using flowchart embedding showed an accuracy for interval PCL change of 96.5% and for adjusted follow-up schedules of 81.2%. <b>CONCLUSION</b>. Multimodal flowchart embedding aided the LLM's automated provision of follow-up recommendations adherent to a clinical guidance document. <b>CLINICAL IMPACT</b>. The framework could be extended to other incidental findings through the use of other clinical guidance documents as the model input.