Open-Source Offline-Deployable Retrieval-Augmented Large Language Model for Assisting Pancreatic Cancer Staging
Authors
Affiliations (1)
Affiliations (1)
- University of Yamanashi
Abstract
PurposeLarge language models (LLMs) are increasingly applied in radiology, but key challenges remain, including data leakage from cloud-based systems, false outputs, and limited reasoning transparency. This study aimed to develop an open-source, offline-deployable retrieval-augmented LLM (RA-LLM) system in which local execution prevents data leakage and retrieval-augmented generation (RAG) improves output accuracy and transparency using reliable external knowledge (REK), demonstrated in pancreatic cancer staging. Materials and MethodsLlama-3.2 11B and Gemma-3 27B were used as local LLMs, and GPT-4o mini served as a cloud-based comparator. The Japanese pancreatic cancer guideline served as REK. Relevant REK excerpts were retrieved to generate retrieval-augmented responses. System performance, including classification accuracy, retrieval metrics, and execution time, was evaluated on 100 simulated pancreatic cancer CT cases, with non-RAG LLMs as baselines. McNemar tests were applied to TNM staging and resectability classification. ResultsRAG improved TNM staging accuracy for all LLMs (GPT-4o mini 61%[->]90%, p<0.001; Llama-3.2 11B 53%[->]72%, p<0.001; Gemma-3 27B 59%[->]87%, p<0.001) and mildly improved resectability classification (72%[->]84%, p=0.012; 58%[->]73%, p=0.006; 77%[->]86%, p=0.093), with Gemma-3 27B showing performance comparable to GPT-4o mini. Retrieval performance was high (context recall = 1; context precision = 0.5-1), and local models ran at speeds comparable to the cloud-based GPT-4o mini. ConclusionWe developed an offline-deployable RA-LLM system for pancreatic cancer staging and publicly released its full source code. RA-LLMs outperformed baseline LLMs, and the offline-capable Gemma-3 27B performed comparably to the widely used cloud-based GPT-4o mini.