Feasibility of retrieval-augmented generation for large language models with Japanese input in radiotherapy.
Authors
Affiliations (2)
Affiliations (2)
- Department of Radiation Oncology, Tohoku University Graduate School of Medicine, 1-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8574, Japan.
- Division of Medical Physics, School of Medical Sciences, Fujita Health University, 1-98, Dengakugakubo, Kutsukake, Toyoake, Aichi 470-1192, Japan.
Abstract
Large language models (LLMs) have recently gained attention for their potential. However, concerns remain regarding their reliability due to limitations such as hallucinations and insufficient domain-specific knowledge. Retrieval-augmented generation (RAG) has emerged as a promising approach, enabling LLMs to reference external knowledge sources and generate accurate outputs. We aimed to clarify the potential of RAG-enhanced LLMs with Japanese input in the field of radiotherapy. This was assessed by evaluating performance on three certification examinations in Japan: the Japanese Medical Physicist Examination, the Japanese Board Examination for Radiologists, and the Japanese Board Examination for Radiation Oncologists. In this study, we constructed a RAG system named Rad-Hub, consisting of a Japanese Radiotherapy Knowledge Database (JRKD) and a retrieval framework built on Microsoft Azure. The JRKD was populated with 32 Japanese radiotherapy textbooks and clinical guidelines. We assessed its utility by inputting all multiple-choice questions from the three examinations into ChatGPT-4o, both with and without Rad-Hub, and recording the answers. They were then compared with reference answers determined by experienced medical physicists and radiation oncologists. Rad-Hub improved accuracy across all examinations. Accuracy increased from 77.0% ± 2.6% to 84.6% ± 1.5% in the Medical Physicist examination, from 74.9% ± 2.0% to 82.1% ± 1.1% in the Radiologist examination, and from 55.6% ± 4.4% to 71.7% ± 4.5% in the Radiation Oncologist examination. Performance gains ranged from 7.2% to 16.1%. These findings highlight the potential of RAG-enhanced LLMs, particularly ChatGPT-4o with Rad-Hub, for integration into radiotherapy applications, such as educational and clinical decision assistance.