JRadiEvo: A Japanese radiology report generation model enhanced by evolutionary optimization of model merging.
Authors
Affiliations (2)
Affiliations (2)
- Department of Cardiovascular Medicine, The University of Tokyo Hospital, 7-3-1, Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan. Electronic address: [email protected].
- Department of Cardiovascular Medicine, The University of Tokyo Hospital, 7-3-1, Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan.
Abstract
Radiology report generation is an important application of artificial intelligence (AI), as the interpretation of medical images and the production of clinically relevant reports are time-consuming and cognitively demanding tasks, especially in high-volume settings such as chest X-ray screening. Recent advances in large language models (LLMs) and vision-language models (VLMs) have enabled substantial progress in automated medical report generation. However, most existing medical foundation models are trained primarily on English datasets, limiting their practicality in non-English-speaking regions such as Japan. Publicly available radiology datasets are overwhelmingly English, while constructing large-scale non-English datasets is costly and difficult because of translation effort and medical data privacy constraints. Existing adaptation methods, such as fine-tuning and continued pre-training, typically require large amounts of in-domain data. This makes them difficult to apply in low-resource medical language settings where large-scale annotated datasets are unavailable. Accordingly, this study asks: how can an accurate Japanese chest X-ray radiology report generator be developed without access to large-scale, curated Japanese medical image-report pairs? To address this challenge, we propose JRadiEvo, a Japanese chest X-ray report generation model built through evolutionary optimization of model merging. By combining pretrained models with complementary strengths in vision-language alignment, medical knowledge, and Japanese generation, JRadiEvo enables data-efficient adaptation without large-scale training. To the best of our knowledge, this is the first attempt to build a non-English medical vision-language model through evolutionary optimization of model merging. Despite using only 50 translated training samples from publicly available data, JRadiEvo outperforms CheXagent, a state-of-the-art model trained on approximately 8.5 million samples, in ROUGE-L and METEOR metrics. These results provide a proof of concept for extreme data-efficient adaptation in low-resource medical languages.