Bridging the performance gap: systematic optimization of local LLMs for Japanese medical PHI extraction.

January 21, 2026

DOI: 10.1038/s41598-026-36904-5 PMID: 41565967

Authors

Wada A,Nishizawa M,Yamamoto A,Akashi T,Hagiwara A,Irie R,Hayakawa Y,Kikuta J,Shimoji K,Sano K,Nakanishi A,Kamagata K,Aoki S

Affiliations (4)

Department of Radiology, Juntendo University Graduate School of Medicine, 2-1-1 Hongo, Bunkyo-ku, Tokyo, 113-8421, Japan. [email protected].
Department of Radiology, Juntendo University Urayasu Hospital, 2-1-1 Tomioka, Urayasu, Chiba, 279-0021, Japan.
Faculty of Health Data Science, Juntendo University Graduate School of Medicine, 6-8-1 Hinode, Urayasu, Chiba, 279-0013, Japan.
Department of Radiology, Juntendo University Graduate School of Medicine, 2-1-1 Hongo, Bunkyo-ku, Tokyo, 113-8421, Japan.

Abstract

Cloud-based Large Language Models (LLMs) excel at medical text processing, but privacy regulations impose significant constraints on transmitting Protected Health Information (PHI) to external services, creating barriers to AI adoption for many healthcare institutions. While contractual agreements (e.g., Business Associate Agreements under HIPAA) may permit such transmission under specific conditions, many institutions prefer or require complete data sovereignty. Local LLMs address this need but have historically underperformed. This study introduces a five-phase optimization framework to bridge this performance gap. Using 160 synthetic Japanese radiology reports, we benchmarked 14 local LLMs against cloud leaders. Our key finding is a notable performance pattern: models with baseline scores below 87-88 points gained an average of + 6.92 points (p < 0.001), while higher-scoring models did not, suggesting a potential threshold effect for targeted optimization that warrants further investigation. The optimized Mistral-Small-3.2 with Self-Refine achieved 91.54 points-97.8% of GPT-4.1's performance-with perfect rule adherence and a clinically acceptable processing time of 24.6 s per report for batch workflows. Our work demonstrates that systematically optimized local LLMs can approach cloud-leader performance. Importantly, it provides a strategic framework guiding institutions on when and where to apply advanced optimization, enabling efficient and trustworthy AI deployment while ensuring patient privacy.

View Source Full Text PDF

Topics

Journal Article

Bridging the performance gap: systematic optimization of local LLMs for Japanese medical PHI extraction.

Authors

Affiliations (4)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?