Back to all papers

Identification of high-priority radiology reports with unexpected findings using fine-tuned large language models.

December 24, 2025pubmed logopapers

Authors

Umeno A,Nishio M,Matsuo H,Matsunaga T,Nogami M,Ueshima E,Sofue K,Murakami T

Affiliations (3)

  • Department of Radiology, Kobe University Graduate School of Medicine, Chuo-ku, Japan.
  • Department of Radiology, Kobe University Graduate School of Medicine, Chuo-ku, Japan. [email protected].
  • Division of Medical Imaging, Biomedical Imaging Research Center, University of Fukui, Yoshida, Japan.

Abstract

This study aims to evaluate whether large language models (LLMs) can accurately predict the urgency and severity of radiology reports. Based on the recommendations of the Academy of Royal Colleges, we defined radiology reports that include unexpected findings of high urgency or severity as "high-priority (HP) radiology reports." Overall, 1906 radiology reports were used as the training set, and 176 radiology reports were used as the test set, with a balanced ratio of HP to non-HP radiology reports (1:1) in both sets. Four types of LLMs (Llama2 7B, Llama3 8B, Llama3 Elyza 8B, and Llama 3.1 8B) were fine-tuned using four different input settings: (1) findings only, (2) findings + referring department, (3) findings + referring department + clinical diagnosis before examination, and (4) findings + referring department + clinical diagnosis before examination + details of examination request. The fine-tuned LLMs predicted whether each radiology report was HP or not. Among the four LLMs, Llama3 Elyza 8B, with inputs comprising findings and the referring department, demonstrated the best performance, achieving PRAUC = 0.962, ROCAUC = 0.968, accuracy = 0.915, sensitivity/recall = 0.932, specificity = 0.898, and F1 = 0.916. Adding a clinical diagnosis before the examination and details of examination requests did not necessarily lead to performance improvement. The fine-tuned LLMs accurately predicted HP radiology reports, suggesting their potential utility in supporting communication regarding radiology reports with high urgency or severity. Question This study aims to evaluate whether large language models (LLMs) can accurately predict the high-priority (HP) radiology reports. Findings The fine-tuned best LLM accurately HP radiology reports, achieving PRAUC of 0.962 and ROCAUC of 0.968. Clinical relevance This study demonstrates that fine-tuned LLMs can accurately identify HP radiology reports, potentially improving timely clinical decision-making and enhancing patient safety through faster communication of critical findings.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 7,600+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.