Automated MRI protocoling in neuroradiology in the era of large language models.

Authors

Reiner LN,Chelbi M,Fetscher L,Stöckel JC,Csapó-Schmidt C,Guseynova S,Al Mohamad F,Bressem KK,Nawabi J,Siebert E,Wattjes MP,Scheel M,Meddeb A

Affiliations (6)

  • Department of Neuroradiology, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353, Berlin, Germany. [email protected].
  • Department of Neuroradiology, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353, Berlin, Germany.
  • Department of Radiology, Technical University Munich, Klinikum Rechts Der Isar, Ismaninger Str. 22, 81675, Munich, Germany.
  • Department or Radiology and Nuclear Medicine, Technical University Munich, German Heart Center Munich, Lazarethstr. 36, 80636, Munich, Germany.
  • Department of Neuroradiology, Hôpital Maison-Blanche, CHU Reims, Université Reims-Champagne-Ardenne, Reims, France.
  • Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Charitéplatz 1, 10117, Berlin, Germany.

Abstract

This study investigates the automation of MRI protocoling, a routine task in radiology, using large language models (LLMs), comparing an open-source (LLama 3.1 405B) and a proprietary model (GPT-4o) with and without retrieval-augmented generation (RAG), a method for incorporating domain-specific knowledge. This retrospective study included MRI studies conducted between January and December 2023, along with institution-specific protocol assignment guidelines. Clinical questions were extracted, and a neuroradiologist established the gold standard protocol. LLMs were tasked with assigning MRI protocols and contrast medium administration with and without RAG. The results were compared to protocols selected by four radiologists. Token-based symmetric accuracy, the Wilcoxon signed-rank test, and the McNemar test were used for evaluation. Data from 100 neuroradiology reports (mean age = 54.2 years ± 18.41, women 50%) were included. RAG integration significantly improved accuracy in sequence and contrast media prediction for LLama 3.1 (Sequences: 38% vs. 70%, P < .001, Contrast Media: 77% vs. 94%, P < .001), and GPT-4o (Sequences: 43% vs. 81%, P < .001, Contrast Media: 79% vs. 92%, P = .006). GPT-4o outperformed LLama 3.1 in MRI sequence prediction (81% vs. 70%, P < .001), with comparable accuracies to the radiologists (81% ± 0.21, P = .43). Both models equaled radiologists in predicting contrast media administration (LLama 3.1 RAG: 94% vs. 91% ± 0.2, P = .37, GPT-4o RAG: 92% vs. 91% ± 0.24, P = .48). Large language models show great potential as decision-support tools for MRI protocoling, with performance similar to radiologists. RAG enhances the ability of LLMs to provide accurate, institution-specific protocol recommendations.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.