Back to all papers

Implementing a Resource-Light and Low-Code Large Language Model System for Information Extraction from Mammography Reports: A Pilot Study.

Authors

Dennstädt F,Fauser S,Cihoric N,Schmerder M,Lombardo P,Cereghetti GM,von Däniken S,Minder T,Meyer J,Chiang L,Gaio R,Lerch L,Filchenko I,Reichenpfader D,Denecke K,Vojvodic C,Tatalovic I,Sander A,Hastings J,Aebersold DM,von Tengg-Kobligk H,Nairz K

Affiliations (12)

  • Department of Radiation Oncology, Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland. [email protected].
  • School of Medicine, University of St. Gallen, St. Gallen, Switzerland. [email protected].
  • Department of Radiation Oncology, Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland.
  • Department of Diagnostic, Interventional and Pediatric Radiology (DIPR), Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland.
  • Department of Neurology, Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland.
  • Institute for Patient-Centered Digital Health, Bern University of Applied Sciences, Biel/Bienne, Switzerland.
  • Faculty of Medicine, University of Geneva, Geneva, Switzerland.
  • Wemedoo AG, Steinhausen, Switzerland.
  • ID Berlin GmbH, Berlin, Germany.
  • School of Medicine, University of St. Gallen, St. Gallen, Switzerland.
  • Institute for Implementation Science in Health Care, University of Zurich, Zurich, Switzerland.
  • Swiss Institute of Bioinformatics, Lausanne, Switzerland.

Abstract

Large language models (LLMs) have been successfully used for data extraction from free-text radiology reports. Most current studies were conducted with LLMs accessed via an application programming interface (API). We evaluated the feasibility of using open-source LLMs, deployed on limited local hardware resources for data extraction from free-text mammography reports, using a common data element (CDE)-based structure. Seventy-nine CDEs were defined by an interdisciplinary expert panel, reflecting real-world reporting practice. Sixty-one reports were classified by two independent researchers to establish ground truth. Five different open-source LLMs deployable on a single GPU were used for data extraction using the general-classifier Python package. Extractions were performed for five different prompt approaches with calculation of overall accuracy, micro-recall and micro-F1. Additional analyses were conducted using thresholds for the relative probability of classifications. High inter-rater agreement was observed between manual classifiers (Cohen's kappa 0.83). Using default prompts, the LLMs achieved accuracies of 59.2-72.9%. Chain-of-thought prompting yielded mixed results, while few-shot prompting led to decreased accuracy. Adaptation of the default prompts to precisely define classification tasks improved performance for all models, with accuracies of 64.7-85.3%. Setting certainty thresholds further improved accuracies to > 90% but reduced the coverage rate to < 50%. Locally deployed open-source LLMs can effectively extract information from mammography reports, maintaining compatibility with limited computational resources. Selection and evaluation of the model and prompting strategy are critical. Clear, task-specific instructions appear crucial for high performance. Using a CDE-based framework provides clear semantics and structure for the data extraction.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.