Creating a scalable CT yield metric for pulmonary embolisms in the emergency department using an open-source large language model

January 16, 2026

preprint

DOI: 10.64898/2026.01.13.26344087

Authors

Wiseman, B.,Li, H.,Mason, E.,Lonergan, K.,Mitchell, R.,Hayward, J.

Affiliations (1)

University of Alberta

Abstract

BackgroundCT scans are the gold-standard diagnostic test for pulmonary embolisms (PE). Despite stable PE prevalence, CT use is rising in emergency departments (EDs), suggesting test overuse. Current methods for measuring test yield are error-prone or not scalable, thus we tested the accuracy of an open-source, foundational large language model (LLM) for identifying PEs from free-text radiology reports. MethodsOur retrospective diagnostic accuracy study used 10,173 CT-PE reports from 216 radiologists at 38 EDs across Alberta, Canada from April 2021-April 2023. Reports were classified as PE present, PE absent, or Indeterminate by human labelers. An LLM (LLAMA-2-70B) was then prompt-engineered to label the reports. Label accuracy was compared against ICD-10-CA codes and a rule-based natural language processing (NLP) algorithm (ChartExtract; University of Toronto). Descriptive statistics were performed to analyze results. Results1070 (11.8%) reports were PE positive. The LLM achieved an Area Under the Curve (AUC) of 99.1%, outperforming both ICD-10-CA (AUC=90.6%) and ChartExtract (AUC=86.5%), while demonstrating a 16-25% higher sensitivity (LLM: sensitivity=98.8%, specificity=99.1%; ICD-10-CA: sensitivity=82.0%, specificity=99.1%; ChartExtract: sensitivity=73.6%, specificity=99.4%). The LLM took an average of 143 milliseconds to label each report and produce a paragraph justifying the classification. ConclusionsOpen-source, foundational LLMs are an accurate and scalable method for interpreting radiology reports and identifying PEs from ED data. If IT resources are available, this is a cost-effective approach to quality metric derivation for diagnostic processes in large health systems.

View Source Full Text PDF

Topics

health systems and quality improvement

Creating a scalable CT yield metric for pulmonary embolisms in the emergency department using an open-source large language model

Authors

Affiliations (1)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?