Back to all papers

From text to tables: Zero-shot extraction of structured clinical data from free-text CT scan reports using foundational large language models

Authors

Hongslo, A.,Gupta, A.,Nguyen, Q.,Caldwell, J.,Choi, B.,Harvey, C. J.,Thompson, J.,Mazzotti, D.,Yao, Z.,Noheria, A.

Affiliations (1)

  • Department of Cardiovascular Medicine, University of Kansas Medical Center, Kansas City, KS

Abstract

BackgroundLarge language models (LLMs) are being explored for multiple applications in medical research, including medical text classification. We evaluate the performance of 5 off-the-shelf LLMs for classifying free-text CT angiography reports for pulmonary embolism (PE)- related diagnostic labels. MethodsWe assessed 1,025 manually labeled CT reports using 5 LLMs (ChatGPT-4o, Llama 3.3 70b, Llama 3.1 8b, Llama 3.2 3b, Mixtral 8x7b) with zero-shot prefix prompts. Labels included acute PE, bilateral PE, and large PE. Voting ensemble models combining multiple LLM outputs were also tested. ResultsLlama 3.3 70b and ChatGPT-4o outperformed smaller models for all classification tasks. Highest accuracies were 96.6% (acute PE), 92.7% (bilateral PE), and 82.6% (large PE). Voting ensemble models offered no or minimal improvement in classification performance. ConclusionsOff-the-shelf LLMs, particularly larger ones, can classify free-text reports with high accuracy using simple prompts. Further work is needed to optimize prompting strategies and evaluate hybrid approaches.

Topics

health informatics

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.