From text to tables: Zero-shot extraction of structured clinical data from free-text CT scan reports using foundational large language models
Authors
Affiliations (1)
Affiliations (1)
- Department of Cardiovascular Medicine, University of Kansas Medical Center, Kansas City, KS
Abstract
BackgroundLarge language models (LLMs) are being explored for multiple applications in medical research, including medical text classification. We evaluate the performance of 5 off-the-shelf LLMs for classifying free-text CT angiography reports for pulmonary embolism (PE)- related diagnostic labels. MethodsWe assessed 1,025 manually labeled CT reports using 5 LLMs (ChatGPT-4o, Llama 3.3 70b, Llama 3.1 8b, Llama 3.2 3b, Mixtral 8x7b) with zero-shot prefix prompts. Labels included acute PE, bilateral PE, and large PE. Voting ensemble models combining multiple LLM outputs were also tested. ResultsLlama 3.3 70b and ChatGPT-4o outperformed smaller models for all classification tasks. Highest accuracies were 96.6% (acute PE), 92.7% (bilateral PE), and 82.6% (large PE). Voting ensemble models offered no or minimal improvement in classification performance. ConclusionsOff-the-shelf LLMs, particularly larger ones, can classify free-text reports with high accuracy using simple prompts. Further work is needed to optimize prompting strategies and evaluate hybrid approaches.