High-Performance Prompting for LLM Extraction of Compression Fracture Findings from Radiology Reports.

May 16, 2025

papers DOI: 10.1007/s10278-025-01530-6 PMID: 40379860

Authors

Kanani MM,Monawer A,Brown L,King WE,Miller ZD,Venugopal N,Heagerty PJ,Jarvik JG,Cohen T,Cross NM

Affiliations (4)

School of Medicine, University of Washington, Seattle, WA, USA. [email protected].
Department of Radiology, University of Washington, Seattle, WA, USA.
Department of Biostatistics, University of Washington, Seattle, WA, USA.
Department of Biomedical Informatics, University of Washington, Seattle, WA, USA.

Abstract

Extracting information from radiology reports can provide critical data to empower many radiology workflows. For spinal compression fractures, these data can facilitate evidence-based care for at-risk populations. Manual extraction from free-text reports is laborious, and error-prone. Large language models (LLMs) have shown promise; however, fine-tuning strategies to optimize performance in specific tasks can be resource intensive. A variety of prompting strategies have achieved similar results with fewer demands. Our study pioneers the use of Meta's Llama 3.1, together with prompt-based strategies, for automated extraction of compression fractures from free-text radiology reports, outputting structured data without model training. We tested performance on a time-based sample of CT exams covering the spine from 2/20/2024 to 2/22/2024 acquired across our healthcare enterprise (637 anonymized reports, age 18-102, 47% Female). Ground truth annotations were manually generated and compared against the performance of three models (Llama 3.1 70B, Llama 3.1 8B, and Vicuna 13B) with nine different prompting configurations for a total of 27 model/prompt experiments. The highest F1 score (0.91) was achieved by the 70B Llama 3.1 model when provided with a radiologist-written background, with similar results when the background was written by a separate LLM (0.86). The addition of few-shot examples to these prompts had variable impact on F1 measurements (0.89, 0.84 respectively). Comparable ROC-AUC and PR-AUC performance was observed. Our work demonstrated that an open-weights LLM excelled at extracting compression fractures findings from free-text radiology reports using prompt-based techniques without requiring extensive manually labeled examples for model training.

View Source Full Text PDF

Topics

Journal Article

High-Performance Prompting for LLM Extraction of Compression Fracture Findings from Radiology Reports.

Authors

Affiliations (4)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?