Back to all papers

Using a Large Language Model-generated Prompt to Extract Features from Synthetic MRI Brain Scan Reports: A Cross-sectional Study.

February 19, 2026pubmed logopapers

Authors

Hanna JJ,Evans CS,Dennis CR,Lee KS,Lehmann CU,Medford RJ

Affiliations (6)

  • Department of Internal Medicine, ECU Brody School of Medicine, Greenville, North Carolina, United States.
  • Information Services, ECU Health, Greenville, North Carolina, United States.
  • Clinical Informatics Center, University of Texas Southwestern, Dallas, Texas, United States.
  • Department of Emergency Medicine, ECU Brody School of Medicine, Greenville, North Carolina, United States.
  • ECU Health Neurosurgery and Spine, ECU Health, Greenville, North Carolina, United States.
  • Department of Pediatrics, University of Texas Southwestern, Dallas, Texas, United States.

Abstract

Feature extraction from free text medical reports is a frequently required clinical, operational, or research procedure. Large language models (LLMs) hold a promise for automating feature extraction, which can also enable category assignment tasks.To compare the groundedness of extracted features by five LLMs from magnetic resonance imaging (MRI) brain scan reports using a clinician-engineered versus an LLM-generated prompt.Five OpenAI LLMs were evaluated for their ability to extract nine binary features from synthetic MRI brain reports. Two types of prompts, a clinician-engineered and an LLM-generated, were used. Metrics including recall, precision, accuracy, and F1 score were calculated to assess model performance.For all extracted features by all studied models from both tested prompts, the overall average recall was 0.956, the average precision was 0.9347, the average accuracy was 0.982, and the average F1 score was 0.9431. Using GPT-3.5-turbo, the LLM-generated prompt had better numerical performance than the clinician-engineered prompt. For the other four GPT-4 models examined, overall recall, precision, and accuracy were higher regardless of the prompt source.This study highlights the potential of LLMs to generate prompts and accurately extract features, with newer models like GPT-4 performing consistently well. The efficacy of feature extraction by LLMs depends on the engineered prompt and model used. Our experimentation demonstrates the potential of LLMs to engineer prompts and extract features from MRI brain scan reports.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.