Large Language Models for Accurate Medical Chart Abstraction: Enabling Scalable and Secure AI Deployment in Stroke.

February 20, 2026

DOI: 10.3174/ajnr.A9251 PMID: 41720514

Authors

Zhong Z,Porto CM,Hong D,Gonzales M,Kim R,Karayi G,Bi L,Feler JR,Shaaya E,Collins S,Shu L,Baird G,Jayaraman M,Yaghi S,Jiao Z,Wolman DN

Affiliations (2)

From the Warren Alpert Medical School of Brown University (Z.Z., C.M.P, M.G., R.K., G.K., L.B., J.R.F., E.S., S.C., L.S., G.B., M.J., S.Y., Z.J., D.N.W.), Department of Radiology (Z.Z., G.B., M.J., Z.J.), Brown University Health, Department of Interventional Radiology (D.N.W.), Brown University Health, Department of Neurosurgery (J.R.F.), Brown University Health, Providence 02903, USA.
From the Warren Alpert Medical School of Brown University (Z.Z., C.M.P, M.G., R.K., G.K., L.B., J.R.F., E.S., S.C., L.S., G.B., M.J., S.Y., Z.J., D.N.W.), Department of Radiology (Z.Z., G.B., M.J., Z.J.), Brown University Health, Department of Interventional Radiology (D.N.W.), Brown University Health, Department of Neurosurgery (J.R.F.), Brown University Health, Providence 02903, USA. [email protected].

Abstract

Medical chart abstraction plays a critical role in clinical research and quality monitoring by transforming unstructured narratives in the procedure reports into structured variables for large-scale analysis. To develop and evaluate a prompting-based large language model (LLM) framework for automated extraction of structured clinical variables from neurointerventional procedure reports in patients with acute ischemic stroke (AIS) due to large-vessel occlusions (LVO). This retrospective study included 2,416 free-text neurointerventional acute stroke intervention (thrombectomy) reports with key radiology findings from three hospitals. Eight clinically relevant variables were annotated by hospital staff (without formal clinical training) and used as the non-expert reference standard. Twenty-two instruction-tuned open-source LLMs (LLaMA, Qwen, Gemma, etc) were evaluated across architectures, sizes, and biomedical adaptations using two prompting strategies: Quick Response and Chain-of-Thought (CoT). Model performance was benchmarked against non-expert staff annotations and medical expert ratings. Extraction accuracy, latency, and agreement with expert adjudication were assessed. LLaMA3.3-70B achieved the highest overall accuracy (94.8%). CoT prompting improved performance on inferential variables (e.g., site of occlusion), while Quick Response was optimal for directly stated procedural fields (e.g., stent placement). Expert adjudication confirmed that LLaMA3.3-70B outperformed non-expert annotations in 7 of 8 variables and matched junior medical students. Annotation accuracy increased with clinical experience, and AI predictions were more closely aligned with expert interpretations than with those of non-expert staff, especially for structured variables like IV tPA, TICI Post, and NIHSS. Prompted LLMs can accurately and scalably extract critical clinical information from neurovascular radiology reports without custom preprocessing, supporting integration into retrospective research pipelines and automated stroke registry curation.

View Source Full Text PDF

Topics

Journal Article

Large Language Models for Accurate Medical Chart Abstraction: Enabling Scalable and Secure AI Deployment in Stroke.

Authors

Affiliations (2)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?