Predicting Acute Cerebrovascular Events in Stroke Alerts Using Large-Language Models and Structured Data

September 29, 2025

DOI: 10.1101/2025.09.28.25336852

Authors

Erekat, A.,Downes, M. H.,Stein, L. K.,Delman, B. N.,Karp, A. M.,Tripathi, A.,Nadkarni, G. N.,Kupersmith, M. J.,Kummer, B. R.

Affiliations (1)

Icahn School of Medicine at Mount Sinai

Abstract

BackgroundAcute stroke alerts are often activated for non-cerebrovascular conditions, leading to false positives that strain clinical resources and promote diagnostic uncertainty. We sought to develop machine learning (ML) models integrating large-language models (LLMs), structured electronic health record data, and clinical time series data to predict the presence of acute cerebrovascular disease (ACD) at stroke alert activation. MethodsWe derived a series of ML models using retrospective data from stroke alerts activated at Mount Sinai Health System between 2011 and 2021. We extracted structured data (demographics, medical comorbidities, medications, and engineered time-series features from vital signs and lab results) as well as unstructured clinical notes available prior to the time of stroke alert. We processed clinical notes using three embedding approaches: word embeddngs, biomedical embeddings (BioWordVec), and LLMs. Using a radiographic gold standard for acute intracranial vascular event, we used an auto-ML approach to train one model based on unstructured data and five models based on different combinations of structured data. We evaluated models individually using the area under the receiver operating characteristic curve (AUROC), mean positive predictive value (PPV), sensitivity, and F1-score. We then combined the 6 model logits into a multimodal ensemble by weighting their logits based on F1-score, determining ensemble performance using the same metrics. ResultsWe identified 16,512 stroke alerts corresponding to 14,233 unique patients over the study period, of which 9,013 (54.6%) were due to ACD. The multi-modal model (AUROC 0.72, PPV 0.68, sensitivity 0.76, F1 0.72) outperformed all individual models by AUROC. One structured model based on demographics, comorbidities, and medications demonstrated the highest sensitivity (0.95). ConclusionsWe developed a multi-modal ML model to predict ACD at stroke alert activation. This approach has promise to optimize stroke triage and reduce false-positive activations.

View Source Full Text PDF

Topics

neurology

Predicting Acute Cerebrovascular Events in Stroke Alerts Using Large-Language Models and Structured Data

Authors

Affiliations (1)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?