Back to all papers

Using natural language processing to extract carotid stenosis severity from clinical notes to create a nationwide veteran cohort.

October 31, 2025pubmed logopapers

Authors

Lee KM,Alba PR,Biagetti GM,Gao A,Yin M,Danilov PN,DiNatale T,Perez C,Hartmann K,Shakt GE,Judy RL,Bellomo TR,Pridgen KM,Tsao PS,Balasundaram N,Levin MG,Damrauer SM,Lynch JA

Affiliations (10)

  • VA Informatics and Computing Infrastructure (VINCI), VA Salt Lake City Health Care System, Salt Lake City.
  • Division of Epidemiology, Department of Internal Medicine, University of Utah School of Medicine, Salt Lake City.
  • Department of Surgery, Hospital of the University of Pennsylvania.
  • Department of Radiology, Hospital of the University of Pennsylvania.
  • Corporal Michael J. Crescenz VA Medical Center, Philadelphia.
  • Department of Surgery, Perelman School of Medicine and the University of Pennsylvania, Philadelphia.
  • Division of Vascular and Endovascular Surgery, Massachusetts General Hospital, Boston.
  • Palo Alto Epidemiology Research and Information Center for Genomics, VA Palo Alto, Palo Alto.
  • Department of Medicine, Stanford University School of Medicine, Stanford.
  • Department of Medicine, Perelman School of Medicine and the University of Pennsylvania, Philadelphia.

Abstract

The prevalence of moderate to severe asymptomatic carotid stenosis (ie, atherosclerotic narrowing of the extracranial carotid arteries) is generally approximately 6% and 2%, respectively. Most prior studies of carotid stenosis risk factors have been small. This study describes the development and validation of a natural language processing (NLP) tool to identify carotid stenosis and uses it to identify significant risk factors, presence, and severity of carotid stenosis. We created an NLP tool to extract the ratio of peak systolic velocity of the internal carotid artery to the common carotid artery (ICA/CCA ratio) in veterans receiving carotid duplex ultrasound examinations in the Veteran's Health Administration from 2001 to 2020. Among those who had at least one valid ICA/CCA ratio, we identified carotid stenosis severity (<50%, 50%-69%, ≥70%) based on the ICA/CCA ratio (<2, ≥2 to <4, ≥4) and assessed the association between presence and severity of carotid stenosis and clinical and demographic characteristics, including age, sex, self-identified race and ethnicity, smoking status, body mass index, systolic and diastolic blood pressures, indicator variables for pre-existing hypertension, coronary heart disease, and type 2 diabetes, and selected laboratory measures (ie, hemoglobin A1c, low-density lipoprotein cholesterol, high-density lipoprotein cholesterol, triglyceride, and creatinine). The harmonic F1 score of the NLP tool was 0.907 for the right value, 0.882 for the left value, and 0.920 for the maximum value. Among the 290,517 veterans in the cohort, the median age was 68.2 years. Black patients had 16% decreased risk of more severe carotid stenosis (odds ratio: 0.84, 95% confidence interval: 0.81-0.87, <i>P</i> < .001). All patient-level risk factors except high-density lipoprotein cholesterol were significantly associated with carotid stenosis severity. The NLP tool performed well, and the study performed with our NLP-created cohort largely validates the risk factors identified by previous smaller studies, demonstrating the utility of big data and NLP in carotid stenosis research. (JVS-Vascular Insights 2025;3:100302.).

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.