Optimizing the Accuracy of Natural Language Processing Tools for Pulmonary Embolism Detection Through Integration with Claims Data: The PE-EHR+ Study.

January 28, 2026

DOI: 10.1055/a-2796-1975 PMID: 41605431

Authors

Rashedi S,Bukhari S,Krishnathasan D,Khairani CD,Bejjani A,Pfeferman MB,Malejczyk J,Zarghami M,Secemsky E,Rahaghi FN,Hussain M,Mojibian H,Goldhaber S,Jiménez D,Monreal M,Yang R,Zhou L,Piazza G,Krumholz H,Wang L,Bikdeli B

Affiliations (19)

Thrombosis Research Group, Brigham and Women's Hospital, Boston, United States.
Division of Cardiology, Johns Hopkins University, Baltimore, United States.
Department of Internal Medicine, University of Pittsburgh Medical Center Health System, Pittsburgh, United States.
Medicine, Jamaica Hospital Medical Center, Jamaica, United States.
Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital Department of Medicine, Boston, United States.
Department of Medicine, Beth Israel Deaconess Medical Center Richard A and Susan F Smith Center for Outcomes Research in Cardiology, Boston, United States.
Harvard Medical School, Boston, United States.
Division of Pulmonary and Critical Care, Brigham and Women's Hospital Department of Medicine, Boston, United States.
Division of Vascular and Endovascular Surgery and the Center for Surgery and Public Health, Brigham and Women's Hospital, Boston, United States.
Department of Radiology and Biomedical Imaging, Yale University, New Haven, United States.
Division of Cardiovascular Medicine, Brigham and Women's Hospital, Boston, United States.
Respiratory Division, Medicine Department, Ramón y Cajal Hospital, IRYCIS and Alcalá de Henares University, Madrid, Spain.
Medicine Department, Universidad de Alcalá, Alcala de Henares, Spain.
CIBER Enfermedades Respiratorias (CIBERES), Instituto de Salud Carlos III, Madrid, Spain.
Catedra de Enfermedad Tromboembolica, Universidad Catolica San Antonio de Murcia Facultad de Ciencias de la Salud, Barcelona, Spain.
Yale School of Medicine Center for Outcomes Research & Evaluation, New Haven, United States.
Section of Cardiovascular Medicine, Yale School of Medicine Department of Internal Medicine, New Haven, United States.
Department of Health Policy and Management, Yale School of Public Health Department of Health Policy & Management, New Haven, United States.
Yale School of Public Health Department of Health Policy & Management, New Haven, United States.

Abstract

Background Rule-based natural language processing (NLP) tools can identify pulmonary embolism (PE) via radiology reports. However, their external validity remains uncertain. Methods In this cross-sectional study, 1,712 hospitalized patients (with and without PE) at Mass General Brigham (MGB) hospitals (2016-2021) were analyzed. Two previously-published NLP algorithms were applied to radiology reports to identify PE. Chart review by two physicians was the reference standard. We tested three approaches: (A) NLP applied to all patients; (B) NLP limited to radiology reports of patients with principal or secondary International Classification of Diseases 10th revision (ICD-10) PE discharge codes; and (C) NLP applied to patients with PE discharge codes or a Present-on-Admission (POA) indicator ("Y") for PE. All others were assumed PE-negative in Approaches B and C to minimize NLP false positives. Weighted estimates were derived from the MGB hospitalized cohort (n=381,642) to calculate F1 scores (as the harmonic mean of sensitivity and positive predictive value (PPV)). Results In Approach A, both NLP tools showed high sensitivity (82.5%, 93.0%) and specificity (98.9%, 98.7%) but low PPV (60.3%, 59.6%). Approach B improved PPV (95.2%, 94.9%) but reduced sensitivity (74.1%, 76.2%), while Approach C preserved both high sensitivity (82.5%, 93.0%) and PPV (95.6%, 95.8%). Approach C demonstrated the best performance, yielding significantly higher F1 scores for both NLP tools (88.6%, 94.4%) compared with Approach A (69.7%, 72.6%) and Approach B (83.3%, 84.5%) (P<0.001). Conclusion The accuracy of PE detection improves when rule-based NLP algorithms are operationalized using administrative claims data in addition to radiology reports.

View Source Full Text PDF

Topics

Journal Article

Optimizing the Accuracy of Natural Language Processing Tools for Pulmonary Embolism Detection Through Integration with Claims Data: The PE-EHR+ Study.

Authors

Affiliations (19)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?