Development and External Validation of a Detection Model to Retrospectively Identify Patients With Acute Respiratory Distress Syndrome.
Levy E, Claar D, Co I, Fuchs BD, Ginestra J, Kohn R, McSparron JI, Patel B, Weissman GE, Kerlin MP, Sjoding MW
•papers•Jun 1 2025The aim of this study was to develop and externally validate a machine-learning model that retrospectively identifies patients with acute respiratory distress syndrome (acute respiratory distress syndrome [ARDS]) using electronic health record (EHR) data. In this retrospective cohort study, ARDS was identified via physician-adjudication in three cohorts of patients with hypoxemic respiratory failure (training, internal validation, and external validation). Machine-learning models were trained to classify ARDS using vital signs, respiratory support, laboratory data, medications, chest radiology reports, and clinical notes. The best-performing models were assessed and internally and externally validated using the area under receiver-operating curve (AUROC), area under precision-recall curve, integrated calibration index (ICI), sensitivity, specificity, positive predictive value (PPV), and ARDS timing. Patients with hypoxemic respiratory failure undergoing mechanical ventilation within two distinct health systems. None. There were 1,845 patients in the training cohort, 556 in the internal validation cohort, and 199 in the external validation cohort. ARDS prevalence was 19%, 17%, and 31%, respectively. Regularized logistic regression models analyzing structured data (EHR model) and structured data and radiology reports (EHR-radiology model) had the best performance. During internal and external validation, the EHR-radiology model had AUROC of 0.91 (95% CI, 0.88-0.93) and 0.88 (95% CI, 0.87-0.93), respectively. Externally, the ICI was 0.13 (95% CI, 0.08-0.18). At a specified model threshold, sensitivity and specificity were 80% (95% CI, 75%-98%), PPV was 64% (95% CI, 58%-71%), and the model identified patients with a median of 2.2 hours (interquartile range 0.2-18.6) after meeting Berlin ARDS criteria. Machine-learning models analyzing EHR data can retrospectively identify patients with ARDS across different institutions.