Discriminating HFrEF vs HFpEF from chest radiographs: Mitigating demographic performance gaps via augmentation and multimodal fusion.
Authors
Affiliations (5)
Affiliations (5)
- Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan.
- Faculty of Medicine and Dentistry, University of Alberta, Edmonton, Canada.
- Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America.
- Division of Pulmonary Critical Care and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts, United States of America.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America.
Abstract
Deep learning models applied to chest radiography have shown promise for cardiac phenotyping, yet concerns remain regarding demographic performance disparities and deployment robustness. In this proof-of-concept study, we present a fairness-centered evaluation of chest X-ray-based differentiation between heart failure with reduced ejection fraction (HFrEF) and heart failure with preserved ejection fraction (HFpEF) using publicly available data. We constructed a cohort from MIMIC-CXR linked to MIMIC-IV ICD-10 phenotypes, restricted to cases with radiographic pulmonary edema, and evaluated image-only modeling, data augmentation, and a lightweight multimodal fusion strategy incorporating demographic and comorbidity information. A DenseNet121 image-only model achieved modest discrimination (AUC = 0.61-0.64) and exhibited measurable performance gaps across race, age, and sex subgroups. Standard data augmentation improved overall performance and reduced several subgroup disparities. Multimodal fusion further enhanced discrimination (AUC = 0.76) and reduced the largest observed demographic AUC gap by up to 83% (relative reduction). Threshold-level subgroup metrics, including sensitivity and specificity, together with calibration analyses, demonstrated more balanced error profiles across demographic groups at clinically relevant operating points. These findings highlight that simple, reproducible interventions can substantially improve both performance and equity in chest X-ray-based heart failure phenotyping on public datasets.