Chest X-ray Foundation Model with Global and Local Representations Integration.
Authors
Abstract
Chest X-ray (CXR) is the most frequently ordered imaging test, supporting diverse clinical tasks from thoracic disease detection to postoperative monitoring. However, task-specific classification models are limited in scope, require costly labeled data, and lack generalizability to out-of-distribution datasets. To address these challenges, we introduce CheXFound, a self-supervised vision foundation model that learns robust CXR representations and generalizes effectively across a wide range of downstream tasks. We pretrained CheXFound on a curated CXR-987K dataset, comprising over approximately 987K unique CXRs from 12 publicly available sources. We propose a Global and Local Representations Integration (GLoRI) head for downstream adaptations, by incorporating fine- and coarse-grained disease-specific local features with global image features for enhanced performance in multilabel classification. Our experimental results showed that CheXFound outperformed state-of-the-art models in classifying 40 disease findings across different prevalence levels on the CXR-LT 24 dataset and exhibited superior label efficiency on downstream tasks with limited training data. Additionally, CheXFound achieved significant improvements on downstream tasks with out-of-distribution datasets, including opportunistic cardiovascular disease risk estimation, mortality prediction, malpositioned tube detection, and anatomical structure segmentation. The above results demonstrate CheXFound's strong generalization capabilities, which will enable diverse downstream adaptations with improved label efficiency in future applications. The project source code is publicly available at https://github.com/RPIDIAL/CheXFound.