Graph guided multiscale cross attention for multilabel chest X ray classification.

May 19, 2026

papers

DOI: 10.1038/s41598-026-53115-0 PMID: 42156521

Authors

Shi G,Wang Z,Shi Y,Pan J,Sun L,Fang F,Jin L

Affiliations (3)

School of Medicine and Information Engineering, Anhui University of Chinese Medicine, Hefei, 230012, P.R. China.
School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, 230009, Anhui, P.R. China.
School of Medicine and Information Engineering, Anhui University of Chinese Medicine, Hefei, 230012, P.R. China. [email protected].

Abstract

Multi-label chest X-ray (CXR) classification is challenging because thoracic abnormalities vary substantially in scale, visual saliency, and anatomical distribution, while disease labels often exhibit clinically meaningful dependencies. We propose a visual-semantic framework that integrates heterogeneous visual representations with graph-guided label reasoning for image-level multi-label CXR classification. The visual encoder consists of a Vision Transformer (ViT) branch and a DenseNet-121 branch with complementary inductive biases: the ViT branch provides self-attention-based content-adaptive token representations, whereas the DenseNet branch provides hierarchical convolutional feature maps with explicit spatial layouts. A multi-scale bidirectional dual cross-attention fusion (DCAF) module aligns these two representations and enables bidirectional cross-representation interaction at the [Formula: see text] and [Formula: see text] stages to construct a fused visual memory. To model label dependencies, we construct an ML-GCN-style label graph whose edges are derived from training-set conditional co-occurrence statistics and whose node features are initialized using GloVe label-name embeddings. The resulting GCN-refined label embeddings initialize the label queries of a Transformer decoder, which retrieves label-specific evidence from the fused visual memory and predicts a single logits matrix for multi-label classification. The proposed method achieves a Mean AUC of 0.849 on ChestX-ray14 following its official evaluation protocol and 0.815 on CheXpert using an internal 70%/10%/20% training/validation/testing partition. Qualitative Grad-CAM visualizations on selected cases further suggest that the proposed framework tends to produce activation patterns consistent with manually indicated visually suspicious regions; these visualizations are not intended as a formal localization evaluation. Overall, the results indicate that cross-representation visual fusion and graph-guided label-query decoding provide complementary benefits for multi-label CXR classification.

View Source Full Text PDF

Topics

Journal Article

Graph guided multiscale cross attention for multilabel chest X ray classification.

Authors

Affiliations (3)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?