Back to all papers

Graph guided multiscale cross attention for multilabel chest X ray classification.

May 19, 2026pubmed logopapers

Authors

Shi G,Wang Z,Shi Y,Pan J,Sun L,Fang F,Jin L

Affiliations (3)

  • School of Medicine and Information Engineering, Anhui University of Chinese Medicine, Hefei, 230012, P.R. China.
  • School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, 230009, Anhui, P.R. China.
  • School of Medicine and Information Engineering, Anhui University of Chinese Medicine, Hefei, 230012, P.R. China. [email protected].

Abstract

Multi-label chest X-ray (CXR) classification is challenging because thoracic abnormalities vary substantially in scale, visual saliency, and anatomical distribution, while disease labels often exhibit clinically meaningful dependencies. We propose a visual-semantic framework that integrates heterogeneous visual representations with graph-guided label reasoning for image-level multi-label CXR classification. The visual encoder consists of a Vision Transformer (ViT) branch and a DenseNet-121 branch with complementary inductive biases: the ViT branch provides self-attention-based content-adaptive token representations, whereas the DenseNet branch provides hierarchical convolutional feature maps with explicit spatial layouts. A multi-scale bidirectional dual cross-attention fusion (DCAF) module aligns these two representations and enables bidirectional cross-representation interaction at the [Formula: see text] and [Formula: see text] stages to construct a fused visual memory. To model label dependencies, we construct an ML-GCN-style label graph whose edges are derived from training-set conditional co-occurrence statistics and whose node features are initialized using GloVe label-name embeddings. The resulting GCN-refined label embeddings initialize the label queries of a Transformer decoder, which retrieves label-specific evidence from the fused visual memory and predicts a single logits matrix for multi-label classification. The proposed method achieves a Mean AUC of 0.849 on ChestX-ray14 following its official evaluation protocol and 0.815 on CheXpert using an internal 70%/10%/20% training/validation/testing partition. Qualitative Grad-CAM visualizations on selected cases further suggest that the proposed framework tends to produce activation patterns consistent with manually indicated visually suspicious regions; these visualizations are not intended as a formal localization evaluation. Overall, the results indicate that cross-representation visual fusion and graph-guided label-query decoding provide complementary benefits for multi-label CXR classification.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.