An attention-infused deep convolutional paradigm for multi-label classification of thoracic pathologies in chest radiographs.
Authors
Affiliations (6)
Affiliations (6)
- Department of Computer Science & IT, University of Malakand, Chakdara, Dir Lower, 18000, Khyber Pakhtunkhwa, Pakistan.
- Department of Software Engineering, University of Malakand, Chakdara, Dir Lower, 18000, Khyber Pakhtunkhwa, Pakistan.
- Quality Enhancement Cell, Shaheed Benazir Bhutto University, Sheringal, Dir Upper, 18050, Khyber Pakhtunkhwa, Pakistan.
- Research Center for Trusted Artificial Intelligence, ISP RAS, Moscow, Russia, 109004.
- Almazov National Medical Research Centre, St. Petersburg, Russia, 197341. [email protected].
- Intelligent Devices Institute, Saint Petersburg Electrotechnical University "LETI", St. Petersburg, Russia, 197022. [email protected].
Abstract
This research proposes an empirical benchmarking study of an attention-infused deep convolutional framework for multi-label thoracic pathology classification in chest radiographs, designed to emulate how radiologists selectively focus on suspicious regions. Existing CNN models process entire images uniformly, often missing fine-grained or subtle abnormalities that require localized visual emphasis. In spite of the progress of deep convolutional neural networks (CNNs) in automated CXR analysis, traditional architectures do not sufficiently localize fine-grained pathological information, especially in multi-label contexts. To overcome these constraints, we introduce an attention-aware deep convolutional paradigm that can easily add lightweight spatial attention modules to multiple high-performance CNN backbones, including ResNet101, EfficientNet-B0/B3, and MobileNetV2. The proposed spatial attention module specifically targets the challenge of spatial feature reweighting to improve localization of fine-grained pathological regions; it does not explicitly model other inherent challenges of multi-label CXR classification such as label noise, class imbalance, or disease co-occurrence, which remain contextual factors of the task. The spatial attention system is based on recreating the radiologist attention mechanism, which temporarily accentuates the relevant areas in diagnostics and inhibits the background noise. This improves feature recalibration on intermediate layers, allowing the network to learn global context and localized pathologies simultaneously with little computational cost. It is trained and tested on the NIH ChestX-ray14 dataset with a multi-label classification with sigmoid outputs and binary cross-entropy losses operating on 13 thoracic conditions. It is important to note that Attention MobileNetV2 achieved a micro-AUC improvement of 0.818 to 0.826, whilst Attention ResNet101 achieved its highest micro-AUC of 0.872, which is greater than the non-attention ResNet101 (0.853). Localized pathologies, including emphysema (improvement in performance, +0.05 AUC), effusion (improvement in performance, +0.04) and pneumothorax (improvement in performance, +0.04), showed some of the strongest improvements in performance. ROC analysis showed an improved early-lift behavior and decreased false-positives -which is important in triage. Ablation experiments verified that spatial attention is a selective, architecture-specific promoter, not an amplifier, and the largest advantage was found in lightweight and deep residual networks. Our results demonstrate that spatial attention is a modest but architecture-dependent enhancement of CNN-based chest radiograph classification. The magnitude of improvement varies considerably across backbone architectures: Attention ResNet101 achieves the largest micro-AUC gain (from 0.853 to 0.872), MobileNetV2 shows a moderate improvement (from 0.818 to 0.826), while EfficientNet-B0 and EfficientNet-B3 exhibit minimal change, reflecting diminishing returns in already highly optimized architectures. These results confirm that the proposed spatial attention module functions as a selective, architecture-specific enhancer rather than a universally transformative component. Claims of broad clinical reliability are tempered by the reliance on NLP-derived, noisy labels from ChestX-ray14; this work is best understood as an empirical foundation for future hybrid attention strategies that may combine spatial, channel-wise, and self-attention for more robust multi-label CXR analysis. The proposed framework achieves a practical balance between performance, interpretability, and computational efficiency, and may serve as a principled modular plug-in in future AI-assisted radiological diagnostic pipelines, pending further clinical validation on expert-annotated datasets.