Cutting-edge bayesian deep learning and statistical strategies for bias mitigation in COVID-19 detection via chest x-ray imaging.
Authors
Affiliations (8)
Affiliations (8)
- The 960th Hospital of the PLA Joint Logistics Support Force, Jinan, Shandong Province, China.
- Department of Computer Science, Qurtuba University of Science and IT, Dera Ismail Kha, Pakistan.
- Faculty of Computing, Gomal University, Dera Ismail Khan, Pakistan.
- Department of Computer Science, Qurtuba University of Science and IT, Dera Ismail Kha, Pakistan. [email protected].
- Centre for Wireless Technology, CoE for Intelligent Network, Faculty of AI & Engineering, Multimedia University, Selangor, Malaysia. [email protected].
- Department of Civil and Airport Engineering, College of Civil Aviation, Nanjing University of Aeronautics and Astronautics, Nanjing, 210016, China.
- Department of Mechanical Engineering, College of Engineering, Taif University, P.O. Box 11099, Taif, 21944, Saudi Arabia.
- Faculty of Computing & Informatics, Multimedia University, Cyberjaya, Malaysia. [email protected].
Abstract
Chest radiography (CXR is widely used for triage and follow-up of pulmonary disease, yet COVID-19 classification remains vulnerable to bias, label noise, and domain shift. We propose a multi-stage Bayesian deep learning framework that combines lung segmentation, segmentation-guided classification, calibrated ensembling, and uncertainty estimation to classify four classes (COVID-19, normal, viral pneumonia, bacterial pneumonia) and to grade COVID-19 severity. Models are trained and tested on 1,531 CXRs (100 COVID-19 images from 70 patients; 1,431 non-COVID images from ChestX-ray14) with patient-wise splits. The final ensemble achieves 98.33% test accuracy; COVID-19 sensitivity reaches 100% on this split. Robustness is quantified by stress-testing five image degradations (Gaussian noise, motion/defocus blur, JPEG compression, and downsampling), with macro AUC drops remaining small at moderate severities and larger under strong blur or heavy downsampling. Saliency and context-relevance analyses are used to identify spurious cues. The study is limited by dataset size and lack of external multi-site validation; a planned evaluation on COVIDx and BIMCV-COVID19 + is outlined.