A unified multimodal framework for chest X-ray retrieval and disease prediction for clinical decision support.
Authors
Affiliations (2)
Affiliations (2)
- Faculty of Information Technology, University of Science, Ho Chi Minh City, Viet nam; Vietnam National University, Ho Chi Minh City, Viet nam.
- Faculty of Information Technology, University of Science, Ho Chi Minh City, Viet nam; Vietnam National University, Ho Chi Minh City, Viet nam. Electronic address: [email protected].
Abstract
Recent advances in medical imaging and natural language processing enable new opportunities for automated diagnostic support. Chest X-rays (CXRs) remain the most common imaging modality for screening pulmonary, cardiovascular, and systemic diseases; however, the growing volume of studies and free-text reports can overwhelm clinicians. We propose a unified multimodal retrieval and prediction framework that jointly leverages DICOM-format CXRs and radiology reports by projecting visual and textual features into a shared semantic space. Trained with contrastive and multi-label objectives, the system supports disease classification, case-based retrieval, and explainable AI. Experiments on the Open-I dataset demonstrate strong performance, achieving a Macro AUROC of 0.95 and Macro F1-score of 0.71 across 22 diagnostic categories. The retrieval module attains high ranking quality (MRR and nDCG >0.93) with sub-millisecond query latency. Quantitative explainability analysis further shows strong agreement between attention- and gradient-based attribution maps (Pearson ρ≈0.92), supporting trustworthy clinical decision-making.