SiCLIP: An explainable multimodal framework for silicosis diagnosis.
Authors
Affiliations (6)
Affiliations (6)
- Posts and Telecommunications Institute of Technology, Hanoi, Viet Nam. Electronic address: [email protected].
- Thai Nguyen University of Medicine and Pharmacy, Thai Nguyen, Viet Nam. Electronic address: [email protected].
- Posts and Telecommunications Institute of Technology, Hanoi, Viet Nam. Electronic address: [email protected].
- Hanoi Medical University, Hanoi, Viet Nam. Electronic address: [email protected].
- Posts and Telecommunications Institute of Technology, Hanoi, Viet Nam. Electronic address: [email protected].
- Posts and Telecommunications Institute of Technology, Hanoi, Viet Nam. Electronic address: [email protected].
Abstract
Silicosis is a serious occupational lung disease caused by exposure to crystalline silica dust and remains difficult to detect early in at-risk worker populations. In this paper, we introduce the Silicosis Diagnosis Dataset (SDD), which comprises chest X-ray images and structured patient-profile information, including harmful habits and clinical symptoms. To exploit this multimodal dataset, we propose SiCLIP, a multimodal retrieval framework based on CLIP-ViT for silicosis screening and binary classification on SDD. SiCLIP learns a shared embedding space for chest X-ray images and patient profiles and performs retrieval-based aggregation for prediction. On the internally evaluated SDD benchmark, SiCLIP achieves higher accuracy and F1-score than several strong image-only deep learning baselines and the compared multimodal VLM baseline. In addition, SiCLIP provides case-based interpretability by grounding predictions in retrieved similar cases, complemented by supportive saliency visualizations. These results suggest that multimodal retrieval is a promising approach for silicosis screening support in occupationally exposed populations, while external validation remains necessary before broader clinical deployment.