An intelligent healthcare system for rare disease diagnosis utilizing electronic health records based on a knowledge-guided multimodal transformer framework.
Authors
Affiliations (4)
Affiliations (4)
- College of Technological Innovation, Zayed University, Abu Dhabi, United Arab Emirates.
- Department of Computer Science and Engineering, Amity School of Engineering and Technology (ASET), Amity University, Mumbai, Maharashtra, India.
- Department of Computer Science & Engineering , University Institute of Technology, Rajiv Gandhi Proudyogiki Vishwavidyalaya (State Technological University), Bhopal, Madhya Pradesh, India.
- Department of Computer Science and Engineering, Manipal University Jaipur, Jaipur, India. [email protected].
Abstract
Rare diseases are a common problem with millions of patients globally, but their diagnosis is difficult because of varied clinical presentations, small sample size, and disparate biomedical data sources. Current diagnostic tools are not able to combine multimodal information effectively, which results in a timely or wrong diagnosis. To fill this gap, this paper suggests a smart multimodal healthcare framework integrating electronic health records (EHRs), genomic sequences, and medical imaging to improve the detection of rare diseases. The framework uses Swin Transformer to extract hierarchical visual features in radiographic scans, Med-BERT and Transformer-XL to learn semantic and long-term temporal relations in longitudinal electronic health record narratives, and a Graph Neural Network (GNN)-based encoder to learn functional and structural relations in genomic sequences. The alignment of the cross-modal representation is further boosted with a Knowledge-Guided Contrastive Learning (KGCL) mechanism, which takes advantage of rare disease ontologies in Orphanet to improve the interpretability of the model and infusion of knowledge. To achieve strong performance, the Nutcracker Optimization Algorithm (NOA) is proposed to optimize hyperparameters, calibrate attention mechanisms, and enhance multimodal fusion. Experimental results on MIMIC-IV (EHR), ClinVar (genomics), and CheXpert (imaging) datasets show that the proposed framework significantly outperforms the state-of-the-art multimodal baselines in terms of accuracy and robustness of early rare disease diagnosis. This paper presents the opportunity to integrate hierarchical vision transformers, domain-specific language models, graph-based genomic encoders, and knowledge-directed optimization to make explainable, accurate, and clinically applicable healthcare decisions in rare disease settings.