Back to all papers

RE-LIG: A Faithfulness-Driven Layer Integrated Gradients Framework for Explainable Medical Visual Question Answering.

June 29, 2026pubmed logopapers

Authors

Balık E,Aygün İ,Kaya M

Affiliations (3)

  • Department of Software Engineering, Bandırma Onyedi Eylül University, Bandırma, Türkiye.
  • Department of Software Engineering, Celal Bayar University, Manisa, Türkiye.
  • Department of Computer Engineering, Fırat University, Elazig, Türkiye. [email protected].

Abstract

Medical Visual Question Answering (Med-VQA) systems have the potential to support medical image interpretation and clinical decision-making processes. However, the "black-box" nature of existing systems and low-resolution constraints limit the transparency of model decisions, hindering clinical applicability. This work proposes a high-resolution holistic framework called robust and efficient layer-integrated gradients (RE-LIG) to enhance reliability and explainability in Med-VQA systems. The proposed architecture is built upon three key components: (1) high-resolution visual encoding: the PubMedCLIP encoder is scaled to high-resolution using dynamic positional embedding interpolation to capture fine details. (2) Multimodal semantic fusion: clinical questions solved by BioLinkBERT and visual features obtained by PubMedCLIP are aligned through a coattention mechanism. (3) Explainability framework: to counter the noisy nature of classical gradient methods, the RE-LIG algorithm, which combines noise tunneling and layer-based integration strategies, has been integrated into the system. Extensive experiments conducted on the SLAKE dataset demonstrate the proposed framework's success in primarily increasing model faithfulness. Quantitative analyses demonstrate that the RE-LIG method achieves a + 28.9% higher explanation fidelity (RE-LIG AOPC = 0.3180 vs. Vanilla IG = 0.2467, Bootstrap 95% CI [0.262-0.375], Wilcoxon p < 0.001) compared to standard gradient approaches. While achieving this gain in explainability, competitive performance with state-of-the-art (SOTA) models was achieved without compromising diagnostic performance (80.77% overall accuracy, 87.61% closed-ended, and 77.34% open-ended performance). Ablation studies confirm that the integrated noise reduction mechanisms shift the model's focus from background noise to actual pathological boundaries. The findings demonstrate that explainability is not merely a visual aid for clinical confidence but a measurable and verifiable requirement.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.