Token-based fidelity scoring for trustworthy vision transformer interpretations in medical imaging.
Authors
Affiliations (6)
Affiliations (6)
- Center for Biosystems and Biotech Data Science, Ghent University Global Campus, Incheon, Republic of Korea. [email protected].
- IDLab, ELIS, Ghent University, Ghent, Belgium. [email protected].
- Computational Data Sciences Department, George Mason University Korea, Incheon, Republic of Korea. [email protected].
- Center for Biosystems and Biotech Data Science, Ghent University Global Campus, Incheon, Republic of Korea.
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium.
- IDLab, ELIS, Ghent University, Ghent, Belgium.
Abstract
This work addresses the problem that interpretability methods for medical imaging models are still mostly evaluated by agreement with human annotations, even though such agreement does not necessarily indicate that an explanation is faithful to the model's actual decision-making. The goal is to quantify fidelity, defined here as the extent to which an explanation reflects the evidence that truly drives model predictions, independent of human intuition, bias, or incomplete annotation. We propose a prediction confidence-based approach to estimate token importance in vision transformers and introduce the confidence-based fidelity score (CFS), which compares token importance signals with interpretability maps to measure how faithfully an explanation matches model-intrinsic reasoning. We evaluate the approach across three medical imaging datasets using vision transformers initialized with both supervised and self-supervised (SSL) pretraining. Across datasets, models, and explanation methods, the experiments reveal clear discrepancies between fidelity and human annotation alignment, indicating that explanations can appear plausible or overlap with annotated regions while still failing to reflect confidence-sensitive model evidence. The proposed token importance formulation and CFS provide a practical, model-aligned way to assess interpretability fidelity in transformer-based medical imaging, complementing annotation-based evaluation and supporting more reliable auditing and selection of explanation methods for clinical AI workflows.