Graph-Enhanced Visual Prompting for Pre-Trained Models Adaptation in Medical Imaging Classification.
Authors
Abstract
Adapting Vision Transformers (ViTs) for medical imaging is constrained by the scarcity of data and high-quality annotations, hindering effective training and robust generalization. Visual prompt learning offers a parameterefficient solution for domain adaptation, but its success depends on accurate and task-relevant semantic guidance- a resource rarely available in real-world clinical practice despite its proven benefits. This motivates the need for mechanisms that can automatically extract reliable semantic cues from existing clinical data. To this end, we propose Graph-Enhanced Visual Prompting (GEVP), the first framework to incorporate cross-modal graph learning into prompt generation for medical imaging. GEVP models image patches and report tokens as graph nodes, captures their spatial and semantic relations via a graph neural network, and produces semantically rich prompts. These prompts are injected into a frozen ViT backbone, guiding attention to diagnostically relevant regions without heavy fine-tuning. A consistent downstream prediction mechanism leverages the pretrained prompt generator to handle both report-available and report-absent settings. Experiments on six public downstream datasets show GEVP surpasses strong prompt- and adapter-based baselines by up to +9.65% F1 on imbalanced tasks and delivers superior unseen disease classification.