Grad-CAM based deep learning analytics for image-level colon disease classification based on graph neural networks and vision transformers.
Authors
Affiliations (4)
Affiliations (4)
- Department of Gastrointestinal Surgery, The First Dongguan Affiliated Hospital, Guangdong Medical University, Dongguan, Guangdong, China.
- Department of Surgery, The First Dongguan Affiliated Hospital, Guangdong Medical University, Dongguan, Guangdong, China.
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China.
- Department of Mechanical Engineering, Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK, Canada.
Abstract
Accurate classification of colonoscopic images is essential for early detection and characterization of colorectal diseases. Recent advances in deep learning, particularly transformer-based architectures and graph neural networks (GNNs), provide alternative strategies for modeling global contextual information and relational structures in image representations. This study evaluates transformer-based and graph-based frameworks under a unified experimental protocol for endoscopic colon disease classification. Experiments were conducted on the Kvasir V2 dataset using two primary paradigms: (i) a Vision Transformer (ViT) with selective fine-tuning and learning-rate scheduling, and (ii) a CNN-GNN pipeline integrating image embeddings with graph construction strategies (cosine similarity, k-nearest neighbors, and epsilon-radius graphs) and multiple GNN architectures. Performance was evaluated using accuracy, precision, recall, and macro-F1 score, with Grad-CAM used for qualitative interpretability analysis. The selectively fine-tuned Vision Transformer achieved 94.6% accuracy with a macro-F1 score of 0.94. The best graph-based configuration (ViT embeddings with epsilon graph and GIN aggregation) achieved 92% accuracy and 0.92 macro-F1 score. Transformer-based contextual modeling provides strong discriminative capability for image-level colon disease classification, while graph-based relational modeling offers competitive performance when paired with high-quality embeddings.