Back to all papers

A novel superpixel based Vision Transformer for improving interpretability in glaucoma screening.

March 27, 2026pubmed logopapers

Authors

Hern谩ndez J,Alay贸n S,Sigut JF,D铆az-Alem谩n T

Affiliations (3)

  • Department of Computer Science and Systems Engineering, University of La Laguna, 38200, San Crist贸bal de La Laguna, Santa Cruz de Tenerife, Spain. [email protected].
  • Department of Computer Science and Systems Engineering, University of La Laguna, 38200, San Crist贸bal de La Laguna, Santa Cruz de Tenerife, Spain.
  • Department of Ophthalmology, Canary Islands University Hospital, 38320, Santa Cruz de Tenerife, Santa Cruz de Tenerife, Spain.

Abstract

Interpretability remains one of the major challenges in the clinical adoption of deep learning models for medical image analysis. In ophthalmology, particularly for glaucoma screening, explainable artificial intelligence (XAI) methods are essential for ensuring trust and diagnostic transparency. This study introduces the Superpixel-based Vision Transformer (SpxViT), a model designed to enhance interpretability while maintaining competitive accuracy. SpxViT replaces the traditional fixed grid tokenization of Vision Transformers. (ViTs) with a superpixel-based approach that preserves semantic boundaries within the retinal image. Two variants, SpxViT_fix and SpxViT_var, were evaluated on public and private glaucoma datasets. Results demonstrate that SpxViT achieves comparable accuracy to ViT-B/16 (91.9% vs. 92.5%) while producing more clinically consistent attention maps focused on the optic disc and cup.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.