Back to all papers

Hierarchical multi-scale vision transformer model for accurate detection and classification of brain tumors in MRI-based medical imaging.

October 31, 2025pubmed logopapers

Authors

Sankari C,Jamuna V,Kavitha AR

Affiliations (3)

  • Department of EEE, Chennai Institute of Technology, Chennai, India. [email protected].
  • Department of EEE, Jerusalem College of Engineering, Chennai, India.
  • Department of IT, Chennai Institute of Technology, Chennai, India.

Abstract

Automated brain tumor detection represents a fundamental challenge in contemporary medical imaging, demanding both precision and computational feasibility for practical implementation. This research introduces a novel Vision Transformer (ViT) framework that incorporates an innovative Hierarchical Multi-Scale Attention (HMSA) methodology for automated detection and classification of brain tumors across four distinct categories: glioma, meningioma, pituitary adenoma, and healthy brain tissue. Our methodology presents several key innovations: (1) multi-resolution patch embedding strategy enabling feature extraction across different spatial scales (8×8, 16×16, and 32×32 patches), (2) computationally optimized transformer architecture achieving 35% reduction in training duration compared to conventional ViT implementations, and (3) probabilistic calibration mechanism enhancing prediction confidence for decision-making applications. Experimental validation was conducted using a comprehensive MRI dataset comprising 7023 T1-weighted contrast-enhanced images sourced from the publicly accessible Brain Tumor MRI Dataset. Our approach achieved superior classification performance with 98.7% accuracy while demonstrating significant improvements over conventional machine learning methodologies (Random Forest: 91.2%, Support Vector Machine: 89.8%, XGBoost: 92.5%), state-of-the-art CNN architectures (EfficientNet-B0: 96.5%, ResNet-50: 95.8%), standard transformers (ViT: 96.8%, Swin Transformer: 97.2%), and hybrid CNN-Transformer approaches (TransBTS: 96.9%, Swin-UNet: 96.6%). The model demonstrates excellent performance with precision of 0.986, recall of 0.988, F1-score of 0.987, and superior calibration quality (Expected Calibration Error: 0.023). The proposed framework establishes a computationally efficient approach for accurate brain tumor classification.

Topics

Brain NeoplasmsMagnetic Resonance ImagingImage Interpretation, Computer-AssistedImage Processing, Computer-AssistedJournal Article

Ready to Sharpen Your Edge?

Subscribe to join 7,100+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.