Back to all papers

Hierarchical multi-scale vision transformer model for accurate detection and classification of brain tumors in MRI-based medical imaging.

October 31, 2025pubmed logopapers

Authors

Sankari C,Jamuna V,Kavitha AR

Affiliations (3)

  • Department of EEE, Chennai Institute of Technology, Chennai, India. [email protected].
  • Department of EEE, Jerusalem College of Engineering, Chennai, India.
  • Department of IT, Chennai Institute of Technology, Chennai, India.

Abstract

Automated brain tumor detection represents a fundamental challenge in contemporary medical imaging, demanding both precision and computational feasibility for practical implementation. This research introduces a novel Vision Transformer (ViT) framework that incorporates an innovative Hierarchical Multi-Scale Attention (HMSA) methodology for automated detection and classification of brain tumors across four distinct categories: glioma, meningioma, pituitary adenoma, and healthy brain tissue. Our methodology presents several key innovations: (1) multi-resolution patch embedding strategy enabling feature extraction across different spatial scales (8×8, 16×16, and 32×32 patches), (2) computationally optimized transformer architecture achieving 35% reduction in training duration compared to conventional ViT implementations, and (3) probabilistic calibration mechanism enhancing prediction confidence for decision-making applications. Experimental validation was conducted using a comprehensive MRI dataset comprising 7023 T1-weighted contrast-enhanced images sourced from the publicly accessible Brain Tumor MRI Dataset. Our approach achieved superior classification performance with 98.7% accuracy while demonstrating significant improvements over conventional machine learning methodologies (Random Forest: 91.2%, Support Vector Machine: 89.8%, XGBoost: 92.5%), state-of-the-art CNN architectures (EfficientNet-B0: 96.5%, ResNet-50: 95.8%), standard transformers (ViT: 96.8%, Swin Transformer: 97.2%), and hybrid CNN-Transformer approaches (TransBTS: 96.9%, Swin-UNet: 96.6%). The model demonstrates excellent performance with precision of 0.986, recall of 0.988, F1-score of 0.987, and superior calibration quality (Expected Calibration Error: 0.023). The proposed framework establishes a computationally efficient approach for accurate brain tumor classification.

Topics

Brain NeoplasmsMagnetic Resonance ImagingImage Interpretation, Computer-AssistedImage Processing, Computer-AssistedJournal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.