Hierarchical multi-scale vision transformer model for accurate detection and classification of brain tumors in MRI-based medical imaging.
Authors
Affiliations (3)
Affiliations (3)
- Department of EEE, Chennai Institute of Technology, Chennai, India. [email protected].
- Department of EEE, Jerusalem College of Engineering, Chennai, India.
- Department of IT, Chennai Institute of Technology, Chennai, India.
Abstract
Automated brain tumor detection represents a fundamental challenge in contemporary medical imaging, demanding both precision and computational feasibility for practical implementation. This research introduces a novel Vision Transformer (ViT) framework that incorporates an innovative Hierarchical Multi-Scale Attention (HMSA) methodology for automated detection and classification of brain tumors across four distinct categories: glioma, meningioma, pituitary adenoma, and healthy brain tissue. Our methodology presents several key innovations: (1) multi-resolution patch embedding strategy enabling feature extraction across different spatial scales (8×8, 16×16, and 32×32 patches), (2) computationally optimized transformer architecture achieving 35% reduction in training duration compared to conventional ViT implementations, and (3) probabilistic calibration mechanism enhancing prediction confidence for decision-making applications. Experimental validation was conducted using a comprehensive MRI dataset comprising 7023 T1-weighted contrast-enhanced images sourced from the publicly accessible Brain Tumor MRI Dataset. Our approach achieved superior classification performance with 98.7% accuracy while demonstrating significant improvements over conventional machine learning methodologies (Random Forest: 91.2%, Support Vector Machine: 89.8%, XGBoost: 92.5%), state-of-the-art CNN architectures (EfficientNet-B0: 96.5%, ResNet-50: 95.8%), standard transformers (ViT: 96.8%, Swin Transformer: 97.2%), and hybrid CNN-Transformer approaches (TransBTS: 96.9%, Swin-UNet: 96.6%). The model demonstrates excellent performance with precision of 0.986, recall of 0.988, F1-score of 0.987, and superior calibration quality (Expected Calibration Error: 0.023). The proposed framework establishes a computationally efficient approach for accurate brain tumor classification.