An Explainable Hybrid CNN-Transformer Framework with Aquila Optimization for MRI-Based Brain Tumor
Authors
Affiliations (1)
Affiliations (1)
- The University of Arizona
Abstract
Accurate and interpretable brain tumor classification remains a critical challenge due to the heterogeneity of tumor types and the complexity of MRI data. This paper presents a hybrid deep learning framework that synergizes Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) for multi-class brain tumor diagnosis. The model leverages CNNs for localized spatial feature extraction and ViTs for capturing long-range contextual information, followed by an attention-guided fusion mechanism. To enhance generalization and reduce feature redundancy, an Improved Aquila Optimizer (AQO) is employed for metaheuristic feature selection. The model is trained and evaluated on the Kaggle brain MRI dataset, comprising 3,264 T1-weighted contrast-enhanced axial slices categorized into four classes: glioma, meningioma, pituitary tumor, and no tumor. To ensure interpretability, SHAP and Grad-CAM are integrated to visualize both semantic and spatial relevance in predictions. The proposed method achieves a classification accuracy of 97.2%, F1-score of 0.96, and AUC-ROC of 0.98, outperforming baseline CNN and ViT models.