A hybrid swin transformer-BiLSTM framework and ensemble learning for multimodal brain stroke detection and risk prediction.
Authors
Affiliations (6)
Affiliations (6)
- Department of Biomedical Engineering, Islamic University, Kushtia, 7003, Bangladesh; Bio-Imaging Research Laboratory, Islamic University, Kushtia, 7003, Bangladesh. Electronic address: [email protected].
- Department of Biomedical Engineering, Islamic University, Kushtia, 7003, Bangladesh; Bio-Imaging Research Laboratory, Islamic University, Kushtia, 7003, Bangladesh. Electronic address: [email protected].
- Department of Biomedical Engineering, Islamic University, Kushtia, 7003, Bangladesh; Bio-Imaging Research Laboratory, Islamic University, Kushtia, 7003, Bangladesh. Electronic address: [email protected].
- Department of Electrical and Electronic Engineering, Islamic University, Kushtia, 7003, Bangladesh. Electronic address: [email protected].
- Department of Electrical and Electronic Engineering, Islamic University, Kushtia, 7003, Bangladesh. Electronic address: [email protected].
- Department of Biomedical Engineering, Islamic University, Kushtia, 7003, Bangladesh; Bio-Imaging Research Laboratory, Islamic University, Kushtia, 7003, Bangladesh. Electronic address: [email protected].
Abstract
Stroke is one of the leading causes of mortality and long-term disability worldwide, primarily resulting from the sudden disruption of cerebral blood flow. Early and accurate diagnosis plays a crucial role in minimizing neurological damage and improving recovery outcomes. This study proposes a comprehensive multimodal framework integrating a hybrid Swin Transformer-Bidirectional Long Short-Term Memory (SwinT-BiLSTM) model and an ensemble learning-based classifier for automated stroke detection and risk prediction from medical image and tabular clinical data. This study utilizes two brain stroke Computed Tomography (CT) datasets, including a primary dataset named BrSCTHD-2025, collected from hospitals in Dhaka and Faridpur, Bangladesh, and a secondary Kaggle CT dataset. In addition, a primary clinical tabular dataset was collected from Kushtia Medical College Hospital for multimodal analysis. The proposed SwinT-BiLSTM model efficiently extracts global spatial and sequential dependencies from CT images, while the ensemble classifier predicts stroke risk based on clinical and lifestyle parameters. Experimental results demonstrate that the model achieves 98% accuracy with an AUC of 1.00 on the BrSCTHD-2025 dataset and 97% accuracy with an AUC of 0.99 on the secondary Kaggle dataset, outperforming standalone SwinT by 2.5% and Convolutional Neural Network (CNN) architectures such as VGG16 and ResNet50 by 3%-4%. The ensemble classifier trained on tabular data achieved 80.36% accuracy, identifying critical stroke risk factors such as heart disease, prolonged sitting duration, and cholesterol level. Furthermore, Explainable Artificial Intelligence (XAI) techniques such as LIME, SHAP, enhanced Grad-CAM, and attention maps enhance interpretability by identifying the most influential visual and clinical features. Overall, the proposed SwinT-BiLSTM-Ensemble framework establishes a robust foundation for accurate, interpretable, and clinically reliable stroke diagnosis and personalized risk assessment in real-world healthcare environments.