A multi-center study of transformer-based CNNs for multiple sclerosis lesion segmentation on 3D FLAIR MRI.
Authors
Affiliations (13)
Affiliations (13)
- Faculty of Pharmacy, Middle East University, Amman, 11831, Jordan.
- Department of Medical Laboratory Technics, College of Health and Medical Technology, Alnoor University, Mosul, Iraq.
- Ahl al Bayt University, Kerbala, Iraq.
- Marwadi University Research Center, Department of Computer Engineering, Faculty of Engineering & Technology, Marwadi University, Gujarat, Rajkot, 360003, India.
- Department of Chemistry and Biochemistry, School of Sciences, JAIN (Deemed to be University), Karnataka, Bangalore, India.
- Centre for Research Impact & Outcome, Chitkara University Institute of Engineering and Technology, Chitkara University, 140401, Punjab, Rajpura, India.
- Centre for Research Impact & Outcome, Sathyabama Institute of Science and Technology, Chennai, Punjab, 140401, India.
- Department of Maxillofacial Surgery, Samarkand State Medical University, 18 Amir Temur Street, Samarkand, 140100, Uzbekistan.
- College of Nursing, National University of Science and Technology, Dhi Qar, Iraq.
- Pharmacy College, Pharmacy College, Al-Farahidi University, Baghdad, Iraq.
- Department of Pharmacy, Al-Zahrawi University College, Karbala, Iraq.
- Gilgamesh Ahliya University, Baghdad, Iraq.
- Department of Medical Physics and Radiology, Faculty of Paramedical Sciences, Kashan University of Medical Sciences, Kashan, Islamic Republic of Iran. [email protected].
Abstract
This study aimed to develop and evaluate a Transformer-CNN framework for automated segmentation of multiple sclerosis (MS) lesions on FLAIR MRI. The model was benchmarked against U-Net and DeepLabV3 and assessed for both segmentation accuracy and across-center performance under internal 5-fold cross-validation to ensure robustness across diverse clinical datasets. A dataset of 1,800 3D FLAIR MRI scans from five clinical centers was split using 5-fold cross-validation. Preprocessing included isotropic resampling, intensity normalization, and bias field correction. The Transformer-CNN combined CNN-based local feature extraction with Transformer-based global context modeling. Data augmentation strategies, including geometric transformations and noise injection, enhanced generalization. Performance was evaluated using Dice score, IoU, HD95, and pixel accuracy, along with internal cross-validation-based metrics such as Generalized Dice Similarity Coefficient (GDSC), Domain-wise IoU (DwIoU), Cross-Fold Dice Deviation (CFDD), and Volume Agreement (Intraclass Correlation Coefficient, ICC). Statistical significance was tested using Kruskal-Wallis and Dunn's post-hoc analyses to compare models. The Transformer-CNN achieved the best overall performance, with a Dice score of 92.3%, IoU of 91.4%, HD95 of 2.25 mm, and pixel accuracy of 95.6%. It also excelled in internal cross-validation-based across-center metrics, achieving the highest GDSC (91.3%) and DwIoU (89.2%), the lowest CFDD (1.05%), and the highest ICC (96.5%). DeepLabV3 and U-Net scored 85.1% and 83.0% in Dice, with HD95 values of 4.15 mm and 4.30 mm, respectively. The worst performance was observed in U-Net, which exhibited high variability across datasets and struggled with small lesion detection. The Transformer-CNN outperformed U-Net and DeepLabV3 in segmentation accuracy and across-center performance under internal 5-fold cross-validation. Its robustness, minimal variability, and ability to generalize across diverse datasets establish it as a practical and reliable tool for clinical MS lesion segmentation and monitoring.