Transformer-enhanced vertebrae segmentation and anatomical variation recognition from CT images.
Authors
Affiliations (6)
Affiliations (6)
- Department of Human Movement Sciences, Faculty of Physical Education and Health, Gan Nan Normal University, Ganzhou, 341000, China.
- First Department of Rehabilitation Medicine, Affiliated Hospital of Jiangxi University of Traditional Chinese Medicine, Nanchang, 330006, Jiangxi, China.
- Affiliated Rehabilitation Hospital, Jiang Xi Medical College, Nanchang University, Nanchang, 330003, Jiangxi, China.
- School of Physical Therapy, Faculty of Associated Medical Sciences, Khon Kaen University, Khon Kaen, 40002, Thailand.
- School of Electronic and Information Technology, Sun Yat-Sen University, Guangzhou, 510275, China.
- School of Rehabilitation Medicine, Gan Nan Medical University, Ganzhou, 341000, China. [email protected].
Abstract
Accurate segmentation and anatomical classification of vertebrae in spinal CT scans are crucial for clinical diagnosis, surgical planning, and disease monitoring. However, the task is complicated by anatomical variability, degenerative changes, and the presence of rare vertebral anomalies. In this study, we propose a hybrid framework that combines a high-resolution WNet segmentation backbone with a Vision Transformer (ViT)-based classification module to perform vertebral identification and anomaly detection. Our model incorporates an attention-based anatomical variation module and leverages patient-specific metadata (age, sex, vertebral distribution) to improve the accuracy and personalization of vertebrae typing. Extensive experiments on the VerSe 2019 and 2020 datasets demonstrate that our approach outperforms state-of-the-art baselines such as nnUNet and SwinUNet, especially in detecting transitional vertebrae (e.g., T13, L6) and modeling morphological diversity. The system maintains high robustness under slice skipping, noise perturbation, and scanner variations, while offering interpretability through attention heatmaps and case-specific alerts. Our findings suggest that integrating anatomical priors and demographic context into transformer-based pipelines is a promising direction for personalized, intelligent spinal image analysis.