Semi-SwinUNeTR: Towards 3D Swin Vision Transformer-Based UNet for Medical Image Segmentation with Limited Annotations.

June 17, 2026

papers

DOI: 10.3390/bioengineering13060695 PMID: 42351939

Authors

Tian Y,Wang Z,Guo L

Affiliations (5)

School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China.
Engineering Research Center of Blockchain and Network Convergence Technology, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing 100876, China.
National Engineering Research Center for Mobile Internet Security Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China.
Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing 100876, China.
School of Computer Science and Digital Technologies, Aston University, Birmingham B4 7ET, UK.

Abstract

Accurate brain tumor segmentation from magnetic resonance imaging (MRI) is essential for computer-assisted diagnosis, treatment planning, and disease monitoring. However, brain tumors usually exhibit irregular, heterogeneous, and multi-scale spatial patterns with complex and ambiguous boundaries. At the same time, the performance of deep segmentation models is often constrained by the limited availability of voxel-level annotations, which are expensive and time-consuming to obtain. To address these challenges, this paper proposes Semi-SwinUNeTR, a semi-supervised framework for 3D brain tumor segmentation with limited annotated data. The proposed method adopts SwinUNeTR as the segmentation backbone, enabling hierarchical volumetric representation learning through shifted-window self-attention while preserving the encoder-decoder structure required for dense prediction. On top of this backbone, we introduce a dual-consistency semi-supervised learning strategy, consisting of mean teacher-based model consistency and interpolation consistency-based data consistency. In addition, voxel-wise consistency weights are used to redistribute semi-supervised supervision toward structurally complex and boundary-irregular tumor regions without changing the SwinUNeTR backbone. Experiments on the BraTS 2019 benchmark demonstrate that the proposed framework achieves strong performance across different annotation ratios. The original Semi-SwinUNeTR achieves Dice scores of 84.93%, 86.25%, 87.05%, and 87.83% under the 10%, 20%, 40%, and 80% labeled-data settings, respectively. With the weighted consistency extension, the Dice scores are further improved to 85.64%, 87.94%, and 88.59% under the 10%, 20%, and 80% labeled-data settings, respectively, while the corresponding HD<sub>95</sub> values are reduced to 8.9826, 8.1854, and 7.4533. These results indicate that combining a SwinUNeTR backbone with complementary model consistency, data consistency, and voxel-wise consistency weighting is an effective strategy for semi-supervised volumetric medical image segmentation under limited annotation.

View Source Full Text PDF

Topics

Journal Article

Semi-SwinUNeTR: Towards 3D Swin Vision Transformer-Based UNet for Medical Image Segmentation with Limited Annotations.

Authors

Affiliations (5)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?