Three dimensional segmentation of abdominal arteries and veins using vision transformers and domain adaptation.
Authors
Affiliations (3)
Affiliations (3)
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, People's Republic of China.
- Medical Image and Health Informatics Lab, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200030, People's Republic of China.
- Department of Electrical and Computer Engineering, University of Massachusetts Lowell, Lowell, MA 01854, United States of America.
Abstract
Accurate segmentation of abdominal three-dimensional (3D) vascular structures from computed tomography (CT) scans is crucial for clinical applications yet remains challenging due to dependency on large annotated datasets through effective pretraining and poor generalization via cross-domain feature alignment. In this paper, we proposes a novel transformer-based framework based on masked autoencoder (MAE) and UNEt TRansformers (UNETR), dubbed as Adaptive MAE-UNETR, that integrates self-supervised pretraining and adversarial domain adaptation to achieve robust artery/vein segmentation with enhanced generalization. Specifically, first, we develop a MAE pretraining paradigm with 3D CT scans as input, to learn hierarchical feature representations through self-reconstruction tasks from source domain data. This targeted design ensures high fidelity in feature extraction and improves segmentation accuracy and stability. Second, the pretrained encoder is transferred to a UNETR segmentation network augmented with domain adaptation technique, which adversarially aligns feature distributions between source and target domains via a domain discriminator. Third, we establish an segmentation framework that simultaneously optimizes segmentation accuracy and domain invariance. Comprehensive evaluations on three public datasets demonstrate state-of-the-art performance of our proposed method. On AMOS22 dataset, our model achieves DSC scores of 0.924 (aorta) and 0.892 (inferior vena cava, IVC). Cross-domain tests yield 0.917/0.888 on BTCV dataset and 0.931/0.906 on FLARE23 dataset, showing consistent superiority over conventional methods across diverse datasets.