ResTransUNet: a dual-encoder hybrid network for automated liver segmentation in CT scans.
Authors
Affiliations (1)
Affiliations (1)
- The School of Medicine, Medical Sciences and Nutrition, University of Aberdeen, Scotland, AB24 3FX, UK. [email protected].
Abstract
Precise delineation of the liver is essential for effective diagnosis and treatment planning in hepatocellular carcinoma, yet current clinical practices rely heavily on manual annotation, which is both inefficient and susceptible to human error. In this work, we propose a fully automated segmentation framework, termed ResTransUNet, that leverages the complementary strengths of convolutional operations and self-attention mechanisms. The architecture introduces a bifurcated encoder design wherein spatially local features are captured through convolutional layers, while long-range dependencies are simultaneously modeled using a Transformer-based path. To bridge these parallel streams, we incorporate a feature refinement module that infuses the globally aware Transformer representations into the CNN-derived features, thereby enhancing contextual comprehension without incurring excessive computational cost. This hybrid strategy addresses the limitations commonly observed in conventional U-Net structures-particularly the inability to capture global semantics-and avoids the high complexity often associated with pure Transformer models. Extensive experimentation on the LiTS2017 benchmark reveals that the proposed model achieves superior segmentation accuracy, reporting a Dice score of 0.9535, a volumetric overlap error of 0.0804, and a relative volume difference of -0.0007. Additional validation across diverse public datasets, including 3Dircadb, CHAOS, and Sliver07, demonstrates the model's consistent generalization and robustness, particularly in challenging scenarios involving small, fragmented, or poorly contrasted liver regions.