GH-UNet: group-wise hybrid convolution-VIT for robust medical image segmentation.
Authors
Affiliations (8)
Affiliations (8)
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, China.
- Department of Radiology, Xiangya Hospital, Central South University, Changsha, China.
- Department of Radiology, The Second Xiangya Hospital, Central South University, Changsha, China.
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, China. [email protected].
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, China. [email protected].
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, China. [email protected].
- Department of Radiology, The Second Xiangya Hospital, Central South University, Changsha, China. [email protected].
- School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, China. [email protected].
Abstract
Medical image segmentation is vital for accurate diagnosis. While U-Net-based models are effective, they struggle to capture long-range dependencies in complex anatomy. We propose GH-UNet, a Group-wise Hybrid Convolution-ViT model within the U-Net framework, to address this limitation. GH-UNet integrates a hybrid convolution-Transformer encoder for both local detail and global context modeling, a Group-wise Dynamic Gating (GDG) module for adaptive feature weighting, and a cascaded decoder for multi-scale integration. Both the encoder and GDG are modular, enabling compatibility with various CNN or ViT backbones. Extensive experiments on five public and one private dataset show GH-UNet consistently achieves superior performance. On ISIC2016, it surpasses H2Former with 1.37% and 1.94% gains in DICE and IOU, respectively, using only 38% of the parameters and 49.61% of the FLOPs. The code is freely accessible via: https://github.com/xiachashuanghua/GH-UNet .