Hybrid-MedNet: a hybrid CNN-transformer network with multi-dimensional feature fusion for medical image segmentation.
Authors
Affiliations (1)
Affiliations (1)
- School of Computer Science and Engineering, Central South University, Central South University, Changsha, Changsha, Hunan, 410083, CHINA.
Abstract
Twin-to-Twin Transfusion Syndrome (TTTS) is a complex prenatal condition in which monochorionic twins experience an imbalance in blood flow due to abnormal vascular connections in the shared placenta. Fetoscopic Laser Photocoagulation (FLP) is the first-line treatment for TTTS, aimed at coagulating these abnormal connections. However, the procedure is complicated by a limited field of view, occlusions, poor-quality endoscopic images, and distortions caused by artifacts. To optimize the visualization of placental vessels during surgical procedures, we propose Hybrid-MedNet, a novel hybrid CNN-transformer network that incorporates multi-dimensional deep feature learning techniques. The network introduces a BiPath Tokenization module that enhances vessel boundary detection by capturing both channel dependencies and spatial features through parallel attention mechanisms. A Context-Aware Transformer block addresses the weak inductive bias problem in traditional transformers while preserving spatial relationships crucial for accurate vessel identification in distorted fetoscopic images. Furthermore, we develop a Multi-Scale Trifusion Module that integrates multi-dimensional features to capture rich vascular representations from the encoder and facilitate precise vessel information transfer to the decoder for improved segmentation accuracy. Experimental results show that our approach achieves a Dice score of 95.40% on fetoscopic images, outperforming 10 state-of-the-art segmentation methods. The consistent superior performance across four segmentation tasks and ten distinct datasets confirms the robustness and effectiveness of our method for diverse and complex medical imaging applications.