Back to all papers

A hybrid CNN-Transformer network integrating multiscale spatially detailed features for medical image segmentation.

April 29, 2026pubmed logopapers

Authors

Li B,Zhou W,Li H

Affiliations (2)

  • School of Computer and Software Engineering, Xihua University, Chengdu, Sichuan, China.
  • Department of Cancer Center, The Second People's Hospital of Neijiang, Neijiang, Sichuan, China.

Abstract

The rapid advancement of deep learning has established Convolutional Neural Networks (CNNs) as mainstream for medical image segmentation, yet their limited receptive field hinders long-range dependency capture. While Transformers excel at modeling global features via self-attention, their high computational complexity burdens high-resolution image processing. To leverage the complementary strengths of both architectures and integrate local and global features under a lightweight framework for enhanced accuracy and efficiency, this work proposes a novel encoder based on parallel CNN and Swin Transformer. Its effective integration is the Semantics and Detail Infusion (SDI) module, which fuses multi-scale features and employs attention to prioritize critical details, enriching features for decoder resolution recovery. Evaluations were conducted on two publicly available datasets, namely the Synapse Multi-Organ Segmentation dataset and the Aortic Vessel Tree dataset. The proposed model achieved Dice coefficients of 84.19% and 87.91%, respectively, and corresponding Hausdorff Distances of 12.64 mm and 7.06 mm. These results represent significant improvements over the UNet benchmark, with Dice score gains of 7.34% and 5.02%, respectively. The results further underscore the model's robustness, efficiency, and clinical relevance in accurately delineating complex anatomical structures, particularly in abdominal segmentation tasks. By effectively fusing CNN and Transformer advantages, our approach meets high-performance standards for medical image segmentation while offering practical benefits for real-world clinical deployment in resource-constrained environments. The code is publicly available on https://github.com/Palpitate-v/HybridNet.

Topics

Neural Networks, ComputerImage Processing, Computer-AssistedDeep LearningJournal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.