PMSFINet: Progressive Multi-Scale Feature Interaction Network for Medical Image Segmentation.
Authors
Abstract
Recently, the Swin Transformer has demonstrated strong performance in dense prediction tasks such as image segmentation by employing a window-based multi-head self-attention mechanism, which effectively reduces computational complexity. However, it still encounters limitations in multi-scale feature fusion and boundary preservation, leading to suboptimal segmentation of complex or ambiguous structures commonly found in medical images. To address these challenges, we propose PMSFINet, a novel medical image segmentation network designed to enhance representation learning through progressive multi-scale feature interaction. The overall framework comprises three key components: (1) a Progressive Multi-Scale Feature Interactive (PMSFI) module that builds Dual-Scale Window Interactive Attention (DSWIA) blocks to enable efficient computation and cross-scale information exchange; (2) a Multi-Scale Super-Resolution Decoder (MSRD) that integrates super-resolution and spatial attention with a Local Similarity-Aware Sampler (LSAS) to refine structural details and enhance boundary clarity; and (3) a Cross-Attention Fusion (CAF) module that employs hybrid attention to dynamically fuse dual-branch features, improving feature complementarity and collaborative representation. Extensive experiments on the Synapse, ACDC, and ISIC2018 datasets yield Dice scores of 84.94%, 92.43%, and 90.79%, respectively, demonstrating the strong generalization and robustness of PMSFINet across diverse medical imaging tasks. Ablation studies further verify the individual effectiveness of each proposed component.