Local global feature enhanced transformer with attention pruning for precise medical image segmentation.
Authors
Affiliations (3)
Affiliations (3)
- The Third Affiliated Hospital of Kunming Medical University, Yunnan Cancer Hospital, Peking University Cancer Hospital Yunnan, Kunming, China.
- Chongqing Technology and Business University, School of Artificial Intelligence, 19 Xuefu Avenue, Nan'an District, Chongqing, China.
- The First People's Hospital of Yunnan Province, 157 Jinbi Rd, Xishan District, Kunming, Yunnan, China.
Abstract
While Vision Transformer (ViT)-based methods excel at global modeling for medical image segmentation, optimizing them requires capturing fine-grained local details and mitigating attention redundancy. To address these issues, we propose a hybrid Transformer network, named Local Feature Enhanced AgentTopk UNet (LFEAT-UNet). LFEAT-UNet features three key designs: a Local Feature Enhancement (LFE) module to better represent detailed structure information, a Token Selection and Filtering (TSF) mechanism to dynamically prune background tokens, and a Cascaded Multi-scale Patch Perception (CMSPP) module to improve feature fusion. We conducted extensive experiments on Synapse, Automated Cardiac Diagnosis Challenge (ACDC), and a large-scale private adrenal nodule dataset. Our method achieves Dice scores of 83.12% on Synapse and 92.20% on ACDC. The superior results on the challenging private dataset further validate the model's strong robustness. The proposed LFEAT-UNet provides an effective, sparsity-aware solution for medical image segmentation, excelling in delineating complex clinical targets, and shows significant potential for clinical applications.