Unet-like Transformer with variable shifted windows for low dose CT denoising.
Authors
Affiliations (4)
Affiliations (4)
- Guangzhou Institutes of Biomedicine and Health Chinese Academy of Sciences, 190 Kaiyuan Avenue, Science City, Guangzhou, Guangdong 510530, China, Guangzhou, 510530, CHINA.
- Institute of High Energy Physics, Chinese Academy of Sciences, 19B Yuquan Road, Shijingshan District, Beijing 100049, China, Beijing, 100864, CHINA.
- Dongguan University of Technology, 1 University Road, Songshan Lake, Dongguan, Guangdong 523808, China, Dongguan, 523808, CHINA.
- Spallation Neutron Source Science Center, Zhongziyuan Road, Dalang, Dongguan, Guangdong 523803, China, Dongguan, 523803, CHINA.
Abstract
Low-dose computed tomography (LDCT) is crucial for reducing radiation exposure in medical imaging, but it often yields noisy images with artifacts that compromise diagnostic accuracy. Recently, Transformer-based models have shown great potential for LDCT denoising by modeling long-range dependencies and global context. However, standard Transformers incur prohibitive computational costs when applied to high-resolution medical images. To address this challenge, we propose a novel pure Transformer architecture for LDCT image restoration, designed within a hierarchical U-Net framework. The core of our innovation is the integration of an agent attention mechanism into a variable shifted-window design. This agent attention module efficiently approximates global self-attention by using a small set of agent tokens to aggregate and broadcast global contextual information, thereby achieving a global receptive field with only linear computational complexity. By embedding this mechanism within a multi-scale U-Net structure, our model effectively captures both fine-grained local details and long-range structural dependencies without sacrificing computational efficiency. Comprehensive experiments on a public LDCT dataset demonstrate that our method achieves state-of-the-art performance, outperforming existing approaches in both quantitative metrics and qualitative visual comparisons.