A token efficient vision framework using patch residual transformer for Alzheimer's disease diagnosis.
Authors
Affiliations (6)
Affiliations (6)
- School of Computer Engineering and Science, Shanghai University, Shanghai, China.
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China.
- Department of Biomedical Engineering, The Hong Kong Polytechnic University, Hong Kong, China.
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China. [email protected].
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, USA. [email protected].
- Mental Health Research Centre, The Hong Kong Polytechnic University, Hong Kong, China. [email protected].
Abstract
Alzheimer's disease (AD) is a progressive neurodegenerative disorder, and early diagnosis using structural magnetic resonance imaging (sMRI) is crucial for timely intervention. Recently, the Vision Transformer (ViT) has been applied to sMRI-based AD diagnosis, but it often struggles with information redundancy when handling high-dimensional data, leading to focus loss and missed critical features. To address this, we proposed a Patch Residual Transformer (PRT)-a token-efficient vision framework designed for AD diagnosis and the prediction of mild cognitive impairment (MCI) conversion (i.e., stable MCI vs. progressive MCI). The PRT model partitions sMRI into patches and incorporates a novel Patch Residual Block (PRB), which includes two key components: Top-K patch selection and linear stitch patch token fusion (LSPTF). Top-K component meticulously assesses and ranks patches to identify the most crucial ones for AD diagnosis and MCI conversion prediction. The LSPTF employs retention and linear stitch strategies that enable ViT to concentrate on the most critical regions, thereby avoiding focus loss, achieving patch diversification, optimizing model performance, and accelerating training. We validated our method on two large-scale datasets, Alzheimer's Disease Neuroimaging Initiative (ADNI) and Open Access Series of Imaging Studies-3 (OASIS-3). Our results demonstrate that PRT outperforms existing slice-, patch-, ROI-, and subject-level approaches in both AD diagnosis and MCI conversion prediction. Additionally, the model demonstrates strong generalization and robust salience map consistency across datasets.