Efficient Slice-Patch Selection Transformer for Interpretable Alzheimer's Disease Diagnosis Using Structural MRI.
Authors
Abstract
Structural magnetic resonance imaging (sMRI) plays a crucial role in the early screening of Alzheimer's disease (AD). Recent advances in vision Transformers (ViTs) demonstrate strong potential for sMRI-based computer-aided diagnosis by capturing long-range dependencies. However, their high computational demands and opaque decision-making hinder clinical application. To address these challenges, we explore the inherent structural redundancy in brain sMRI and propose an efficient and interpretable slice-patch selection Transformer (SPSFormer) framework that selectively focuses on task-relevant sMRI slices and patches, significantly reducing the computational overhead of existing ViTs. Specifically, SPSFormer employs lightweight learnable scorers placed before and within the ViT recognizer to estimate the importance of slice and patch candidates. Subsequently, a perturbed-maximum based differentiable Top-k operator is constructed to select the top-scoring elements for end-to-end training. We conduct rigorous cross-dataset validation (NACC, ADNI, AIBL) to evaluate generalizability. Across DeiT and Swin recognizers, SPSFormer reduces required GFLOPs by approximately $2-4\times$ while maintaining diagnostic accuracy. Analysis of learned selection policies highlights key regions (e.g., hippocampus, parahippocampal gyrus, amygdala, thalamus) consistent with established AD neuropathology, supporting interpretability. The model's predicted AD probability shows significant associations with cognitive and biomarker measures, confirming neurobiological validity, and offers prognostic value: higher predicted AD probability is associated with shorter time to conversion from mild cognitive impairment to AD. These findings suggest that coupling high computational efficiency with intrinsic explainability offers a promising direction to clinically deployable, trustworthy artificial intelligence for AD detection.