Vision Transformer Autoencoders for Unsupervised Representation Learning: Revealing Novel Genetic Associations through Learned Sparse Attention Patterns

August 21, 2025

preprint

DOI: 10.1101/2025.03.24.25324549

Authors

Islam, S. R.,He, W.,Xie, Z.,Zhi, D.

Affiliations (1)

McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston

Abstract

The discovery of genetic loci associated with brain architecture can provide deeper insights into neuroscience and potentially lead to improved personalized medicine outcomes. Previously, we designed the Unsupervised Deep learning-derived Imaging Phenotypes (UDIPs) approach to extract phenotypes from brain imaging using a convolutional (CNN) autoencoder, and conducted brain imaging GWAS on UK Biobank (UKBB). In this work, we design a vision transformer (ViT)-based autoencoder, leveraging its distinct inductive bias and its ability to capture unique patterns through its pairwise attention mechanism. The encoder generates contextual embeddings for input patches, from which we derive a 128-dimensional latent representation, interpreted as phenotypes, by applying average pooling. The GWAS on these 128 phenotypes discovered 10 loci previously unreported by CNN-based UDIP model, 3 of which had no previous associations with brain structure in the GWAS Catalog. Our interpretation results suggest that these novel associations stem from the ViTs capability to learn sparse attention patterns, enabling the capturing of non-local patterns such as left-right hemisphere symmetry within brain MRI data. Our results highlight the advantages of transformer-based architectures in feature extraction and representation learning for genetic discovery.

View Source Full Text PDF

Topics

genetic and genomic medicine

Vision Transformer Autoencoders for Unsupervised Representation Learning: Revealing Novel Genetic Associations through Learned Sparse Attention Patterns

Authors

Affiliations (1)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?