Optimized AI-based Neural Decoding from BOLD fMRI Signal for Analyzing Visual and Semantic ROIs in the Human Visual System.
Authors
Affiliations (4)
Affiliations (4)
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Pizza L Da Vinci, 32, 20133 Milano, Milan, Lombardy, 20133, ITALY.
- Department of Neuroradiolog, San Raffaele Hospital, Via Olgettina 60, Milan, Lombardy, 20132, ITALY.
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, via Ponzio 34/5, 20133 Milano, Milan, Lombardy, 20133, ITALY.
- Department of Industrial and Information Engineering , Università di Pavia, via A. Ferrata 5, 27100 Pavia, Pavia, Lombardia, 27100, ITALY.
Abstract
AI-based neural decoding reconstructs visual perception by leveraging generative models to map brain activity measured through functional MRI (fMRI) into the observed visual stimulus. Traditionally, ridge linear models transform fMRI into a latent space, which is then decoded using variational autoencoders (VAE) or latent diffusion models (LDM). Owing to the complexity and noisiness of fMRI data, newer approaches split the reconstruction into two sequential stages, the first one providing a rough visual approximation using a VAE, the second one incorporating semantic information through the adoption of LDM guided by contrastive language-image pre-training (CLIP) embeddings. This work addressed some key scientific and technical gaps of the two-stage neural decoding by: 1) implementing a gated recurrent unit (GRU)-based architecture to establish a non-linear mapping between the fMRI signal and the VAE latent space, 2) optimizing the dimensionality of the VAE latent space, 3) systematically evaluating the contribution of the first reconstruction stage, and 4) analyzing the impact of different brain regions of interest (ROIs) on reconstruction quality. Experiments on the Natural Scenes Dataset, containing 73,000 unique natural images, along with fMRI of eight subjects, demonstrated that the proposed architecture maintained competitive performance while reducing the complexity of its first stage by 85%. The sensitivity analysis showcased that the first reconstruction stage is essential for preserving high structural similarity in the final reconstructions. Restricting analysis to semantic ROIs, while excluding early visual areas, diminished visual coherence, preserving semantics though. The inter-subject repeatability across ROIs was about 92 and 98% for visual and sematic metrics, respectively. This study represents a key step toward optimized neural decoding architectures leveraging non-linear models for stimulus prediction. Sensitivity analysis highlighted the interplay between the two reconstruction stages, while ROI-based analysis provided strong evidence that the two-stage AI model reflects the brain's hierarchical processing of visual information.