Cross-Scanner Reliability of Brain MRI Foundation Model Embeddings: A Travelling-Heads Study
Authors
Affiliations (1)
Affiliations (1)
- Laboratorio de Procesado de Imagen, Universidad de Valladolid
Abstract
Foundation models (FMs) for brain magnetic resonance imaging (MRI) are increasingly adopted as pretrained backbones for clinical tasks such as brain age prediction, disease classification, and anomaly detection. However, if FM embeddings (internal representations) shift systematically across MRI scanners, downstream analyses built on them may reflect acquisition hardware rather than biology. No study has yet quantified this cross-scanner reproducibility. Here, we assess the cross-scanner reliability of brain MRI FM embeddings and investigate which design factors (pretraining strategy, network architecture, embedding dimensionality, and pretraining dataset scale) best explain the observed differences. Using the ON-Harmony travelling-heads dataset (20 participants, eight scanners, three vendors), we evaluate the embeddings of five architecturally diverse FMs and a FreeSurfer morphometric baseline via within- and between-scanner intraclass correlation coefficient (ICC), variance decomposition, and scanner fingerprinting. Reliability spanned the full spectrum: biology-guided models achieved good-to-excellent cross-scanner ICC (AnatCL: 0.97 [95% confidence interval (CI): 0.94, 0.98]; y-Aware: 0.81 [0.63, 0.88]), matching or surpassing FreeSurfer (0.93 [0.83, 0.96]), whereas purely self-supervised models fell below the poor threshold (BrainIAC: 0.45, BrainSegFounder: 0.31, 3D-Neuro-SimCLR: 0.25), with 23-58% of embedding variance attributable to scanner identity. The strongest correlate of cross-scanner reliability among the models evaluated was pretraining strategy: incorporating biological metadata (cortical morphometrics, age) into the contrastive objective produced scanner-robust embeddings, whereas architecture, dimensionality, and dataset scale did not predict reliability.