Back to all papers

Cross-Scanner Reliability of Brain MRI Foundation Model Embeddings: A Travelling-Heads Study

March 25, 2026medrxiv logopreprint

Authors

Navarro-Gonzalez, R.,Aja-Fernandez, S.,Planchuelo-Gomez, A.,de Luis-Garcia, R.

Affiliations (1)

  • Laboratorio de Procesado de Imagen, Universidad de Valladolid

Abstract

Foundation models (FMs) for brain magnetic resonance imaging (MRI) are increasingly adopted as pretrained backbones for clinical tasks such as brain age prediction, disease classification, and anomaly detection. However, if FM embeddings (internal representations) shift systematically across MRI scanners, downstream analyses built on them may reflect acquisition hardware rather than biology. No study has yet quantified this cross-scanner reproducibility. Here, we assess the cross-scanner reliability of brain MRI FM embeddings and investigate which design factors (pretraining strategy, network architecture, embedding dimensionality, and pretraining dataset scale) best explain the observed differences. Using the ON-Harmony travelling-heads dataset (20 participants, eight scanners, three vendors), we evaluate the embeddings of five architecturally diverse FMs and a FreeSurfer morphometric baseline via within- and between-scanner intraclass correlation coefficient (ICC), variance decomposition, and scanner fingerprinting. Reliability spanned the full spectrum: biology-guided models achieved good-to-excellent cross-scanner ICC (AnatCL: 0.97 [95% confidence interval (CI): 0.94, 0.98]; y-Aware: 0.81 [0.63, 0.88]), matching or surpassing FreeSurfer (0.93 [0.83, 0.96]), whereas purely self-supervised models fell below the poor threshold (BrainIAC: 0.45, BrainSegFounder: 0.31, 3D-Neuro-SimCLR: 0.25), with 23-58% of embedding variance attributable to scanner identity. The strongest correlate of cross-scanner reliability among the models evaluated was pretraining strategy: incorporating biological metadata (cortical morphometrics, age) into the contrastive objective produced scanner-robust embeddings, whereas architecture, dimensionality, and dataset scale did not predict reliability.

Topics

radiology and imaging

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.