Vision foundation model for 3D magnetic resonance imaging segmentation, classification, and registration.

February 12, 2026

papers

DOI: 10.1016/j.media.2026.103992 PMID: 41702178

Authors

Wang S,Safari M,Li Q,Chang CW,Lj Qiu R,Roper J,Yu DS,Yang X

Affiliations (8)

Department of Radiation Oncology, Emory University School of Medicine, Atlanta, 30322, GA, USA. Electronic address: [email protected].
Department of Radiation Oncology, Emory University School of Medicine, Atlanta, 30322, GA, USA. Electronic address: [email protected].
Department of Radiation Oncology, Emory University School of Medicine, Atlanta, 30322, GA, USA. Electronic address: [email protected].
Department of Radiation Oncology, Emory University School of Medicine, Atlanta, 30322, GA, USA. Electronic address: [email protected].
Department of Radiation Oncology, Emory University School of Medicine, Atlanta, 30322, GA, USA. Electronic address: [email protected].
Department of Radiation Oncology, Emory University School of Medicine, Atlanta, 30322, GA, USA. Electronic address: [email protected].
Department of Radiation Oncology, Emory University School of Medicine, Atlanta, 30322, GA, USA. Electronic address: [email protected].
Department of Radiation Oncology, Emory University School of Medicine, Atlanta, 30322, GA, USA. Electronic address: [email protected].

Abstract

Vision foundation models (VFMs) are pre-trained on extensive image datasets to learn general representations. These models can subsequently be fine-tuned for specific downstream tasks, markedly boosting performance across a broad range of applications. However, existing vision foundation models that claim to be applicable to various downstream tasks are mostly pre-trained on imaging modalities with different characteristics than magnetic resonance imaging (MRI), those differences in imaging principles, signal characteristics, and data distribution may hinder their practical performance and versatility in MRI-specific applications. Here, we propose Triad, a vision foundation model for 3D MRI segmentation, classification, and registration. Triad learns robust representations from 129K 3D MRI volumes based on SimMIM framework and uses textual descriptions related to modality, device parameters, and imaging parameters to constrain the semantic distribution of the visual modality. The above pre-training dataset is called Triad-129K, which is currently the largest 3D MRI pre-training dataset. We evaluate Triad across three tasks, namely, organ/tumor segmentation, organ/cancer classification, and medical image registration, in two data modalities (within-domain and out-of-domain) settings using 25 downstream datasets. By initializing models with Triad's pre-trained weights, nnUNet-Triad-SimMIM improves segmentation performance by 2.13% compared to nnUNet-Scratch across 17 datasets. Swin-B-Triad-SimMIM achieves a 4.38% improvement over Swin-B-Scratch in classification tasks across five datasets. SwinUNETR-Triad-SimMIM improves by 3.84% compared to SwinUNETR-Scratch in registration tasks across two datasets. Our study demonstrates that pre-training can improve performance when the data modalities and organs of upstream and downstream tasks are consistent. This work highlights the value of large-scale pre-training techniques for downstream tasks in 3D MRI.

View Source Full Text PDF

Topics

Journal Article

Vision foundation model for 3D magnetic resonance imaging segmentation, classification, and registration.

Authors

Affiliations (8)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?