A generalist foundation model and database for open-world medical image segmentation.
Authors
Affiliations (11)
Affiliations (11)
- State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China.
- State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University, Beijing, China.
- State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University, Beijing, China. [email protected].
- South China Hospital, Medical School, Shenzhen University, Shenzhen, China.
- Department of Radiology, Sun Yat-Sen Memorial Hospital and Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Medical Research Center, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou, China.
- Department of Urology, Peking University Third Hospital, Beijing, China.
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Gastrointestinal Cancer Center, Peking University Cancer Hospital and Institute, Beijing, China.
- Department of Materials Science and Engineering, Stanford University, Stanford, CA, USA.
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA.
- School of Science and Engineering (SSE), the Future Network of Intelligence Institute (FNii) and the Guangdong Provincial Key Laboratory of Future Networks of Intelligence, The Chinese University of Hong Kong, Shenzhen, China.
- State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China. [email protected].
Abstract
Vision foundation models have demonstrated vast potential in achieving generalist medical segmentation capability, providing a versatile, task-agnostic solution through a single model. However, current generalist models involve simple pre-training on various medical data containing irrelevant information, often resulting in the negative transfer phenomenon and degenerated performance. Furthermore, the practical applicability of foundation models across diverse open-world scenarios, especially in out-of-distribution (OOD) settings, has not been extensively evaluated. Here we construct a publicly accessible database, MedSegDB, based on a tree-structured hierarchy and annotated from 129 public medical segmentation repositories and 5 in-house datasets. We further propose a Generalist Medical Segmentation model (MedSegX), a vision foundation model trained with a model-agnostic Contextual Mixture of Adapter Experts (ConMoAE) for open-world segmentation. We conduct a comprehensive evaluation of MedSegX across a range of medical segmentation tasks. Experimental results indicate that MedSegX achieves state-of-the-art performance across various modalities and organ systems in in-distribution (ID) settings. In OOD and real-world clinical settings, MedSegX consistently maintains its performance in both zero-shot and data-efficient generalization, outperforming other foundation models.