Ultrasam: a foundation model for ultrasound using large open-access segmentation datasets.
Authors
Affiliations (5)
Affiliations (5)
- University of Strasbourg, CNRS, INSERM, ICube, UMR7357, Strasbourg, France. [email protected].
- IHU-Strasbourg, Institute of Image-Guided Surgery, Strasbourg, France. [email protected].
- University of Strasbourg, CNRS, INSERM, ICube, UMR7357, Strasbourg, France.
- IHU-Strasbourg, Institute of Image-Guided Surgery, Strasbourg, France.
- Hôpitaux Universitaires de Strasbourg, Strasbourg, France.
Abstract
Automated ultrasound (US) image analysis remains a longstanding challenge due to anatomical complexity and the scarcity of annotated data. Although large-scale pretraining has improved data efficiency in many visual domains, its impact in US is limited by a pronounced domain shift from other imaging modalities and high variability across clinical applications, such as chest, ovarian, and endoscopic imaging. To address this, we propose UltraSam, a SAM-style model trained on a heterogeneous collection of publicly available segmentation datasets, originally developed in isolation. UltraSam is trained under the prompt-conditioned segmentation paradigm, which eliminates the need for unified labels and enables generalization to a broad range of downstream tasks. We compile US-43d, a large-scale collection of 43 open-access US datasets comprising over 282,000 images with segmentation masks covering 58 anatomical structures. We explore adaptation and fine-tuning strategies for SAM and systematically evaluate transferability across downstream tasks, comparing against state-of-the-art pretraining methods. We further propose prompted classification, a new use case where object-specific prompts and image features are jointly decoded to improve classification performance. In experiments on three diverse public US datasets, UltraSam outperforms existing SAM variants on prompt-based segmentation and surpasses self-supervised US foundation models on downstream (prompted) classification and instance segmentation tasks. UltraSam demonstrates that SAM-style training on diverse, sparsely annotated US data enables effective generalization across tasks. By unlocking the value of fragmented public datasets, our approach lays the foundation for scalable, real-world US representation learning. We release our code and pretrained models at https://github.com/CAMMA-public/UltraSam and invite the community to further this effort by continuing to contribute high-quality datasets.