Adapt or specialize? A comprehensive evaluation of adapted SAM versus task-specific CNNs for fetal abdominal segmentation.
Authors
Affiliations (3)
Affiliations (3)
- Department of Information Engineering, Marche Polytechnic University, Ancona, Italy.
- Department of Information Engineering, Marche Polytechnic University, Ancona, Italy. Electronic address: [email protected].
- Department of Electronics, Information and Bioengineering Politecnico di Milano, Milano, Italy.
Abstract
The fetal abdomen is crucial in prenatal screening, offering key insights into fetal growth and congenital anomalies. However, segmenting internal abdominal structures in ultrasound (US) remains challenging due to anatomical variability, overlapping organs, and low contrast. While CNN-based models have shown strong performance in fetal head analysis, most existing methods focus on biometric measurements (e.g., head or abdominal circumference), leaving internal abdominal organ segmentation largely underexplored. Recently, foundation models like the Segment Anything Model (SAM) have emerged as flexible alternatives, enabling zero- or few-shot segmentation. Yet, their performance on fetal US remains poorly understood, and the need for adaptation is still an open question. We compare two segmentation strategies: (1) task-specific CNNs, including UNet, Attention UNet, nnUNet, DeepLabv3+, and their focal-loss variants; and (2) adapted SAM-based models. The latter includes zero-shot variants (SAMPoint, SAMBBox), pre-trained models (SAM Med2DBBox, MedSAMBBox), and adapted configurations, including lightweight fine-tuned models (SAM-LoRA, MedSAM-FrozenEncoder) and SAM Med2D variants with adapted layers. Experiments are conducted on a curated dataset of fetal abdominal US images with manual segmentations of the liver, stomach, artery, and umbilical vein. Performance is evaluated using Dice Similarity Coefficient, Intersection over Union, and precision. Statistical significance is assessed via pairwise Friedman chi-square tests. Zero-shot SAM variants performed poorly, particularly on small or low-contrast structures. In contrast, adapted SAM models consistently outperformed CNNs, reaching DSC scores up to 0.90 (liver) and 0.80 (artery). Prompt-based interaction enables semi-automated, human-in-the-loop workflows, supporting clinical applicability.