Exploration and Performance Analysis of Deep Learning Applications in Spermatic Vein Ultrasound Segmentation.
Authors
Affiliations (3)
Affiliations (3)
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 516, Jungong Road, Yangpu District, Shanghai, , 200093, China.
- Department of Ultrasound, Shanghai Jiao Tong University School of Medicine, 227 South Chongqing Road, Shanghai, 200025, China.
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 516, Jungong Road, Yangpu District, Shanghai, 200093, China.
Abstract
Varicocele is a common cause of male infertility, with ultrasound (US) serving as the primary diagnostic tool. Current practice relies on manual, subjective measurements of the spermatic vein, which are time-consuming and lack reproducibility. Developing automated tools is hindered by scarce annotated data and intrinsic US challenges like low contrast and high noise. 
Obejectives: This study aimed to: (1) develop and validate an efficient semi-automated annotation workflow; (2) establish the first performance benchmark for automated spermatic vein segmentation using deep learning; (3) critically evaluate the efficacy of state-of-the-art and customised segmentation models for this specific task. 
Methods: We proposed a semi-automated pipeline using the Segment Anything Model (SAM) with clinician refinement. Using the resulting dataset, we conducted a comprehensive benchmark, evaluating a baseline U-Net, advanced models (U-Net++, Attention U-Net, and RPA-UNet), and a proposed U-Net with deep supervision (UNet-DS). All models were assessed via leave-one-patient-out cross-validation and statistical tests. 
Results: The 'SAM+clinician' workflow showed excellent agreement with expert annotation (Dice Similarity Coefficient(DSC) = 92.66%; Kappa = 91.92%). In segmentation, the baseline U-Net achieved a mean DSC of 61.33%. Only Attention U-Net showed a statistically significant improvement (p = 0.0391). UNet-DS attained the mean DSC (64.65%) but this was not statistically significant (p = 0.0781). All models plateaued in a narrow range (DSC: 61%-65%), far below performance in mature US segmentation domains. 
Conclusion: This work validates an efficient semi-automated annotation solution and establishes the first performance benchmark for this task. Results reveal a distinct performance ceiling, indicating the primary barrier is the inherent data limitations, not model architecture. Future breakthroughs require a shift towards bespoke, physics-informed algorithms rather than applying generic deep learning models.