Domain adaptation of stable diffusion for ultrasound inpainting: a synthetic data approach for enhanced thyroid nodule segmentation.
Authors
Affiliations (2)
Affiliations (2)
- Institute of Biophysics and Informatics, 1st Faculty of Medicine, Charles University, Salmovska 1, Prague 2 120 00, Czech Republic. Electronic address: [email protected].
- Institute of Biophysics and Informatics, 1st Faculty of Medicine, Charles University, Salmovska 1, Prague 2 120 00, Czech Republic.
Abstract
To enhance the cross-domain generalization of thyroid-nodule segmentation models by augmenting limited ultrasound training data with synthetic images generated by a fine-tuned Stable Diffusion model. Three public thyroid ultrasound datasets with heterogeneous acquisition characteristics were used: TN3K (training + testing), TDID, and TUCC. The denoising UNet inside Stable Diffusion v1.4 was fine-tuned on 2303 TN3K nodules and then used to synthesize realistic thyroid nodules. Using the model's inpainting capability, same number of synthetic nodules were inserted into original ultrasound images. The combined data were then used to train ResUNet, DeepLabV3+ and MITUnet segmentation networks with identical hyper-parameters. Performance between the models trained on native data only and native + synthetic data was quantified with the Dice similarity coefficient (Dice score) and Intersection-over-Union (IoU). Across the in-domain TN3K test set (n = 614), performance gains were modest, with the best improvements reaching + 2.2 % in Dice score for DeepLabV3+. In contrast, substantial gains were observed on the external datasets. On the TDID dataset (n = 462), DeepLabV3+ improved from 38.2 % to 59.1 % Dice (+20.9 %), while MITUNet and ResUNet also gained up by 7.1 % and 6.9 % respectively. On the TUCC dataset (n = 192), DeepLabV3+ improved by 11.4 % in Dice, MITUNet by 6.9 %, and ResUNet by 3.1 %. All improvements-except for in-domain TN3K-were statistically significant (p < 0.01, paired t-test or Wilcoxon signed-rank test), confirming that synthetic images generated by Stable Diffusion enhance cross-domain segmentation robustness. Augmenting ultrasound dataset with synthetic images generated by a task-specific Stable Diffusion model substantially improves the robustness of thyroid nodule segmentation across datasets acquired with different devices, at different institutions, and by different operators.