Back to all papers

A Dual-Mode ViT-Conditioned Diffusion Framework with an Adaptive Conditioning Bridge for Breast Cancer Segmentation

November 8, 2025arxiv logopreprint

Authors

Prateek Singh,Moumita Dholey,P. K. Vinod

Abstract

In breast ultrasound images, precise lesion segmentation is essential for early diagnosis; however, low contrast, speckle noise, and unclear boundaries make this difficult. Even though deep learning models have demonstrated potential, standard convolutional architectures frequently fall short in capturing enough global context, resulting in segmentations that are anatomically inconsistent. To overcome these drawbacks, we suggest a flexible, conditional Denoising Diffusion Model that combines an enhanced UNet-based generative decoder with a Vision Transformer (ViT) encoder for global feature extraction. We introduce three primary innovations: 1) an Adaptive Conditioning Bridge (ACB) for efficient, multi-scale fusion of semantic features; 2) a novel Topological Denoising Consistency (TDC) loss component that regularizes training by penalizing structural inconsistencies during denoising; and 3) a dual-head architecture that leverages the denoising objective as a powerful regularizer, enabling a lightweight auxiliary head to perform rapid and accurate inference on smaller datasets and a noise prediction head. Our framework establishes a new state-of-the-art on public breast ultrasound datasets, achieving Dice scores of 0.96 on BUSI, 0.90 on BrEaST and 0.97 on BUS-UCLM. Comprehensive ablation studies empirically validate that the model components are critical for achieving these results and for producing segmentations that are not only accurate but also anatomically plausible.

Topics

cs.CV

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.