VGDM: Vision-Guided Diffusion Model for Brain Tumor Detection and Segmentation
Authors
Abstract
Accurate detection and segmentation of brain tumors from magnetic resonance imaging (MRI) are essential for diagnosis, treatment planning, and clinical monitoring. While convolutional architectures such as U-Net have long been the backbone of medical image segmentation, their limited capacity to capture long-range dependencies constrains performance on complex tumor structures. Recent advances in diffusion models have demonstrated strong potential for generating high-fidelity medical images and refining segmentation boundaries. In this work, we propose VGDM: Vision-Guided Diffusion Model for Brain Tumor Detection and Segmentation framework, a transformer-driven diffusion framework for brain tumor detection and segmentation. By embedding a vision transformer at the core of the diffusion process, the model leverages global contextual reasoning together with iterative denoising to enhance both volumetric accuracy and boundary precision. The transformer backbone enables more effective modeling of spatial relationships across entire MRI volumes, while diffusion refinement mitigates voxel-level errors and recovers fine-grained tumor details. This hybrid design provides a pathway toward improved robustness and scalability in neuro-oncology, moving beyond conventional U-Net baselines. Experimental validation on MRI brain tumor datasets demonstrates consistent gains in Dice similarity and Hausdorff distance, underscoring the potential of transformer-guided diffusion models to advance the state of the art in tumor segmentation.