Bilateral Information-Guided Diagnosis of Breast Masses in Mammography Using Vision Transformer.
Authors
Abstract
Early-stage breast cancer is often asymptomatic, highlighting the critical role of computer-aided diagnostic (CAD) systems in mammography screening. While radiologists often refer to bilateral symmetry to identify abnormalities, most existing CAD methods analyze unilateral views or require image registration, which limits their ability to model structural heterogeneity and often introduces distortion. To address this, we propose a registration-free, structure-aware diagnostic framework that integrates bilateral mammography with soft spatial prompting via Vision Transformers (ViT). By directly concatenating bilateral images and introducing a soft attention mask generated from a lightweight segmentation network, our approach enables end-to-end modeling of cross-breast structural differences without the need for region-of-interest extraction. Extensive evaluations on both public and clinical datasets demonstrate that our method consistently outperforms CNN and lightweight Transformer baselines, achieving up to 0.930 accuracy and 0.972 AUC. To our knowledge, this is the first framework to combine bilateral structural modeling and soft guidance in a unified, interpretable, and scalable ViT-based pipeline for breast cancer diagnosis.