A Hybrid CNN-Transformer Deep Learning Model for Differentiating Benign and Malignant Breast Tumors Using Multi-View Ultrasound Images
Authors
Affiliations (1)
Affiliations (1)
- The Second Affiliated Hospital of Guangzhou University of Chinese Medicine
Abstract
Breast cancer is a leading malignancy threatening womens health globally, making early and accurate diagnosis crucial. Ultrasound is a key screening and diagnostic tool due to its non- invasive, real-time, and cost-effective nature. However, its diagnostic accuracy is highly dependent on operator experience, and conventional single-image analysis often fails to capture the comprehensive features of a lesion. This study introduces a computer-aided diagnosis (CAD) system that emulates a clinicians multi-view diagnostic process. We developed a novel hybrid deep learning model that integrates a Convolutional Neural Network (CNN) with a Transformer architecture. The model uses a pretrained EfficientNetV2 to extract spatial features from multiple, unordered ultrasound images of a single lesion. These features are then processed by a Transformer encoder, whose self-attention mechanism globally models and fuses their intrinsic correlations. A strict lesion-level data partitioning strategy ensured a rigorous evaluation. On an internal test set, our CNN-Transformer model achieved an accuracy of 0.93, a sensitivity of 0.92, a specificity of 0.94, and an Area Under the Curve (AUC) of 0.98. On an external test set, it demonstrated an accuracy of 0.93, a sensitivity of 0.94, a specificity of 0.91, and an AUC of 0.97. These results significantly outperform those of a baseline single-image model, which achieved accuracies of 0.88 and 0.89 and AUCs of 0.95 and 0.94 on the internal and external test sets, respectively. This study confirms that combining CNNs with Transformers yields a highly accurate and robust diagnostic system for breast ultrasound. By effectively fusing multi-view information, our model aligns with clinical logic and shows immense potential for improving diagnostic reliability.