Advancing breast cancer diagnosis through ultrasound imaging: A cross-attention multi-scale vision transformer approach with comparative analysis against ResNet architectures.
Authors
Affiliations (1)
Affiliations (1)
- Department of Future Technology, Korea University of Technology and Education, Cheonan, South Korea.
Abstract
In recent years, the field of medical imaging has witnessed substantial progress due to the integration of advanced machine learning techniques, particularly in the diagnosis of critical conditions such as breast cancer. This study aims to improve the predictive accuracy of breast cancer diagnosis using ultrasound images by employing the cross-attention multi-scale vision transformer (CrossViT). The proposed methodology involves a dual-branch architecture in which each branch processes image patches of different sizes, thereby capturing both fine-grained and coarse-grained features. The model incorporates a cross-attention mechanism that efficiently fuses these multi-scale features, enhancing its ability to discern complex patterns in medical images. A public ultrasound dataset was partitioned at the patient level using a stratified 80/10/10 train/validation/test split. The development data used for model optimization included 4074 benign and 4042 malignant training images, along with 500 benign and 400 malignant validation images after preprocessing and augmentation, and final model performance was assessed on a held-out test set. Hyperparameters were fine-tuned using a grid search strategy to optimize performance, and training was conducted with stochastic gradient descent and regularization techniques to support stable convergence. Results from this single-dataset experiment showed that CrossViT yielded higher observed performance metrics than the evaluated ResNet architectures across accuracy, precision, recall, F1-score, and AUC. However, these findings should be interpreted as exploratory comparative observations rather than statistically confirmed evidence of model superiority. In conclusion, CrossViT represents a promising technical approach in medical imaging and may have potential utility for automated breast cancer diagnosis in future clinically validated settings.