U-CBAMNet: an attention-guided deep learning model for accurate and explainable prediction of HER2 expression from breast ultrasound cine videos.
Authors
Affiliations (5)
Affiliations (5)
- Department of Medical Ultrasound, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi Zhuang Autonomous Region, 530021, China.
- Department of Ultrasound, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, Shandong, 250021, China.
- School of Control Science and Engineering, Shandong University, Jinan, Shandong, 250061, China.
- Department of Ultrasound, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, Shandong, 250021, China. [email protected].
- Department of Medical Ultrasound, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi Zhuang Autonomous Region, 530021, China. [email protected].
Abstract
Accurate assessment of human epidermal growth factor receptor 2 (HER2) expression is essential for guiding targeted therapy in breast cancer. Conventional immunohistochemistry and fluorescence in situ hybridization remain the diagnostic standard but are invasive, costly, and limited by sampling bias. To develop and internally evaluate an explainable deep learning model based on an improved Convolutional Block Attention Module (CBAM) integrated with EfficientNet-B3 (termed U-CBAMNet) for non-invasive prediction of HER2 expression from breast ultrasound cine videos. A retrospective cohort of 149 patients with pathologically confirmed HER2 status was used. Ultrasound cine videos were divided by patient ID into training (70%) and test (30%) sets. For each lesion, dynamic cine sequences were processed frame-wise using U-CBAMNet, and frame-level features were aggregated via temporal average pooling to obtain video-level predictions. The proposed model incorporated a refined CBAM with adaptive weighted pooling and spatial attention to emphasize diagnostically informative regions. Performance was compared against ResNet50, DenseNet121, Swin-Transformer, and baseline EfficientNet-B3 using accuracy, precision, recall, F1-score, and AUC. Model interpretability was evaluated through Grad-CAM-based heatmaps computed on representative video frames. U-CBAMNet achieved an accuracy of 87.32%, precision of 88.91%, recall of 87.64%, F1-score of 88.12%, and a macro-average AUC of 0.88, outperforming all comparator models. Ablation analysis confirmed the complementary contributions of channel and spatial attention mechanisms. Visual attention maps highlighted lesion-centric regions consistent with radiologist-identified areas, demonstrating strong biological plausibility. The proposed U-CBAMNet model enables accurate and interpretable non-invasive prediction of HER2 expression directly from routine cine ultrasound imaging. This approach may serve as a cost-effective adjunct to molecular testing, facilitating preoperative risk stratification and personalized treatment planning in breast cancer management.