Multi-scale and context-aware enhanced YOLOv8 for breast tumor detection in ultrasound images.
Authors
Affiliations (5)
Affiliations (5)
- Research Institute of General Surgery, Jinling Hospital, Affiliated Hospital of Medical School, Nanjing University, Nanjing, China.
- Department of General Surgery, the Second aaAffiliated Hospital of Fujian Medical University, Quanzhou, 362000, Fujian Province, China.
- Quanzhou Hospital of Traditional Chinese Medicine, Quanzhou, China.
- The Affiliated Jiangning Hospital of Nanjing Medical University, Nanjing, China. [email protected].
- Department of Thyroid and Breast Surgery, Jinjiang Municipal Hospital, Quanzhou, Fujian, China. [email protected].
Abstract
Early and accurate detection of breast tumors via ultrasound imaging is paramount for effective clinical intervention. While single-stage object detectors offer vital real-time processing capabilities, their efficacy in the medical domain is severely constrained by spatial information degradation during downsampling, insufficient multi-scale feature representation, and high susceptibility to false positives amidst complex anatomical backgrounds. To overcome these inherent limitations, we present an efficient, lightweight single-stage detection architecture designed for breast ultrasound analysis. Built upon a foundational YOLOv8n framework, our network integrates a triad of structural innovations: (1) introducing the ADown downsampling module that rigorously preserves fine-grained edge details and critical structural textures; (2) a specifically designed Multi-Scale Dilation-Wise Residual (C2f_DWR) module that dynamically calibrates receptive fields to capture highly variable tumor morphologies; and (3) incorporating a dual-branch Context-Aware Feature Module (CAFM) designed to actively suppress background glandular noise while isolating localized tumor features. Comprehensive evaluations on the BUSI benchmark demonstrate that our model achieves a precision of 80.6% and an [email protected] of 71.9%. Crucially, this robust diagnostic performance is attained with exceptional computational efficiency, requiring a mere 2.88 M parameters and 7.6 GFLOPs. By demonstrating a favorable accuracy-efficiency trade-off compared to recent state-of-the-art architectures, including YOLOv10, YOLO11, and YOLO12, our proposed network provides a viable and scalable solution for next-generation, real-time clinical Computer-Aided Diagnosis systems.