Back to all papers

Comparison of breast ultrasound image classification accuracy between convolutional neural networks and human experts using multicenter external validation cohort data.

March 26, 2026pubmed logopapers

Authors

Yamakawa M,Shiina T,Ito T,Akashi ST,Murakami K,Watanabe T,Morishima H,Tsugawa K,Uematsu T,Nishida N,Kudo M

Affiliations (9)

  • SIT Research Laboratories, Shibaura Institute of Technology, 3-7-5, Toyosu, Koto-ku, Tokyo, 135-8548, Japan. [email protected].
  • SIT Research Laboratories, Shibaura Institute of Technology, 3-7-5, Toyosu, Koto-ku, Tokyo, 135-8548, Japan.
  • Faculty of Medicine, Kindai University, Osaka, Japan.
  • Department of Breast Surgery, School of Medicine, Tokyo Women's Medical University, Tokyo, Japan.
  • Department of Radiology, Showa Medical University, Tokyo, Japan.
  • Department of Breast Surgery, National Hospital Organization Sendai Medical Center, Sendai, Miyagi, Japan.
  • Department of Breast Endocrine Surgery, Rinku General Medical Center, Osaka, Japan.
  • Department of Breast and Endocrine Surgery, St. Marianna University School of Medicine, Kawasaki, Kanagawa, Japan.
  • Department of Breast Imaging and Breast Interventional Radiology, Shizuoka Cancer Center Hospital, Shizuoka, Japan.

Abstract

In recent years, much research has been conducted on ultrasound diagnosis of breast tumors using convolutional neural networks (CNNs). While many CNNs for breast tumor classification have been investigated, previous studies have evaluated them using data from the same institution that provided the CNN training data. This may have biased the accuracy of the CNNs. To perform a fairer evaluation, we compared the accuracy of CNNs with that of human experts using a multicenter external validation cohort. Additionally, previous studies used fewer than about 2000 images to train CNNs, whereas this study used 16,530 images. We trained a 2-class (benign, malignant) classification CNN and a 4-class (breast cancer, fibroadenoma, simple cyst, and other benign tumors) classification CNN using 16,530 images. Using data from a multicenter external validation cohort, we compared the classification accuracy of the developed CNNs to that of human experts. The 2-class classification CNN achieved an accuracy of 88.1%. The benign/malignant classification from the 4-class classification CNN achieved an accuracy of 86.3%. Human experts achieved an accuracy of 83.2%. Thus, the 2-class classification CNN was slightly more accurate than the benign/malignant classification from the 4-class classification CNN. Both developed CNNs were more accurate than human experts. The CNNs developed using a large-scale breast ultrasound image database achieved higher accuracy than that of human experts in evaluation using multicenter external validation cohort data.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.