Back to all papers

A 3D self-configuring hybrid transformer with multi-task learning for 3D automated breast ultrasound segmentation.

October 31, 2025pubmed logopapers

Authors

Jeong H,Yoon C,Lim H,Won J,Kim K,Luo G,Xu M,Kim N,Kim C

Affiliations (6)

  • Graduate School of Artificial Intelligence (GSAI), Department of Electrical Engineering, Convergence IT Engineering, Mechanical Engineering, Medical Science and Engineering, and Medical Device Innovation Center, Pohang University of Science and Technology (POSTECH), Pohang, Republic of Korea.
  • Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea.
  • Department of Convergence Medicine, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Republic of Korea.
  • Faculty of Computing, Harbin Institute of Technology, Harbin, China.
  • Department of Biomedical Engineering, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, College of Medicine, University of Ulsan, Seoul, Republic of Korea.
  • Graduate School of Artificial Intelligence (GSAI), Department of Electrical Engineering, Convergence IT Engineering, Mechanical Engineering, Medical Science and Engineering, and Medical Device Innovation Center, Pohang University of Science and Technology (POSTECH), Pohang, Republic of Korea. Electronic address: [email protected].

Abstract

Tumor segmentation of 3D automated breast ultrasound (ABUS) images facilitates comprehensive analyses for breast cancer treatment. Although 3D ABUS automatic segmentation models have been explored to alleviate labor-intensive tasks for doctors, the models are still limited by variations in tumor size and shape, lesion-related artifacts, and/or small datasets. Transformer models offer increased capacity, but they are susceptible to overfitting with small data sets. Here, we demonstrate a 3D self-configuring hybrid transformer with multi-task learning for tumor segmentation of 3D ABUS images. First, a sparse-adaptive attention (SA2) block extracts comprehensive features, such as local and global contextual features with sequential refinement. The SA2 module downscales key and value feature maps with adaptive subsampling, reducing the computational load. Second, we introduce a novel twin mixed attention gate with a cross-channel attention (C2A) module. To capture long-range dependencies between channels at the bottom layer, the C2A module proposes a channel-prior attention mechanism that uses channel-specific attention maps derived from pooled context vectors. Additionally, a compressed multi-layer perceptron is employed to reduce output channels and improve memory efficiency. The output from the C2A module passes through a spatial attention mechanism, generating a mixed attention feature. Last, multi-task learning, including classification, segmentation, and consistency loss, improves the calibration and model robustness against lesion-related artifacts. With 200 datasets and 5-fold cross validation, our method outperforms conventional models on overall metrics, with an average Dice similarity coefficient of 59.80 %, a 95 % Hausdorff distance of 17.85, a Jaccard index of 49.36 %, a precision of 64.25 %, a recall of 62.41 %, a false positive rate of 0.046 %, and an average surface distance of 7.99.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.