Anatomy-guided visual prompt tuning for cross-modal breast cancer understanding.
Authors
Affiliations (8)
Affiliations (8)
- Key Laboratory of Breast Cancer Prevention and Therapy (Tianjin Medical University, Ministry of Education), The Third Department of Breast Cancer, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Tianjin, China.
- Key Laboratory of Cancer Prevention and Therapy, Tianjin's Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Tianjin, China.
- Key Laboratory of Breast Cancer Prevention and Therapy (Tianjin Medical University, Ministry of Education), Department of Anesthesiology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Tianjin, China.
- Department of Breast Oncology, Tianjin Cancer Hospital Airport Hospital, Tianjin, China.
- Baiyunshan Pharmaceutical General Factory/Guangdong Province Key Laboratory for Core Technology of Chemical Raw Materials and Pharmaceutical Formulations, Guangzhou Baiyunshan Pharmaceutical Holding Co., Ltd, Guangzhou, China. [email protected].
- The Genetics Laboratory, Longgang District Maternity & Child Healthcare Hospital of Shenzhen City (Longgang Maternity and Child Institute of Shantou University Medical College), Shantou, Guangdong, China. [email protected].
- Key Laboratory of Breast Cancer Prevention and Therapy (Tianjin Medical University, Ministry of Education), The Third Department of Breast Cancer, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Tianjin, China. [email protected].
- Key Laboratory of Cancer Prevention and Therapy, Tianjin's Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Tianjin, China. [email protected].
Abstract
Early and reliable detection of breast cancer across imaging modalities remains a long-standing challenge due to the heterogeneous appearance of lesions and the lack of cross-domain consistency among medical imaging systems. Recent advances in Vision Transformers (ViTs) and parameter-efficient fine-tuning (PEFT) techniques have enabled rapid model adaptation, yet most existing approaches remain data-driven and fail to incorporate domain-specific anatomical priors. In this work, we propose A-VPT (Anatomy-Guided Visual Prompt Tuning), a novel framework that integrates explicit anatomical structure into the prompt space of a frozen ViT backbone. Unlike conventional prompt tuning methods, A-VPT dynamically generates tissue-aware prompts guided by glandular, fatty, and ductal region embeddings, and performs hierarchical prompt-token interaction across transformer layers. Furthermore, a cross-modal contrastive alignment strategy harmonizes anatomical semantics among mammography, ultrasound, and MRI, enabling robust multi-domain generalization. Extensive experiments on three benchmark datasets (INbreast, BUSI, and Duke-Breast-MRI) demonstrate that A-VPT achieves state-of-the-art performance in both lesion classification and segmentation while using less than 2% of the tunable parameters required for full fine-tuning. Qualitative analyses confirm that anatomy-guided prompts yield interpretable attention patterns consistent with radiological structures. Our results suggest that embedding anatomical priors into prompt tuning not only enhances efficiency and generalization but also provides an interpretable bridge between deep learning representations and human anatomical reasoning.