Multimodal Fusion for Fine-Grained Classification of Breast Fibroadenoma and Phyllodes Tumors

July 2, 2026

arXiv: 2607.02091v1

Authors

Chuxi Nan,Di Wu,Hongming Guo,Ning Cao,Xiaohui Zhu,Zhaoting Shi,Jiawei Li

Abstract

Breast fibroadenoma (FA) and phyllodes tumor (PT) are fibroepithelial breast lesions with highly overlapping appearances on B-mode ultrasound, making benign and borderline PT prone to being misclassified as FA and complicating preoperative decision-making. Existing computer-aided diagnosis methods commonly rely on single-modal imaging features and insufficiently exploit complementary clinical and textual information. To address this limitation, we construct the FAPT-M Dataset, a pathology-confirmed multimodal dataset comprising 910 patients with strictly reviewed ultrasound images, structured clinical attributes, and ultrasound diagnostic descriptions. Based on this dataset, we propose a clinically guided multimodal framework that integrates DenseNet-based visual encoding, CLIP-inspired text encoding, and lightweight clinical encoding, and further introduces clinical-conditioned adaptive modulation, cross-modal Transformer fusion, and dual-path representation learning to improve feature alignment and multimodal interaction. Under patient-level five-fold cross-validation, the proposed method achieves an accuracy of 77.64%, F1-score of 73.38%, and AUC of 89.74%, outperforming representative CNN-, Transformer-, and vision-language-based baselines. Ablation studies and class-balanced evaluations further confirm the contribution of three-modality fusion and the key architectural components. Overall, this work provides an effective multimodal approach for fine-grained FA-PT classification and establishes a high-quality benchmark for multimodal breast ultrasound analysis.

View Source Full Text PDF

Topics

cs.CV

Multimodal Fusion for Fine-Grained Classification of Breast Fibroadenoma and Phyllodes Tumors

Authors

Abstract

Tags

Topics

Ready to Sharpen Your Edge?