Attention-based multimodal fusion transformer for predicting the efficacy of neoadjuvant therapy in breast cancer: a cross-institutional retrospective study.
Authors
Affiliations (10)
Affiliations (10)
- Department of Pathology, West China Hospital, Sichuan University, Chengdu, 610041, Sichuan Province, China.
- Institute of Clinical Pathology, West China Hospital, Sichuan University, Chengdu, 610041, Sichuan Province, China.
- Department of Pathology, Shengjing Hospital of China Medical University, Shenyang, 110004, Liaoning Province, China.
- Key Laboratory of Intelligent and Precision Pathology Diagnosis in Oncology, China Medical University, Shenyang, 110004, Liaoning Province, China.
- AI Thrust, Information Hub, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, 511453, Guangdong Province, China.
- Department of Pathology, Shengjing Hospital of China Medical University, Shenyang, 110004, Liaoning Province, China. [email protected].
- Key Laboratory of Intelligent and Precision Pathology Diagnosis in Oncology, China Medical University, Shenyang, 110004, Liaoning Province, China. [email protected].
- Department of Radiology, West China Hospital, Sichuan University, Chengdu, 610041, Sichuan Province, China. [email protected].
- Institute of Clinical Pathology, West China Hospital, Sichuan University, Chengdu, 610041, Sichuan Province, China. [email protected].
- College of Computer Science, Sichuan University, Chengdu, 610041, Sichuan Province, China. [email protected].
Abstract
Neoadjuvant therapy (NAC) is a standard treatment for breast cancer, yet only some patients gain significant benefit. Identifying those most likely to benefit from NAC is crucial. Single-modality data often overlook patient heterogeneity, so we developed an interpretable, attention-based multimodal full information feature fusion transformer, MuFi, to predict NAC responses by integrating whole slide images (WSI) and magnetic resonance imaging (MRI). Data from 567 biopsy-confirmed breast cancer patients from two institutions were retrospectively analyzed, with a training cohort (n = 290), validation cohort (n = 73), and external test cohort (n = 204). Multimodal data included pre-treatment pathology slides, MRI scans, and clinical information. A memory-efficient multimodal model was used to fuse WSIs and MRI, with a transformer capturing interactions between histological patches and MRI features. MuFi achieved AUCs of 81.9% and 78.5% in discovery and validation cohorts and 79.3% in external testing, outperforming clinical, single-modality and late-fusion-based models. Integrating clinical data (cT and molecular subtype) with MuFi and Feature Re-calibration based Multiple Instance Learning (FRMIL) models further increased AUCs to 90.2%, 81.8%, and 81.6% across the cohorts, indicating enhanced predictive accuracy and generalizability, especially in external testing. By fusing pathology and radiology features, MuFi improves decision reliability and identifies critical multimodal predictors. This integration framework better captures patient heterogeneity, supporting personalized NAC decision-making through improved accuracy and generalizability.