Attention-based multimodal deep learning for interpretable and generalizable prediction of pathological complete response in breast cancer.
Authors
Affiliations (4)
Affiliations (4)
- Department of Diagnostic Radiology, Warren Alpert Medical School of Brown University, Providence, RI, USA.
- Department of Radiology, Montefiore Health System and Albert Einstein College of Medicine, Bronx, NY, USA.
- Department of Diagnostic Radiology, Warren Alpert Medical School of Brown University, Providence, RI, USA. [email protected].
- Department of Radiology, Montefiore Health System and Albert Einstein College of Medicine, Bronx, NY, USA. [email protected].
Abstract
Accurate prediction of pathological complete response (pCR) to neoadjuvant chemotherapy has significant clinical utility in the management of breast cancer treatment. Although multimodal deep learning models have shown promise for predicting pCR from medical imaging and other clinical data, their adoption has been limited due to challenges with interpretability and generalizability across institutions. We developed a multimodal deep learning model combining post contrast-enhanced whole-breast MRI at pre- and post-treatment timepoints with non-imaging clinical features. The model integrates 3D convolutional neural networks and self-attention to capture spatial and cross-modal interactions. We utilized two public multi-institutional datasets to perform internal and external validation of the model. For model training and validation, we used data from the I-SPY 2 trial (N = 660). For external validation, we used the I-SPY 1 dataset (N = 114). Of the 660 patients in I-SPY 2, 217 patients achieved pCR (32.88%). Of the 114 patients in I-SPY 1, 29 achieved pCR (25.44%). The attention-based multimodal model yielded the best predictive performance with an AUC of 0.73 ± 0.04 on the internal data and an AUC of 0.71 ± 0.02 on the external dataset. The MRI-only model (internal AUC = 0.68 ± 0.03, external AUC = 0.70 ± 0.04) and the non-MRI clinical features-only model (internal AUC = 0.66 ± 0.08, external AUC = 0.71 ± 0.03) trailed in performance, indicating the combination of both modalities is most effective. We present a robust and interpretable deep learning framework for pCR prediction in breast cancer patients undergoing NAC. By combining imaging and clinical data with attention-based fusion, the model achieves strong predictive performance and generalizes across institutions.