Cross-Attention Multimodal Learning for Predicting Response to Neoadjuvant Imatinib in Gastrointestinal Stromal Tumors: A Multicenter Retrospective Study
Authors
Abstract
Background: Response to neoadjuvant imatinib in gastrointestinal stromal tumors (GISTs) is highly variable and cannot be reliably predicted using current clinical or molecular markers. This study developed and evaluated an explainable multimodal deep learning framework integrating computed tomography (CT) imaging and clinical variables to predict treatment response. Methods: Patients from four tertiary centers were retrospectively included between 2000-2023 in independent pretraining (n=935) and prediction (n=213) cohorts. A cross-attention framework integrating clinical variables and tumor-centered CT imaging was developed to predict response to neoadjuvant imatinib. Two training strategies were evaluated: (1) self-supervised pretraining with low-rank adaptation and (2) training from scratch. Hyperparameters were optimized using SMAC3. Performance was assessed through internal cross-validation and external testing. Ablation analyses and attention-based explanations were used to quantify modality contributions. Results: Among 213 patients (54.5% responders), responders had larger tumors (112 vs. 89 mm, P=0.026), higher mitotic index (3 vs. 0, P<0.001), and more frequent KIT mutations (69.0% vs. 56.7%, P=0.019). Cross-attention models achieved the highest internal performance (AUC up to 0.99) but lower external performance (AUC 0.60-0.63). Clinical-only performance was moderate (AUC 0.66), whereas imaging-only models showed limited generalizability (AUC 0.56-0.66). Explainability analyses identified significant differences in feature importance between responders and non-responders, including CD117, BRAF, PDGFRA, age, sex, disease status, and comorbidities (FDR-adjusted P<=0.036). Conclusion: The cross-attention framework shows potential for improving imatinib response prediction in GIST while providing interpretable insights into multimodal determinants of treatment response.