Back to all papers

Bridging radiology and pathology: domain-generalized cross-modal learning for clinical.

February 16, 2026pubmed logopapers

Authors

Zhong X,Gu Z,Shanmuganathan M,Li M,Sun H,Du M,Chen Q,Jiang G

Affiliations (8)

  • Department of General Surgery, The Second Affiliated Hospital of Soochow University, Suzhou, Jiangsu, China.
  • University of Tabuk, Faculty of Computers and Information Technology, Tabuk, Kingdom of Saudi Arabia.
  • School of Nano-Tech and Nano-Bionics, University of Science and Technology of China, Hefei, Anhui, China.
  • CAS Key Laboratory of Nano-Bio Interface, Division of Nanobiomedicine and i-Lab, Suzhou Institute of Nano-Tech and Nano-Bionics, Chinese Academy of Sciences, Suzhou, Jiangsu, China.
  • Wolfson Institute for Biomedical Research, UCL, University College London, London, London, UK. [email protected].
  • CAS Key Laboratory of Nano-Bio Interface, Division of Nanobiomedicine and i-Lab, Suzhou Institute of Nano-Tech and Nano-Bionics, Chinese Academy of Sciences, Suzhou, Jiangsu, China. [email protected].
  • Medical Science and Technology Innovation Center, The Affiliated Suzhou Hospital of Nanjing Medical University, Suzhou Municipal Hospital, Gusu School of Nanjing Medical University, Suzhou, Jiangsu, China. [email protected].
  • Department of General Surgery, The Second Affiliated Hospital of Soochow University, Suzhou, Jiangsu, China. [email protected].

Abstract

Reliable interpretation of clinical imaging requires integrating complementary evidence across modalities, yet most AI systems remain limited by single-modality analysis and poor generalization across institutions. We propose a unified cross-modal framework that bridges mammography and histopathology for breast cancer diagnosis through: (1) a shared vision transformer encoder with lightweight modality-specific adapters, (2) a weakly supervised patient-level contrastive alignment module that learns cross-modal correspondences without pixel-level supervision, (3) domain generalization strategies combining MixStyle augmentation and invariant risk minimization, and (4) causal test-time adaptation for unseen target domains. The model jointly addresses classification, lesion localization, and pathological grading while generating reasoning-guided attention maps that explicitly link suspicious mammographic regions with corresponding histopathological evidence. Evaluated on four public benchmarks (CBIS-DDSM, INbreast, BACH, CAMELYON16/17), the framework consistently outperforms state-of-the-art unimodal, multimodal, and domain generalization baselines, achieving mean AUC of 0.90 under rigorous leave-one-domain-out evaluation and substantially smaller domain gaps (0.03 vs. 0.06-0.10). Visualization and interpretability analyses further confirm that predictions align with clinically meaningful features, supporting transparency and trust. By advancing multimodal integration, cross-institutional robustness, and explainability, this study represents a step toward clinically deployable AI systems for diagnostic decision support.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.