A Genomics-Guided Multimodal Contrastive Learning Framework for Clinically Significant Prostate Cancer Risk Stratification with Missing Clinical Data.

June 16, 2026

papers

DOI: 10.3390/cancers18121952 PMID: 42352485

Authors

Shahid M,Ather MA,Fatima Z,Mejorada CGS,Ruiz MJT,Téllez RQ,Mata-Rivera MF,Zagal-Flores R

Affiliations (5)

Centro de Investigación en Computación (CIC), Instituto Politécnico Nacional (IPN), Mexico City 07738, Mexico.
Department of Computer Sciences, Bahria University, Lahore 54600, Pakistan.
Faculty of Allied Health Sciences, Superior University, Lahore 54000, Pakistan.
Interdisciplinary Professional Unit in Engineering and Advanced Technologies (UPIITA), Instituto Politécnico Nacional (IPN), Mexico City 07340, Mexico.
Higher School of Computing (ESCOM), Instituto Politécnico Nacional (IPN), Mexico City 07738, Mexico.

Abstract

Heterogeneous data integration remains a major challenge in intelligent information systems, particularly under missing-modality and cross-domain conditions. Existing multimodal fusion approaches often rely on complete datasets and weak alignment mechanisms, limiting their robustness and practical applicability. This study aims to develop and evaluate a genomics-guided multimodal representation learning framework that enables robust heterogeneous data fusion, reliable cross-modal correspondence, and accurate prediction under incomplete-data conditions. We propose a multimodal learning architecture that models genomics as the primary biological anchor and learns conditional projections to imaging modalities, including multiparametric MRI and whole-slide histopathology (WSI). The framework formulates multimodal fusion as a genomics-guided contrastive learning problem, incorporates domain-specific optimization constraints, and learns a latent shared-state representation to support inference without requiring fully paired datasets. Evaluation was conducted using public datasets, including TCGA-PRAD and TCIA, across low-risk versus higher-risk/clinically significant prostate cancer (csPCa) discrimination, Gleason-based risk stratification, and clinically significant outcome prediction tasks under realistic multimodal and missing-modality scenarios. In the adequately powered Genomics+WSI cohort (n = 486), the framework achieved an AUROC of 0.985 ± 0.005 for low-risk versus higher-risk/csPCa discrimination (p < 0.001). Exploratory analysis in a small, matched Genomics+MRI cohort (n = 28) yielded an AUROC of 0.980 ± 0.006 for the same endpoint; these findings are reported descriptively with bootstrap confidence intervals due to limited sample size. Because the negative reference group consisted of low-risk prostate cancer cases rather than cancer-free controls, results are interpreted as within-cancer risk discrimination rather than de novo cancer detection. The framework achieved weighted accuracy up to 92.1%, Cohen's κ up to 0.86, and reduced critical decision errors by 58%. Calibration remained strong (ECE 0.021-0.024), and decision-curve analysis indicated improved utility with reduced unnecessary invasive workups in retrospective modeling. Robustness analysis demonstrated AUROC degradation below 0.04 under domain shifts. Single-modality inference using genomics alone maintained AUROC > 0.90. Interpretability analysis revealed feature attributions aligned with domain-relevant genomic markers. The proposed framework provides a scalable and generalizable solution for heterogeneous multimodal data fusion, supporting reliable prediction, robustness to missing modalities, and applicability to complex information systems beyond the studied domain.

View Source Full Text PDF

Topics

Journal Article

A Genomics-Guided Multimodal Contrastive Learning Framework for Clinically Significant Prostate Cancer Risk Stratification with Missing Clinical Data.

Authors

Affiliations (5)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?