A Digital Twin-Inspired Closed-Loop Latent Simulation Framework for Cross-Cohort Breast Cancer Subtype Classification under Modality-Disjoint Learning.
Authors
Abstract
Breast cancer PAM50 subtype classification is hindered by the single-pass prediction paradigm of existing deep learning systems, which provide no mechanism for iterative representation refinement or uncertainty trajectory analysis. We present the Cross-Cohort Modality-Disjoint Latent Simulation (CDLS) framework: a closed-loop latent trajectory classification system that integrates histopathology (WSI), transcriptomics (RNA-seq), mammography sequences, and clinical covariates from three non-overlapping cohorts under a modality-disjoint regime in which no patient possesses all modalities simultaneously and learning proceeds via shared latent alignment rather than per-patient fusion. A PPO-governed stochastic policy refines the latent state $z\!\in \!\mathbb {R}^{7}$ across $T\!=\!5$ optimisation steps through a Twin-GRU transition model; a defining feature is the closed-loop latent feedback step applied after each transition, which aligns simulated states with real patient embeddings via kNN retrieval: $z_{t+1}\!=\!\mathcal {T}(z_{t},a_{t})+\lambda \tfrac{1}{k}\sum _{i\in \mathcal {N}(z_{t})}(z_{i}-z_{t})$. Multi-seed evaluation ($n\!=\!4$, training stability only) yields Balanced Accuracy $0.870\!\pm \!0.044$ and MCC $0.904\!\pm \!0.046$; five-fold cross-validated results confirm stability (Accuracy $0.871\!\pm \!0.029$, MCC $0.897\!\pm \!0.038$, $df\!=\!19$). An extended 8-seed analysis yields $0.872\!\pm \!0.029$, consistent with the 4-seed result; all comparative conclusions remain variance-aware ($\sigma \!\approx \!0.034$). PPO is selected for its trajectory geometry properties (broader latent coverage, higher path diversity) rather than marginal accuracy gains, which fall within multi-seed variance. The $d_{twin}\!=\!7$ bottleneck is empirically guided by intrinsic dimensionality estimates ($\hat{d}_{id}\!=\!6.3\!\pm \!0.4$); it is not claimed as a theoretically optimal value. The closed-loop latent framework terminology is used in a computational, representation-space sense only.