Unsupervised risk factor identification across cancer types and data modalities via explainable artificial intelligence
Authors
Abstract
Risk stratification is a key tool in clinical decision-making, yet current approaches often fail to translate sophisticated survival analysis into actionable clinical criteria. We present a novel method for unsupervised machine learning that directly optimizes for survival heterogeneity across patient clusters through a differentiable adaptation of the multivariate logrank statistic. Unlike most existing methods that rely on proxy metrics, our approach represents novel methodology for training any neural network architecture on any data modality to identify prognostically distinct patient groups. We thoroughly evaluate the method in simulation experiments and demonstrate its utility in practice by applying it to two distinct cancer types: analyzing laboratory parameters from multiple myeloma patients and computed tomography images from non-small cell lung cancer patients, identifying prognostically distinct patient subgroups with significantly different survival outcomes in both cases. Post-hoc explainability analyses uncover clinically meaningful features determining the group assignments which align well with established risk factors and thus lend strong weight to the methods utility. This pan-cancer, model-agnostic approach represents a valuable advancement in clinical risk stratification, enabling the discovery of novel prognostic signatures across diverse data types while providing interpretable results that promise to complement treatment personalization and clinical decision-making in oncology and beyond.