Unsupervised risk factor identification across cancer types and data modalities via explainable artificial intelligence.
Authors
Affiliations (13)
Affiliations (13)
- Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI) Dresden/Leipzig, Universität Leipzig, Leipzig, Germany. [email protected].
- Innovation Center Computer Assisted Surgery (ICCAS), University of Leipzig, Leipzig, Germany. [email protected].
- Department of Diagnostics, Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany. [email protected].
- Department of Hematology, Hemostaseology, Cellular Therapy and Infectiology, University Hospital of Leipzig, Leipzig, Germany. [email protected].
- Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI) Dresden/Leipzig, Universität Leipzig, Leipzig, Germany.
- Innovation Center Computer Assisted Surgery (ICCAS), University of Leipzig, Leipzig, Germany.
- Department of Hematology, Hemostaseology, Cellular Therapy and Infectiology, University Hospital of Leipzig, Leipzig, Germany.
- Department of Internal Medicine V, University Hospital Heidelberg, Heidelberg, Germany.
- National Center for Tumor Diseases (NCT), Heidelberg, Germany.
- Department of Diagnostic and Interventional Radiology, University Hospital Leipzig, Leipzig, Germany.
- Department of Diagnostics, Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany.
- Institute for Clinical Immunology, University Hospital of Leipzig, Leipzig, Germany.
- Myeloma Service, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
Abstract
Risk stratification is an important tool in clinical decision-making, yet current approaches often fail to translate sophisticated survival analysis into actionable clinical criteria. We present a novel method for training any neural network architecture on any data modality to identify prognostically distinct patient groups by directly optimizing for survival heterogeneity across patient clusters. We evaluate the method in simulation experiments and demonstrate its utility in practice by applying it to two distinct cancer types: analyzing laboratory parameters from multiple myeloma (MM) patients using the CoMMpass dataset and computed tomography images from non-small cell lung cancer (NSCLC) patients using the Lung1 dataset. Post-hoc explainability analyses uncover clinically meaningful features determining group assignments, which align well with established risk factors in both cases. Our findings in MM were externally validated using the GMMG-MM5 study dataset, while the NSCLC findings were validated with data from our own institution, thus lending strong weight to the method's utility. This pan-cancer, model-agnostic approach enables the discovery of novel prognostic signatures across diverse data types while providing interpretable results that promise to complement treatment personalization and clinical decision-making in oncology and beyond.