Explanation-Guided Reconstruction of Missing Clinical Features for Survival Prediction in Pancreatic Cancer.
Authors
Abstract
Pancreatic ductal adenocarcinoma (PDAC) remains one of the most lethal cancers, with survival rates influenced by a variety of factors, including early diagnosis, tumor profile, treatment regimen, and treatment response. The development of PDAC prognostic models is often compromised by incomplete clinical records that need to cover imaging, pathology, surgery, and treatment workflows. We propose an explanation-guided reconstruction framework (xRF) that combines an autoencoder with an ensemble of gradient- and perturbation-based explainability methods to identify and prioritize clinically relevant features during autoencoder training. Such a dual-module architecture ensures that the reconstruction process focuses on features most critical for downstream survival prediction instead of diluting its attention to less relevant but easy-to-reconstruct features. The framework was validated using a cohort of 1531 PDAC patients treated in the Danish Capital Region, with clinical features drawn from CT image readings, surgery and pathology protocols, and chemotherapy records. xRF was validated on four survival prediction tasks: post-diagnosis, post-metastasis, post-surgery, and post-chemotherapy survival. We conducted experiments with synthetic masking levels ranging from 10% to 80% and tested performance on unobserved but clinically meaningful features. The prediction performance of xRF drops by 1-2% for the moderate amount of missing data to 10-15% in the cases with high percentage of features missing when measured against the reference predictions without missing data. These results compare favorably against six alternative machine learning-based reconstruction algorithms.