Robust Disease Prognosis via Diagnostic Knowledge Preservation: A Sequential Learning Approach

September 25, 2025

preprint

DOI: 10.1101/2025.09.22.25336414

Authors

Rajamohan, H. R.,Xu, Y.,Zhu, W.,Kijowski, R.,Cho, K.,Geras, K.,Razavian, N.,Deniz, C. M.

Affiliations (1)

New York University

Abstract

Accurate disease prognosis is essential for patient care but is often hindered by the lack of long-term data. This study explores deep learning training strategies that utilize large, accessible diagnostic datasets to pretrain models aimed at predicting future disease progression in knee osteoarthritis (OA), Alzheimers disease (AD), and breast cancer (BC). While diagnostic pretraining improves prognostic task performance, naive fine-tuning for prognosis can cause catastrophic forgetting, where the models original diagnostic accuracy degrades, a significant patient safety concern in real-world settings. To address this, we propose a sequential learning strategy with experience replay. We used cohorts with knee radiographs, brain MRIs, and digital mammograms to predict 4-year structural worsening in OA, 2-year cognitive decline in AD, and 5-year cancer diagnosis in BC. Our results showed that diagnostic pretraining on larger datasets improved prognosis model performance compared to standard baselines, boosting both the Area Under the Receiver Operating Characteristic curve (AUROC) (e.g., Knee OA external: 0.77 vs 0.747; Breast Cancer: 0.874 vs 0.848) and the Area Under the Precision-Recall Curve (AUPRC) (e.g., Alzheimers Disease: 0.752 vs 0.683). Additionally, a sequential learning approach with experience replay achieved prognostic performance comparable to dedicated single-task models (e.g., Breast Cancer AUROC 0.876 vs 0.874) while also preserving diagnostic ability. This method maintained high diagnostic accuracy (e.g., Breast Cancer Balanced Accuracy 50.4% vs 50.9% for a dedicated diagnostic model), unlike simpler multitask methods prone to catastrophic forgetting (e.g., 37.7%). Our findings show that leveraging large diagnostic datasets is a reliable and data-efficient way to enhance prognostic models while maintaining essential diagnostic skills. Author SummaryIn our research, we addressed a common problem in medical AI: how to accurately predict the future course of a disease when long-term patient data is rare. We focused on knee osteoarthritis, Alzheimers disease, and breast cancer. We found that we could significantly improve a models ability to predict disease progression by first training it on a much larger, more common type of data - diagnostic images used to assess a patients current disease state. We then developed a specialized training method that allows a single AI model to perform both diagnosis and prognosis tasks effectively. A key challenge is that models often "forget" their original diagnostic skills when they learn a new prognostic task. In a clinical setting, this poses a safety risk, as it could lead to missed diagnoses. We utilize experience replay to overcome this by continually refreshing the models diagnostic knowledge. This creates a more robust and efficient model that mirrors a clinicians workflow, offering the potential to improve patient care with limited amount of hard-to-get longitudinal data.

View Source Full Text PDF

Topics

health informatics

Robust Disease Prognosis via Diagnostic Knowledge Preservation: A Sequential Learning Approach

Authors

Affiliations (1)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?