Synthetic data generation method improves risk prediction model for early tumor recurrence after surgery in patients with pancreatic cancer.
Authors
Affiliations (6)
Affiliations (6)
- Division of Hepatobiliary-Pancreatic Surgery, Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea.
- Division of Hepatobiliary-Pancreatic Surgery, Department of Surgery, Daejeon Eulji University Medical Center, Eulji University School of Medicine, Daejeon, South Korea.
- Department of Surgery, Seoul National University College of Medicine, Seoul, South Korea.
- Department of Radiology and Center for Imaging Science, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea. [email protected].
- Department of Nuclear Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea. [email protected].
- Division of Hepatobiliary-Pancreatic Surgery, Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea. [email protected].
Abstract
Pancreatic cancer is aggressive with high recurrence rates, necessitating accurate prediction models for effective treatment planning, particularly for neoadjuvant chemotherapy or upfront surgery. This study explores the use of variational autoencoder (VAE)-generated synthetic data to predict early tumor recurrence (within six months) in pancreatic cancer patients who underwent upfront surgery. Preoperative data of 158 patients between January 2021 and December 2022 was analyzed, and machine learning models-including Logistic Regression, Random Forest (RF), Gradient Boosting Machine (GBM), and Deep Neural Networks (DNN)-were trained on both original and synthetic datasets. The VAE-generated dataset (n = 94) closely matched the original data (p > 0.05) and enhanced model performance, improving accuracy (GBM: 0.81 to 0.87; RF: 0.84 to 0.87) and sensitivity (GBM: 0.73 to 0.91; RF: 0.82 to 0.91). PET/CT-derived metabolic parameters were the strongest predictors, accounting for 54.7% of the model predictive power with maximum standardized uptake value (SUVmax) showing the highest importance (0.182, 95% CI: 0.165-0.199). This study demonstrates that synthetic data can significantly enhance predictive models for pancreatic cancer recurrence, especially in data-limited scenarios, offering a promising strategy for oncology prediction models.