A machine learning-based risk prediction framework for atypical hyperplasia and endometrial cancer in postmenopausal women.
Authors
Affiliations (4)
Affiliations (4)
- Department of Obstetrics and Gynecology, People's Hospital of Henan University, Henan Provincial People's Hospital, Zhengzhou, Henan, China.
- Department of Gynecology, Henan Provincial People's Hospital, 7 Weiwu Road, Jinshui District, Zhengzhou, 450003, Henan, China.
- Department of Obstetrics and Gynecology, Zhengzhou University People's Hospital, Henan Provincial People's Hospital, Zhengzhou, Henan, China.
- Department of Gynecology, Henan Provincial People's Hospital, 7 Weiwu Road, Jinshui District, Zhengzhou, 450003, Henan, China. [email protected].
Abstract
This study aimed to develop and internally validate a machine learning-based model for predicting endometrial malignancy, defined as atypical hyperplasia or endometrial cancer (AH/EC), in postmenopausal women, integrating routinely available clinical, ultrasound, and laboratory features to support individualized diagnostic triage and potentially reduce unnecessary invasive diagnostic procedures in low-risk patients. This retrospective, single-center study included 858 postmenopausal women who underwent endometrial histopathological evaluation at Henan Provincial People's Hospital between February 2022 and September 2025. The cohort was randomly divided into a training set (70%, n = 602) and a validation set (30%, n = 256). Feature selection was performed in the training cohort using univariate analysis (P < 0.001), LASSO regression with the λ₁se criterion, and the Boruta algorithm. Nine supervised machine learning models were developed in the training cohort and evaluated in the validation cohort. Model performance was assessed based on discrimination (area under the receiver operating characteristic curve [AUC], sensitivity, specificity, F1 score), calibration (Brier score, calibration curves), and clinical utility (decision curve analysis). SHAP was applied to interpret the optimal model, and a nomogram was constructed based on the Logistic Regression model. A total of 155 patients (18.1%) were diagnosed with AH/EC. Six predictors were retained for model development: Endometrial thickness, Postmenopausal bleeding, Presence of blood flow signal, CA19-9, CA125, and Lesion outline regularity. In the validation cohort, the Neural Network model showed the highest AUC (0.840, 95% CI: 0.770-0.909), comparable to Logistic Regression (AUC 0.838, 95% CI: 0.768-0.908), with higher sensitivity (0.739 vs. 0.674) and similar calibration (Brier score 0.099 for both models). Both models showed acceptable validation performance. A prediction framework based on routinely obtainable clinical, ultrasound, and laboratory variables may support personalized risk assessment in postmenopausal women undergoing diagnostic evaluation for suspected endometrial lesions. Further multicenter external validation and prospective studies are needed to confirm its generalizability and clinical applicability.