Precision identification of endometrial malignancy and precancerous lesions: Development of a machine learning model incorporating multidimensional clinical and imaging parameters.
Authors
Affiliations (2)
Affiliations (2)
- Department of Gynecology, Xianning Central Hospital, The First Affiliated Hospital of Hubei University of Science and Technology, Xianning, Hubei, China.
- Department of Oncology, Xianning Central Hospital, The First Affiliated Hospital of Hubei University of Science and Technology, Xianning, Hubei, China.
Abstract
To develop and validate a machine learning (ML) model integrating multidimensional clinical, pathomic, and ultrasound radiomic parameters for precise identification of endometrial malignancy and precancerous lesions, with a focus on addressing the diagnostic needs of younger patients pursuing fertility preservation. This retrospective study enrolled patients with suspected endometrial lesions from a single institution. Clinical baseline data (e.g., age, body mass index, menopausal status), pathomic features (e.g., cellular atypia, gland density), and ultrasound radiomic parameters (e.g., endometrial thickness, resistance index) were collected. Key predictors were selected using Pearson correlation, SHapley Additive exPlanations analysis, and least absolute shrinkage and selection operator regression. Seven ML models were constructed and optimized via 5-fold cross-validation. Model performance was evaluated using metrics such as ROC-AUC, sensitivity, specificity, and precision-recall performance in training (70%) and testing (30%) sets. A total of 1221 patients (854 in training, 367 in testing) were included. Seven variables including age, body mass index, menopausal status, cellular atypia, gland density, endometrial thickness, and resistance index emerged as robust predictors. Among the 7 ML models, the random forest model showed superior performance, with receiver operating characteristic area under curve of 0.98 in the training set and 0.96 in the testing set (95% confidence interval [CI]: 0.93-0.98). It had balanced sensitivity (0.89, 95% CI: 0.75-0.96) and specificity (0.86, 95% CI: 0.82-0.90) in the testing set. It maintained stability across varying risk thresholds and cost-benefit ratios, outperforming other models in precision-recall balance. Integration of multidimensional data via ML, particularly the random forest model, enhances the precision of endometrial malignancy detection. This approach enables personalized risk stratification, supporting targeted management for younger patients and advancing patient-centered care.