Developing and validating ultrasound-based machine-learning models incorporating radiomics features to predict malignancy in adnexal masses.
Authors
Affiliations (13)
Affiliations (13)
- UniCamillus-International Medical University of Rome, Rome, Italy.
- Dipartimento Scienze della Salute della Donna e del Bambino, Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy.
- Radiomics G-STeP Research Core Facility, Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy.
- Dipartimento Universitario Scienze della Vita e Sanità Pubblica, Università Cattolica del Sacro Cuore, Rome, Italy.
- Data Collection G-STeP Research Core Facility, Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome, Italy.
- Dipartimento di Diagnostica per Immagini, Radioterapia Oncologica ed Ematologia, Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy.
- Università Cattolica del Sacro Cuore, Rome, Italy.
- Epidemiology and Biostatistics Facility, G-STeP Generator, Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome, Italy.
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium.
- Department of Obstetrics and Gynaecology, University Hospitals Leuven, Leuven, Belgium.
- Department of Obstetrics and Gynaecology, Imperial College London, London, UK.
- Department of Obstetrics and Gynaecology, Skåne University Hospital, Malmö, Sweden.
- Department of Clinical Sciences Malmö, Lund University, Malmö, Sweden.
Abstract
The primary aim of this study was to develop and internally validate ultrasound-based radiomics models to discriminate between all types of benign and malignant adnexal masses. The secondary aim was to compare the performance of the radiomics models with that of the Assessment of Different NEoplasias in the adneXa (ADNEX) model. This was a retrospective, observational, single-center study, for which all patients with an adnexal mass that were included in the ongoing International Ovarian Tumor Analysis phase-5 and phase-7 studies and were examined using ultrasound between January 2012 and December 2023 at Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome, Italy, were eligible for inclusion. Inclusion criteria were: adnexal mass detected by ultrasound; surgical removal of the adnexal mass within 180 days after the ultrasound examination; histological confirmation of an adnexal mass; and absence of a synchronous malignant tumor. Patients without digital ultrasound images saved in DICOM format were excluded. The patient cohort was split randomly into training and validation sets using a stratified split with a ratio of 70:30, to preserve the proportion of benign and malignant cases in the two sets. Two machine-learning models for discriminating between benign and malignant adnexal masses were built using one image per tumor, with 5-fold cross-validation for hyperparameter tuning, and were tested on the validation set. The variables used in model building were patient age, serum CA 125 level and the radiomics features that differed significantly between benign and malignant tumors (determined using the Mann-Whitney U-test with Benjamini-Hochberg correction) and were not redundant based on Pearson correlation analysis. Histology was the reference standard. We assessed the discriminative performance of the radiomics models using the area under the receiver-operating-characteristics curve (AUC) and classification performance using sensitivity and specificity at the optimal cut-off of each model to classify the mass as malignant, as determined by Youden's index. The diagnostic performance of the developed radiomics models was compared with that of the ADNEX model (AUC, sensitivity and specificity at the 10% risk-of-malignancy cut-off, which is the recommended threshold for clinical use of the ADNEX model). In total, 4501 patients met the inclusion criteria. Among these, 2428 patients were excluded owing to an absence of ultrasound images or images unsuitable for radiomics analysis. Overall, a total of 2073 patients were included in the analysis, of whom 803 (38.7%) had a histologically confirmed malignant tumor. In the validation set (n = 622, including 254 malignancies), the clinical-radiomics model trained using the eXtreme Gradient Boosting algorithm, including age, serum CA 125 level and 14 selected radiomics features, achieved the highest performance, with an AUC of 0.89 (95% CI, 0.86-0.92), sensitivity of 0.83 (95% CI, 0.79-0.88) and specificity of 0.81 (95% CI, 0.77-0.85) at the optimal cut-off (31% risk of malignancy, based on Youden's index). At a 10% risk-of-malignancy cut-off, it had a sensitivity of 0.94 (95% CI, 0.91-0.97) and specificity of 0.48 (95% CI, 0.42-0.53). The ADNEX model had an AUC of 0.95 (95% CI, 0.93-0.97), sensitivity of 0.97 (95% CI, 0.95-0.99) and specificity of 0.72 (95% CI, 0.68-0.77) at the 10% risk-of-malignancy cut-off in the validation set. Our results support further exploration of radiomics analysis for distinguishing between benign and malignant adnexal masses in larger study populations. Future studies should consider using multiple images per tumor and testing alternative model-building methods, and should perform external validation to assess the generalizability of the radiomics models. © 2026 The Author(s). Ultrasound in Obstetrics & Gynecology published by John Wiley & Sons Ltd on behalf of International Society of Ultrasound in Obstetrics and Gynecology.