Artificial intelligence applied to ultrasound diagnosis of pelvic gynecological tumors: a systematic review and meta-analysis.
Authors
Abstract
To perform a systematic review on artificial intelligence (AI) studies focused on identifying and differentiating pelvic gynecological tumors on ultrasound scans. Studies developing or validating AI models for diagnosing gynecological pelvic tumors on ultrasound scans were eligible for inclusion. We systematically searched PubMed, Embase, Web of Science, and Cochrane Central from their database inception until April 30th, 2024. To assess the quality of the included studies, we adapted the QUADAS-2 risk of bias tool to address the unique challenges of AI in medical imaging. Using multi-level random effects models, we performed a meta-analysis to generate summary estimates of the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity. To provide a reference point of current diagnostic support tools for ultrasound examiners, we descriptively compared the pooled performance to that of the well-recognized ADNEX model on external validation. Subgroup analyses were performed to explore sources of heterogeneity. From 9151 records retrieved, 44 studies were eligible: 40 on ovarian, three on endometrial, and one on myometrial pathology. Overall, 95% were at high risk of bias - primarily due to inappropriate study inclusion criteria, the absence of a patient-level split of training and testing image sets, and no calibration assessment. For ovarian tumors, the summary AUC for AI models distinguishing benign from malignant tumors was 0.89 (95% CI: 0.85-0.92). In lower-risk studies (at least three low-risk domains), the summary AUC dropped to 0.87 (0.83-0.90), with deep learning models outperforming radiomics-based machine learning approaches in this subset. Only five studies included an external validation, and six evaluated calibration performance. In a recent systematic review of external validation studies, the ADNEX model had a pooled AUC of 0.93 (0.91-0.94) in studies at low risk of bias. Studies on endometrial and myometrial pathologies were reported individually. Although AI models show promising discriminative performances for diagnosing gynecological tumors on ultrasound, most studies have methodological shortcomings that result in a high risk of bias. In addition, the ADNEX model appears to outperform most AI approaches for ovarian tumors. Future research should emphasize robust study designs - ideally large, multicenter, and prospective cohorts that mirror real-world populations - along with external validation, proper calibration, and standardized reporting. This study was pre-registered with Open Science Framework (OSF): https://doi.org/10.17605/osf.io/bhkst.