Diagnostic Performance of CT-Based Artificial Intelligence for Early Recurrence of Cholangiocarcinoma: A Systematic Review and Meta-Analysis.
Authors
Affiliations (3)
Affiliations (3)
- Department of Radiology, The First Hospital of Jilin University, 71 Xinxin Street, Chaoyang District, Changchun City, Jilin Province, China, Jilin, CN.
- General Surgery Center, Department of Hepatobiliary and Pancreatic Surgery, The First Hospital of Jilin University, Changchun, CN.
- Department of Urology, The First Hospital of Jilin University, Changchun, CN.
Abstract
Despite AI models demonstrating high predictive accuracy for early cholangiocarcinoma(CCA) recurrence, their clinical application faces challenges such as reproducibility, generalizability, hidden biases, and uncertain performance across diverse datasets and populations, raising concerns about their practical applicability. This meta-analysis aims to systematically assess the diagnostic performance of artificial intelligence (AI) models utilizing computed tomography (CT) imaging to predict early recurrence of CCA. A systematic search was conducted in PubMed, Embase, and Web of Science for studies published up to May 2025. Studies were selected based on the PIRTOS framework. Participants (P): Patients diagnosed with CCA (including intrahepatic and extrahepatic locations). Index test (I): AI techniques applied to CT imaging for early recurrence prediction (defined as within 1 year). Reference standard (R): Pathological diagnosis or imaging follow-up confirming recurrence. Target condition (T): Early recurrence of CCA (positive group: recurrence, negative group: no recurrence). Outcomes (O): Sensitivity, specificity, diagnostic odds ratio (DOR), and area under the receiver operating characteristic curve (AUC), assessed in both internal and external validation cohorts. Setting (S): Retrospective or prospective studies using hospital datasets. Methodological quality was assessed using an optimized version of the revised QUADAS-2 tool. Heterogeneity was assessed using the I² statistic. Pooled sensitivity, specificity, DOR and AUC were calculated using a bivariate random-effects model. Nine studies with 30 datasets involving 1,537 patients were included. In internal validation cohorts, CT-based AI models showed a pooled sensitivity of 0.87 (95% CI: 0.81-0.92), specificity of 0.85 (95% CI: 0.79-0.89), DOR of 37.71 (95% CI: 18.35-77.51), and AUC of 0.93 (95% CI: 0.90-0.94). In external validation cohorts, pooled sensitivity was 0.87 (95% CI: 0.81-0.91), specificity was 0.82 (95% CI: 0.77-0.86), DOR was 30.81 (95% CI: 18.79-50.52), and AUC was 0.85 (95% CI: 0.82-0.88). The AUC was significantly lower in external validation cohorts compared to internal validation cohorts (P < .001). Our results show that CT-based AI models predict early CCA recurrence with high performance in internal validation sets and moderate performance in external validation sets. However, the high heterogeneity observed may impact the robustness of these results. Future research should focus on prospective studies and establishing standardized gold standards to further validate the clinical applicability and generalizability of AI models.