Deep-learning-based non-contrast CT for detecting acute ischemic stroke: a systematic review and HSROC meta-analysis of patient-level diagnostic accuracy.
Authors
Affiliations (4)
Affiliations (4)
- School of Medicine, College of Medicine and Health Sciences, Bahir Dar University, Bahir Dar, Ethiopia. [email protected].
- EPIC Health Systems, Addis Ababa, Ethiopia. [email protected].
- School of Health, Faculty of Medicine and Health, University of New England, Armidale, 2351, Australia.
- Department of Clinical Pharmacy, School of Pharmacy, College of Medicine and Health Science, University of Gondar, Gondar, Ethiopia.
Abstract
Non-contrast CT (NCCT) is first-line imaging for suspected acute ischemic stroke (AIS) but has limited early sensitivity; deep learning (DL) may improve patient-level detection. To estimate the diagnostic accuracy of DL applied to NCCT for patient-level AIS detection and to examine prespecified sources of between-study heterogeneity. We searched MEDLINE, Embase, and Web of Science (January 2010-May 2025). Eligible prospective or retrospective diagnostic studies evaluated DL on NCCT against an appropriate reference standard and reported (or allowed reconstruction of) patient-level 2 × 2 data. Two-gate case-control and lesion-only reports were excluded. Dual reviewers screened/extracted data; risk of bias was assessed with QUADAS-2, and AI-reporting against items adapted from STARD-AI/CLAIM/CONSORT-AI. Bivariate random-effects/HSROC models summarized sensitivity and specificity. Prespecified moderators were posterior-fossa inclusion, reference-standard robustness, and validation type. Sensitivity analyses included external-only cohorts, robust standards, posterior-fossa inclusion, and a "Direct AIS" construct subset. Of 1,899 records, 16 studies met inclusion; 13 contributed patient-level data to meta-analysis. Summary sensitivity was 0.91 (95% CI, 0.81-0.96) and specificity 0.90 (0.85-0.94). Sensitivity was lower for externally validated models than internally validated ones (0.82 [0.67-0.91] vs. 0.95 [0.89-0.98]) with similar specificity (0.88 [0.83-0.92] vs. 0.93 [0.82-0.97]). Findings were directionally robust across sensitivity analyses. QUADAS-2 frequently indicated concerns in patient selection and index-test domains; AI-reporting quality was mostly moderate, and explicit external validation remained uncommon. DL applied to NCCT shows high accuracy for patient-level AIS detection. However, generalizability is the principal gap; broader external validation and guideline-concordant reporting are needed to support safe clinical adoption.