Commercial AI platforms improve reproducibility but fail to meet clinical precision thresholds for intracranial aneurysm measurement: implications for serial surveillance.
Authors
Affiliations (2)
Affiliations (2)
- 1Department of Radiology, Zhuzhou Hospital Affiliated to Xiangya School of Medicine, Central South University, No. 116, South Changjiang Road, Tianyuan District, Zhuzhou, Hunan, 412000, China.
- 1Department of Radiology, Zhuzhou Hospital Affiliated to Xiangya School of Medicine, Central South University, No. 116, South Changjiang Road, Tianyuan District, Zhuzhou, Hunan, 412000, China. [email protected].
Abstract
Intracranial aneurysm affect 3-7% of the global population, with rupture causing > 80% of non-traumatic subarachnoid hemorrhage and approximately 50% mortality. Clinical management relies on precise measurement of aneurysm neck width and maximum length, where ≥ 1 mm growth signals elevated rupture risk. computed tomography angiography enables non-invasive monitoring but manual measurements suffer from inter-observer variability. Commercial artificial intelligence platforms offer potential improvements, yet their consistency and accuracy versus digital subtraction angiography, the gold standard, are understudied. This retrospective study analyzed 148 patients with 163 Intracranial aneurysms via computed tomography angiography, including a subgroup of 86 with digital subtraction angiography within 1 week. Measurements were obtained using Shukun artificial intelligence, UIH artificial intelligence, manual computed tomography angiography (intra-observer repeated at 1 month), and digital subtraction angiography. Reproducibility was assessed by coefficient of variation; agreement by Bland-Altman analysis, with 95% limits of agreement within ± 1.0 mm considered clinically acceptable. Results For aneurysm neck width, manual measurements had a mean difference of -1.62 mm and 95% limits of agreement of -4.87 to 1.62 mm vs. digital subtraction angiography, while UIH artificial intelligence and Shukun artificial intelligence had mean differences of + 0.80 mm and - 1.01 mm, and 95% limits of agreement of -3.68 to 2.08 mm and - 3.06 to 1.04 mm, respectively. For aneurysm maximum length, UIH artificial intelligence systematically overestimated (mean difference: +3.46 mm) and Shukun artificial intelligence underestimated (mean difference: -2.20 mm) vs. digital subtraction angiography. Both artificial intelligence platforms had narrower coefficients of variation (0.31-0.36 for aneurysm neck width, 0.23-0.26 for aneurysm maximum length) than manual measurements (0.36-0.42, 0.23-0.24). However, all methods exceeded the clinically acceptable ± 1.0 mm threshold. Artificial intelligence platforms have better reproducibility than manual measurements but show systematic biases, not meeting the clinical ± 1.0 mm precision vs. the digital subtraction angiography gold standard. Inter-platform variability exceeds the aneurysm growth threshold, requiring consistent use of the same artificial intelligence platform in serial surveillance to prevent misinterpreting changes.