Back to all papers

AI-assisted detection of cerebral aneurysms on 3D time-of-flight MR angiography: user variability and clinical implications.

Authors

Liao L,Puel U,Sabardu O,Harsan O,Medeiros LL,Loukoul WA,Anxionnat R,Kerrien E

Affiliations (4)

  • Department of Diagnostic and Interventional Neuroradiology, CHRU Nancy, France; INRIA, LORIA, CNRS, Université de Lorraine, Nancy, France. Electronic address: [email protected].
  • Department of Diagnostic and Interventional Neuroradiology, CHRU Nancy, France; IADI, INSERM U1254, Université de Lorraine, Nancy, France.
  • Department of Diagnostic and Interventional Neuroradiology, CHRU Nancy, France.
  • INRIA, LORIA, CNRS, Université de Lorraine, Nancy, France.

Abstract

The generalizability and reproducibility of AI-assisted detection for cerebral aneurysms on 3D time-of-flight MR angiography remain unclear. We aimed to evaluate physician performance using AI assistance, focusing on inter- and intra-user variability, identifying factors influencing performance and clinical implications. In this retrospective study, four state-of-the-art AI models were hyperparameter-optimized on an in-house dataset (2019-2021) and evaluated via 5-fold cross-validation on a public external dataset. The two best-performing models were selected for evaluation on an expert-revised external dataset. saccular aneurysms without prior treatment. Five physicians, grouped by expertise, each performed two AI-assisted evaluations, one with each model. Lesion-wise sensitivity and false positives per case (FPs/case) were calculated for each physician-AI pair and AI models alone. Agreement was assessed using kappa. Aneurysm size comparisons used the Mann-Whitney U test. The in-house dataset included 132 patients with 206 aneurysms (mean size: 4.0 mm); the revised external dataset, 270 patients with 174 aneurysms (mean size: 3.7 mm). Standalone AI achieved 86.8% sensitivity and 0.58 FPs/case. With AI assistance, non-experts achieved 72.1% sensitivity and 0.037 FPs/case; experts, 88.6% and 0.076 FPs/case; the intermediate-level physician, 78.5% and 0.037 FPs/case. Intra-group agreement was 80% for non-experts (kappa: 0.57, 95% CI: 0.54-0.59) and 77.7% for experts (kappa: 0.53, 95% CI: 0.51-0.55). In experts, false positives were smaller than true positives (2.7 vs. 3.8 mm, p < 0.001); no difference in non-experts (p = 0.09). Missed aneurysm locations were mainly model-dependent, while true- and false-positive locations reflected physician expertise. Non-experts more often rejected AI suggestions and added fewer annotations; experts were more conservative and added more. Evaluating AI models in isolation provides an incomplete view of their clinical applicability. Detection performance and patterns differ between standalone AI and AI-assisted use, and are modulated by physician expertise. Rigorous external validation is essential before clinical deployment.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.