Systematic Review: Agentic AI in Neuroradiology: Technical Promise with Limited Clinical Evidence.
Authors
Affiliations (4)
Affiliations (4)
- Radiology Informatics Lab, Department of Radiology, Mayo Clinic, Rochester, MN, 55905, USA. [email protected].
- Radiology Informatics Lab, Department of Radiology, Mayo Clinic, Rochester, MN, 55905, USA.
- Department of Radiology, Mayo Clinic, Rochester, MN, 55,905, USA.
- Radiology Informatics Lab, Department of Radiology, Mayo Clinic, Rochester, MN, 55905, USA. [email protected].
Abstract
Agentic artificial intelligence systems featuring iterative reasoning, autonomous tool use, or multi-agent collaboration have been proposed as solutions to the limitations of large language models (LLMs) in neuroradiology. However, the extent of their implementation and clinical validation remains unclear. We systematically searched PubMed, Web of Science, and Scopus (January 2022-August 2025) for studies implementing agentic AI in neuroradiology. Six independent reviewers (three medical doctors and three AI specialists) assessed full texts. Agentic AI was defined as requiring mandatory iterative reasoning plus either autonomous tool use or multi-agent collaboration. Study quality was evaluated using adapted QUADAS-AI criteria. From 230 records, 9 studies (3.90%) met inclusion criteria. Of these, five (55.60%) implemented true multi-agent architecture, two (22.20%) used hybrid or conceptual frameworks, and two (22.20%) relied on single-model LLMs without genuine agentic behavior. All nine studies were single center with no external validation. Sample sizes were small (median 142 cases; range 16-302). The only randomized controlled trial-INSPIRE (neurophysiology with imaging correlation)-demonstrated high technical performance (≈92% accuracy; AIGERS 0.94 for AI-assisted vs. 0.70 for AI-only, p < 0.001) but showed no measurable clinical benefit when physicians used AI assistance compared with independent reporting. Safety assessments were absent from all studies. Agentic AI in neuroradiology remains technically promising but clinically unproven. Severe evidence scarcity (3.90% inclusion rate), frequent overextension of the "agentic" label (30% of studies lacked genuine autonomy), and the persistent gap between technical performance and clinical utility indicate that the field remains in its early research phase. Current evidence is insufficient to support clinical deployment. Rigorous, multi-center prospective trials with patient-centered and safety outcomes are essential before clinical implementation can be responsibly considered.