Researchers found ChatGPT-4 Turbo could efficiently monitor the performance of Aidoc's ICH detection AI across real-world radiology practices.
Key Details
- 1Study used ChatGPT-4 Turbo to monitor Aidoc's intracranial hemorrhage (ICH) detection on 332,809 head CT exams from 37 sites.
- 2Compared LLM data extraction to a ground-truth set of 1,000 radiologist-labeled reports.
- 3LLM achieved 0.995 accuracy, 0.99 AUC, PPV of 1, and NPV of 0.98.
- 4Discordant cases were mostly due to Aidoc overcalls; only 0.5% due to LLM extraction error.
- 5Aidoc AI's performance varied across CT scanner models and was influenced by scanner manufacturer, exam artifacts, and patient symptoms.
- 6Authors emphasized that LLM monitoring is cost-effective and scalable compared to manual review.
Why It Matters

Source
AuntMinnie
Related News

Paul Chang Discusses Foundation Models and Agentic AI at RSNA 2025
Dr. Paul Chang shares his insights on the role of foundation models and agentic AI in radiology at RSNA 2025.

Toronto Study: LLMs Must Cite Sources for Radiology Decision Support
University of Toronto researchers found that large language models (LLMs) such as DeepSeek V3 and GPT-4o offer promising support for radiology decision-making in pancreatic cancer when their recommendations cite guideline sources.

AI Model Using Mammograms Enhances Five-Year Breast Cancer Risk Assessment
A new image-only AI model more accurately predicts five-year breast cancer risk than breast density alone, according to multinational research presented at RSNA 2025.