Researchers found ChatGPT-4 Turbo could efficiently monitor the performance of Aidoc's ICH detection AI across real-world radiology practices.
Key Details
- 1Study used ChatGPT-4 Turbo to monitor Aidoc's intracranial hemorrhage (ICH) detection on 332,809 head CT exams from 37 sites.
- 2Compared LLM data extraction to a ground-truth set of 1,000 radiologist-labeled reports.
- 3LLM achieved 0.995 accuracy, 0.99 AUC, PPV of 1, and NPV of 0.98.
- 4Discordant cases were mostly due to Aidoc overcalls; only 0.5% due to LLM extraction error.
- 5Aidoc AI's performance varied across CT scanner models and was influenced by scanner manufacturer, exam artifacts, and patient symptoms.
- 6Authors emphasized that LLM monitoring is cost-effective and scalable compared to manual review.
Why It Matters

Source
AuntMinnie
Related News

Stanford Team Introduces Real-Time AI Safety Monitoring for Radiology
Stanford researchers introduced an ensemble monitoring model to provide real-time confidence assessments for FDA-cleared radiology AI tools.

Harrison.ai Receives FDA Breakthrough Status for Imaging AI Device
Harrison.ai has been awarded three FDA breakthrough device designations for its imaging AI solutions, including a tool for obstructive hydrocephalus triage.

Head-to-Head Study Evaluates AI Accuracy in Fracture Detection on X-Ray
A prospective study compared three commercial AI tools for fracture detection on x-ray, showing moderate-to-high accuracy for simple cases but weaker performance in complex scenarios.