Researchers found ChatGPT-4 Turbo could efficiently monitor the performance of Aidoc's ICH detection AI across real-world radiology practices.
Key Details
- 1Study used ChatGPT-4 Turbo to monitor Aidoc's intracranial hemorrhage (ICH) detection on 332,809 head CT exams from 37 sites.
- 2Compared LLM data extraction to a ground-truth set of 1,000 radiologist-labeled reports.
- 3LLM achieved 0.995 accuracy, 0.99 AUC, PPV of 1, and NPV of 0.98.
- 4Discordant cases were mostly due to Aidoc overcalls; only 0.5% due to LLM extraction error.
- 5Aidoc AI's performance varied across CT scanner models and was influenced by scanner manufacturer, exam artifacts, and patient symptoms.
- 6Authors emphasized that LLM monitoring is cost-effective and scalable compared to manual review.
Why It Matters
Reliable, scalable solutions for performance monitoring are crucial as AI tools become increasingly common in radiological practice. Using large language models like ChatGPT could streamline postdeployment quality assurance and ensure ongoing diagnostic accuracy.

Source
AuntMinnie
Related News

•AuntMinnie
Deep Learning AI Outperforms Radiologists in Detecting ENE on CT
A deep learning tool, DeepENE, exceeded radiologist performance in identifying lymph node extranodal extension in head and neck cancers using preoperative CT scans.

•Radiology Business
Patients Favor AI in Imaging Diagnostics, Hesitate on Triage Use
Survey finds most patients support AI in diagnostic imaging but are reluctant about its use in triage decisions.

•Radiology Business
FDA Clears Multi-Disease AI Screening Platform for CT Imaging
HeartLung Corporation's AI-CVD platform receives FDA clearance to detect multiple diseases from a single CT scan.