AI-RADS: A Framework for Assessment of Artificial Intelligence Output in Radiology-Development and Multireader Evaluation.

February 9, 2026

DOI: 10.1097/RLI.0000000000001272 PMID: 41661173

Authors

Russe MF,Fink A,Simon CP,Rau S,Kästingschäfer K,Bamberg F,Rau A

Abstract

Despite the growing number of artificial intelligence (AI)-based applications used in radiology, no structured framework exists to assess their case-level reliability or to document overridden outputs in reports. To develop and evaluate the Artificial Intelligence Reporting and Data System (AI-RADS), a structured framework for an objective, case-level assessment of AI output reliability, clinical utility, and recommended actions in radiology. The AI-RADS framework was tested in a retrospective, multireader study. Here, 5 board-certified radiologists independently evaluated 350 cases processed by 7 representative AI applications for image-based and generative tasks. Each case was assigned one of 5 AI-RADS categories, applicable modifiers, and an independent correctness rating as a reference. Interreader agreement was quantified using Krippendorff's α with 95% CIs. Substantial interreader agreement was observed for the core AI-RADS categories in both image-based (Krippendorff's α=0.87; 95% CI: 0.83-0.91) and generative AI tasks (Krippendorff's α=0.93; 95% CI: 0.91-0.95). Reader-assigned correctness aligned well with AI-RADS categories 1 to 2, which indicate outputs suitable for integration into clinical workflows. Outputs rated as "incorrect" were predominantly assigned to categories 4 to 5, warranting override or removal from display. AI-RADS provides a structured framework for the case-level evaluation of AI output reliability, clinical utility, and consequences for report communication. This multireader study demonstrated substantial interreader agreement and applicability across various AI applications.

View Source Full Text PDF

Topics

Journal Article

AI-RADS: A Framework for Assessment of Artificial Intelligence Output in Radiology-Development and Multireader Evaluation.

Authors

Abstract

Tags

Topics

Ready to Sharpen Your Edge?