Back to all papers

Predicting the Value of Radiology Artificial Intelligence Applications: Large-Scale Predeployment Evaluation of a Portfolio of Models.

March 4, 2026pubmed logopapers

Authors

Larson DB,Poff JA,Krishnan S,Avondo J,Armstrong BA,Na HS,Chaudhari A,Kottler N

Affiliations (3)

  • AI Development and Evaluation (AIDE) Lab, Department of Radiology, Stanford University.
  • Clinical Artificial Intelligence Team, Radiology Partners, Nashville, TN.
  • Aidoc, Tel Aviv-Yafo, Israel.

Abstract

<b>Background:</b> Real-world performance of radiology artificial intelligence (AI) applications frequently diverges from previously reported results, creating challenges in anticipating a model's clinical value and impact. <b>Objective:</b> To develop a structured predeployment evaluation method for radiology AI models that combines standard performance metrics with new augmentation metrics in predicting overall AI model value and to test this method's predictions against radiologists' real-world postdeployment perceptions of model value. <b>Methods:</b> In this prospective study, a large national radiology practice conducted a predeployment evaluation from July 2022 to November 2024 of a single vendor's portfolio of 13 AI models for 12 clinical tasks. A four-radiologist workgroup identified attributes contributing to inherent value of AI assistance for clinical tasks, assigned weights to those attributes, and rated models accordingly. Performance of radiologists (based on clinical reports) and AI was assessed for 88,645 examinations across clinical sites using conventional metrics and augmentation metrics reflecting enhanced detection cases (i.e., AI-detected radiologist-missed positive cases). The workgroup combined inherent task values and pooled AI performance to predict models' overall value. Radiologists completed a postdeployment survey. <b>Results:</b> The workgroup identified three attributes as most likely to contribute to inherent value of AI assistance: tediousness of the task, likelihood that the radiologist would miss the finding, and a missed finding's potential clinical impact. Five, five, and two tasks were rated as having high, medium, and low inherent value, respectively. Across tasks, radiologists generally had higher PPV, whereas AI generally had higher sensitivity. Models showed widely varying absolute and relative enhanced detection rates (0.03-2.28% and 4.5-60.5%, respectively). Five, five, and three models were predicted to have high, medium, and low overall value, respectively. Survey response rate was 43.2% (54/125). Perceived value categories agreed between survey respondents and workgroup predictions for ten of 12 tasks. <b>Conclusion:</b> We present a structured method for predeployment evaluation of AI models' potential value, combining task-inherent value assessments with radiologist and AI performance metrics. A validation survey indicated high agreement between predeployment predictions and real-world postdeployment value perceptions. <b>Clinical Impact:</b> This practical evaluation approach can help guide radiology practices in evidence-based purchasing and deployment decisions for radiology AI models.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.