Back to all papers

Using a large language model for post-deployment monitoring of FDA approved AI: pulmonary embolism detection use case.

June 30, 2025pubmed logopapers

Authors

Sorin V,Korfiatis P,Bratt AK,Leiner T,Wald C,Butler C,Cook CJ,Kline TL,Collins JD

Affiliations (6)

  • Department of Radiology, Mayo Clinic College of Medicine and Science, Mayo Clinic, Rochester, MN, USA. Electronic address: [email protected].
  • Department of Radiology, Mayo Clinic College of Medicine and Science, Mayo Clinic, Rochester, MN, USA.
  • Department of Radiology, Mayo Clinic College of Medicine and Science, Mayo Clinic, Rochester, MN, USA; Chair, Thoracic Division, Department of Radiology, Mayo Clinic, Rochester, MN, USA.
  • Department of Radiology, Mayo Clinic College of Medicine and Science, Mayo Clinic, Rochester, MN, USA; Medical Director, Artificial Intelligence for Cardiovascular Imaging Research and Exploration Program, Mayo Clinic, Rochester, MN, USA.
  • Department of Radiology, Mayo Clinic College of Medicine and Science, Mayo Clinic, Rochester, MN, USA; Chair, ACR Informatics Commission; Vice Chair, ACR Board of Chancellors.
  • Department of Radiology, Mayo Clinic College of Medicine and Science, Mayo Clinic, Rochester, MN, USA; Chair, Informatics Division, Department of Radiology, Mayo Clinic, Rochester, MN, USA; Medical Director, Advanced Imaging Post-Processing Lab, Mayo Clinic, Rochester, MN, USA. Electronic address: [email protected].

Abstract

Artificial intelligence (AI) is increasingly integrated into clinical workflows. The performance of AI in production can diverge from initial evaluations. Post-deployment monitoring (PDM) remains a challenging ingredient of ongoing quality assurance once AI is deployed in clinical production. To develop and evaluate a PDM framework that uses large language models (LLMs) for free-text classification of radiology reports, and human oversight. We demonstrate its application to monitor a commercially vended pulmonary embolism (PE) detection AI (CVPED). We retrospectively analyzed 11,999 CT pulmonary angiography (CTPA) studies performed between 04/30/2023-06/17/2024. Ground truth was determined by combining LLM-based radiology-report classification and the CVPED outputs, with human review of discrepancies. We simulated a daily monitoring framework to track discrepancies between CVPED and the LLM. Drift was defined when discrepancy rate exceeded a fixed 95% confidence interval (CI) for seven consecutive days. The CI and the optimal retrospective assessment period were determined from a stable dataset with consistent performance. We simulated drift by systematically altering CVPED or LLM sensitivity and specificity, and we modeled an approach to detect data shifts. We incorporated a human-in-the-loop selective alerting framework for continuous prospective evaluation and to investigate potential for incremental detection. Of 11,999 CTPAs, 1,285 (10.7%) had PE. Overall, 373 (3.1%) had discrepant classifications between CVPED and LLM. Among 111 CVPED-positive and LLM-negative cases, 29 would have triggered an alert due to the radiologist not interacting with CVPED. Of those, 24 were CVPED false-positives, one was an LLM false-negative, and the framework ultimately identified four true-alerts for incremental PE cases. The optimal retrospective assessment period for drift detection was determined to be two months. A 2-3% decline in model specificity caused a 2-3-fold increase in discrepancies, while a 10% drop in sensitivity was required to produce a similar effect. For example, a 2.5% drop in LLM specificity led to a 1.7-fold increase in CVPED-negative-LLM-positive discrepancies, which would have taken 22 days to detect using the proposed framework. A PDM framework combining LLM-based free-text classification with a human-in-the-loop alerting system can continuously track an image-based AI's performance, alert for performance drift, and provide incremental clinical value.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 10k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.