Development and International Validation of a Deep Learning Model for Predicting Acute Pancreatitis Severity from CT Scans

Authors

Xu, Y.,Teutsch, B.,Zeng, W.,Hu, Y.,Rastogi, S.,Hu, E. Y.,DeGregorio, I. M.,Fung, C. W.,Richter, B. I.,Cummings, R.,Goldberg, J. E.,Mathieu, E.,Appiah Asare, B.,Hegedus, P.,Gurza, K.-B.,Szabo, I. V.,Tarjan, H.,Szentesi, A.,Borbely, R.,Molnar, D.,Faluhelyi, N.,Vincze, A.,Marta, K.,Hegyi, P.,Lei, Q.,Gonda, T.,Huang, C.,Shen, Y.

Affiliations (1)

  • Department of Radiology, New York University Grossman School of Medicine; Center for Data Science, New York University

Abstract

Background and aimsAcute pancreatitis (AP) is a common gastrointestinal disease with rising global incidence. While most cases are mild, severe AP (SAP) carries high mortality. Early and accurate severity prediction is crucial for optimal management. However, existing severity prediction models, such as BISAP and mCTSI, have modest accuracy and often rely on data unavailable at admission. This study proposes a deep learning (DL) model to predict AP severity using abdominal contrast-enhanced CT (CECT) scans acquired within 24 hours of admission. MethodsWe collected 10,130 studies from 8,335 patients across a multi-site U.S. health system. The model was trained in two stages: (1) self-supervised pretraining on large-scale unlabeled CT studies and (2) fine-tuning on 550 labeled studies. Performance was evaluated against mCTSI and BISAP on a hold-out internal test set (n=100 patients) and externally validated on a Hungarian AP registry (n=518 patients). ResultsOn the internal test set, the model achieved AUROCs of 0.888 (95% CI: 0.800-0.960) for SAP and 0.888 (95% CI: 0.819-0.946) for mild AP (MAP), outperforming mCTSI (p = 0.002). External validation showed robust AUROCs of 0.887 (95% CI: 0.825-0.941) for SAP and 0.858 (95% CI: 0.826-0.888) for MAP, surpassing mCTSI (p = 0.024) and BISAP (p = 0.002). Retrospective simulation suggested the models potential to support admission triage and serve as a second reader during CECT interpretation. ConclusionsThe proposed DL model outperformed standard scoring systems for AP severity prediction, generalized well to external data, and shows promise for providing early clinical decision support and improving resource allocation.

Topics

radiology and imaging

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.