AI-derived CT biomarker score for robust COVID-19 mortality prediction across multiple waves and regions using machine learning.
Authors
Affiliations (6)
Affiliations (6)
- Department of Radiology, AZ Delta General Hospital, Roeselare, Belgium.
- Department of Laboratory Medicine, AZ Delta General Hospital, Deltalaan 1, Roeselare, 8800, Belgium. [email protected].
- RADar Innovation Center, AZ Delta General Hospital, Roeselare, Belgium.
- Department of Laboratory Medicine, AZ Delta General Hospital, Deltalaan 1, Roeselare, 8800, Belgium.
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium.
- Department of Radiology, UZ Brussel, Brussels, Belgium.
Abstract
This study aimed to develop a simple, interpretable model using routinely available data for predicting COVID-19 mortality at admission, addressing limitations of complex models, and to provide a statistically robust framework for controlled clinical use, managing model uncertainty for responsible healthcare application. Data from Belgium's first COVID-19 wave (UZ Brussel, n = 252) were used for model development. External validation utilized data from unvaccinated patients during the late second and early third waves (AZ Delta, n = 175). Various machine learning methods were trained and compared for diagnostic performance after data preprocessing and feature selection. The final model, the M3-score, incorporated three features: age, white blood cell (WBC) count, and AI-derived total lung involvement (TOTAL<sub>AI</sub>) quantified from CT scans using Icolung software. The M3-score demonstrated strong classification performance in the training cohort (AUC 0.903) and clinically useful performance in the external validation dataset (AUC 0.826), indicating generalizability potential. To enhance clinical utility and interpretability, predicted probabilities were categorized into actionable likelihood ratio (LR) intervals: highly unlikely (LR 0.0), unlikely (LR 0.13), gray zone (LR 0.85), more likely (LR 2.14), and likely (LR 8.19) based on the training cohort. External validation suggested temporal and geographical robustness, though some variability in AUC and LR performance was observed, as anticipated in real-world settings. The parsimonious M3-score, integrating AI-based CT quantification with clinical and laboratory data, offers an interpretable tool for predicting in-hospital COVID-19 mortality, showing robust training performance. Observed performance variations in external validation underscore the need for careful interpretation and further extensive validation across international cohorts to confirm wider applicability and robustness before widespread clinical adoption.