Improving the Robustness of Deep Learning Models in Predicting Hematoma Expansion from Admission Head CT.
Authors
Affiliations (9)
Affiliations (9)
- From the Department of Radiology (A.T.T., D.Z., S.P.), NewYork-Presbyterian/Columbia University Irving Medical Center, Columbia University, New York, New York.
- Department of Radiology and Biomedical Imaging (A.T.T., G.A.K., D.Z., A.M.), Yale School of Medicine, New Haven, Connecticut.
- Zeenat Qureshi Stroke Institute and Department of Neurology (A.I.Q.), University of Missouri, Columbia, Missouri.
- Department of Neurosurgery, Icahn School of Medicine at Mount Sinai (S.M.), Mount Sinai Hospital, New York, New York.
- Department of Neurology (S.B.M.), Weill Cornell Medical College, Cornell University, New York.
- Department of Neurology (S.P., D.K.), NewYork-Presbyterian/Columbia University Irving Medical Center, Columbia University, New York, New York.
- Department of Neurology (G.J.F. K.N.S.), Yale School of Medicine, New Haven, Connecticut.
- Center for Brain and Mind Health (G.J.F., K.N.S.), Yale School of Medicine, New Haven, Connecticut.
- From the Department of Radiology (A.T.T., D.Z., S.P.), NewYork-Presbyterian/Columbia University Irving Medical Center, Columbia University, New York, New York [email protected].
Abstract
Robustness against input data perturbations is essential for deploying deep learning models in clinical practice. Adversarial attacks involve subtle, voxel-level manipulations of scans to increase deep learning models' prediction errors. Testing deep learning model performance on examples of adversarial images provides a measure of robustness, and including adversarial images in the training set can improve the model's robustness. In this study, we examined adversarial training and input modifications to improve the robustness of deep learning models in predicting hematoma expansion (HE) from admission head CTs of patients with acute intracerebral hemorrhage (ICH). We used a multicenter cohort of <i>n</i> = 890 patients for cross-validation/training, and a cohort of <i>n</i> = 684 consecutive patients with ICH from 2 stroke centers for independent validation. Fast gradient sign method (FGSM) and projected gradient descent (PGD) adversarial attacks were applied for training and testing. We developed and tested 4 different models to predict ≥3 mL, ≥6 mL, ≥9 mL, and ≥12 mL HE in an independent validation cohort applying receiver operating characteristics area under the curve (AUC). We examined varying mixtures of adversarial and nonperturbed (clean) scans for training as well as including additional input from the hyperparameter-free Otsu multithreshold segmentation for model. When deep learning models trained solely on clean scans were tested with PGD and FGSM adversarial images, the average HE prediction AUC decreased from 0.8 to 0.67 and 0.71, respectively. Overall, the best performing strategy to improve model robustness was training with 5:3 mix of clean and PGD adversarial scans and addition of Otsu multithreshold segmentation to model input, increasing the average AUC to 0.77 against both PGD and FGSM adversarial attacks. Adversarial training with FGSM improved robustness against similar type attack but offered limited cross-attack robustness against PGD-type images. Adversarial training and inclusion of threshold-based segmentation as an additional input can improve deep learning model robustness in prediction of HE from admission head CTs in acute ICH.