External validity of deep learning solution for spontaneous intracranial hemorrhage detection on head CT scans.
Authors
Affiliations (5)
Affiliations (5)
- Department of Neurosurgery, Helsinki University Hospital and University of Helsinki, Helsinki, Finland, P.O. Box 266, FI-00029. [email protected].
- Machine Intelligence in Clinical Neuroscience & Microsurgical Neuroanatomy (MICN) Laboratory, Department of Neurosurgery, Clinical Neuroscience Center, University Hospital Zürich, University of Zürich, Zürich, Switzerland.
- Department of Neurosurgery, HOCH Health Ostschweiz, Kantonsspital St. Gallen, St. Gallen, Switzerland.
- Department of Neurosurgery, Klinikum rechts der Isar, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany.
- Department of Neurosurgery, Helsinki University Hospital and University of Helsinki, Helsinki, Finland, P.O. Box 266, FI-00029.
Abstract
We externally validated the performance of deep learning (DL) solution for detection of spontaneous intracerebral (ICH), intraventricular (IVH) and subarachnoid hemorrhages (SAH) on non-contrast enhanced head CT scans (NCCTs). We analyzed 901 NCCTs collected retrospectively from two Swiss hospitals: University Hospital Zürich (USZ) and HOCH Health Ostschweiz, Kantonsspital St. Gallen (KSSG). Of these 901 NCCTs, 81 had spontaneous ICH, IVH or SAH. The diagnostic accuracy was evaluated using the radiologist's reports as the reference standard. The DL solution correctly identified 74 out of the 81 intracranial hemorrhages (sensitivity 91.4%). In the USZ cohort, the sensitivity was 88.5% and the specificity was 89.7%. Using the original KSSG-NCCTs, the DL solution had a sensitivity of 100.0% and a specificity of 47.5%. After adjusting the KSSG-NCCTs pixel matrix to a standardized 512 × 512 resolution the KSSG cohort sensitivity remained at 100.0%, and the specificity increased to 74.0%. The overall specificity was 78.4% when using the original imaging data. The overall specificity increased to 85.5% when using the padded KSSG-NCCTs along with original imaging data from USZ. Particularly the specificity varied substantially depending on imaging acquisition parameters. In a clinical setting, this would mean a high variability in a false positive rate.