Optimization and External Validation of a Deep Learning Model for Segmentation and Quantification of Traumatic Brain Injury Lesions on Head Computed Tomography.
Authors
Affiliations (7)
Affiliations (7)
- Department of Surgery, Division of Neurosurgery, University of Toronto, Toronto, Ontario, Canada.
- Interdepartmental Division of Critical Care Medicine, University of Toronto, Toronto, Ontario, Canada.
- Department of Computing, Biomedical Image Analysis Group, Imperial College London, London, UK.
- Li Ka Shing Knowledge Institute, St Michael's Hospital, Toronto, Ontario, Canada.
- Institute of Health Policy, Management and Evaluation, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada.
- Department of Medicine, University Division of Anaesthesia and PACE Section, University of Cambridge, Cambridge, UK.
- Department of Critical Care Medicine, Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada.
Abstract
Current methods for radiological classification in traumatic brain injury (TBI), such as the Marshall and Rotterdam score, provide an incomplete description of intracranial lesion burden, rely on time-consuming manual assessments by experts, and are prone to intra- and interrater disagreement. To circumvent these limitations, we previously proposed the Brain Lesion Analysis and Segmentation Tool for Computed Tomography (BLAST-CT), a deep learning-based method using convolutional neural networks (CNNs) to perform multiclass, voxel-wise segmentation and quantification of TBI lesions on CT in an automated fashion. In this study, we expand on our previous work by (1) optimizing the performance of our model using additional training data from CENTER-TBI, (2) externally validating the findings reported in our internal development study by applying our algorithm to an independent imaging dataset from the Prophylaxis for Venous Thromboembolism in Severe Traumatic Brain Injury (PROTEST) multicenter randomized controlled trial. A total of 680 scans from CENTER-TBI were annotated by neuroradiological experts and used to retrain the CNN, creating BLAST version 2.0. Traumatic lesions were subdivided into four distinct classes: intraparenchymal hemorrhage (IPH), extra-axial hemorrhage (EAH), intraventricular hemorrhage (IVH), and perilesional edema. Fifty-one scans from PROTEST were manually annotated to obtain ground-truth lesion labels on an independent imaging dataset. The same PROTEST scans were then contemporaneously run through versions 1.0 and 2.0 of BLAST-CT to evaluate the performance change resulting from the optimization training procedure while calculating segmentation accuracy metrics on an external validation dataset. The additional training phase implemented for version 2.0 yielded an overall mean Dice similarity coefficient (DSC) improvement of 4% when looking at lesions of any size and class (range 0-13% for individual lesion classes). Mean absolute volume errors between automated and ground-truth segmentations also improved for most lesion types (2.94 vs. 1.55 mL for IPH, 18.44 vs. 16.33 mL for EAH, 0.74 vs. 0.80 for IVH, and 1.56 vs. 0.27 for perilesional edema using version 1.0 vs. 2.0, respectively). Overall, the performance of BLAST-CT on the PROTEST external validation dataset was comparable or better to the results obtained on our internal development sample (median DSC for all lesion classes was 0.60 [IQR 0.0-0.94] on the PROTEST images vs. 0.36 [IQR 0.0-0.63] on the CENTER-TBI development dataset). Mean volume differences between the ground-truth and predicted lesion maps were also comparable to the internal sample for most lesion subtypes, with the exception of EAH. We propose one of the first models capable of automated multiclass volumetric lesion segmentation in TBI to be trained and externally validated in a multicenter fashion. After optimizing our model using a large additional training sample from CENTER-TBI, we were able to achieve a level of performance comparable to other state-of-the-art methods. We also make this optimized model (BLAST-CT 2.0) publicly available to provide a robust and generalizable platform for application to large prospective TBI datasets and clinical trials.