Towards Automated Spine Fracture Detection on Whole-Body CT of Polytraumatized Patients.
Authors
Affiliations (4)
Affiliations (4)
- Institute of Radiology and Neuroradiology, BG Klinikum Unfallkrankenhaus Berlin, 12683 Berlin, Germany.
- Center for Clinical Research, BG Klinikum Unfallkrankenhaus Berlin, 12683 Berlin, Germany.
- Institute of Diagnostic Radiology and Neuroradiology, Universitätsklinikum Greifswald, 17489 Greifswald, Germany.
- Philips Research Hamburg, Roentgenstrasse 24-26, 22335 Hamburg, Germany.
Abstract
Treatment of severely injured patients is challenging, and timely reading of whole-body computed tomography (WBCT) images therefore crucial. Artificial intelligence is increasingly used to prioritize and detect acute injuries in this context. Algorithms focusing on the cervical spine and compression fractures have been deployed successfully. However, tools for whole spine assessment and the entirety of fracture morphologies are lacking. We aimed to investigate the capabilities of an algorithm to detect spine fractures on WBCTs and factors contributing to the difficulties in its development. A version 1.0 (v1) of the algorithm was previously trained with 454 cervical spine fractures using a U-Net via four-fold cross-validation to segment spine fractures and the spine via a multi-task loss. Further training expanded towards whole spine assessment with additional annotated fractures (Cohort 1) of the cervical (<i>n</i> = 50), thoracic (<i>n</i> = 30), and lumbar spine (<i>n</i> = 20), resulting in version 2.0 (v2). Baseline was set to reach the highest sensitivity at a maximum of five false positives per case. Version 1.0 was tested on Cohort 1 and both versions were compared on prospectively collected real-world data (Cohort 2, <i>n</i> = 712 WBCTs). An additional systematic review served to compare the algorithmic performance against the state-of-the-art. Version 1.0 showed promising performance not only for the cervical but also the thoracic and lumbar spine due to generalization (sensitivities ranging between 60% and 87%). Version 2.0 also achieved decent sensitivities for Cohort 2 (sensitivities ranging between 77% and 85%) but generated an abundance of false positives. Various reasons led to false positive results; for Version 2.0, the trabecular structure itself provoked false alerts. Variances in training and test data (image quality, dose, reconstructions), heterogeneity of fractures and anatomies, plus the size of training sets explain some difficulties during algorithm development. Only five other groups described their work on whole-spine fracture detection, encountered similar difficulties, and have also failed to develop a clinically deployable tool. Spine fracture detection on WBCT is feasible, but multiple factors hinder the development of commercially available AI tools. Expansion and the improved design of training cohorts are necessary for further development and simulation of real-life conditions.