Performance comparison between a deep learning model and spine surgeons in detecting cervical spinal cord compression on radiographs.

May 22, 2026

papers

DOI: 10.3171/2026.1.SPINE251409 PMID: 42172670

Authors

Chen R,Liang M,Zhang Y,Liu X,Wang T,Wang A,Fan N,Yuan S,Du P,Ma Z,Xi Y,Gu Z,Fei Q,Zang L

Affiliations (6)

1Department of Orthopedics, Beijing Chaoyang Hospital, Capital Medical University, Beijing.
2Department of Biomedical Engineering, School of Medicine, Tsinghua University, Beijing.
3Longwood Valley Medical Technology Co. Ltd., Beijing.
4School of Life Sciences, Tsinghua University, Beijing.
5Institute of Biomedical and Health Engineering (iBHE), Tsinghua Shenzhen International Graduate School, Shenzhen; and.
6Department of Orthopedics, Beijing Friendship Hospital, Capital Medical University, Beijing, China.

Abstract

This study aimed to develop a deep learning (DL) model for the detection of cervical spinal cord compression on cervical radiographs and compare its performance with spine surgeons. The authors conducted a retrospective study on consecutive hospitalized patients who underwent cervical spine radiography and MRI at their center. Data from 600 patients were randomly divided into the training (n = 480), validation (n = 60), and internal test (n = 60) sets. Additionally, patients from another center were included as an external test set (n = 60). MR images were used as the gold standard for determining the presence of cervical segmental compression. The model was trained on cervical radiographs, where a segmentation-based localization algorithm was first developed to identify cervical segments, followed by a binary classification to diagnose spinal cord compression. Furthermore, the gradient-weighted class activation mapping (Grad-CAM) was used to visualize the area with high feature densities extracted by the model. Model performance was evaluated based on accuracy, sensitivity, specificity, F1 score, and area under the receiver operating characteristic curve (AUC), and compared to the diagnoses of two spine surgeons. In the internal test set, the model achieved 94.67% accuracy and an AUC of 0.9911, significantly outperforming the two spine surgeons (69.09% and 71.18% accuracy, p < 0.05). In the external test set, the model achieved 93.33% accuracy with an AUC of 0.9868. Compared to the reference standard, the kappa coefficients for the model, reader 1, and reader 2 were 0.893, 0.378, and 0.422, respectively. Grad-CAM showed high feature density in the intervertebral discs, intervertebral foramina, and facet joints. The DL model developed in this study achieved binary classification of cervical spinal cord compression and localized the affected segments on cervical radiographs, demonstrating superior diagnostic performance to that of spine surgeons. This DL model ensures high detection rates for cervical spinal cord compression and holds promise for clinical diagnosis, particularly in resource-limited or remote settings.

View Source Full Text PDF

Topics

Journal Article

Performance comparison between a deep learning model and spine surgeons in detecting cervical spinal cord compression on radiographs.

Authors

Affiliations (6)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?