Deep learning models for lumbar spinal stenosis on MRI: model comparison and clinical benchmarking.
Authors
Affiliations (9)
Affiliations (9)
- Department of Diagnostic Imaging, National University Hospital, 5 Lower Kent Ridge Rd, Singapore, 119074, Singapore.
- Department of Computer Science, School of Computing, National University of Singapore, 13 Computing Drive, Singapore, 117417, Singapore.
- Department of Diagnostic Radiology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
- National University Spine Institute, Department of Orthopaedic Surgery, National University Hospital, Singapore, Singapore.
- Department of Radiology, Ng Teng Fong General Hospital, 1 Jurong East Street 21, Singapore, 609606, Singapore.
- Biostatistics Unit, Yong Loo Lin School of Medicine, National University of Singapore, 10 Medical Drive, Singapore, 117597, Singapore.
- Division of Spine Surgery, Department of Orthopaedic Surgery, Ng Teng Fong General Hospital, 1 Jurong East Street 21, Singapore, 609606, Singapore.
- Department of Diagnostic Imaging, National University Hospital, 5 Lower Kent Ridge Rd, Singapore, 119074, Singapore. [email protected].
- Department of Diagnostic Radiology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore. [email protected].
Abstract
To compare deep learning models of different architecture for automated lumbar spinal stenosis classification on MRI and benchmark their performance against radiologists and orthopedists. Lumbar spine MRI studies from Sep-2015 to Sep-2019 were retrospectively obtained. Exclusion criteria included previous spinal instrumentation, suboptimal image quality, post-gadolinium studies, and severe scoliosis. Axial T2-weighted and sagittal T1-weighted images were used. Studies were split into training/validation and test sets. An external test set of 100 studies was used. Training data were labelled by 4 radiologists using predefined gradings. Two models, CNN-based and transformer-based, were developed. Consensus labelling by two expert spine radiologists served as the reference standard. Test sets were labelled by 8 participants (2 general radiologists, 2 radiologists-in-training, 2 orthopedists, 2 orthopedists-in-training). Detection recall (%), interrater agreement (Gwet κ), sensitivity, and specificity were evaluated. 564 MRI lumbar spines were included (mean age = 52 ± 19[SD]; 302 women), with 464(82%) and 100(18%) for training/validation and internal testing, respectively. Both models showed high recall for all regions of interest (> 94%), similar to participants. Dichotomous classification (normal/mild vs. moderate/severe) by the CNN model, transformer model, and participants showed respective kappas for central canal 0.99/0.99/0.97-0.98, lateral recesses 0.98/0.94/0.81-0.94, and neural foramina 0.98/0.95/0.91-0.95 on internal testing (p < 0.001); for central canal 0.99/0.97/0.92-0.97, lateral recess 0.97/0.90/0.61-0.91, and neural foramina 0.99/0.94/0.87-0.93 on external testing (p < 0.001). The CNN model showed superior performance, and the transformer model showed similar to superior performance compared to clinicians for classifying lumbar spinal stenosis. These models could assist clinicians in report generation, surgical planning and education.