Automated detection of spinal bone marrow oedema in axial spondyloarthritis: training and validation using two large phase 3 trial datasets.
Authors
Affiliations (12)
Affiliations (12)
- Visual Geometry Group, Department of Engineering Science, University of Oxford, Oxford, UK.
- Rheuma Praxis Berlin, Berlin, Germany.
- Ruhr-Universität Bochum, Bochum, Germany.
- Division of Rheumatology, University of California San Francisco, California, San Francisco, USA.
- Copenhagen Center for Arthritis Research, Center for Rheumatology and Spine Diseases, Rigshospitalet, Glostrup, Denmark.
- Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark.
- Division of Rheumatology, Department of Medicine, University Health Network and University of Toronto, Toronto, Canada.
- Department of Gastroenterology, Infectiology and Rheumatology (including Nutrition Medicine), Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany.
- Novartis Pharmaceuticals Corporation, East Hanover, USA.
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, University College London, London, UK.
- National Institute for Health Research (NIHR) University College London Hospitals Biomedical Research Centre, London, UK.
- Department of Rheumatology, Northwick Park Hospital, London North West University Healthcare NHS Trust, London, UK.
Abstract
To evaluate the performance of machine learning (ML) models for the automated scoring of spinal MRI bone marrow oedema (BMO) in patients with axial spondyloarthritis (axSpA) and compare them with expert scoring. ML algorithms using SpineNet software were trained and validated on 3483 spinal MRIs from 686 axSpA patients across two clinical trial datasets. The scoring pipeline involved (i) detection and labelling of vertebral bodies and (ii) classification of vertebral units for the presence or absence of BMO. Two models were tested: Model 1, without manual segmentation, and Model 2, incorporating an intermediate manual segmentation step. Model outputs were compared with those of human experts using kappa statistics, balanced accuracy, sensitivity, specificity, and AUC. Both models performed comparably to expert readers, regarding presence vs absence of BMO. Model 1 outperformed Model 2, with an AUC of 0.94 (vs 0.88), accuracy of 75.8% (vs 70.5%), and kappa of 0.50 (vs 0.31), using absolute reader consensus scoring as the external reference; this performance was similar to the expert inter-reader accuracy of 76.8% and kappa of 0.47, in a radiographic axSpA dataset. In a non-radiographic axSpA dataset, Model 1 achieved an AUC of 0.97 (vs 0.91 for Model 2), accuracy of 74.6% (vs 70%), and kappa of 0.52 (vs 0.27), comparable to the expert inter-reader accuracy of 74.2% and kappa of 0.46. ML software shows potential for automated MRI BMO assessment in axSpA, offering benefits such as improved consistency, reduced labour costs, and minimised inter- and intra-reader variability. Clinicaltrials.gov, MEASURE 1 study (NCT01358175); PREVENT study (NCT02696031).