Comparison of lumbar disc degeneration grading between deep learning model SpineNet and radiologist: a longitudinal study with a 14-year follow-up.

Authors

Murto N,Lund T,Kautiainen H,Luoma K,Kerttula L

Affiliations (5)

  • Department of Radiology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland. [email protected].
  • Department of Radiology, Helsinki University Central Hospital, PO Box 200, Helsinki, 00029 HUS, Finland. [email protected].
  • Department of Orthopaedics and Traumatology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland.
  • Primary Health Care Unit, Finland and Folkhälsan Research Center, Kuopio University Hospital, Helsinki, Finland.
  • Department of Radiology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland.

Abstract

To assess the agreement between lumbar disc degeneration (DD) grading by the convolutional neural network model SpineNet and radiologist's visual grading. In a 14-year follow-up MRI study involving 19 male volunteers, lumbar DD was assessed by SpineNet and two radiologists using the Pfirrmann classification at baseline (age 37) and after 14 years (age 51). Pfirrmann summary scores (PSS) were calculated by summing individual disc grades. The agreement between the first radiologist and SpineNet was analyzed, with the second radiologist's grading used for inter-observer agreement. Significant differences were observed in the Pfirrmann grades and PSS assigned by the radiologist and SpineNet at both time points. SpineNet assigned Pfirrmann grade 1 to several discs and grade 5 to more discs compared to the radiologists. The concordance correlation coefficients (CCC) of PSS between the radiologist and SpineNet were 0.54 (95% CI: 0.28 to 0.79) at baseline and 0.54 (0.27 to 0.80) at follow-up. The average kappa (κ) values of 0.74 (0.68 to 0.81) at baseline and 0.68 (0.58 to 0.77) at follow-up. CCC of PSS between the radiologists was 0.83 (0.69 to 0.97) at baseline and 0.78 (0.61 to 0.95) at follow-up, with κ values ranging from 0.73 to 0.96. We found fair to substantial agreement in DD grading between SpineNet and the radiologist, albeit with notable discrepancies. These findings indicate that AI-based systems like SpineNet hold promise as complementary tools in radiological evaluation, including in longitudinal studies, but emphasize the need for ongoing refinement of AI algorithms.

Topics

Journal Article
Get Started

Upload your X-ray image and get interpretation.

Upload now →

Disclaimer: X-ray Interpreter's AI-generated results are for informational purposes only and not a substitute for professional medical advice. Always consult a healthcare professional for medical diagnosis and treatment.