Automated vertebral bone quality score measurement on lumbar MRI using deep learning: Development and validation of an AI algorithm.

Authors

Jayasuriya NM,Feng E,Nathani KR,Delawan M,Katsos K,Bhagra O,Freedman BA,Bydon M

Affiliations (5)

  • Department of Neurologic Surgery, Mayo Clinic, Rochester, MN, USA.
  • Department of Neurologic Surgery, Mayo Clinic, Rochester, MN, USA; Department of Neurosurgery, University of Minnesota, Minneapolis, MN, USA.
  • Department of Neurologic Surgery, Mayo Clinic, Rochester, MN, USA; Department of Neurological Surgery, University of Chicago, Chicago, IL, USA.
  • Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN, USA.
  • Department of Neurological Surgery, University of Chicago, Chicago, IL, USA. Electronic address: [email protected].

Abstract

Bone health is a critical determinant of spine surgery outcomes, yet many patients undergo procedures without adequate preoperative assessment due to limitations in current bone quality assessment methods. This study aimed to develop and validate an artificial intelligence-based algorithm that predicts Vertebral Bone Quality (VBQ) scores from routine MRI scans, enabling improved preoperative identification of patients at risk for poor surgical outcomes. This study utilized 257 lumbar spine T1-weighted MRI scans from the SPIDER challenge dataset. VBQ scores were calculated through a three-step process: selecting the mid-sagittal slice, measuring vertebral body signal intensity from L1-L4, and normalizing by cerebrospinal fluid signal intensity. A YOLOv8 model was developed to automate region of interest placement and VBQ score calculation. The system was validated against manual annotations from 47 lumbar spine surgery patients, with performance evaluated using precision, recall, mean average precision, intraclass correlation coefficient, Pearson correlation, RMSE, and mean error. The YOLOv8 model demonstrated high accuracy in vertebral body detection (precision: 0.9429, recall: 0.9076, [email protected]: 0.9403, mAP@[0.5:0.95]: 0.8288). Strong interrater reliability was observed with ICC values of 0.95 (human-human), 0.88 and 0.93 (human-AI). Pearson correlations for VBQ scores between human and AI measurements were 0.86 and 0.9, with RMSE values of 0.58 and 0.42 respectively. The AI-based algorithm accurately predicts VBQ scores from routine lumbar MRIs. This approach has potential to enhance early identification and intervention for patients with poor bone health, leading to improved surgical outcomes. Further external validation is recommended to ensure generalizability and clinical applicability.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.