Osteoporosis prediction using lumbar CT Hounsfield units: comparative performance and clinical implications of seven machine learning models.
Authors
Affiliations (2)
Affiliations (2)
- Department of Orthopaedic Surgery, Tokai University School of Medicine, Isehara, Kanagawa, Japan. [email protected].
- Department of Orthopaedic Surgery, Tokai University School of Medicine, Isehara, Kanagawa, Japan.
Abstract
This study aimed to evaluate the utility of L1-L4 average Hounsfield Unit (HU) values from lumbar spine computed tomography (CT) in predicting osteoporosis using multiple machine learning (ML) models. We retrospectively analyzed 172 patients (≥ 50 years) who underwent lumbar spine surgery and received preoperative CT and dual-energy X-ray absorptiometry (DXA) within 3 months. Osteoporosis was defined as a T-score < - 2.5 at either the lumbar spine or femoral neck. The L1-L4 average HU value was used as the sole input variable to develop seven supervised ML models. Model performance was compared using accuracy, precision, recall, F1 score, and area under the receiver operating characteristic curve (AUC), and HU-based thresholds for osteoporosis screening were explored. Of 172 patients, 59 (34.3%) were classified as having osteoporosis. The osteoporosis group showed significantly lower L1-L4 HU values than the non-osteoporosis group (105.1 ± 47.5 vs. 140.1 ± 61.3, p < 0.01). Among the models, K-nearest neighbors (KNN) achieved the most balanced diagnostic performance (accuracy: 0.714 ± 0.048; F1 score: 0.466 ± 0.073). Logistic Regression and Naive Bayes showed the highest AUCs (0.785 ± 0.096 and 0.777 ± 0.098, respectively) but limited recall, whereas Support Vector Machine demonstrated moderate performance. Tree-based models yielded comparatively lower discriminatory ability. Optimal HU ranges for identifying high osteoporosis risk generally converged around 90-130 HU. ML models using L1-L4 HU values can aid in osteoporosis screening. KNN provided the most robust and balanced diagnostic performance, while Logistic Regression and SVM offered stable threshold-based classification. These findings support the utility of CT-based ML approaches in preoperative spinal surgery settings, particularly where DXA is unavailable or limited.