Artificial Intelligence for Opportunistic Screening for Osteoporosis and Spine Fractures Using Computed Tomography: A Systematic Review and Meta-Analysis.
Authors
Affiliations (3)
Affiliations (3)
- Department of Radiology.
- Department of Imaging Physics.
- Department of Epidemiology, MD Anderson Cancer Center, Houston, TX.
Abstract
Osteoporosis is a major global health burden and often remains undiagnosed until fragility fractures occur. Dual-energy x-ray absorptiometry (DXA) is underutilized in routine practice. Computed tomography (CT) scans obtained for other indications offer an opportunity for opportunistic bone quality assessment without additional radiation exposure. This systematic review and meta-analysis evaluated: (i) the overall diagnostic performance [area under the receiver operating characteristic curve (AUC)] of artificial intelligence (AI) algorithms for osteoporosis/nonosteoporosis screening using CT; (ii) AUC for fracture/nonfracture screening; (iii) the most frequently studied body part; and (iv) the predominant machine learning (ML) and deep learning (DL) algorithms. Following PRISMA guidelines, 2 reviewers searched PubMed, Web of Science, Scopus, and Google Scholar for English-language studies (2011 to 2025). Eligible studies used DXA as the reference standard and reported diagnostic performance metrics. Studies lacking patient-level AI assessment, using alternative reference standards, and incorrect analysis methods were excluded. Risk of bias was assessed using QUADAS-2. A random-effects meta-analysis pooled sensitivity, specificity, and AUC. Subgroup analysis was conducted by algorithm type and study size. Heterogeneity was quantified using the I2 and Q-statistic. Of 1258 screened, 31 studies were included [26/31 (83.87%) osteoporosis screening; 5/31 (16.13%) fracture screening]. For osteoporosis screening, the abdomen/lumbar spine (L-spine) was the most frequently evaluated body part [14/26 (53.85%) studies]. Within this subgroup, osteoporosis/nonosteoporosis (normal/osteopenia) [5/14 (35.71%) studies] was the most common outcome. For fracture screening, the abdomen/L-spine was the most assessed body part [4/5 (80%) studies]. Within this subgroup, fracture/nonfracture [4/4 (100%)] was the most common outcome. The pooled AUC for abdomen/L-spine osteoporosis/nonosteoporosis screening was 0.931 (95% CI: 0.926-0.936), and for abdomen/L-spine fracture/nonfracture screening was 0.863 (95% CI: 0.73-0.936). ML was used in 21/26 (80.77%) studies and DL in 4/26 (15.38%) studies for osteoporosis screening. Support vector machine (SVM) was the most common ML (13/21, 61.90%), while custom convolutional neural networks (CNNs) (3/4, 75%) predominated in DL. ML was used in 3/5 (60%) studies and DL in 1/5 (20%) studies for fracture screening. SVM was the most common ML (n=3/5, 60%), while custom CNN (1/1, 100%) predominated DL. Heterogeneity was lower for abdomen/L-spine osteoporosis/nonosteoporosis screening (I2=47.24%) than for fracture/nonfracture screening (I2=64.8%). AI algorithms applied to abdomen/L-spine demonstrate excellent performance for opportunistic osteoporosis/nonosteoporosis screening and fracture/nonfracture screening, supporting integration into routine CT interpretation to improve early detection and prevention.