Deep learning in the diagnosis, staging, and prognosis of osteonecrosis of the femoral head: a systematic review.
Authors
Affiliations (5)
Affiliations (5)
- Musculoskeletal Injuries Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
- Advanced Diagnostic & Interventional Radiology Research Center, Tehran University of Medical Sciences, Tehran, Iran.
- Center for Research and Training in Skin Diseases and Leprosy, Tehran University of Medical Sciences, Reza zandi, Tehran, MD, Iran.
- Musculoskeletal Injuries Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran. [email protected].
- Musculoskeletal Injuries Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran. [email protected].
Abstract
Osteonecrosis of the femoral head (ONFH) is a progressive ischemic bone disorder that often leads to structural collapse and early arthroplasty in adults. Early detection and accurate staging are essential for joint-preserving management. Deep learning (DL) has shown high promise in musculoskeletal imaging; however, the methodological quality and How well these ONFH-related DL models translate to clinical practice remain unclear. This systematic review synthesized current evidence on DL applications for ONFH diagnosis, staging, segmentation, and prognosis across imaging modalities, with emphasis on performance, quality, and explainability. Following PRISMA 2020 guidelines, PubMed, Embase, Scopus, and Web of Science were searched on 18 August 2025. Studies applying DL to MRI, CT or radiograph for ONFH diagnosis, segmentation, staging, or prediction were included. Two reviewers independently conducted screening and data extraction; Methodological quality was appraised using the Must AI Criteria-10 (MAIC-10); Due to heterogeneity, findings were narratively synthesized by imaging modality and analytical task. Among 301 records, 26 studies met inclusion criteria (1993-2025). MRI was the most used modality (n = 14), followed by radiography (n = 7), CT (n = 2), and multimodal frameworks (n = 3). Diagnostic AUCs typically ranged from 0.89 to 0.98, while necrotic-lesion segmentation performance yielded Dice scores between 0.57 and 0.75. Anatomical segmentation (femur or femoral head) achieved much higher Dice values (0.90-0.99). MRI-based models achieved the highest sensitivity for early and pre-collapse detection, while radiograph-based madules showed strong specificity and scalability on large datasets (up to 59372 images). CT and multimodal pipelines improved anatomical delineation, volumetric mapping, and cross-domain precision. Predictive models integrating imaging and clinical data achieved AUCs of 0.85-0.95 for outcome or collapse prediction. The mean MAIC-10 score was 6.27/10, indicating moderate methodological quality, with consistent strengths in clinical relevance but limited external validation, explainability and uncertainty quantification. DL shows promising internal diagnostic and staging performance for ONFH and growing potential in prognostic modeling. However, most studies remain retrospective and single-center, limited by small and heterogeneous datasets. Future research should prioritize multicenter validation, standardized reporting, and uncertainty estimation to advance clinically explainable and generalizable DL tools for musculoskeletal imaging.