Latest Papers on Radiology AI. Tags: Musculoskeletal.

Artificial Intelligence to Detect Developmental Dysplasia of Hip: A Systematic Review.

Bhavsar S, Gowda BB, Bhavsar M, Patole S, Rao S, Rath C

•papers•Sep 28 2025

Deep learning (DL), a branch of artificial intelligence (AI), has been applied to diagnose developmental dysplasia of the hip (DDH) on pelvic radiographs and ultrasound (US) images. This technology can potentially assist in early screening, enable timely intervention and improve cost-effectiveness. We conducted a systematic review to evaluate the diagnostic accuracy of the DL algorithm in detecting DDH. PubMed, Medline, EMBASE, EMCARE, the clinicaltrials.gov (clinical trial registry), IEEE Xplore and Cochrane Library databases were searched in October 2024. Prospective and retrospective cohort studies that included children (< 16 years) at risk of or suspected to have DDH and reported hip ultrasonography (US) or X-ray images using AI were included. A review was conducted using the guidelines of the Cochrane Collaboration Diagnostic Test Accuracy Working Group. Risk of bias was assessed using the QUADAS-2 tool. Twenty-three studies met inclusion criteria, with 15 (n = 8315) evaluating DDH on US images and eight (n = 7091) on pelvic radiographs. The area under the curve of the included studies ranged from 0.80 to 0.99 for pelvic radiographs and 0.90-0.99 for US images. Sensitivity and specificity for detecting DDH on radiographs ranged from 92.86% to 100% and 95.65% to 99.82%, respectively. For US images, sensitivity ranged from 86.54% to 100% and specificity from 62.5% to 100%. AI demonstrated comparable effectiveness to physicians in detecting DDH. However, limited evaluation on external datasets restricts its generalisability. Further research incorporating diverse datasets and real-world applications is needed to assess its broader clinical impact on DDH diagnosis.

Mixed Modality Detection Musculoskeletal Meta Analysis In Silico

Quantifying 3D foot and ankle alignment using an AI-driven framework: a pilot study.

Huysentruyt R, Audenaert E, Van den Borre I, Pižurica A, Duquesne K

•papers•Sep 27 2025

Accurate assessment of foot and ankle alignment through clinical measurements is essential for diagnosing deformities, treatment planning, and monitoring outcomes. The traditional 2D radiographs fail to fully represent the 3D complexity of the foot and ankle. In contrast, weight-bearing CT provides a 3D view of bone alignment under physiological loading. Nevertheless, manual landmark identification on WBCT remains time-intensive and prone to variability. This study presents a novel AI framework automating foot and ankle alignment assessment via deep learning landmark detection. By training 3D U-Net models to predict 22 anatomical landmarks directly from weight-bearing CT images, using heatmap predictions, our approach eliminates the need for segmentation and iterative mesh registration methods. A small dataset of 74 orthopedic patients, including foot deformity cases such as pes cavus and planovalgus, was used to develop and evaluate the model in a clinically relevant population. The mean absolute error was assessed for each landmark and each angle using a fivefold cross-validation. Mean absolute distance errors ranged from 1.00 mm for the proximal head center of the first phalanx to a maximum of 1.88 mm for the lowest point of the calcaneus. Automated clinical measurements derived from these landmarks achieved mean absolute errors between 0.91° for the hindfoot angle and a maximum of 2.90° for the Böhler angle. The heatmap-based AI approach enables automated foot and ankle alignment assessment from WBCT imaging, achieving accuracies comparable to the manual inter-rater variability reported in previous studies. This novel AI-driven method represents a potentially valuable approach for evaluating foot and ankle morphology. However, this exploratory study requires further evaluation with larger datasets to assess its real clinical applicability.

CT Detection Musculoskeletal Methodology In Silico Academic Lab

Radiomics-based machine learning model integrating preoperative vertebral computed tomography and clinical features to predict cage subsidence after single-level anterior cervical discectomy and fusion with a zero-profile anchored spacer.

Zheng B, Yu P, Ma K, Zhu Z, Liang Y, Liu H

•papers•Sep 26 2025

To develop machine-learning model that combines pre-operative vertebral-body CT radiomics with clinical data to predict cage subsidence after single-level ACDF with Zero-P. We retrospectively review 253 patients (2016-2023). Subsidence is defined as ≥ 3 mm loss of fused-segment height at final follow-up. Patients are split 8:2 into a training set (n = 202; 39 subsidence) and an independent test set (n = 51; 14 subsidence). Vertebral bodies adjacent to the target level are segmented on pre-operative CT, and high-throughput radiomic features are extracted with PyRadiomics. Features are z-score-normalized, then reduced by variance, correlation and LASSO. Age, vertebral Hounsfield units (HU) and T1-slope entered a clinical model. Eight classifiers are tuned by cross-validation; performance is assessed by AUC and related metrics, with thresholds optimized on the training cohort. Subsidence patients are older, lower HU and higher T1-slope (all P < 0.05). LASSO retained 11 radiomic features. In the independent test set, the clinical model had limited discrimination (AUC 0.595). The radiomics model improved performance (AUC 0.775; sensitivity 100%; specificity 60%). The combined model is best (AUC 0.813; sensitivity 80%; specificity 80%) and surpassed both single-source models (P < 0.05). A pre-operative model integrating CT-based radiomic signatures with key clinical variables predicts cage subsidence after ACDF with good accuracy. This tool may facilitate individualized risk stratification and guide strategies-such as endplate protection, implant choice and bone-quality optimization-to mitigate subsidence risk. Multicentre prospective validation is warranted.

CT Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab

Performance of artificial intelligence in automated measurement of patellofemoral joint parameters: a systematic review.

Zhan H, Zhao Z, Liang Q, Zheng J, Zhang L

•papers•Sep 26 2025

The evaluation of patellofemoral joint parameters is essential for diagnosing patellar dislocation, yet manual measurements exhibit poor reproducibility and demonstrate significant variability dependent on clinician expertise. This systematic review aimed to evaluate the performance of artificial intelligence (AI) models in automatically measuring patellofemoral joint parameters. A comprehensive literature search of PubMed, Web of Science, Cochrane Library, and Embase databases was conducted from database inception through June 15, 2025. Two investigators independently performed study screening and data extraction, with methodological quality assessment based on the modified MINORS checklist. This systematic review is registered with PROSPERO. A narrative review was conducted to summarize the findings of the included studies. A total of 19 studies comprising 10,490 patients met the inclusion and exclusion criteria, with a mean age of 51.3 years and a mean female proportion of 56.8%. Among these, six studies developed AI models based on radiographic series, nine on CT imaging, and four on MRI. The results demonstrated excellent reliability, with intraclass correlation coefficients (ICCs) ranging from 0.900 to 0.940 for femoral anteversion angle, 0.910-0.920 for trochlear groove depth and 0.930-0.950 for tibial tuberosity-trochlear groove distance. Additionally, good reliability was observed for patellar height (ICCs: 0.880-0.985), sulcus angle (ICCs: 0.878-0.980), and patellar tilt angle (ICCs: 0.790-0.990). Notably, the AI system successfully detected trochlear dysplasia, achieving 88% accuracy, 79% sensitivity, 96% specificity, and an AUC of 0.88. AI-based measurement of patellofemoral joint parameters demonstrates methodological robustness and operational efficiency, showing strong agreement with expert manual measurements. To further establish clinical utility, multicenter prospective studies incorporating rigorous external validation protocols are needed. Such validation would strengthen the model's generalizability and facilitate its integration into clinical decision support systems. This systematic review was registered in PROSPERO (CRD420251075068).

Mixed Modality Detection Musculoskeletal Review In Silico

MRI grading of lumbar disc herniation based on AFFM-YOLOv8 system.

Wang Y, Yang Z, Cai S, Wu W, Wu W

•papers•Sep 25 2025

Magnetic resonance imaging (MRI) serves as the clinical gold standard for diagnosing lumbar disc herniation (LDH). This multicenter study was to develop and clinically validate a deep learning (DL) model utilizing axial T2-weighted lumbar MRI sequences to automate LDH detection, following the Michigan State University (MSU) morphological classification criteria. A total of 8428 patients (100000 axial lumbar MRIs) with spinal surgeons annotating the datasets per MSU criteria, which classifies LDH into 11 subtypes based on morphology and neural compression severity, were analyzed. A DL architecture integrating adaptive multi-scale feature fusion titled as AFFM-YOLOv8 was developed. Model performance was validated against radiologists' annotations using accuracy, precision, recall, F1-score, and Cohen's κ (95% confidence intervals). The proposed model demonstrated superior diagnostic performance with a 91.01% F1-score (3.05% improvement over baseline) and 3% recall enhancement across all evaluation metrics. For surgical indication prediction, strong inter-rater agreement was achieved with senior surgeons (κ = 0.91, 95% CI 90.6-91.4) and residents (κ = 0.89, 95% CI 88.5-89.4), reaching consensus levels comparable to expert-to-expert agreement (senior surgeons: κ = 0.89; residents: κ = 0.87). This study establishes a DL framework for automated LDH diagnosis using large-scale axial MRI datasets. The model achieves clinician-level accuracy in MUS-compliant classification, addressing key limitations of prior binary classification systems. By providing granular spatial and morphological insights, this tool holds promise for standardizing LDH assessment and reducing diagnostic delays in resource-constrained settings.

MRI Classification Musculoskeletal Retrospective Clinical In Silico

AI demonstrates comparable diagnostic performance to radiologists in MRI detection of anterior cruciate ligament tears: a systematic review and meta-analysis.

Gill SS, Haq T, Zhao Y, Ristic M, Amiras D, Gupte CM

•papers•Sep 25 2025

Anterior cruciate ligament (ACL) injuries are among the most common knee injuries, affecting 1 in 3500 people annually. With rising rates of ACL tears, particularly in children, timely diagnosis is critical. This study evaluates artificial intelligence (AI) effectiveness in diagnosing and classifying ACL tears on MRI through a systematic review and meta-analysis, comparing AI performance with clinicians and assessing radiomic and non-radiomic models. Major databases were searched for AI models diagnosing ACL tears via MRIs. 36 studies, representing 52 models, were included. Accuracy, sensitivity, and specificity metrics were extracted. Pooled estimates were calculated using a random-effects model. Subgroup analyses compared MRI sequences, ground truths, AI versus clinician performance, and radiomic versus non-radiomic models. This study was conducted in line with Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) protocols. AI demonstrated strong diagnostic performance, with pooled accuracy, sensitivity, and specificity of 87.37%, 90.73%, and 91.34%, respectively. Classification models achieved pooled metrics of 90.46%, 88.68%, and 94.08%. Radiomic models outperformed non-radiomic models, and AI demonstrated comparable performance to clinicians in key metrics. Three-dimensional (3D) proton density fat suppression (PDFS) sequences with < 2 mm slice depth yielded the most promising results, despite small sample sizes, favouring arthroscopic benchmarks. Despite high heterogeneity (I² > 90%). AI models demonstrate diagnostic performance comparable to clinicians and may serve as valuable adjuncts in ACL tear detection, pending prospective validation. However, substantial heterogeneity and limited interpretability remain key challenges. Further research and standardised evaluation frameworks are needed to support clinical integration. Question Is AI effective and accurate in diagnosing and classifying anterior cruciate ligament (ACL) tears on MRI? Findings AI demonstrated high accuracy (87.37%), sensitivity (90.73%), and specificity (91.34%) in ACL tear diagnosis, matching or surpassing clinicians. Radiomic models outperformed non-radiomic approaches. Clinical relevance AI can enhance the accuracy of ACL tear diagnosis, reducing misdiagnoses and supporting clinicians, especially in resource-limited settings. Its integration into clinical workflows may streamline MRI interpretation, reduce diagnostic delays, and improve patient outcomes by optimising management.

MRI Classification Musculoskeletal Meta Analysis In Silico Benchmark SOTA

A Deep Learning-Based Fully Automated Vertebra Segmentation and Labeling Workflow.

Lu H, Liu M, Yu K, Fang Y, Zhao J, Shi Y

•papers•Sep 25 2025

Aims/Background Spinal disorders, such as herniated discs and scoliosis, are highly prevalent conditions with rising incidence in the aging global population. Accurate analysis of spinal anatomical structures is a critical prerequisite for achieving high-precision positioning with surgical navigation robots. However, traditional manual segmentation methods are limited by issues such as low efficiency and poor consistency. This work aims to develop a fully automated deep learning-based vertebral segmentation and labeling workflow to provide efficient and accurate preoperative analysis support for spine surgery navigation robots. Methods In the localization stage, the You Only Look Once version 7 (YOLOv7) network was utilized to predict the bounding boxes of individual vertebrae on computed tomography (CT) sagittal slices, transforming the 3D localization problem into a 2D one. Subsequently, the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm was employed to aggregate the 2D detection results into 3D vertebral centers. This approach significantly reduces inference time and enhances localization accuracy. In the segmentation stage, a 3D U-Net model integrated with an attention mechanism was trained using the region of interest (ROI) based on the vertebral center as input, effectively extracting the 3D structural features of vertebrae to achieve precise segmentation. In the labeling stage, a vertebra labeling network was trained by combining deep learning architectures-ResNet and Transformer, which are capable of extracting rich intervertebral features, to obtain the final labeling results through post-processing based on positional logic analysis. To verify the effectiveness of this workflow, experiments were conducted on a dataset comprising 106 spinal CT datasets sourced from various devices, covering a wide range of clinical scenarios. Results The results demonstrate that the method performed excellently in the three key tasks of localization, segmentation, and labeling, with a Mean Localization Error (MLE) of 1.42 mm. The segmentation accuracy metrics included a Dice Similarity Coefficient (DSC) of 0.968 ± 0.014, Intersection over Union (IoU) of 0.879 ± 0.018, Pixel Accuracy (PA) of 0.988 ± 0.005, mean symmetric distance (MSD) of 1.09 ± 0.19 mm, and Hausdorff Distance (HD) of 5.42 ± 2.05 mm. The degree of classification accuracy reached up to 94.36%. Conclusion These quantitative assessments and visualizations confirm the effectiveness of our method (vertebra localization, vertebra segmentation and vertebra labeling), indicating its potential for deployment in spinal surgery navigation robots to provide accurate and efficient preoperative analysis and navigation support for spinal surgeries.

CT Segmentation Musculoskeletal Methodology In Silico

Revolutionizing Precise Low Back Pain Diagnosis via Contrastive Learning

Thanh Binh Le, Hoang Nhat Khang Vo, Tan-Ha Mai, Trong Nhan Phan

•preprint•Sep 25 2025

Low back pain affects millions worldwide, driving the need for robust diagnostic models that can jointly analyze complex medical images and accompanying text reports. We present LumbarCLIP, a novel multimodal framework that leverages contrastive language-image pretraining to align lumbar spine MRI scans with corresponding radiological descriptions. Built upon a curated dataset containing axial MRI views paired with expert-written reports, LumbarCLIP integrates vision encoders (ResNet-50, Vision Transformer, Swin Transformer) with a BERT-based text encoder to extract dense representations. These are projected into a shared embedding space via learnable projection heads, configurable as linear or non-linear, and normalized to facilitate stable contrastive training using a soft CLIP loss. Our model achieves state-of-the-art performance on downstream classification, reaching up to 95.00% accuracy and 94.75% F1-score on the test set, despite inherent class imbalance. Extensive ablation studies demonstrate that linear projection heads yield more effective cross-modal alignment than non-linear variants. LumbarCLIP offers a promising foundation for automated musculoskeletal diagnosis and clinical decision support.

MRI Classification Musculoskeletal Methodology In Silico Academic Lab Benchmark SOTA

Proof-of-concept comparison of an artificial intelligence-based bone age assessment tool with Greulich-Pyle and Tanner-Whitehouse version 2 methods in a pediatric cohort.

Marinelli L, Lo Mastro A, Grassi F, Berritto D, Russo A, Patanè V, Festa A, Grassi E, Grandone A, Nasto LA, Pola E, Reginelli A

•papers•Sep 25 2025

Bone age assessment is essential in evaluating pediatric growth disorders. Artificial intelligence (AI) systems offer potential improvements in accuracy and reproducibility compared to traditional methods. To compare the performance of a commercially available artificial intelligence-based software (BoneView BoneAge, Gleamer, Paris, France) against two human-assessed methods-the Greulich-Pyle (GP) atlas and Tanner-Whitehouse version 2 (TW2)-in a pediatric population. This proof-of-concept study included 203 pediatric patients (mean age, 9.0 years; range, 2.0-17.0 years) who underwent hand and wrist radiographs for suspected endocrine or growth-related conditions. After excluding technically inadequate images, 157 cases were analyzed using AI and GP-assessed methods. A subset of 35 patients was also evaluated using the TW2 method by a pediatric endocrinologist. Performance was measured using mean absolute error (MAE), root mean square error (RMSE), bias, and Pearson's correlation coefficient, using chronological age as reference. The AI model achieved a MAE of 1.38 years, comparable to the radiologist's GP-based estimate (MAE, 1.30 years), and superior to TW2 (MAE, 2.86 years). RMSE values were 1.75 years, 1.80 years, and 3.88 years, respectively. AI showed minimal bias (-0.05 years), while TW2-based assessments systematically underestimated bone age (bias, -2.63 years). Strong correlations with chronological age were observed for AI (r=0.857) and GP (r=0.894), but not for TW2 (r=0.490). BoneView demonstrated comparable accuracy to radiologist-assessed GP method and outperformed TW2 assessments in this cohort. AI-based systems may enhance consistency in pediatric bone age estimation but require careful validation, especially in ethnically diverse populations.

X-Ray Classification Musculoskeletal Retrospective Clinical Prototype Startup

In-context learning enables large language models to achieve human-level performance in spinal instability neoplastic score classification from synthetic CT and MRI reports.

Russe MF, Reisert M, Fink A, Hohenhaus M, Nakagawa JM, Wilpert C, Simon CP, Kotter E, Urbach H, Rau A

•papers•Sep 24 2025

To assess the performance of state-of-the-art large language models in classifying vertebral metastasis stability using the Spinal Instability Neoplastic Score (SINS) compared to human experts, and to evaluate the impact of task-specific refinement including in-context learning on their performance. This retrospective study analyzed 100 synthetic CT and MRI reports encompassing a broad range of SINS scores. Four human experts (two radiologists and two neurosurgeons) and four large language models (Mistral, Claude, GPT-4 turbo, and GPT-4o) evaluated the reports. Large language models were tested in both generic form and with task-specific refinement. Performance was assessed based on correct SINS category assignment and attributed SINS points. Human experts demonstrated high median performance in SINS classification (98.5% correct) and points calculation (92% correct), with a median point offset of 0 [0-0]. Generic large language models performed poorly with 26-63% correct category and 4-15% correct SINS points allocation. In-context learning significantly improved chatbot performance to near-human levels (96-98/100 correct for classification, 86-95/100 for scoring, no significant difference to human experts). Refined large language models performed 71-85% better in SINS points allocation. In-context learning enables state-of-the-art large language models to perform at near-human expert levels in SINS classification, offering potential for automating vertebral metastasis stability assessment. The poor performance of generic large language models highlights the importance of task-specific refinement in medical applications of artificial intelligence.

Mixed Modality Classification Musculoskeletal Retrospective Clinical In Silico GenAI

Filter Papers

Tags

Artificial Intelligence to Detect Developmental Dysplasia of Hip: A Systematic Review.

Quantifying 3D foot and ankle alignment using an AI-driven framework: a pilot study.

Radiomics-based machine learning model integrating preoperative vertebral computed tomography and clinical features to predict cage subsidence after single-level anterior cervical discectomy and fusion with a zero-profile anchored spacer.

Performance of artificial intelligence in automated measurement of patellofemoral joint parameters: a systematic review.

MRI grading of lumbar disc herniation based on AFFM-YOLOv8 system.

AI demonstrates comparable diagnostic performance to radiologists in MRI detection of anterior cruciate ligament tears: a systematic review and meta-analysis.

A Deep Learning-Based Fully Automated Vertebra Segmentation and Labeling Workflow.

Revolutionizing Precise Low Back Pain Diagnosis via Contrastive Learning

Proof-of-concept comparison of an artificial intelligence-based bone age assessment tool with Greulich-Pyle and Tanner-Whitehouse version 2 methods in a pediatric cohort.

In-context learning enables large language models to achieve human-level performance in spinal instability neoplastic score classification from synthetic CT and MRI reports.

Ready to Sharpen Your Edge?