Latest Papers on Radiology AI. Tags: Musculoskeletal, Order: Best Match, Limit: 10.

Deep learning-enhanced zero echo time MRI for glenohumeral assessment in shoulder instability: a comparative study with CT.

Carretero-Gómez L, Fung M, Wiesinger F, Carl M, McKinnon G, de Arcos J, Mandava S, Arauz S, Sánchez-Lacalle E, Nagrani S, López-Alcorocho JM, Rodríguez-Íñigo E, Malpica N, Padrón M

•papers•Jun 1 2025

To evaluate image quality and lesion conspicuity of zero echo time (ZTE) MRI reconstructed with deep learning (DL)-based algorithm versus conventional reconstruction and to assess DL ZTE performance against CT for bone loss measurements in shoulder instability. Forty-four patients (9 females; 33.5 ± 15.65 years) with symptomatic anterior glenohumeral instability and no previous shoulder surgery underwent ZTE MRI and CT on the same day. ZTE images were reconstructed with conventional and DL methods and post-processed for CT-like contrast. Two musculoskeletal radiologists, blinded to the reconstruction method, independently evaluated 20 randomized MR ZTE datasets with and without DL-enhancement for perceived signal-to-noise ratio, resolution, and lesion conspicuity at humerus and glenoid using a 4-point Likert scale. Inter-reader reliability was assessed using weighted Cohen's kappa (K). An ordinal logistic regression model analyzed Likert scores, with the reconstruction method (DL-enhanced vs. conventional) as the predictor. Glenoid track (GT) and Hill-Sachs interval (HSI) measurements were performed by another radiologist on both DL ZTE and CT datasets. Intermodal agreement was assessed through intraclass correlation coefficients (ICCs) and Bland-Altman analysis. DL ZTE MR bone images scored higher than conventional ZTE across all items, with significantly improved perceived resolution (odds ratio (OR) = 7.67, p = 0.01) and glenoid lesion conspicuity (OR = 25.12, p = 0.01), with substantial inter-rater agreement (K = 0.61 (0.38-0.83) to 0.77 (0.58-0.95)). Inter-modality assessment showed almost perfect agreement between DL ZTE MR and CT for all bone measurements (overall ICC = 0.99 (0.97-0.99)), with mean differences of 0.08 (- 0.80 to 0.96) mm for GT and - 0.07 (- 1.24 to 1.10) mm for HSI. DL-based reconstruction enhances ZTE MRI quality for glenohumeral assessment, offering osseous evaluation and quantification equivalent to gold-standard CT, potentially simplifying preoperative workflow, and reducing CT radiation exposure.

MRI Reconstruction Musculoskeletal Retrospective Clinical Clinical Pilot Big Tech Benchmark SOTA

Regions of interest in opportunistic computed tomography-based screening for osteoporosis: impact on short-term in vivo precision.

Park J, Kim Y, Hong S, Chee CG, Lee E, Lee JW

•papers•Jun 1 2025

To determine an optimal region of interest (ROI) for opportunistic screening of osteoporosis in terms of short-term in vivo diagnostic precision. We included patients who underwent two CT scans and one dual-energy X-ray absorptiometry scan within a month in 2022. Deep-learning software automatically measured the attenuation in L1 using 54 ROIs (three slice thicknesses × six shapes × three intravertebral levels). To identify factors associated with a lower attenuation difference between the two CT scans, mixed-effect model analysis was performed with ROI-level (slice thickness, shape, intravertebral levels) and patient-level (age, sex, patient diameter, change in CT machine) factors. The root-mean-square standard deviation (RMSSD) and area under the receiver-operating-characteristic curve (AUROC) were calculated. In total, 73 consecutive patients (mean age ± standard deviation, 69 ± 9 years, 38 women) were included. A lower attenuation difference was observed in ROIs in images with slice thicknesses of 1 and 3 mm than that in images with a slice thickness of 5 mm (p < .001), in large elliptical ROIs (p = .007 or < .001, respectively), and in mid- or cranial-level ROIs than that in caudal-level ROIs (p < .001). No patient-level factors were significantly associated with the attenuation difference. Large, elliptical ROIs placed at the mid-level of L1 on images with 1- or 3-mm slice thicknesses yielded RMSSDs of 12.4-12.5 HU and AUROCs of 0.90. The largest possible regions of interest drawn in the mid-level trabecular portion of the L1 vertebra on thin-slice images may yield improvements in the precision of opportunistic screening for osteoporosis via CT.

CT Segmentation Musculoskeletal Retrospective Clinical In Silico Academic Lab

Automatic 3-dimensional analysis of posterosuperior full-thickness rotator cuff tear size on magnetic resonance imaging.

Hess H, Gussarow P, Rojas JT, Zumstein MA, Gerber K

•papers•Jun 1 2025

Tear size and shape are known to prognosticate the efficacy of surgical rotator cuff (RC) repair; however, current manual measurements on magnetic resonance images (MRIs) exhibit high interobserver variabilities and exclude 3-dimensional (3D) morphologic information. This study aimed to develop algorithms for automatic 3D analyses of posterosuperior full-thickness RC tear to enable efficient and precise tear evaluation and 3D tear visualization. A deep-learning network for automatic segmentation of the tear region in coronal and sagittal multicenter MRI was trained with manually segmented (consensus of 3 experts) proton density- and T2-weighted MRI of shoulders with full-thickness posterosuperior tears (n = 200). Algorithms for automatic measurement of tendon retraction, tear width, tear area, and automatic Patte classification considering the 3D morphology of the shoulder were implemented and evaluated against manual segmentation (n = 59). Automatic Patte classification was calculated using automatic segmented humerus and scapula on T1-weighted MRI of the same shoulders. Tears were automatically segmented, enabling 3D visualization of the tear, with a mean Dice coefficient of 0.58 ± 0.21 compared to an interobserver variability of 0.46 ± 0.21. The mean absolute error of automatic tendon retraction and tear width measurements (4.98 ± 4.49 mm and 3.88 ± 3.18 mm) were lower than the interobserver variabilities (5.42 ± 7.09 mm and 5.92 ± 1.02 mm). The correlations of all measurements performed on automatic tear segmentations compared with those on consensus segmentations were higher than the interobserver correlation. Automatic Patte classification achieved a Cohen kappa value of 0.62, compared with the interobserver variability of 0.56. Retraction calculated using standard linear measures underestimated the tear size relative to measurements considering the curved shape of the humeral head, especially for larger tears. Even on highly heterogeneous data, the proposed algorithms showed the feasibility to successfully automate tear size analysis and to enable automatic 3D visualization of the tear situation. The presented algorithms standardize cross-center tear analyses and enable the calculation of additional metrics, potentially improving the predictive power of image-based tear measurements for the outcome of surgical treatments, thus aiding in RC tear diagnosis, treatment decision, and planning.

MRI Segmentation Musculoskeletal Methodology In Silico Academic Lab

The role of deep learning in diagnostic imaging of spondyloarthropathies: a systematic review.

Omar M, Watad A, McGonagle D, Soffer S, Glicksberg BS, Nadkarni GN, Klang E

•papers•Jun 1 2025

Diagnostic imaging is an integral part of identifying spondyloarthropathies (SpA), yet the interpretation of these images can be challenging. This review evaluated the use of deep learning models to enhance the diagnostic accuracy of SpA imaging. Following PRISMA guidelines, we systematically searched major databases up to February 2024, focusing on studies that applied deep learning to SpA imaging. Performance metrics, model types, and diagnostic tasks were extracted and analyzed. Study quality was assessed using QUADAS-2. We analyzed 21 studies employing deep learning in SpA imaging diagnosis across MRI, CT, and X-ray modalities. These models, particularly advanced CNNs and U-Nets, demonstrated high accuracy in diagnosing SpA, differentiating arthritis forms, and assessing disease progression. Performance metrics frequently surpassed traditional methods, with some models achieving AUCs up to 0.98 and matching expert radiologist performance. This systematic review underscores the effectiveness of deep learning in SpA imaging diagnostics across MRI, CT, and X-ray modalities. The studies reviewed demonstrated high diagnostic accuracy. However, the presence of small sample sizes in some studies highlights the need for more extensive datasets and further prospective and external validation to enhance the generalizability of these AI models. Question How can deep learning models improve diagnostic accuracy in imaging for spondyloarthropathies (SpA), addressing challenges in early detection and differentiation from other forms of arthritis? Findings Deep learning models, especially CNNs and U-Nets, showed high accuracy in SpA imaging across MRI, CT, and X-ray, often matching or surpassing expert radiologists. Clinical relevance Deep learning models can enhance diagnostic precision in SpA imaging, potentially reducing diagnostic delays and improving treatment decisions, but further validation on larger datasets is required for clinical integration.

Mixed Modality Classification Musculoskeletal Review In Silico Academic Lab Benchmark SOTA GenAI

Age-dependent changes in CT vertebral attenuation values in opportunistic screening for osteoporosis: a nationwide multi-center study.

Kim Y, Kim HY, Lee S, Hong S, Lee JW

•papers•Jun 1 2025

To examine how vertebral attenuation changes with aging, and to establish age-adjusted CT attenuation value cutoffs for diagnosing osteoporosis. This multi-center retrospective study included 11,246 patients (mean age ± standard deviation, 50 ± 13 years; 7139 men) who underwent CT and dual-energy X-ray absorptiometry (DXA) in six health-screening centers between 2022 and 2023. Using deep-learning-based software, attenuation values of L1 vertebral bodies were measured. Segmented linear regression in women and simple linear regression in men were used to assess how attenuation values change with aging. A multivariable linear regression analysis was performed to determine whether age is associated with CT attenuation values independently of the DXA T-score. Age-adjusted cutoffs targeting either 90% sensitivity or 90% specificity were derived using quantile regression. Performance of both age-adjusted and age-unadjusted cutoffs was measured, where the target sensitivity or specificity was considered achieved if a 95% confidence interval encompassed 90%. While attenuation values declined consistently with age in men, they declined abruptly in women aged > 42 years. Such decline occurred independently of the DXA T-score (p < 0.001). Age adjustment seemed critical for age ≥ 65 years, where the age-adjusted cutoffs achieved the target (sensitivity of 91.5% (86.3-95.2%) when targeting 90% sensitivity and specificity of 90.0% (88.3-91.6%) when targeting 90% specificity), but age-unadjusted cutoffs did not (95.5% (91.2-98.0%) and 73.8% (71.4-76.1%), respectively). Age-adjusted cutoffs provided a more reliable diagnosis of osteoporosis than age-unadjusted cutoffs since vertebral attenuation values decrease with age, regardless of DXA T-scores. Question How does vertebral CT attenuation change with age? Findings Independent of dual-energy X-ray absorptiometry T-score, vertebral attenuation values on CT declined at a constant rate in men and abruptly in women over 42 years of age. Clinical relevance Age adjustments are needed in opportunistic osteoporosis screening, especially among the elderly.

CT Segmentation Musculoskeletal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Comparative diagnostic accuracy of ChatGPT-4 and machine learning in differentiating spinal tuberculosis and spinal tumors.

Hu X, Xu D, Zhang H, Tang M, Gao Q

•papers•Jun 1 2025

In clinical practice, distinguishing between spinal tuberculosis (STB) and spinal tumors (ST) poses a significant diagnostic challenge. The application of AI-driven large language models (LLMs) shows great potential for improving the accuracy of this differential diagnosis. To evaluate the performance of various machine learning models and ChatGPT-4 in distinguishing between STB and ST. A retrospective cohort study. A total of 143 STB cases and 153 ST cases admitted to Xiangya Hospital Central South University, from January 2016 to June 2023 were collected. This study incorporates basic patient information, standard laboratory results, serum tumor markers, and comprehensive imaging records, including Magnetic Resonance Imaging (MRI) and Computed Tomography (CT), for individuals diagnosed with STB and ST. Machine learning techniques and ChatGPT-4 were utilized to distinguish between STB and ST separately. Six distinct machine learning models, along with ChatGPT-4, were employed to evaluate their differential diagnostic effectiveness. Among the 6 machine learning models, the Gradient Boosting Machine (GBM) algorithm model demonstrated the highest differential diagnostic efficiency. In the training cohort, the GBM model achieved a sensitivity of 98.84% and a specificity of 100.00% in distinguishing STB from ST. In the testing cohort, its sensitivity was 98.25%, and specificity was 91.80%. ChatGPT-4 exhibited a sensitivity of 70.37% and a specificity of 90.65% for differential diagnosis. In single-question cases, ChatGPT-4's sensitivity and specificity were 71.67% and 92.55%, respectively, while in re-questioning cases, they were 44.44% and 76.92%. The GBM model demonstrates significant value in the differential diagnosis of STB and ST, whereas the diagnostic performance of ChatGPT-4 remains suboptimal.

Mixed Modality Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab

Generative artificial intelligence enables the generation of bone scintigraphy images and improves generalization of deep learning models in data-constrained environments.

Haberl D, Ning J, Kluge K, Kumpf K, Yu J, Jiang Z, Constantino C, Monaci A, Starace M, Haug AR, Calabretta R, Camoni L, Bertagna F, Mascherbauer K, Hofer F, Albano D, Sciagra R, Oliveira F, Costa D, Nitsche C, Hacker M, Spielvogel CP

•papers•Jun 1 2025

Advancements of deep learning in medical imaging are often constrained by the limited availability of large, annotated datasets, resulting in underperforming models when deployed under real-world conditions. This study investigated a generative artificial intelligence (AI) approach to create synthetic medical images taking the example of bone scintigraphy scans, to increase the data diversity of small-scale datasets for more effective model training and improved generalization. We trained a generative model on <sup>99m</sup>Tc-bone scintigraphy scans from 9,170 patients in one center to generate high-quality and fully anonymized annotated scans of patients representing two distinct disease patterns: abnormal uptake indicative of (i) bone metastases and (ii) cardiac uptake indicative of cardiac amyloidosis. A blinded reader study was performed to assess the clinical validity and quality of the generated data. We investigated the added value of the generated data by augmenting an independent small single-center dataset with synthetic data and by training a deep learning model to detect abnormal uptake in a downstream classification task. We tested this model on 7,472 scans from 6,448 patients across four external sites in a cross-tracer and cross-scanner setting and associated the resulting model predictions with clinical outcomes. The clinical value and high quality of the synthetic imaging data were confirmed by four readers, who were unable to distinguish synthetic scans from real scans (average accuracy: 0.48% [95% CI 0.46-0.51]), disagreeing in 239 (60%) of 400 cases (Fleiss' kappa: 0.18). Adding synthetic data to the training set improved model performance by a mean (± SD) of 33(± 10)% AUC (p < 0.0001) for detecting abnormal uptake indicative of bone metastases and by 5(± 4)% AUC (p < 0.0001) for detecting uptake indicative of cardiac amyloidosis across both internal and external testing cohorts, compared to models without synthetic training data. Patients with predicted abnormal uptake had adverse clinical outcomes (log-rank: p < 0.0001). Generative AI enables the targeted generation of bone scintigraphy images representing different clinical conditions. Our findings point to the potential of synthetic data to overcome challenges in data sharing and in developing reliable and prognostic deep learning models in data-limited environments.

SPECT Image Synthesis Musculoskeletal Retrospective Clinical In Silico Academic Lab GenAI

Managing class imbalance in the training of a large language model to predict patient selection for total knee arthroplasty: Results from the Artificial intelligence to Revolutionise the patient Care pathway in Hip and knEe aRthroplastY (ARCHERY) project.

Farrow L, Anderson L, Zhong M

•papers•Jun 1 2025

This study set out to test the efficacy of different techniques used to manage to class imbalance, a type of data bias, in application of a large language model (LLM) to predict patient selection for total knee arthroplasty (TKA). This study utilised data from the Artificial Intelligence to Revolutionise the Patient Care Pathway in Hip and Knee Arthroplasty (ARCHERY) project (ISRCTN18398037). Data included the pre-operative radiology reports of patients referred to secondary care for knee-related complaints from within the North of Scotland. A clinically based LLM (GatorTron) was trained regarding prediction of selection for TKA. Three methods for managing class imbalance were assessed: a standard model, use of class weighting, and majority class undersampling. A total of 7707 individual knee radiology reports were included (dated from 2015 to 2022). The mean text length was 74 words (range 26-275). Only 910/7707 (11.8%) patients underwent TKA surgery (the designated 'minority class'). Class weighting technique performed better for minority class discrimination and calibration compared with the other two techniques (Recall 0.61/AUROC 0.73 for class weighting compared with 0.54/0.70 and 0.59/0.72 for the standard model and majority class undersampling, respectively. There was also significant data loss for majority class undersampling when compared with class-weighting. Use of class-weighting appears to provide the optimal method of training a an LLM to perform analytical tasks on free-text clinical information in the face of significant data bias ('class imbalance'). Such knowledge is an important consideration in the development of high-performance clinical AI models within Trauma and Orthopaedics.

X-Ray Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab

Predictive validity of consensus-based MRI definition of osteoarthritis plus radiographic osteoarthritis for the progression of knee osteoarthritis: A longitudinal cohort study.

Xing X, Wang Y, Zhu J, Shen Z, Cicuttini F, Jones G, Aitken D, Cai G

•papers•Jun 1 2025

Our previous study showed that magnetic resonance imaging (MRI)-defined tibiofemoral osteoarthritis (MRI-OA), based on a Delphi approach, in combination with radiographic OA (ROA) had a strong predictive validity for the progression of knee OA. This study aimed to compare whether the combination using traditional prediction models was superior to the Light Gradient Boosting Machine (LightGBM) models. Data were from the Tasmanian Older Adult Cohort. A radiograph and 1.5T MRI of the right knee was performed. Tibial cartilage volume was measured at baseline, 2.6 and 10.7 years. Knee pain and function were assessed at baseline, 2.6, 5.1, and 10.7 years. Right-sided total knee replacement (TKR) were assessed over 13.5 years. The area under the curve (AUC) was applied to compare the predictive validity of logistic regression with the LightGBM algorithm. For significant imbalanced outcomes, the area under the precision-recall curve (AUC-PR) was used. 574 participants (mean 62 years, 49 % female) were included. Overall, the LightGBM showed a clinically acceptable predictive performance for all outcomes but TKR. For knee pain and function, LightGBM showed better predictive performance than logistic regression model (AUC: 0.731-0.912 vs 0.627-0.755). Similar results were found for tibial cartilage loss over 2.6 (AUC: 0.845 vs 0.701, p < 0.001) and 10.7 years (AUC: 0.845 vs 0.753, p = 0.016). For TKR, which exhibited significant class imbalance, both algorithms performed poorly (AUC-PR: 0.647 vs 0.610). Compared to logistic regression combining MRI-OA, ROA, and common covariates, LightGBM offers valuable insights that can inform early risk identification and targeted prevention strategies.

MRI Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab

Kellgren-Lawrence grading of knee osteoarthritis using deep learning: Diagnostic performance with external dataset and comparison with four readers.

Vaattovaara E, Panfilov E, Tiulpin A, Niinimäki T, Niinimäki J, Saarakkala S, Nevalainen MT

•papers•Jun 1 2025

To evaluate the performance of a deep learning (DL) model in an external dataset to assess radiographic knee osteoarthritis using Kellgren-Lawrence (KL) grades against versatile human readers. Two-hundred-eight knee anteroposterior conventional radiographs (CRs) were included in this retrospective study. Four readers (three radiologists, one orthopedic surgeon) assessed the KL grades and consensus grade was derived as the mean of these. The DL model was trained using all the CRs from Multicenter Osteoarthritis Study (MOST) and validated on Osteoarthritis Initiative (OAI) dataset and then tested on our external dataset. To assess the agreement between the graders, Cohen's quadratic kappa (k) with 95 % confidence intervals were used. Diagnostic performance was measured using confusion matrices and receiver operating characteristic (ROC) analyses. The multiclass (KL grades from 0 to 4) diagnostic performance of the DL model was multifaceted: sensitivities were between 0.372 and 1.000, specificities 0.691-0.974, PPVs 0.227-0.879, NPVs 0.622-1.000, and AUCs 0.786-0.983. The overall balanced accuracy was 0.693, AUC 0.886, and kappa 0.820. If only dichotomous KL grading (i.e. KL0-1 vs. KL2-4) was utilized, superior metrics were seen with an overall balanced accuracy of 0.902 and AUC of 0.967. A substantial agreement between each reader and DL model was found: the inter-rater agreement was 0.737 [0.685-0.790] for the radiology resident, 0.761 [0.707-0.816] for the musculoskeletal radiology fellow, 0.802 [0.761-0.843] for the senior musculoskeletal radiologist, and 0.818 [0.775-0.860] for the orthopedic surgeon. In an external dataset, our DL model can grade knee osteoarthritis with diagnostic accuracy comparable to highly experienced human readers.

X-Ray Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab

Deep learning-enhanced zero echo time MRI for glenohumeral assessment in shoulder instability: a comparative study with CT.

Regions of interest in opportunistic computed tomography-based screening for osteoporosis: impact on short-term in vivo precision.

Automatic 3-dimensional analysis of posterosuperior full-thickness rotator cuff tear size on magnetic resonance imaging.

The role of deep learning in diagnostic imaging of spondyloarthropathies: a systematic review.

Age-dependent changes in CT vertebral attenuation values in opportunistic screening for osteoporosis: a nationwide multi-center study.

Comparative diagnostic accuracy of ChatGPT-4 and machine learning in differentiating spinal tuberculosis and spinal tumors.

Generative artificial intelligence enables the generation of bone scintigraphy images and improves generalization of deep learning models in data-constrained environments.

Managing class imbalance in the training of a large language model to predict patient selection for total knee arthroplasty: Results from the Artificial intelligence to Revolutionise the patient Care pathway in Hip and knEe aRthroplastY (ARCHERY) project.

Predictive validity of consensus-based MRI definition of osteoarthritis plus radiographic osteoarthritis for the progression of knee osteoarthritis: A longitudinal cohort study.

Kellgren-Lawrence grading of knee osteoarthritis using deep learning: Diagnostic performance with external dataset and comparison with four readers.

Ready to Sharpen Your Edge?