Latest Papers on Radiology AI. Tags: Musculoskeletal

The imaging crisis in axial spondyloarthritis.

Diekhoff T, Poddubnyy D

•papers•May 16 2025

Imaging holds a pivotal yet contentious role in the early diagnosis of axial spondyloarthritis. Although MRI has enhanced our ability to detect early inflammatory changes, particularly bone marrow oedema in the sacroiliac joints, the poor specificity of this finding introduces a substantial risk of overdiagnosis. The well intentioned push by rheumatologists towards earlier intervention could inadvertently lead to the misclassification of mechanical or degenerative conditions (eg, osteitis condensans ilii) as inflammatory disease, especially in the absence of structural lesions. Diagnostic uncertainty is further fuelled by anatomical variability, sex differences, and suboptimal imaging protocols. Current strategies-such as quantifying bone marrow oedema and analysing its distribution patterns, and integrating clinical and laboratory data-offer partial guidance for avoiding overdiagnosis but fall short of resolving the core diagnostic dilemma. Emerging imaging technologies, including high-resolution sequences, quantitative MRI, radiomics, and artificial intelligence, could improve diagnostic precision, but these tools remain exploratory. This Viewpoint underscores the need for a shift in imaging approaches, recognising that although timely diagnosis and treatment is essential to prevent long-term structural damage, robust and reliable imaging criteria are also needed. Without such advances, the imaging field risks repeating past missteps seen in other rheumatological conditions.

MRI Classification Musculoskeletal Review Concept Academic Lab

Artificial intelligence-guided distal radius fracture detection on plain radiographs in comparison with human raters.

Ramadanov N, John P, Hable R, Schreyer AG, Shabo S, Prill R, Salzmann M

•papers•May 16 2025

The aim of this study was to compare the performance of artificial intelligence (AI) in detecting distal radius fractures (DRFs) on plain radiographs with the performance of human raters. We retrospectively analysed all wrist radiographs taken in our hospital since the introduction of AI-guided fracture detection from 11 September 2023 to 10 September 2024. The ground truth was defined by the radiological report of a board-certified radiologist based solely on conventional radiographs. The following parameters were calculated: True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN), accuracy (%), Cohen's Kappa coefficient, F1 score, sensitivity (%), specificity (%), Youden Index (J Statistic). In total 1145 plain radiographs of the wrist were taken between 11 September 2023 and 10 September 2024. The mean age of the included patients was 46.6 years (± 27.3), ranging from 2 to 99 years and 59.0% were female. According to the ground truth, of the 556 anteroposterior (AP) radiographs, 225 cases (40.5%) had a DRF, and of the 589 lateral view radiographs, 240 cases (40.7%) had a DRF. The AI system showed the following results on AP radiographs: accuracy (%): 95.90; Cohen's Kappa: 0.913; F1 score: 0.947; sensitivity (%): 92.02; specificity (%): 98.45; Youden Index: 90.47. The orthopedic surgeon achieved a sensitivity of 91.5%, specificity of 97.8%, an overall accuracy of 95.1%, F1 score of 0.943, and Cohen's kappa of 0.901. These results were comparable to those of the AI model. AI-guided detection of DRF demonstrated diagnostic performance nearly identical to that of an experienced orthopedic surgeon across all key metrics. The marginal differences observed in sensitivity and specificity suggest that AI can reliably support clinical fracture assessment based solely on conventional radiographs.

X-Ray Detection Musculoskeletal Retrospective Clinical Clinical Pilot Academic Lab

Impact of test set composition on AI performance in pediatric wrist fracture detection in X-rays.

Till T, Scherkl M, Stranger N, Singer G, Hankel S, Flucher C, Hržić F, Štajduhar I, Tschauner S

•papers•May 16 2025

To evaluate how different test set sampling strategies-random selection and balanced sampling-affect the performance of artificial intelligence (AI) models in pediatric wrist fracture detection using radiographs, aiming to highlight the need for standardization in test set design. This retrospective study utilized the open-sourced GRAZPEDWRI-DX dataset of 6091 pediatric wrist radiographs. Two test sets, each containing 4588 images, were constructed: one using a balanced approach based on case difficulty, projection type, and fracture presence and the other a random selection. EfficientNet and YOLOv11 models were trained and validated on 18,762 radiographs and tested on both sets. Binary classification and object detection tasks were evaluated using metrics such as precision, recall, F1 score, AP50, and AP50-95. Statistical comparisons between test sets were performed using nonparametric tests. Performance metrics significantly decreased in the balanced test set with more challenging cases. For example, the precision for YOLOv11 models decreased from 0.95 in the random set to 0.83 in the balanced set. Similar trends were observed for recall, accuracy, and F1 score, indicating that models trained on easy-to-recognize cases performed poorly on more complex ones. These results were consistent across all model variants tested. AI models for pediatric wrist fracture detection exhibit reduced performance when tested on balanced datasets containing more difficult cases, compared to randomly selected cases. This highlights the importance of constructing representative and standardized test sets that account for clinical complexity to ensure robust AI performance in real-world settings. Question Do different sampling strategies based on samples' complexity have an influence in deep learning models' performance in fracture detection? Findings AI performance in pediatric wrist fracture detection significantly drops when tested on balanced datasets with more challenging cases, compared to randomly selected cases. Clinical relevance Without standardized and validated test datasets for AI that reflect clinical complexities, performance metrics may be overestimated, limiting the utility of AI in real-world settings.

X-Ray Detection Musculoskeletal Retrospective Clinical In Silico Academic Lab Open Dataset

Segmentation of the thoracolumbar fascia in ultrasound imaging: a deep learning approach.

Bonaldi L, Pirri C, Giordani F, Fontanella CG, Stecco C, Uccheddu F

•papers•May 15 2025

Only in recent years it has been demonstrated that the thoracolumbar fascia is involved in low back pain (LBP), thus highlighting its implications for treatments. Furthermore, an easily accessible and non-invasive way to investigate the fascia in real time is the ultrasound examination, which to be reliable as is, it must overcome the challenges related to the configuration of the machine and the experience of the operator. Therefore, the lack of a clear understanding of the fascial system combined with the penalty related to the setting of the ultrasound acquisition has generated a gap that makes its effective evaluation difficult during clinical routine. The aim of the present work is to fill this gap by investigating the effectiveness of using a deep learning approach to segment the thoracolumbar fascia from ultrasound imaging. A total of 538 ultrasound images of the thoracolumbar fascia of LBP subjects were finally used to train and test a deep learning network. An additional test set (so-called Test set 2) was collected from another center, operator, machine manufacturer, patient cohort, and protocol to improve the generalizability of the study. A U-Net-based architecture was demonstrated to be able to segment these structures with a final training accuracy of 0.99 and a validation accuracy of 0.91. The accuracy of the prediction computed on a test set (87 images not included in the training set) reached the 0.94, with a mean intersection over union index of 0.82 and a Dice-score of 0.76. These latter metrics were outperformed by those in Test set 2. The validity of the predictions was also verified and confirmed by two expert clinicians. Automatic identification of the thoracolumbar fascia has shown promising results to thoroughly investigate its alteration and target a personalized rehabilitation intervention based on each patient-specific scenario.

Ultrasound Segmentation Musculoskeletal Retrospective Clinical In Silico Academic Lab

From error to prevention of wrong-level spine surgery: a review.

Javadnia P, Gohari H, Salimi N, Alimohammadi E

•papers•May 15 2025

Wrong-level spine surgery remains a significant concern in spine surgery, leading to devastating consequences for patients and healthcare systems alike. This comprehensive review aims to analyze the existing literature on wrong-level spine surgery in spine procedures, identifying key factors that contribute to these errors and exploring advanced strategies and technologies designed to prevent them. A systematic literature search was conducted across multiple databases, including PubMed, Scopus, EMBASE, and CINAHL. The selection criteria focused on preclinical and clinical studies that specifically addressed wrong site and wrong level surgeries in the context of spine surgery. The findings reveal a range of contributing factors to wrong-level spine surgeries, including communication failures, inadequate preoperative planning, and insufficient surgical protocols. The review emphasizes the critical role of innovative technologies-such as artificial intelligence, advanced imaging techniques, and surgical navigation systems-alongside established safety protocols like digital checklists and simulation training in enhancing surgical accuracy and preventing errors. In conclusion, integrating advanced technologies and systematic safety protocols is instrumental in reducing the incidence of wrong-level spine surgeries. This review underscores the importance of continuous education and the adoption of innovative solutions to foster a culture of safety and improve surgical outcomes. By addressing the multifaceted challenges associated with these errors, the field can work towards minimizing their occurrence and enhancing patient care.

Mixed Modality Detection Musculoskeletal Review Concept Academic Lab

Performance of Artificial Intelligence in Diagnosing Lumbar Spinal Stenosis: A Systematic Review and Meta-Analysis.

Yang X, Zhang Y, Li Y, Wu Z

•papers•May 15 2025

The present study followed the reporting guidelines for systematic reviews and meta-analyses. We conducted this study to review the diagnostic value of artificial intelligence (AI) for various types of lumbar spinal stenosis (LSS) and the level of stenosis, offering evidence-based support for the development of smart diagnostic tools. AI is currently being utilized for image processing in clinical practice. Some studies have explored AI techniques for identifying the severity of LSS in recent years. Nevertheless, there remains a shortage of structured data proving its effectiveness. Four databases (PubMed, Cochrane, Embase, and Web of Science) were searched until March 2024, including original studies that utilized deep learning (DL) and machine learning (ML) models to diagnose LSS. The risk of bias of included studies was assessed using Quality Assessment of Diagnostic Accuracy Studies is a quality evaluation tool for diagnostic research (diagnostic tests). Computed Tomography. PROSPERO is an international database of prospectively registered systematic reviews. Summary Receiver Operating Characteristic. Magnetic Resonance. Central canal stenosis. three-dimensional magnetic resonance myelography. The accuracy in the validation set was extracted for a meta-analysis. The meta-analysis was completed in R4.4.0. A total of 48 articles were included, with an overall accuracy of 0.885 (95% CI: 0.860-0907) for dichotomous tasks. Among them, the accuracy was 0.892 (95% CI: 0.867-0915) for DL and 0.833 (95% CI: 0.760-0895) for ML. The overall accuracy for LSS was 0.895 (95% CI: 0.858-0927), with an accuracy of 0.912 (95% CI: 0.873-0.944) for DL and 0.843 (95% CI: 0.766-0.907) for ML. The overall accuracy for central canal stenosis was 0.875 (95% CI: 0.821-0920), with an accuracy of 0.881 (95% CI: 0.829-0.925) for DL and 0.733 (95% CI: 0.541-0.877) for ML. The overall accuracy for neural foramen stenosis was 0.893 (95% CI: 0.851-0.928). In polytomous tasks, the accuracy was 0.936 (95% CI: 0.895-0.967) for no LSS, 0.503 (95% CI: 0.391-0.614) for mild LSS, 0.512 (95% CI: 0.336-0.688) for moderate LSS, and 0.860 for severe LSS (95% CI: 0.733-0.954). AI is highly valuable for diagnosing LSS. However, further external validation is necessary to enhance the analysis of different stenosis categories and improve the diagnostic accuracy for mild to moderate stenosis levels.

Mixed Modality Classification Musculoskeletal Meta Analysis In Silico Academic Lab

Artificial intelligence algorithm improves radiologists' bone age assessment accuracy artificial intelligence algorithm improves radiologists' bone age assessment accuracy.

Chang TY, Chou TY, Jen IA, Yuh YS

•papers•May 15 2025

Artificial intelligence (AI) algorithms can provide rapid and precise radiographic bone age (BA) assessment. This study assessed the effects of an AI algorithm on the BA assessment performance of radiologists, and evaluated how automation bias could affect radiologists. In this prospective randomized crossover study, six radiologists with varying levels of experience (senior, mi-level, and junior) assessed cases from a test set of 200 standard BA radiographs. The test set was equally divided into two subsets: datasets A and B. Each radiologist assessed BA independently without AI assistance (A- B-) and with AI assistance (A+ B+). We used the mean of assessments made by two experts as the ground truth for accuracy assessment; subsequently, we calculated the mean absolute difference (MAD) between the radiologists' BA predictions and ground-truth BA and evaluated the proportion of estimates for which the MAD exceeded one year. Additionally, we compared the radiologists' performance under conditions of early AI assistance with their performance under conditions of delayed AI assistance; the radiologists were allowed to reject AI interpretations. The overall accuracy of senior, mid-level, and junior radiologists improved significantly with AI assistance than without AI assistance (MAD: 0.74 vs. 0.46 years, p < 0.001; proportion of assessments for which MAD exceeded 1 year: 24.0% vs. 8.4%, p < 0.001). The proportion of improved BA predictions with AI assistance (16.8%) was significantly higher than that of less accurate predictions with AI assistance (2.3%; p < 0.001). No consistent timing effect was observed between conditions of early and delayed AI assistance. Most disagreements between radiologists and AI occurred over images for patients aged ≤8 years. Senior radiologists had more disagreements than other radiologists. The AI algorithm improved the BA assessment accuracy of radiologists with varying experience levels. Automation bias was prone to affect less experienced radiologists.

X-Ray Classification Musculoskeletal Prospective Clinical Pilot Academic Lab

Measuring the severity of knee osteoarthritis with an aberration-free fast line scanning Raman imaging system.

Jiao C, Ye J, Liao J, Li J, Liang J, He S

•papers•May 15 2025

Osteoarthritis (OA) is a major cause of disability worldwide, with symptoms like joint pain, limited functionality, and decreased quality of life, potentially leading to deformity and irreversible damage. Chemical changes in joint tissues precede imaging alterations, making early diagnosis challenging for conventional methods like X-rays. Although Raman imaging provides detailed chemical information, it is time-consuming. This paper aims to achieve rapid osteoarthritis diagnosis and grading using a self-developed Raman imaging system combined with deep learning denoising and acceleration algorithms. Our self-developed aberration-corrected line-scanning confocal Raman imaging device acquires a line of Raman spectra (hundreds of points) per scan using a galvanometer or displacement stage, achieving spatial and spectral resolutions of 2 μm and 0.2 nm, respectively. Deep learning algorithms enhance the imaging speed by over 4 times through effective spectrum denoising and signal-to-noise ratio (SNR) improvement. By leveraging the denoising capabilities of deep learning, we are able to acquire high-quality Raman spectral data with a reduced integration time, thereby accelerating the imaging process. Experiments on the tibial plateau of osteoarthritis patients compared three excitation wavelengths (532, 671, and 785 nm), with 671 nm chosen for optimal SNR and minimal fluorescence. Machine learning algorithms achieved a 98 % accuracy in distinguishing articular from calcified cartilage and a 97 % accuracy in differentiating osteoarthritis grades I to IV. Our fast Raman imaging system, combining an aberration-corrected line-scanning confocal Raman imager with deep learning denoising, offers improved imaging speed and enhanced spectral and spatial resolutions. It enables rapid, label-free detection of osteoarthritis severity and can identify early compositional changes before clinical imaging, allowing precise grading and tailored treatment, thus advancing orthopedic diagnostics and improving patient outcomes.

OCT Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab Breakthrough

Metal Suppression Magnetic Resonance Imaging Techniques in Orthopaedic and Spine Surgery.

Ziegeler K, Yoon D, Hoff M, Theologis AA

•papers•May 15 2025

Implantation of metallic instrumentation is the mainstay of a variety of orthopaedic and spine surgeries. Postoperatively, imaging of the soft tissues around these implants is commonly required to assess for persistent, recurrent, and/or new pathology (ie, instrumentation loosening, particle disease, infection, neural compression); visualization of these pathologies often requires the superior soft-tissue contrast of magnetic resonance imaging (MRI). As susceptibility artifacts from ferromagnetic implants can result in unacceptable image quality, unique MRI approaches are often necessary to provide accurate imaging. In this text, a comprehensive review is provided on common artifacts encountered in orthopaedic MRI, including comparisons of artifacts from different metallic alloys and common nonpropriety/propriety MR metallic artifact reduction methods. The newest metal-artifact suppression imaging technology and future directions (ie, deep learning/artificial intelligence) in this important field will be considered.

MRI Reconstruction Musculoskeletal Review Concept Academic Lab

Comparison of lumbar disc degeneration grading between deep learning model SpineNet and radiologist: a longitudinal study with a 14-year follow-up.

Murto N, Lund T, Kautiainen H, Luoma K, Kerttula L

•papers•May 15 2025

To assess the agreement between lumbar disc degeneration (DD) grading by the convolutional neural network model SpineNet and radiologist's visual grading. In a 14-year follow-up MRI study involving 19 male volunteers, lumbar DD was assessed by SpineNet and two radiologists using the Pfirrmann classification at baseline (age 37) and after 14 years (age 51). Pfirrmann summary scores (PSS) were calculated by summing individual disc grades. The agreement between the first radiologist and SpineNet was analyzed, with the second radiologist's grading used for inter-observer agreement. Significant differences were observed in the Pfirrmann grades and PSS assigned by the radiologist and SpineNet at both time points. SpineNet assigned Pfirrmann grade 1 to several discs and grade 5 to more discs compared to the radiologists. The concordance correlation coefficients (CCC) of PSS between the radiologist and SpineNet were 0.54 (95% CI: 0.28 to 0.79) at baseline and 0.54 (0.27 to 0.80) at follow-up. The average kappa (κ) values of 0.74 (0.68 to 0.81) at baseline and 0.68 (0.58 to 0.77) at follow-up. CCC of PSS between the radiologists was 0.83 (0.69 to 0.97) at baseline and 0.78 (0.61 to 0.95) at follow-up, with κ values ranging from 0.73 to 0.96. We found fair to substantial agreement in DD grading between SpineNet and the radiologist, albeit with notable discrepancies. These findings indicate that AI-based systems like SpineNet hold promise as complementary tools in radiological evaluation, including in longitudinal studies, but emphasize the need for ongoing refinement of AI algorithms.

MRI Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab

Filter Papers

Tags

The imaging crisis in axial spondyloarthritis.

Artificial intelligence-guided distal radius fracture detection on plain radiographs in comparison with human raters.

Impact of test set composition on AI performance in pediatric wrist fracture detection in X-rays.

Segmentation of the thoracolumbar fascia in ultrasound imaging: a deep learning approach.

From error to prevention of wrong-level spine surgery: a review.

Performance of Artificial Intelligence in Diagnosing Lumbar Spinal Stenosis: A Systematic Review and Meta-Analysis.

Artificial intelligence algorithm improves radiologists' bone age assessment accuracy artificial intelligence algorithm improves radiologists' bone age assessment accuracy.

Measuring the severity of knee osteoarthritis with an aberration-free fast line scanning Raman imaging system.

Metal Suppression Magnetic Resonance Imaging Techniques in Orthopaedic and Spine Surgery.

Comparison of lumbar disc degeneration grading between deep learning model SpineNet and radiologist: a longitudinal study with a 14-year follow-up.

Ready to Sharpen Your Edge?