Sort by:
Page 16 of 19190 results

The imaging crisis in axial spondyloarthritis.

Diekhoff T, Poddubnyy D

pubmed logopapersMay 16 2025
Imaging holds a pivotal yet contentious role in the early diagnosis of axial spondyloarthritis. Although MRI has enhanced our ability to detect early inflammatory changes, particularly bone marrow oedema in the sacroiliac joints, the poor specificity of this finding introduces a substantial risk of overdiagnosis. The well intentioned push by rheumatologists towards earlier intervention could inadvertently lead to the misclassification of mechanical or degenerative conditions (eg, osteitis condensans ilii) as inflammatory disease, especially in the absence of structural lesions. Diagnostic uncertainty is further fuelled by anatomical variability, sex differences, and suboptimal imaging protocols. Current strategies-such as quantifying bone marrow oedema and analysing its distribution patterns, and integrating clinical and laboratory data-offer partial guidance for avoiding overdiagnosis but fall short of resolving the core diagnostic dilemma. Emerging imaging technologies, including high-resolution sequences, quantitative MRI, radiomics, and artificial intelligence, could improve diagnostic precision, but these tools remain exploratory. This Viewpoint underscores the need for a shift in imaging approaches, recognising that although timely diagnosis and treatment is essential to prevent long-term structural damage, robust and reliable imaging criteria are also needed. Without such advances, the imaging field risks repeating past missteps seen in other rheumatological conditions.

Artificial intelligence-guided distal radius fracture detection on plain radiographs in comparison with human raters.

Ramadanov N, John P, Hable R, Schreyer AG, Shabo S, Prill R, Salzmann M

pubmed logopapersMay 16 2025
The aim of this study was to compare the performance of artificial intelligence (AI) in detecting distal radius fractures (DRFs) on plain radiographs with the performance of human raters. We retrospectively analysed all wrist radiographs taken in our hospital since the introduction of AI-guided fracture detection from 11 September 2023 to 10 September 2024. The ground truth was defined by the radiological report of a board-certified radiologist based solely on conventional radiographs. The following parameters were calculated: True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN), accuracy (%), Cohen's Kappa coefficient, F1 score, sensitivity (%), specificity (%), Youden Index (J Statistic). In total 1145 plain radiographs of the wrist were taken between 11 September 2023 and 10 September 2024. The mean age of the included patients was 46.6 years (± 27.3), ranging from 2 to 99 years and 59.0% were female. According to the ground truth, of the 556 anteroposterior (AP) radiographs, 225 cases (40.5%) had a DRF, and of the 589 lateral view radiographs, 240 cases (40.7%) had a DRF. The AI system showed the following results on AP radiographs: accuracy (%): 95.90; Cohen's Kappa: 0.913; F1 score: 0.947; sensitivity (%): 92.02; specificity (%): 98.45; Youden Index: 90.47. The orthopedic surgeon achieved a sensitivity of 91.5%, specificity of 97.8%, an overall accuracy of 95.1%, F1 score of 0.943, and Cohen's kappa of 0.901. These results were comparable to those of the AI model. AI-guided detection of DRF demonstrated diagnostic performance nearly identical to that of an experienced orthopedic surgeon across all key metrics. The marginal differences observed in sensitivity and specificity suggest that AI can reliably support clinical fracture assessment based solely on conventional radiographs.

Diagnostic challenges of carpal tunnel syndrome in patients with congenital thenar hypoplasia: a comprehensive review.

Naghizadeh H, Salkhori O, Akrami S, Khabiri SS, Arabzadeh A

pubmed logopapersMay 16 2025
Carpal Tunnel Syndrome (CTS) is the most common entrapment neuropathy, frequently presenting with pain, numbness, and muscle weakness due to median nerve compression. However, diagnosing CTS becomes particularly challenging in patients with Congenital Thenar Hypoplasia (CTH), a rare congenital anomaly characterized by underdeveloped thenar muscles. The overlapping symptoms of CTH and CTS, such as thumb weakness, impaired hand function, and thenar muscle atrophy, can obscure the identification of median nerve compression. This review highlights the diagnostic complexities arising from this overlap and evaluates existing clinical, imaging, and electrophysiological assessment methods. While traditional diagnostic tests, including Phalen's and Tinel's signs, exhibit limited sensitivity in CTH patients, advanced imaging modalities like ultrasonography (US), magnetic resonance imaging (MRI), and diffusion tensor imaging (DTI) provide valuable insights into structural abnormalities. Additionally, emerging technologies such as artificial intelligence (AI) enhance diagnostic precision by automating imaging analysis and identifying subtle nerve alterations. Combining clinical history, functional assessments, and advanced imaging, an interdisciplinary approach is critical to differentiate between CTH-related anomalies and CTS accurately. This comprehensive review underscores the need for tailored diagnostic protocols to improve early detection, personalised management, and outcomes for this unique patient population.

Impact of test set composition on AI performance in pediatric wrist fracture detection in X-rays.

Till T, Scherkl M, Stranger N, Singer G, Hankel S, Flucher C, Hržić F, Štajduhar I, Tschauner S

pubmed logopapersMay 16 2025
To evaluate how different test set sampling strategies-random selection and balanced sampling-affect the performance of artificial intelligence (AI) models in pediatric wrist fracture detection using radiographs, aiming to highlight the need for standardization in test set design. This retrospective study utilized the open-sourced GRAZPEDWRI-DX dataset of 6091 pediatric wrist radiographs. Two test sets, each containing 4588 images, were constructed: one using a balanced approach based on case difficulty, projection type, and fracture presence and the other a random selection. EfficientNet and YOLOv11 models were trained and validated on 18,762 radiographs and tested on both sets. Binary classification and object detection tasks were evaluated using metrics such as precision, recall, F1 score, AP50, and AP50-95. Statistical comparisons between test sets were performed using nonparametric tests. Performance metrics significantly decreased in the balanced test set with more challenging cases. For example, the precision for YOLOv11 models decreased from 0.95 in the random set to 0.83 in the balanced set. Similar trends were observed for recall, accuracy, and F1 score, indicating that models trained on easy-to-recognize cases performed poorly on more complex ones. These results were consistent across all model variants tested. AI models for pediatric wrist fracture detection exhibit reduced performance when tested on balanced datasets containing more difficult cases, compared to randomly selected cases. This highlights the importance of constructing representative and standardized test sets that account for clinical complexity to ensure robust AI performance in real-world settings. Question Do different sampling strategies based on samples' complexity have an influence in deep learning models' performance in fracture detection? Findings AI performance in pediatric wrist fracture detection significantly drops when tested on balanced datasets with more challenging cases, compared to randomly selected cases. Clinical relevance Without standardized and validated test datasets for AI that reflect clinical complexities, performance metrics may be overestimated, limiting the utility of AI in real-world settings.

High-Performance Prompting for LLM Extraction of Compression Fracture Findings from Radiology Reports.

Kanani MM, Monawer A, Brown L, King WE, Miller ZD, Venugopal N, Heagerty PJ, Jarvik JG, Cohen T, Cross NM

pubmed logopapersMay 16 2025
Extracting information from radiology reports can provide critical data to empower many radiology workflows. For spinal compression fractures, these data can facilitate evidence-based care for at-risk populations. Manual extraction from free-text reports is laborious, and error-prone. Large language models (LLMs) have shown promise; however, fine-tuning strategies to optimize performance in specific tasks can be resource intensive. A variety of prompting strategies have achieved similar results with fewer demands. Our study pioneers the use of Meta's Llama 3.1, together with prompt-based strategies, for automated extraction of compression fractures from free-text radiology reports, outputting structured data without model training. We tested performance on a time-based sample of CT exams covering the spine from 2/20/2024 to 2/22/2024 acquired across our healthcare enterprise (637 anonymized reports, age 18-102, 47% Female). Ground truth annotations were manually generated and compared against the performance of three models (Llama 3.1 70B, Llama 3.1 8B, and Vicuna 13B) with nine different prompting configurations for a total of 27 model/prompt experiments. The highest F1 score (0.91) was achieved by the 70B Llama 3.1 model when provided with a radiologist-written background, with similar results when the background was written by a separate LLM (0.86). The addition of few-shot examples to these prompts had variable impact on F1 measurements (0.89, 0.84 respectively). Comparable ROC-AUC and PR-AUC performance was observed. Our work demonstrated that an open-weights LLM excelled at extracting compression fractures findings from free-text radiology reports using prompt-based techniques without requiring extensive manually labeled examples for model training.

Automated CT segmentation for lower extremity tissues in lymphedema evaluation using deep learning.

Na S, Choi SJ, Ko Y, Urooj B, Huh J, Cha S, Jung C, Cheon H, Jeon JY, Kim KW

pubmed logopapersMay 16 2025
Clinical assessment of lymphedema, particularly for lymphedema severity and fluid-fibrotic lesions, remains challenging with traditional methods. We aimed to develop and validate a deep learning segmentation tool for automated tissue component analysis in lower extremity CT scans. For development datasets, lower extremity CT venography scans were collected in 118 patients with gynecologic cancers for algorithm training. Reference standards were created by segmentation of fat, muscle, and fluid-fibrotic tissue components using 3D slicer. A deep learning model based on the Unet++ architecture with an EfficientNet-B7 encoder was developed and trained. Segmentation accuracy of the deep learning model was validated in an internal validation set (n = 10) and an external validation set (n = 10) using Dice similarity coefficient (DSC) and volumetric similarity (VS). A graphical user interface (GUI) tool was developed for the visualization of the segmentation results. Our deep learning algorithm achieved high segmentation accuracy. Mean DSCs for each component and all components ranged from 0.945 to 0.999 in the internal validation set and 0.946 to 0.999 in the external validation set. Similar performance was observed in the VS, with mean VSs for all components ranging from 0.97 to 0.999. In volumetric analysis, mean volumes of the entire leg and each component did not differ significantly between reference standard and deep learning measurements (p > 0.05). Our GUI displays lymphedema mapping, highlighting segmented fat, muscle, and fluid-fibrotic components in the entire leg. Our deep learning algorithm provides an automated segmentation tool enabling accurate segmentation, volume measurement of tissue component, and lymphedema mapping. Question Clinical assessment of lymphedema remains challenging, particularly for tissue segmentation and quantitative severity evaluation. Findings A deep learning algorithm achieved DSCs > 0.95 and VS > 0.97 for fat, muscle, and fluid-fibrotic components in internal and external validation datasets. Clinical relevance The developed deep learning tool accurately segments and quantifies lower extremity tissue components on CT scans, enabling automated lymphedema evaluation and mapping with high segmentation accuracy.

Comparison of lumbar disc degeneration grading between deep learning model SpineNet and radiologist: a longitudinal study with a 14-year follow-up.

Murto N, Lund T, Kautiainen H, Luoma K, Kerttula L

pubmed logopapersMay 15 2025
To assess the agreement between lumbar disc degeneration (DD) grading by the convolutional neural network model SpineNet and radiologist's visual grading. In a 14-year follow-up MRI study involving 19 male volunteers, lumbar DD was assessed by SpineNet and two radiologists using the Pfirrmann classification at baseline (age 37) and after 14 years (age 51). Pfirrmann summary scores (PSS) were calculated by summing individual disc grades. The agreement between the first radiologist and SpineNet was analyzed, with the second radiologist's grading used for inter-observer agreement. Significant differences were observed in the Pfirrmann grades and PSS assigned by the radiologist and SpineNet at both time points. SpineNet assigned Pfirrmann grade 1 to several discs and grade 5 to more discs compared to the radiologists. The concordance correlation coefficients (CCC) of PSS between the radiologist and SpineNet were 0.54 (95% CI: 0.28 to 0.79) at baseline and 0.54 (0.27 to 0.80) at follow-up. The average kappa (κ) values of 0.74 (0.68 to 0.81) at baseline and 0.68 (0.58 to 0.77) at follow-up. CCC of PSS between the radiologists was 0.83 (0.69 to 0.97) at baseline and 0.78 (0.61 to 0.95) at follow-up, with κ values ranging from 0.73 to 0.96. We found fair to substantial agreement in DD grading between SpineNet and the radiologist, albeit with notable discrepancies. These findings indicate that AI-based systems like SpineNet hold promise as complementary tools in radiological evaluation, including in longitudinal studies, but emphasize the need for ongoing refinement of AI algorithms.

Accuracy and Reliability of Multimodal Imaging in Diagnosing Knee Sports Injuries.

Zhu D, Zhang Z, Li W

pubmed logopapersMay 15 2025
Due to differences in subjective experience and professional level among doctors, as well as inconsistent diagnostic criteria, there are issues with the accuracy and reliability of single imaging diagnosis results for knee joint injuries. To address these issues, magnetic resonance imaging (MRI), computed tomography (CT) and ultrasound (US) are adopted in this article for ensemble learning, and deep learning (DL) is combined for automatic analysis. By steps such as image enhancement, noise elimination, and tissue segmentation, the quality of image data is improved, and then convolutional neural networks (CNN) are used to automatically identify and classify injury types. The experimental results show that the DL model exhibits high sensitivity and specificity in the diagnosis of different types of injuries, such as anterior cruciate ligament tear, meniscus injury, cartilage injury, and fracture. The diagnostic accuracy of anterior cruciate ligament tear exceeds 90%, and the highest diagnostic accuracy of cartilage injury reaches 95.80%. In addition, compared with traditional manual image interpretation, the DL model has significant advantages in time efficiency, with a significant reduction in average interpretation time per case. The diagnostic consistency experiment shows that the DL model has high consistency with doctors' diagnosis results, with an overall error rate of less than 2%. The model has high accuracy and strong generalization ability when dealing with different types of joint injuries. These data indicate that combining multiple imaging technologies and the DL algorithm can effectively improve the accuracy and efficiency of diagnosing sports injuries of knee joints.

Artificial intelligence algorithm improves radiologists' bone age assessment accuracy artificial intelligence algorithm improves radiologists' bone age assessment accuracy.

Chang TY, Chou TY, Jen IA, Yuh YS

pubmed logopapersMay 15 2025
Artificial intelligence (AI) algorithms can provide rapid and precise radiographic bone age (BA) assessment. This study assessed the effects of an AI algorithm on the BA assessment performance of radiologists, and evaluated how automation bias could affect radiologists. In this prospective randomized crossover study, six radiologists with varying levels of experience (senior, mi-level, and junior) assessed cases from a test set of 200 standard BA radiographs. The test set was equally divided into two subsets: datasets A and B. Each radiologist assessed BA independently without AI assistance (A- B-) and with AI assistance (A+ B+). We used the mean of assessments made by two experts as the ground truth for accuracy assessment; subsequently, we calculated the mean absolute difference (MAD) between the radiologists' BA predictions and ground-truth BA and evaluated the proportion of estimates for which the MAD exceeded one year. Additionally, we compared the radiologists' performance under conditions of early AI assistance with their performance under conditions of delayed AI assistance; the radiologists were allowed to reject AI interpretations. The overall accuracy of senior, mid-level, and junior radiologists improved significantly with AI assistance than without AI assistance (MAD: 0.74 vs. 0.46 years, p < 0.001; proportion of assessments for which MAD exceeded 1 year: 24.0% vs. 8.4%, p < 0.001). The proportion of improved BA predictions with AI assistance (16.8%) was significantly higher than that of less accurate predictions with AI assistance (2.3%; p < 0.001). No consistent timing effect was observed between conditions of early and delayed AI assistance. Most disagreements between radiologists and AI occurred over images for patients aged ≤8 years. Senior radiologists had more disagreements than other radiologists. The AI algorithm improved the BA assessment accuracy of radiologists with varying experience levels. Automation bias was prone to affect less experienced radiologists.

From error to prevention of wrong-level spine surgery: a review.

Javadnia P, Gohari H, Salimi N, Alimohammadi E

pubmed logopapersMay 15 2025
Wrong-level spine surgery remains a significant concern in spine surgery, leading to devastating consequences for patients and healthcare systems alike. This comprehensive review aims to analyze the existing literature on wrong-level spine surgery in spine procedures, identifying key factors that contribute to these errors and exploring advanced strategies and technologies designed to prevent them. A systematic literature search was conducted across multiple databases, including PubMed, Scopus, EMBASE, and CINAHL. The selection criteria focused on preclinical and clinical studies that specifically addressed wrong site and wrong level surgeries in the context of spine surgery. The findings reveal a range of contributing factors to wrong-level spine surgeries, including communication failures, inadequate preoperative planning, and insufficient surgical protocols. The review emphasizes the critical role of innovative technologies-such as artificial intelligence, advanced imaging techniques, and surgical navigation systems-alongside established safety protocols like digital checklists and simulation training in enhancing surgical accuracy and preventing errors. In conclusion, integrating advanced technologies and systematic safety protocols is instrumental in reducing the incidence of wrong-level spine surgeries. This review underscores the importance of continuous education and the adoption of innovative solutions to foster a culture of safety and improve surgical outcomes. By addressing the multifaceted challenges associated with these errors, the field can work towards minimizing their occurrence and enhancing patient care.
Page 16 of 19190 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.