Latest Papers on Radiology AI. Tags: X-Ray

Detection of carotid artery calcifications using artificial intelligence in dental radiographs: a systematic review and meta-analysis.

Arzani S, Soltani P, Karimi A, Yazdi M, Ayoub A, Khurshid Z, Galderisi D, Devlin H

•papers•May 19 2025

Carotid artery calcifications are important markers of cardiovascular health, often associated with atherosclerosis and a higher risk of stroke. Recent research shows that dental radiographs can help identify these calcifications, allowing for earlier detection of vascular diseases. Advances in artificial intelligence (AI) have improved the ability to detect carotid calcifications in dental images, making it a useful screening tool. This systematic review and meta-analysis aimed to evaluate how accurately AI methods can identify carotid calcifications in dental radiographs. A systematic search in databases including PubMed, Scopus, Embase, and Web of Science for studies on AI algorithms used to detect carotid calcifications in dental radiographs was conducted. Two independent reviewers collected data on study aims, imaging techniques, and statistical measures such as sensitivity and specificity. A meta-analysis using random effects was performed, and the risk of bias was evaluated with the QUADAS-2 tool. Nine studies were suitable for qualitative analysis, while five provided data for quantitative analysis. These studies assessed AI algorithms using cone beam computed tomography (n = 3) and panoramic radiographs (n = 6). The sensitivity of the included studies ranged from 0.67 to 0.98 and specificity varied between 0.85 and 0.99. The overall effect size, by considering only one AI method in each study, resulted in a sensitivity of 0.92 [95% CI 0.81 to 0.97] and a specificity of 0.96 [95% CI 0.92 to 0.97]. The high sensitivity and specificity indicate that AI methods could be effective screening tools, enhancing the early detection of stroke and related cardiovascular risks. Not applicable.

X-Ray Detection Vascular Meta Analysis In Silico Academic Lab

Accuracy of segment anything model for classification of vascular stenosis in digital subtraction angiography.

Navasardyan V, Katz M, Goertz L, Zohranyan V, Navasardyan H, Shahzadi I, Kröger JR, Borggrefe J

•papers•May 19 2025

This retrospective study evaluates the diagnostic performance of an optimized comprehensive multi-stage framework based on the Segment Anything Model (SAM), which we named Dr-SAM, for detecting and grading vascular stenosis in the abdominal aorta and iliac arteries using digital subtraction angiography (DSA). A total of 100 DSA examinations were conducted on 100 patients. The infrarenal abdominal aorta (AAI), common iliac arteries (CIA), and external iliac arteries (EIA) were independently evaluated by two experienced radiologists using a standardized 5-point grading scale. Dr-SAM analyzed the same DSA images, and its assessments were compared with the average stenosis grading provided by the radiologists. Diagnostic accuracy was evaluated using Cohen's kappa, specificity, sensitivity, and Wilcoxon signed-rank tests. Interobserver agreement between radiologists, which established the reference standard, was strong (Cohen's kappa: CIA right = 0.95, CIA left = 0.94, EIA right = 0.98, EIA left = 0.98, AAI = 0.79). Dr-SAM showed high agreement with radiologist consensus for CIA (κ = 0.93 right, 0.91 left), moderate agreement for EIA (κ = 0.79 right, 0.76 left), and fair agreement for AAI (κ = 0.70). Dr-SAM demonstrated excellent specificity (up to 1.0) and robust sensitivity (0.67-0.83). Wilcoxon tests revealed no significant differences between Dr-SAM and radiologist grading (p > 0.05). Dr-SAM proved to be an accurate and efficient tool for vascular assessment, with the potential to streamline diagnostic workflows and reduce variability in stenosis grading. Its ability to deliver rapid and consistent evaluations may contribute to earlier detection of disease and the optimization of treatment strategies. Further studies are needed to confirm these findings in prospective settings and to enhance its capabilities, particularly in the detection of occlusions.

X-Ray Classification Vascular Retrospective Clinical In Silico Academic Lab

CorBenchX: Large-Scale Chest X-Ray Error Dataset and Vision-Language Model Benchmark for Report Error Correction

Jing Zou, Qingqiu Li, Chenyu Lian, Lihao Liu, Xiaohan Yan, Shujun Wang, Jing Qin

•preprint•May 17 2025

AI-driven models have shown great promise in detecting errors in radiology reports, yet the field lacks a unified benchmark for rigorous evaluation of error detection and further correction. To address this gap, we introduce CorBenchX, a comprehensive suite for automated error detection and correction in chest X-ray reports, designed to advance AI-assisted quality control in clinical practice. We first synthesize a large-scale dataset of 26,326 chest X-ray error reports by injecting clinically common errors via prompting DeepSeek-R1, with each corrupted report paired with its original text, error type, and human-readable description. Leveraging this dataset, we benchmark both open- and closed-source vision-language models,(e.g., InternVL, Qwen-VL, GPT-4o, o4-mini, and Claude-3.7) for error detection and correction under zero-shot prompting. Among these models, o4-mini achieves the best performance, with 50.6 % detection accuracy and correction scores of BLEU 0.853, ROUGE 0.924, BERTScore 0.981, SembScore 0.865, and CheXbertF1 0.954, remaining below clinical-level accuracy, highlighting the challenge of precise report correction. To advance the state of the art, we propose a multi-step reinforcement learning (MSRL) framework that optimizes a multi-objective reward combining format compliance, error-type accuracy, and BLEU similarity. We apply MSRL to QwenVL2.5-7B, the top open-source model in our benchmark, achieving an improvement of 38.3% in single-error detection precision and 5.2% in single-error correction over the zero-shot baseline.

X-Ray LLM Radiology Report Chest Dataset Release In Silico Open Dataset Benchmark SOTA

Prediction of cervical spondylotic myelopathy from a plain radiograph using deep learning with convolutional neural networks.

Tachi H, Kokabu T, Suzuki H, Ishikawa Y, Yabu A, Yanagihashi Y, Hyakumachi T, Shimizu T, Endo T, Ohnishi T, Ukeba D, Sudo H, Yamada K, Iwasaki N

•papers•May 17 2025

This study aimed to develop deep learning algorithms (DLAs) utilising convolutional neural networks (CNNs) to classify cervical spondylotic myelopathy (CSM) and cervical spondylotic radiculopathy (CSR) from plain cervical spine radiographs. Data from 300 patients (150 with CSM and 150 with CSR) were used for internal validation (IV) using five-fold cross-validation strategy. Additionally, 100 patients (50 with CSM and 50 with CSR) were included in the external validation (EV). Two DLAs were trained using CNNs on plain radiographs from C3-C6 for the binary classification of CSM and CSR, and for the prediction of the spinal canal area rate using magnetic resonance imaging. Model performance was evaluated on external data using metrics such as area under the curve (AUC), accuracy, and likelihood ratios. For the binary classification, the AUC ranged from 0.84 to 0.96, with accuracy between 78% and 95% during IV. In the EV, the AUC and accuracy were 0.96 and 90%, respectively. For the spinal canal area rate, correlation coefficients during five-fold cross-validation ranged from 0.57 to 0.64, with a mean correlation of 0.61 observed in the EV. DLAs developed with CNNs demonstrated promising accuracy for classifying CSM and CSR from plain radiographs. These algorithms have the potential to assist non-specialists in identifying patients who require further evaluation or referral to spine specialists, thereby reducing delays in the diagnosis and treatment of CSM.

X-Ray Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab

Artificial intelligence-guided distal radius fracture detection on plain radiographs in comparison with human raters.

Ramadanov N, John P, Hable R, Schreyer AG, Shabo S, Prill R, Salzmann M

•papers•May 16 2025

The aim of this study was to compare the performance of artificial intelligence (AI) in detecting distal radius fractures (DRFs) on plain radiographs with the performance of human raters. We retrospectively analysed all wrist radiographs taken in our hospital since the introduction of AI-guided fracture detection from 11 September 2023 to 10 September 2024. The ground truth was defined by the radiological report of a board-certified radiologist based solely on conventional radiographs. The following parameters were calculated: True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN), accuracy (%), Cohen's Kappa coefficient, F1 score, sensitivity (%), specificity (%), Youden Index (J Statistic). In total 1145 plain radiographs of the wrist were taken between 11 September 2023 and 10 September 2024. The mean age of the included patients was 46.6 years (± 27.3), ranging from 2 to 99 years and 59.0% were female. According to the ground truth, of the 556 anteroposterior (AP) radiographs, 225 cases (40.5%) had a DRF, and of the 589 lateral view radiographs, 240 cases (40.7%) had a DRF. The AI system showed the following results on AP radiographs: accuracy (%): 95.90; Cohen's Kappa: 0.913; F1 score: 0.947; sensitivity (%): 92.02; specificity (%): 98.45; Youden Index: 90.47. The orthopedic surgeon achieved a sensitivity of 91.5%, specificity of 97.8%, an overall accuracy of 95.1%, F1 score of 0.943, and Cohen's kappa of 0.901. These results were comparable to those of the AI model. AI-guided detection of DRF demonstrated diagnostic performance nearly identical to that of an experienced orthopedic surgeon across all key metrics. The marginal differences observed in sensitivity and specificity suggest that AI can reliably support clinical fracture assessment based solely on conventional radiographs.

X-Ray Detection Musculoskeletal Retrospective Clinical Clinical Pilot Academic Lab

From Embeddings to Accuracy: Comparing Foundation Models for Radiographic Classification

Xue Li, Jameson Merkow, Noel C. F. Codella, Alberto Santamaria-Pang, Naiteek Sangani, Alexander Ersoy, Christopher Burt, John W. Garrett, Richard J. Bruce, Joshua D. Warner, Tyler Bradshaw, Ivan Tarapov, Matthew P. Lungren, Alan B. McMillan

•preprint•May 16 2025

Foundation models, pretrained on extensive datasets, have significantly advanced machine learning by providing robust and transferable embeddings applicable to various domains, including medical imaging diagnostics. This study evaluates the utility of embeddings derived from both general-purpose and medical domain-specific foundation models for training lightweight adapter models in multi-class radiography classification, focusing specifically on tube placement assessment. A dataset comprising 8842 radiographs classified into seven distinct categories was employed to extract embeddings using six foundation models: DenseNet121, BiomedCLIP, Med-Flamingo, MedImageInsight, Rad-DINO, and CXR-Foundation. Adapter models were subsequently trained using classical machine learning algorithms. Among these combinations, MedImageInsight embeddings paired with an support vector machine adapter yielded the highest mean area under the curve (mAUC) at 93.8%, followed closely by Rad-DINO (91.1%) and CXR-Foundation (89.0%). In comparison, BiomedCLIP and DenseNet121 exhibited moderate performance with mAUC scores of 83.0% and 81.8%, respectively, whereas Med-Flamingo delivered the lowest performance at 75.1%. Notably, most adapter models demonstrated computational efficiency, achieving training within one minute and inference within seconds on CPU, underscoring their practicality for clinical applications. Furthermore, fairness analyses on adapters trained on MedImageInsight-derived embeddings indicated minimal disparities, with gender differences in performance within 2% and standard deviations across age groups not exceeding 3%. These findings confirm that foundation model embeddings-especially those from MedImageInsight-facilitate accurate, computationally efficient, and equitable diagnostic classification using lightweight adapters for radiographic image analysis.

X-Ray Classification Chest Retrospective Clinical In Silico Benchmark SOTA

CheX-DS: Improving Chest X-ray Image Classification with Ensemble Learning Based on DenseNet and Swin Transformer

Xinran Li, Yu Liu, Xiujuan Xu, Xiaowei Zhao

•preprint•May 16 2025

The automatic diagnosis of chest diseases is a popular and challenging task. Most current methods are based on convolutional neural networks (CNNs), which focus on local features while neglecting global features. Recently, self-attention mechanisms have been introduced into the field of computer vision, demonstrating superior performance. Therefore, this paper proposes an effective model, CheX-DS, for classifying long-tail multi-label data in the medical field of chest X-rays. The model is based on the excellent CNN model DenseNet for medical imaging and the newly popular Swin Transformer model, utilizing ensemble deep learning techniques to combine the two models and leverage the advantages of both CNNs and Transformers. The loss function of CheX-DS combines weighted binary cross-entropy loss with asymmetric loss, effectively addressing the issue of data imbalance. The NIH ChestX-ray14 dataset is selected to evaluate the model's effectiveness. The model outperforms previous studies with an excellent average AUC score of 83.76\%, demonstrating its superior performance.

X-Ray Classification Chest Retrospective Clinical In Silico Benchmark SOTA

Impact of test set composition on AI performance in pediatric wrist fracture detection in X-rays.

Till T, Scherkl M, Stranger N, Singer G, Hankel S, Flucher C, Hržić F, Štajduhar I, Tschauner S

•papers•May 16 2025

To evaluate how different test set sampling strategies-random selection and balanced sampling-affect the performance of artificial intelligence (AI) models in pediatric wrist fracture detection using radiographs, aiming to highlight the need for standardization in test set design. This retrospective study utilized the open-sourced GRAZPEDWRI-DX dataset of 6091 pediatric wrist radiographs. Two test sets, each containing 4588 images, were constructed: one using a balanced approach based on case difficulty, projection type, and fracture presence and the other a random selection. EfficientNet and YOLOv11 models were trained and validated on 18,762 radiographs and tested on both sets. Binary classification and object detection tasks were evaluated using metrics such as precision, recall, F1 score, AP50, and AP50-95. Statistical comparisons between test sets were performed using nonparametric tests. Performance metrics significantly decreased in the balanced test set with more challenging cases. For example, the precision for YOLOv11 models decreased from 0.95 in the random set to 0.83 in the balanced set. Similar trends were observed for recall, accuracy, and F1 score, indicating that models trained on easy-to-recognize cases performed poorly on more complex ones. These results were consistent across all model variants tested. AI models for pediatric wrist fracture detection exhibit reduced performance when tested on balanced datasets containing more difficult cases, compared to randomly selected cases. This highlights the importance of constructing representative and standardized test sets that account for clinical complexity to ensure robust AI performance in real-world settings. Question Do different sampling strategies based on samples' complexity have an influence in deep learning models' performance in fracture detection? Findings AI performance in pediatric wrist fracture detection significantly drops when tested on balanced datasets with more challenging cases, compared to randomly selected cases. Clinical relevance Without standardized and validated test datasets for AI that reflect clinical complexities, performance metrics may be overestimated, limiting the utility of AI in real-world settings.

X-Ray Detection Musculoskeletal Retrospective Clinical In Silico Academic Lab Open Dataset

Artificial intelligence in dentistry: awareness among dentists and computer scientists.

Costa ED, Vieira MA, Ambrosano GMB, Gaêta-Araujo H, Carneiro JA, Zancan BAG, Scaranti A, Macedo AA, Tirapelli C

•papers•May 16 2025

For clinical application of artificial intelligence (AI) in dentistry, collaboration with computer scientists is necessary. This study aims to evaluate the knowledge of dentists and computer scientists regarding the utilization of AI in dentistry, especially in dentomaxillofacial radiology. 610 participants (374 dentists and 236 computer scientists) took part in a survey about AI in dentistry and radiographic imaging. Response options contained Likert scale of agreement/disagreement. Descriptive analyses of agreement scores were performed using quartiles (minimum value, first quartile, median, third quartile, and maximum value). Non-parametric Mann-Whitney test was used to compare response scores between two categories (α = 5%). Dentists academics had higher agreement scores for the questions: "knowing the applications of AI in dentistry", "dentists taking the lead in AI research", "AI education should be part of teaching", "AI can increase the price of dental services", "AI can lead to errors in radiographic diagnosis", "AI can negatively interfere with the choice of Radiology specialty", "AI can cause a reduction in the employment of radiologists", "patient data can be hacked using AI" (p < 0.05). Computer scientists had higher concordance scores for the questions "having knowledge in AI" and "AI's potential to speed up and improve radiographic diagnosis". Although dentists acknowledge the potential benefits of AI in dentistry, they remain skeptical about its use and consider it important to integrate the topic of AI into dental education curriculum. On the other hand, computer scientists confirm technical expertise in AI and recognize its potential in dentomaxillofacial radiology.

X-Ray Review Academic Lab

CheXGenBench: A Unified Benchmark For Fidelity, Privacy and Utility of Synthetic Chest Radiographs

Raman Dutt, Pedro Sanchez, Yongchen Yao, Steven McDonagh, Sotirios A. Tsaftaris, Timothy Hospedales

•preprint•May 15 2025

We introduce CheXGenBench, a rigorous and multifaceted evaluation framework for synthetic chest radiograph generation that simultaneously assesses fidelity, privacy risks, and clinical utility across state-of-the-art text-to-image generative models. Despite rapid advancements in generative AI for real-world imagery, medical domain evaluations have been hindered by methodological inconsistencies, outdated architectural comparisons, and disconnected assessment criteria that rarely address the practical clinical value of synthetic samples. CheXGenBench overcomes these limitations through standardised data partitioning and a unified evaluation protocol comprising over 20 quantitative metrics that systematically analyse generation quality, potential privacy vulnerabilities, and downstream clinical applicability across 11 leading text-to-image architectures. Our results reveal critical inefficiencies in the existing evaluation protocols, particularly in assessing generative fidelity, leading to inconsistent and uninformative comparisons. Our framework establishes a standardised benchmark for the medical AI community, enabling objective and reproducible comparisons while facilitating seamless integration of both existing and future generative models. Additionally, we release a high-quality, synthetic dataset, SynthCheX-75K, comprising 75K radiographs generated by the top-performing model (Sana 0.6B) in our benchmark to support further research in this critical domain. Through CheXGenBench, we establish a new state-of-the-art and release our framework, models, and SynthCheX-75K dataset at https://raman1121.github.io/CheXGenBench/

X-Ray Image Synthesis Chest Dataset Release In Silico Academic Lab Open Dataset Open Code Benchmark SOTA

Filter Papers

Tags

Detection of carotid artery calcifications using artificial intelligence in dental radiographs: a systematic review and meta-analysis.

Accuracy of segment anything model for classification of vascular stenosis in digital subtraction angiography.

CorBenchX: Large-Scale Chest X-Ray Error Dataset and Vision-Language Model Benchmark for Report Error Correction

Prediction of cervical spondylotic myelopathy from a plain radiograph using deep learning with convolutional neural networks.

Artificial intelligence-guided distal radius fracture detection on plain radiographs in comparison with human raters.

From Embeddings to Accuracy: Comparing Foundation Models for Radiographic Classification

CheX-DS: Improving Chest X-ray Image Classification with Ensemble Learning Based on DenseNet and Swin Transformer

Impact of test set composition on AI performance in pediatric wrist fracture detection in X-rays.

Artificial intelligence in dentistry: awareness among dentists and computer scientists.

CheXGenBench: A Unified Benchmark For Fidelity, Privacy and Utility of Synthetic Chest Radiographs

Ready to Sharpen Your Edge?