Latest Papers on Radiology AI. Tags: Detection

Meta-analysis of AI-based pulmonary embolism detection: How reliable are deep learning models?

Lanza E, Ammirabile A, Francone M

•papers•May 23 2025

Deep learning (DL)-based methods show promise in detecting pulmonary embolism (PE) on CT pulmonary angiography (CTPA), potentially improving diagnostic accuracy and workflow efficiency. This meta-analysis aimed to (1) determine pooled performance estimates of DL algorithms for PE detection; and (2) compare the diagnostic efficacy of convolutional neural network (CNN)- versus U-Net-based architectures. Following PRISMA guidelines, we searched PubMed and EMBASE through April 15, 2025 for English-language studies (2010-2025) reporting DL models for PE detection with extractable 2 × 2 data or performance metrics. True/false positives and negatives were reconstructed when necessary under an assumed 50 % PE prevalence (with 0.5 continuity correction). We approximated AUROC as the mean of sensitivity and specificity if not directly reported. Sensitivity, specificity, accuracy, PPV and NPV were pooled using a DerSimonian-Laird random-effects model with Freeman-Tukey transformation; AUROC values were combined via a fixed-effect inverse-variance approach. Heterogeneity was assessed by Cochran's Q and I<sup>2</sup>. Subgroup analyses contrasted CNN versus U-Net models. Twenty-four studies (n = 22,984 patients) met inclusion criteria. Pooled estimates were: AUROC 0.895 (95 % CI: 0.874-0.917), sensitivity 0.894 (0.856-0.923), specificity 0.871 (0.831-0.903), accuracy 0.857 (0.833-0.882), PPV 0.832 (0.794-0.869) and NPV 0.902 (0.874-0.929). Between-study heterogeneity was high (I<sup>2</sup> ≈ 97 % for sensitivity/specificity). U-Net models exhibited higher sensitivity (0.899 vs 0.893) and CNN models higher specificity (0.926 vs 0.900); subgroup Q-tests confirmed significant differences for both sensitivity (p = 0.0002) and specificity (p < 0.001). DL algorithms demonstrate high diagnostic accuracy for PE detection on CTPA, with complementary strengths: U-Net architectures excel in true-positive identification, whereas CNNs yield fewer false positives. However, marked heterogeneity underscores the need for standardized, prospective validation before routine clinical implementation.

CT Detection Chest Meta Analysis In Silico Academic Lab

Optimizing the power of AI for fracture detection: from blind spots to breakthroughs.

Behzad S, Eibschutz L, Lu MY, Gholamrezanezhad A

•papers•May 23 2025

Artificial Intelligence (AI) is increasingly being integrated into the field of musculoskeletal (MSK) radiology, from research methods to routine clinical practice. Within the field of fracture detection, AI is allowing for precision and speed previously unimaginable. Yet, AI's decision-making processes are sometimes wrought with deficiencies, undermining trust, hindering accountability, and compromising diagnostic precision. To make AI a trusted ally for radiologists, we recommend incorporating clinical history, rationalizing AI decisions by explainable AI (XAI) techniques, increasing the variety and scale of training data to approach the complexity of a clinical situation, and active interactions between clinicians and developers. By bridging these gaps, the true potential of AI can be unlocked, enhancing patient outcomes and fundamentally transforming radiology through a harmonious integration of human expertise and intelligent technology. In this article, we aim to examine the factors contributing to AI inaccuracies and offer recommendations to address these challenges-benefiting both radiologists and developers striving to improve future algorithms.

X-Ray Detection Musculoskeletal Review Clinical Pilot Academic Lab Ethics

SD-MAD: Sign-Driven Few-shot Multi-Anomaly Detection in Medical Images

Kaiyu Guo, Tan Pan, Chen Jiang, Zijian Wang, Brian C. Lovell, Limei Han, Yuan Cheng, Mahsa Baktashmotlagh

•preprint•May 22 2025

Medical anomaly detection (AD) is crucial for early clinical intervention, yet it faces challenges due to limited access to high-quality medical imaging data, caused by privacy concerns and data silos. Few-shot learning has emerged as a promising approach to alleviate these limitations by leveraging the large-scale prior knowledge embedded in vision-language models (VLMs). Recent advancements in few-shot medical AD have treated normal and abnormal cases as a one-class classification problem, often overlooking the distinction among multiple anomaly categories. Thus, in this paper, we propose a framework tailored for few-shot medical anomaly detection in the scenario where the identification of multiple anomaly categories is required. To capture the detailed radiological signs of medical anomaly categories, our framework incorporates diverse textual descriptions for each category generated by a Large-Language model, under the assumption that different anomalies in medical images may share common radiological signs in each category. Specifically, we introduce SD-MAD, a two-stage Sign-Driven few-shot Multi-Anomaly Detection framework: (i) Radiological signs are aligned with anomaly categories by amplifying inter-anomaly discrepancy; (ii) Aligned signs are selected further to mitigate the effect of the under-fitting and uncertain-sample issue caused by limited medical data, employing an automatic sign selection strategy at inference. Moreover, we propose three protocols to comprehensively quantify the performance of multi-anomaly detection. Extensive experiments illustrate the effectiveness of our method.

Mixed Modality Detection Methodology In Silico Academic Lab GenAI

Customized GPT-4V(ision) for radiographic diagnosis: can large language model detect supernumerary teeth?

Aşar EM, İpek İ, Bi Lge K

•papers•May 21 2025

With the growing capabilities of language models like ChatGPT to process text and images, this study evaluated their accuracy in detecting supernumerary teeth on periapical radiographs. A customized GPT-4V model (CGPT-4V) was also developed to assess whether domain-specific training could improve diagnostic performance compared to standard GPT-4V and GPT-4o models. One hundred eighty periapical radiographs (90 with and 90 without supernumerary teeth) were evaluated using GPT-4 V, GPT-4o, and a fine-tuned CGPT-4V model. Each image was assessed separately with the standardized prompt "Are there any supernumerary teeth in the radiograph above?" to avoid contextual bias. Three dental experts scored the responses using a three-point Likert scale for positive cases and a binary scale for negatives. Chi-square tests and ROC analysis were used to compare model performances (p < 0.05). Among the three models, CGPT-4 V exhibited the highest accuracy, detecting supernumerary teeth correctly in 91% of cases, compared to 77% for GPT-4o and 63% for GPT-4V. The CGPT-4V model also demonstrated a significantly lower false positive rate (16%) than GPT-4V (42%). A statistically significant difference was found between CGPT-4V and GPT-4o (p < 0.001), while no significant difference was observed between GPT-4V and CGPT-4V or between GPT-4V and GPT-4o. Additionally, CGPT-4V successfully identified multiple supernumerary teeth in radiographs where present. These findings highlight the diagnostic potential of customized GPT models in dental radiology. Future research should focus on multicenter validation, seamless clinical integration, and cost-effectiveness to support real-world implementation.

X-Ray Detection Retrospective Clinical In Silico Academic Lab GenAI

Lung Nodule-SSM: Self-Supervised Lung Nodule Detection and Classification in Thoracic CT Images

Muniba Noreen, Furqan Shaukat

•preprint•May 21 2025

Lung cancer remains among the deadliest types of cancer in recent decades, and early lung nodule detection is crucial for improving patient outcomes. The limited availability of annotated medical imaging data remains a bottleneck in developing accurate computer-aided diagnosis (CAD) systems. Self-supervised learning can help leverage large amounts of unlabeled data to develop more robust CAD systems. With the recent advent of transformer-based architecture and their ability to generalize to unseen tasks, there has been an effort within the healthcare community to adapt them to various medical downstream tasks. Thus, we propose a novel "LungNodule-SSM" method, which utilizes selfsupervised learning with DINOv2 as a backbone to enhance lung nodule detection and classification without annotated data. Our methodology has two stages: firstly, the DINOv2 model is pre-trained on unlabeled CT scans to learn robust feature representations, then secondly, these features are fine-tuned using transformer-based architectures for lesionlevel detection and accurate lung nodule diagnosis. The proposed method has been evaluated on the challenging LUNA 16 dataset, consisting of 888 CT scans, and compared with SOTA methods. Our experimental results show the superiority of our proposed method with an accuracy of 98.37%, explaining its effectiveness in lung nodule detection. The source code, datasets, and pre-processed data can be accessed using the link:https://github.com/EMeRALDsNRPU/Lung-Nodule-SSM-Self-Supervised-Lung-Nodule-Detection-and-Classification/tree/main

CT Detection Chest Methodology In Silico Academic Lab Open Code

Artificial Intelligence and Musculoskeletal Surgical Applications.

Oettl FC, Zsidai B, Oeding JF, Samuelsson K

•papers•May 20 2025

Artificial intelligence (AI) has emerged as a transformative force in orthopedic surgery. Potentially encompassing pre-, intra-, and postoperative processes, it can process complex medical imaging, provide real-time surgical guidance, and analyze large datasets for outcome prediction and optimization. AI has shown improvements in surgical precision, efficiency, and patient outcomes across orthopedic subspecialties, and large language models and agentic AI systems are expanding AI utility beyond surgical applications into areas such as clinical documentation, patient education, and autonomous decision support. The successful implementation of AI in orthopedic surgery requires careful attention to validation, regulatory compliance, and healthcare system integration. As these technologies continue to advance, maintaining the balance between innovation and patient safety remains crucial, with the ultimate goal of achieving more personalized, efficient, and equitable healthcare delivery while preserving the essential role of human clinical judgment. This review examines the current landscape and future trajectory of AI applications in orthopedic surgery, highlighting both technological advances and their clinical impact. Studies have suggested that AI-assisted procedures achieve higher accuracy and better functional outcomes compared to conventional methods, while reducing operative times and complications. However, these technologies are designed to augment rather than replace clinical expertise, serving as sophisticated tools to enhance surgeons' capabilities and improve patient care.

Mixed Modality Detection Musculoskeletal Review In Silico Academic Lab

NOVA: A Benchmark for Anomaly Localization and Clinical Reasoning in Brain MRI

Cosmin I. Bercea, Jun Li, Philipp Raffler, Evamaria O. Riedel, Lena Schmitzer, Angela Kurz, Felix Bitzer, Paula Roßmüller, Julian Canisius, Mirjam L. Beyrle, Che Liu, Wenjia Bai, Bernhard Kainz, Julia A. Schnabel, Benedikt Wiestler

•preprint•May 20 2025

In many real-world applications, deployed models encounter inputs that differ from the data seen during training. Out-of-distribution detection identifies whether an input stems from an unseen distribution, while open-world recognition flags such inputs to ensure the system remains robust as ever-emerging, previously $unknown$ categories appear and must be addressed without retraining. Foundation and vision-language models are pre-trained on large and diverse datasets with the expectation of broad generalization across domains, including medical imaging. However, benchmarking these models on test sets with only a few common outlier types silently collapses the evaluation back to a closed-set problem, masking failures on rare or truly novel conditions encountered in clinical use. We therefore present $NOVA$, a challenging, real-life $evaluation-only$ benchmark of $\sim$900 brain MRI scans that span 281 rare pathologies and heterogeneous acquisition protocols. Each case includes rich clinical narratives and double-blinded expert bounding-box annotations. Together, these enable joint assessment of anomaly localisation, visual captioning, and diagnostic reasoning. Because NOVA is never used for training, it serves as an $extreme$ stress-test of out-of-distribution generalisation: models must bridge a distribution gap both in sample appearance and in semantic space. Baseline results with leading vision-language models (GPT-4o, Gemini 2.0 Flash, and Qwen2.5-VL-72B) reveal substantial performance drops across all tasks, establishing NOVA as a rigorous testbed for advancing models that can detect, localize, and reason about truly unknown anomalies.

MRI Detection Neurological Dataset Release In Silico Academic Lab Open Dataset Benchmark SOTA

Detection of carotid artery calcifications using artificial intelligence in dental radiographs: a systematic review and meta-analysis.

Arzani S, Soltani P, Karimi A, Yazdi M, Ayoub A, Khurshid Z, Galderisi D, Devlin H

•papers•May 19 2025

Carotid artery calcifications are important markers of cardiovascular health, often associated with atherosclerosis and a higher risk of stroke. Recent research shows that dental radiographs can help identify these calcifications, allowing for earlier detection of vascular diseases. Advances in artificial intelligence (AI) have improved the ability to detect carotid calcifications in dental images, making it a useful screening tool. This systematic review and meta-analysis aimed to evaluate how accurately AI methods can identify carotid calcifications in dental radiographs. A systematic search in databases including PubMed, Scopus, Embase, and Web of Science for studies on AI algorithms used to detect carotid calcifications in dental radiographs was conducted. Two independent reviewers collected data on study aims, imaging techniques, and statistical measures such as sensitivity and specificity. A meta-analysis using random effects was performed, and the risk of bias was evaluated with the QUADAS-2 tool. Nine studies were suitable for qualitative analysis, while five provided data for quantitative analysis. These studies assessed AI algorithms using cone beam computed tomography (n = 3) and panoramic radiographs (n = 6). The sensitivity of the included studies ranged from 0.67 to 0.98 and specificity varied between 0.85 and 0.99. The overall effect size, by considering only one AI method in each study, resulted in a sensitivity of 0.92 [95% CI 0.81 to 0.97] and a specificity of 0.96 [95% CI 0.92 to 0.97]. The high sensitivity and specificity indicate that AI methods could be effective screening tools, enhancing the early detection of stroke and related cardiovascular risks. Not applicable.

X-Ray Detection Vascular Meta Analysis In Silico Academic Lab

The Role of Machine Learning to Detect Occult Neck Lymph Node Metastases in Early-Stage (T1-T2/N0) Oral Cavity Carcinomas.

Troise S, Ugga L, Esposito M, Positano M, Elefante A, Capasso S, Cuocolo R, Merola R, Committeri U, Abbate V, Bonavolontà P, Nocini R, Dell'Aversana Orabona G

•papers•May 19 2025

Oral cavity carcinomas (OCCs) represent roughly 50% of all head and neck cancers. The risk of occult neck metastases for early-stage OCCs ranges from 15% to 35%, hence the need to develop tools that can support the diagnosis of detecting these neck metastases. Machine learning and radiomic features are emerging as effective tools in this field. Thus, the aim of this study is to demonstrate the effectiveness of radiomic features to predict the risk of occult neck metastases in early-stage (T1-T2/N0) OCCs. Retrospective study. A single-institution analysis (Maxillo-facial Surgery Unit, University of Naples Federico II). A retrospective analysis was conducted on 75 patients surgically treated for early-stage OCC. For all patients, data regarding TNM, in particular pN status after the histopathological examination, have been obtained and the analysis of radiomic features from MRI has been extrapolated. 56 patients confirmed N0 status after surgery, while 19 resulted in pN+. The radiomic features, extracted by a machine-learning algorithm, exhibited the ability to preoperatively discriminate occult neck metastases with a sensitivity of 78%, specificity of 83%, an AUC of 86%, accuracy of 80%, and a positive predictive value (PPV) of 63%. Our results seem to confirm that radiomic features, extracted by machine learning methods, are effective tools in detecting occult neck metastases in early-stage OCCs. The clinical relevance of this study is that radiomics could be used routinely as a preoperative tool to support diagnosis and to help surgeons in the surgical decision-making process, particularly regarding surgical indications for neck lymph node treatment.

MRI Detection Retrospective Clinical In Silico Academic Lab

Patient-Specific Autoregressive Models for Organ Motion Prediction in Radiotherapy

Yuxiang Lai, Jike Zhong, Vanessa Su, Xiaofeng Yang

•preprint•May 17 2025

Radiotherapy often involves a prolonged treatment period. During this time, patients may experience organ motion due to breathing and other physiological factors. Predicting and modeling this motion before treatment is crucial for ensuring precise radiation delivery. However, existing pre-treatment organ motion prediction methods primarily rely on deformation analysis using principal component analysis (PCA), which is highly dependent on registration quality and struggles to capture periodic temporal dynamics for motion modeling.In this paper, we observe that organ motion prediction closely resembles an autoregressive process, a technique widely used in natural language processing (NLP). Autoregressive models predict the next token based on previous inputs, naturally aligning with our objective of predicting future organ motion phases. Building on this insight, we reformulate organ motion prediction as an autoregressive process to better capture patient-specific motion patterns. Specifically, we acquire 4D CT scans for each patient before treatment, with each sequence comprising multiple 3D CT phases. These phases are fed into the autoregressive model to predict future phases based on prior phase motion patterns. We evaluate our method on a real-world test set of 4D CT scans from 50 patients who underwent radiotherapy at our institution and a public dataset containing 4D CT scans from 20 patients (some with multiple scans), totaling over 1,300 3D CT phases. The performance in predicting the motion of the lung and heart surpasses existing benchmarks, demonstrating its effectiveness in capturing motion dynamics from CT images. These results highlight the potential of our method to improve pre-treatment planning in radiotherapy, enabling more precise and adaptive radiation delivery.

CT Detection Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Filter Papers

Tags