Latest Papers on Radiology AI. Tags: Benchmark SOTA

Artificial Intelligence in Prenatal Ultrasound: A Systematic Review of Diagnostic Tools for Detecting Congenital Anomalies

Dunne, J., Kumarasamy, C., Belay, D. G., Betran, A. P., Gebremedhin, A. T., Mengistu, S., Nyadanu, S. D., Roy, A., Tessema, G., Tigest, T., Pereira, G.

•preprint•Jul 5 2025

BackgroundArtificial intelligence (AI) has potentially shown promise in interpreting ultrasound imaging through flexible pattern recognition and algorithmic learning, but implementation in clinical practice remains limited. This study aimed to investigate the current application of AI in prenatal ultrasounds to identify congenital anomalies, and to synthesise challenges and opportunities for the advancement of AI-assisted ultrasound diagnosis. This comprehensive analysis addresses the clinical translation gap between AI performance metrics and practical implementation in prenatal care. MethodsSystematic searches were conducted in eight electronic databases (CINAHL Plus, Ovid/EMBASE, Ovid/MEDLINE, ProQuest, PubMed, Scopus, Web of Science and Cochrane Library) and Google Scholar from inception to May 2025. Studies were included if they applied an AI-assisted ultrasound diagnostic tool to identify a congenital anomaly during pregnancy. This review adhered to PRISMA guidelines for systematic reviews. We evaluated study quality using the Checklist for Artificial Intelligence in Medical Imaging (CLAIM) guidelines. FindingsOf 9,918 records, 224 were identified for full-text review and 20 met the inclusion criteria. The majority of studies (11/20, 55%) were conducted in China, with most published after 2020 (16/20, 80%). All AI models were developed as an assistive tool for anomaly detection or classification. Most models (85%) focused on single-organ systems: heart (35%), brain/cranial (30%), or facial features (20%), while three studies (15%) attempted multi-organ anomaly detection. Fifty percent of the included studies reported exceptionally high model performance, with both sensitivity and specificity exceeding 0.95, with AUC-ROC values ranging from 0.91 to 0.97. Most studies (75%) lacked external validation, with internal validation often limited to small training and testing datasets. InterpretationWhile AI applications in prenatal ultrasound showed potential, current evidence indicates significant limitations in their practical implementation. Much work is required to optimise their application, including the external validation of diagnostic models with clinical utility to have real-world implications. Future research should prioritise larger-scale multi-centre studies, developing multi-organ anomaly detection capabilities rather than the current single-organ focus, and robust evaluation of AI tools in real-world clinical settings.

Ultrasound Detection Review In Silico Academic Lab Benchmark SOTA

Causal-SAM-LLM: Large Language Models as Causal Reasoners for Robust Medical Segmentation

Tao Tang, Shijie Xu, Yiting Wu, Zhixiang Lu

•preprint•Jul 4 2025

The clinical utility of deep learning models for medical image segmentation is severely constrained by their inability to generalize to unseen domains. This failure is often rooted in the models learning spurious correlations between anatomical content and domain-specific imaging styles. To overcome this fundamental challenge, we introduce Causal-SAM-LLM, a novel framework that elevates Large Language Models (LLMs) to the role of causal reasoners. Our framework, built upon a frozen Segment Anything Model (SAM) encoder, incorporates two synergistic innovations. First, Linguistic Adversarial Disentanglement (LAD) employs a Vision-Language Model to generate rich, textual descriptions of confounding image styles. By training the segmentation model's features to be contrastively dissimilar to these style descriptions, it learns a representation robustly purged of non-causal information. Second, Test-Time Causal Intervention (TCI) provides an interactive mechanism where an LLM interprets a clinician's natural language command to modulate the segmentation decoder's features in real-time, enabling targeted error correction. We conduct an extensive empirical evaluation on a composite benchmark from four public datasets (BTCV, CHAOS, AMOS, BraTS), assessing generalization under cross-scanner, cross-modality, and cross-anatomy settings. Causal-SAM-LLM establishes a new state of the art in out-of-distribution (OOD) robustness, improving the average Dice score by up to 6.2 points and reducing the Hausdorff Distance by 15.8 mm over the strongest baseline, all while using less than 9% of the full model's trainable parameters. Our work charts a new course for building robust, efficient, and interactively controllable medical AI systems.

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA

SAMed-2: Selective Memory Enhanced Medical Segment Anything Model

Zhiling Yan, Sifan Song, Dingjie Song, Yiwei Li, Rong Zhou, Weixiang Sun, Zhennong Chen, Sekeun Kim, Hui Ren, Tianming Liu, Quanzheng Li, Xiang Li, Lifang He, Lichao Sun

•preprint•Jul 4 2025

Recent "segment anything" efforts show promise by learning from large-scale data, but adapting such models directly to medical images remains challenging due to the complexity of medical data, noisy annotations, and continual learning requirements across diverse modalities and anatomical structures. In this work, we propose SAMed-2, a new foundation model for medical image segmentation built upon the SAM-2 architecture. Specifically, we introduce a temporal adapter into the image encoder to capture image correlations and a confidence-driven memory mechanism to store high-certainty features for later retrieval. This memory-based strategy counters the pervasive noise in large-scale medical datasets and mitigates catastrophic forgetting when encountering new tasks or modalities. To train and evaluate SAMed-2, we curate MedBank-100k, a comprehensive dataset spanning seven imaging modalities and 21 medical segmentation tasks. Our experiments on both internal benchmarks and 10 external datasets demonstrate superior performance over state-of-the-art baselines in multi-task scenarios. The code is available at: https://github.com/ZhilingYan/Medical-SAM-Bench.

Mixed Modality Segmentation Methodology In Silico Academic Lab Open Code Open Dataset Benchmark SOTA

Improving risk assessment of local failure in brain metastases patients using vision transformers - A multicentric development and validation study.

Erdur AC, Scholz D, Nguyen QM, Buchner JA, Mayinger M, Christ SM, Brunner TB, Wittig A, Zimmer C, Meyer B, Guckenberger M, Andratschke N, El Shafie RA, Debus JU, Rogers S, Riesterer O, Schulze K, Feldmann HJ, Blanck O, Zamboglou C, Bilger-Z A, Grosu AL, Wolff R, Eitz KA, Combs SE, Bernhardt D, Wiestler B, Rueckert D, Peeken JC

•papers•Jul 4 2025

This study investigates the use of Vision Transformers (ViTs) to predict Freedom from Local Failure (FFLF) in patients with brain metastases using pre-operative MRI scans. The goal is to develop a model that enhances risk stratification and informs personalized treatment strategies. Within the AURORA retrospective trial, patients (n = 352) who received surgical resection followed by post-operative stereotactic radiotherapy (SRT) were collected from seven hospitals. We trained our ViT for the direct image-to-risk task on T1-CE and FLAIR sequences and combined clinical features along the way. We employed segmentation-guided image modifications, model adaptations, and specialized patient sampling strategies during training. The model was evaluated with five-fold cross-validation and ensemble learning across all validation runs. An external, international test cohort (n = 99) within the dataset was used to assess the generalization capabilities of the model, and saliency maps were generated for explainability analysis. We achieved a competent C-Index score of 0.7982 on the test cohort, surpassing all clinical, CNN-based, and hybrid baselines. Kaplan-Meier analysis showed significant FFLF risk stratification. Saliency maps focusing on the BM core confirmed that model explanations aligned with expert observations. Our ViT-based model offers a potential for personalized treatment strategies and follow-up regimens in patients with brain metastases. It provides an alternative to radiomics as a robust, automated tool for clinical workflows, capable of improving patient outcomes through effective risk assessment and stratification.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

A preliminary attempt to harmonize using physics-constrained deep neural networks for multisite and multiscanner MRI datasets (PhyCHarm).

Lee G, Ye DH, Oh SH

•papers•Jul 4 2025

In magnetic resonance imaging (MRI), variations in scan parameters and scanner specifications can result in differences in image appearance. To minimize these differences, harmonization in MRI has been suggested as a crucial image processing technique. In this study, we developed an MR physics-based harmonization framework, Physics-Constrained Deep Neural Network for multisite and multiscanner Harmonization (PhyCHarm). PhyCHarm includes two deep neural networks: (1) the Quantitative Maps Generator to generate T1- and M0-maps and (2) the Harmonization Network. We used an open dataset consisting of 3T MP2RAGE images from 50 healthy individuals for the Quantitative Maps Generator and a traveling dataset consisting of 3T T1w images from 9 healthy individuals for the Harmonization Network. PhyCHarm was evaluated using the structural similarity index measure (SSIM), peak signal-to-noise ratio (PSNR), and normalized-root-mean square error (NRMSE) for the Quantitative Maps Generator, and using SSIM, PSNR, and volumetric analysis for the Harmonization network, respectively. PhyCHarm demonstrated increased SSIM and PSNR, the highest Dice score in the FSL FAST segmentation results for gray and white matter compared to U-Net, Pix2Pix, CALAMITI, and HarmonizingFlows. PhyCHarm showed a greater reduction in volume differences after harmonization for gray and white matter than U-Net, Pix2Pix, CALAMITI, or HarmonizingFlows. As an initial step toward developing advanced harmonization techniques, we investigated the applicability of physics-based constraints within a supervised training strategy. The proposed physics constraints could be integrated with unsupervised methods, paving the way for more sophisticated harmonization qualities.

MRI Image Synthesis Neurological Methodology In Silico Academic Lab Benchmark SOTA

Disease Classification of Pulmonary Xenon Ventilation MRI Using Artificial Intelligence.

Matheson AM, Bdaiwi AS, Willmering MM, Hysinger EB, McCormack FX, Walkup LL, Cleveland ZI, Woods JC

•papers•Jul 4 2025

Hyperpolarized 129Xenon magnetic resonance imaging (MRI) measures the extent of lung ventilation by ventilation defect percent (VDP), but VDP alone cannot distinguish between diseases. Prior studies have reported anecdotal evidence of disease-specific defect patterns such as wedge-shaped defects in asthma and polka-dot defects in lymphangioleiomyomatosis (LAM). Neural network artificial intelligence can evaluate image shapes and textures to classify images, but this has not been attempted in xenon MRI. We hypothesized that an artificial intelligence network trained on ventilation MRI could classify diseases based on spatial patterns in lung MR images alone. Xenon MRI data in six pulmonary conditions (control, asthma, bronchiolitis obliterans syndrome, bronchopulmonary dysplasia, cystic fibrosis, LAM) were used to train convolutional neural networks. Network performance was assessed with top-1 and top-2 accuracy, recall, precision, and one-versus-all area under the curve (AUC). Gradient class-activation-mapping (Grad-CAM) was used to visualize what parts of the images were important for classification. Training/testing data were collected from 262 participants. The top performing network (VGG-16) had top-1 accuracy=56%, top-2 accuracy=78%, recall=.30, precision=.70, and AUC=.85. The network performed better on larger classes (top-1 accuracy: control=62% [n=57], CF=67% [n=85], LAM=69% [n=61]) and outperformed human observers (human top-1 accuracy=40%, network top-1 accuracy=61% on a single training fold). We developed an artificial intelligence tool that could classify disease from xenon ventilation images alone that outperformed human observers. This suggests that xenon images have additional, disease-specific information that could be useful for cases that are clinically challenging or for disease phenotyping.

MRI Classification Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Quantitative CT Imaging in Chronic Obstructive Pulmonary Disease.

Park S, Lee SM, Hwang HJ, Oh SY, Choe J, Seo JB

•papers•Jul 4 2025

Chronic obstructive pulmonary disease (COPD) is a highly heterogeneous condition characterized by diverse pulmonary and extrapulmonary manifestations. Efforts to quantify its various components using CT imaging have advanced, aiming for more precise, objective, and reproducible assessment and management. Beyond emphysema and small airway disease, the two major components of COPD, CT quantification enables the evaluation of pulmonary vascular alteration, ventilation-perfusion mismatches, fissure completeness, and extrapulmonary features such as altered body composition, osteoporosis, and atherosclerosis. Recent advancements, including the application of deep learning techniques, have facilitated fully automated segmentation and quantification of CT parameters, while innovations such as image standardization hold promise for enhancing clinical applicability. Numerous studies have reported associations between quantitative CT parameters and clinical or physiologic outcomes in patients with COPD. However, barriers remain to the routine implementation of these technologies in clinical practice. This review highlights recent research on COPD quantification, explores advances in technology, and also discusses current challenges and potential solutions for improving quantification methods.

CT Segmentation Chest Review In Silico Academic Lab Benchmark SOTA GenAI

Novel CAC Dispersion and Density Score to Predict Myocardial Infarction and Cardiovascular Mortality.

Huangfu G, Ihdayhid AR, Kwok S, Konstantopoulos J, Niu K, Lu J, Smallbone H, Figtree GA, Chow CK, Dembo L, Adler B, Hamilton-Craig C, Grieve SM, Chan MTV, Butler C, Tandon V, Nagele P, Woodard PK, Mrkobrada M, Szczeklik W, Aziz YFA, Biccard B, Devereaux PJ, Sheth T, Dwivedi G, Chow BJW

•papers•Jul 4 2025

Coronary artery calcification (CAC) provides robust prediction for major adverse cardiovascular events (MACE), but current techniques disregard plaque distribution and protective effects of high CAC density. We investigated whether a novel CAC-dispersion and density (CAC-DAD) score will exhibit superior prognostic value compared with the Agatston score (AS) for MACE prediction. We conducted a multicenter, retrospective, cross-sectional study of 961 patients (median age, 67 years; 61% male) who underwent cardiac computed tomography for cardiovascular or perioperative risk assessment. Blinded analyzers applied deep learning algorithms to noncontrast scans to calculate the CAC-DAD score, which adjusts for the spatial distribution of CAC and assigns a protective weight factor for lesions with ≥1000 Hounsfield units. Associations were assessed using frailty regression. Over a median follow-up of 30 (30-460) days, 61 patients experienced MACE (nonfatal myocardial infarction or cardiovascular mortality). An elevated CAC-DAD score (≥2050 based on optimal cutoff) captured more MACE than AS ≥400 (74% versus 57%; P=0.002). Univariable analysis revealed that an elevated CAC-DAD score, AS ≥400 and AS ≥100, age, diabetes, hypertension, and statin use predicted MACE. On multivariable analysis, only the CAC-DAD score (hazard ratio, 2.57 [95% CI, 1.43-4.61]; P=0.002), age, statins, and diabetes remained significant. The inclusion of the CAC-DAD score in a predictive model containing demographic factors and AS improved the C statistic from 0.61 to 0.66 (P=0.008). The fully automated CAC-DAD score improves MACE prediction compared with the AS. Patients with a high CAC-DAD score, including those with a low AS, may be at higher risk and warrant intensification of their preventative therapies.

CT Segmentation Cardiac Retrospective Clinical In Silico Academic Lab Benchmark SOTA

A tailored deep learning approach for early detection of oral cancer using a 19-layer CNN on clinical lip and tongue images.

Liu P, Bagi K

•papers•Jul 4 2025

Early and accurate detection of oral cancer plays a pivotal role in improving patient outcomes. This research introduces a custom-designed, 19-layer convolutional neural network (CNN) for the automated diagnosis of oral cancer using clinical images of the lips and tongue. The methodology integrates advanced preprocessing steps, including min-max normalization and histogram-based contrast enhancement, to optimize image features critical for reliable classification. The model is extensively validated on the publicly available Oral Cancer (Lips and Tongue) Images (OCI) dataset, which is divided into 80% training and 20% testing subsets. Comprehensive performance evaluation employs established metrics-accuracy, sensitivity, specificity, precision, and F1-score. Our CNN architecture achieved an accuracy of 99.54%, sensitivity of 95.73%, specificity of 96.21%, precision of 96.34%, and F1-score of 96.03%, demonstrating substantial improvements over prominent transfer learning benchmarks, including SqueezeNet, AlexNet, Inception, VGG19, and ResNet50, all tested under identical experimental protocols. The model's robust performance, efficient computation, and high reliability underline its practicality for clinical application and support its superiority over existing approaches. This study provides a reproducible pipeline and a new reference point for deep learning-based oral cancer detection, facilitating translation into real-world healthcare environments and promising enhanced diagnostic confidence.

X-Ray Classification Methodology In Silico Academic Lab Benchmark SOTA Reproducibility

Recent Advances in Applying Machine Learning to Proton Radiotherapy.

Wildman VL, Wynne J, Momin S, Kesarwala AH, Yang X

•papers•Jul 3 2025

In radiation oncology, precision and timeliness of both planning and treatment are paramount values of patient care. Machine learning has increasingly been applied to various aspects of photon radiotherapy to reduce manual error and improve the efficiency of clinical decision making; however, applications to proton therapy remain an emerging field in comparison. This systematic review aims to comprehensively cover all current and potential applications of machine learning to the proton therapy clinical workflow, an area that has not been extensively explored in literature. PubMed and Embase were utilized to identify studies pertinent to machine learning in proton therapy between 2019 to 2024. An initial search on PubMed was made with the search strategy "'proton therapy', 'machine learning', 'deep learning'". A subsequent search on Embase was made with "("proton therapy") AND ("machine learning" OR "deep learning")". In total, 38 relevant studies have been summarized and incorporated. It is observed that U-Net architectures are prevalent in the patient pre-screening process, while convolutional neural networks play an important role in dose and range prediction. Both image quality improvement and transformation between modalities to decrease extraneous radiation are popular targets of various models. To adaptively improve treatments, advanced architectures such as general deep inception or deep cascaded convolution neural networks improve online dose verification and range monitoring. With the rising clinical usage of proton therapy, machine learning models have been increasingly proposed to facilitate both treatment and discovery. Significantly improving patient screening, planning, image quality, and dose and range calculation, machine learning is advancing the precision and personalization of proton therapy.

Mixed Modality Image Synthesis Review In Silico Academic Lab Benchmark SOTA

Filter Papers

Tags

Artificial Intelligence in Prenatal Ultrasound: A Systematic Review of Diagnostic Tools for Detecting Congenital Anomalies

Causal-SAM-LLM: Large Language Models as Causal Reasoners for Robust Medical Segmentation

SAMed-2: Selective Memory Enhanced Medical Segment Anything Model

Improving risk assessment of local failure in brain metastases patients using vision transformers - A multicentric development and validation study.

A preliminary attempt to harmonize using physics-constrained deep neural networks for multisite and multiscanner MRI datasets (PhyCHarm).

Disease Classification of Pulmonary Xenon Ventilation MRI Using Artificial Intelligence.

Quantitative CT Imaging in Chronic Obstructive Pulmonary Disease.

Novel CAC Dispersion and Density Score to Predict Myocardial Infarction and Cardiovascular Mortality.

A tailored deep learning approach for early detection of oral cancer using a 19-layer CNN on clinical lip and tongue images.

Recent Advances in Applying Machine Learning to Proton Radiotherapy.

Ready to Sharpen Your Edge?