Latest Papers on Radiology AI.

Beyond Diagnosis: Evaluating Multimodal LLMs for Pathology Localization in Chest Radiographs

Advait Gosai, Arun Kavishwar, Stephanie L. McNamara, Soujanya Samineni, Renato Umeton, Alexander Chowdhury, William Lotter

•preprint•Sep 22 2025

Recent work has shown promising performance of frontier large language models (LLMs) and their multimodal counterparts in medical quizzes and diagnostic tasks, highlighting their potential for broad clinical utility given their accessible, general-purpose nature. However, beyond diagnosis, a fundamental aspect of medical image interpretation is the ability to localize pathological findings. Evaluating localization not only has clinical and educational relevance but also provides insight into a model's spatial understanding of anatomy and disease. Here, we systematically assess two general-purpose MLLMs (GPT-4 and GPT-5) and a domain-specific model (MedGemma) in their ability to localize pathologies on chest radiographs, using a prompting pipeline that overlays a spatial grid and elicits coordinate-based predictions. Averaged across nine pathologies in the CheXlocalize dataset, GPT-5 exhibited a localization accuracy of 49.7%, followed by GPT-4 (39.1%) and MedGemma (17.7%), all lower than a task-specific CNN baseline (59.9%) and a radiologist benchmark (80.1%). Despite modest performance, error analysis revealed that GPT-5's predictions were largely in anatomically plausible regions, just not always precisely localized. GPT-4 performed well on pathologies with fixed anatomical locations, but struggled with spatially variable findings and exhibited anatomically implausible predictions more frequently. MedGemma demonstrated the lowest performance on all pathologies, showing limited capacity to generalize to this novel task. Our findings highlight both the promise and limitations of current MLLMs in medical imaging and underscore the importance of integrating them with task-specific tools for reliable use.

X-Ray Segmentation Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

MRN: Harnessing 2D Vision Foundation Models for Diagnosing Parkinson's Disease with Limited 3D MR Data

Ding Shaodong, Liu Ziyang, Zhou Yijun, Liu Tao

•preprint•Sep 22 2025

The automatic diagnosis of Parkinson's disease is in high clinical demand due to its prevalence and the importance of targeted treatment. Current clinical practice often relies on diagnostic biomarkers in QSM and NM-MRI images. However, the lack of large, high-quality datasets makes training diagnostic models from scratch prone to overfitting. Adapting pre-trained 3D medical models is also challenging, as the diversity of medical imaging leads to mismatches in voxel spacing and modality between pre-training and fine-tuning data. In this paper, we address these challenges by leveraging 2D vision foundation models (VFMs). Specifically, we crop multiple key ROIs from NM and QSM images, process each ROI through separate branches to compress the ROI into a token, and then combine these tokens into a unified patient representation for classification. Within each branch, we use 2D VFMs to encode axial slices of the 3D ROI volume and fuse them into the ROI token, guided by an auxiliary segmentation head that steers the feature extraction toward specific brain nuclei. Additionally, we introduce multi-ROI supervised contrastive learning, which improves diagnostic performance by pulling together representations of patients from the same class while pushing away those from different classes. Our approach achieved first place in the MICCAI 2025 PDCADxFoundation challenge, with an accuracy of 86.0% trained on a dataset of only 300 labeled QSM and NM-MRI scans, outperforming the second-place method by 5.5%.These results highlight the potential of 2D VFMs for clinical analysis of 3D MR images.

MRI Classification Neurological Methodology In Silico Academic Lab Benchmark SOTA

Conditional Diffusion Models for CT Image Synthesis from CBCT: A Systematic Review

Alzahra Altalib, Chunhui Li, Alessandro Perelli

•preprint•Sep 22 2025

Objective: Cone-beam computed tomography (CBCT) provides a low-dose imaging alternative to conventional CT, but suffers from noise, scatter, and artifacts that degrade image quality. Synthetic CT (sCT) aims to translate CBCT to high-quality CT-like images for improved anatomical accuracy and dosimetric precision. Although deep learning approaches have shown promise, they often face limitations in generalizability and detail preservation. Conditional diffusion models (CDMs), with their iterative refinement process, offers a novel solution. This review systematically examines the use of CDMs for CBCT-to-sCT synthesis. Methods: A systematic search was conducted in Web of Science, Scopus, and Google Scholar for studies published between 2013 and 2024. Inclusion criteria targeted works employing conditional diffusion models specifically for sCT generation. Eleven relevant studies were identified and analyzed to address three questions: (1) What conditional diffusion methods are used? (2) How do they compare to conventional deep learning in accuracy? (3) What are their clinical implications? Results: CDMs incorporating anatomical priors and spatial-frequency features demonstrated improved structural preservation and noise robustness. Energy-guided and hybrid latent models enabled enhanced dosimetric accuracy and personalized image synthesis. Across studies, CDMs consistently outperformed traditional deep learning models in noise suppression and artefact reduction, especially in challenging cases like lung imaging and dual-energy CT. Conclusion: Conditional diffusion models show strong potential for generalized, accurate sCT generation from CBCT. However, clinical adoption remains limited. Future work should focus on scalability, real-time inference, and integration with multi-modal imaging to enhance clinical relevance.

CT Image Synthesis Review In Silico Academic Lab GenAI Benchmark SOTA

Automated Labeling of Intracranial Arteries with Uncertainty Quantification Using Deep Learning

Javier Bisbal, Patrick Winter, Sebastian Jofre, Aaron Ponce, Sameer A. Ansari, Ramez Abdalla, Michael Markl, Oliver Welin Odeback, Sergio Uribe, Cristian Tejos, Julio Sotelo, Susanne Schnell, David Marlevi

•preprint•Sep 22 2025

Accurate anatomical labeling of intracranial arteries is essential for cerebrovascular diagnosis and hemodynamic analysis but remains time-consuming and subject to interoperator variability. We present a deep learning-based framework for automated artery labeling from 3D Time-of-Flight Magnetic Resonance Angiography (3D ToF-MRA) segmentations (n=35), incorporating uncertainty quantification to enhance interpretability and reliability. We evaluated three convolutional neural network architectures: (1) a UNet with residual encoder blocks, reflecting commonly used baselines in vascular labeling; (2) CS-Net, an attention-augmented UNet incorporating channel and spatial attention mechanisms for enhanced curvilinear structure recognition; and (3) nnUNet, a self-configuring framework that automates preprocessing, training, and architectural adaptation based on dataset characteristics. Among these, nnUNet achieved the highest labeling performance (average Dice score: 0.922; average surface distance: 0.387 mm), with improved robustness in anatomically complex vessels. To assess predictive confidence, we implemented test-time augmentation (TTA) and introduced a novel coordinate-guided strategy to reduce interpolation errors during augmented inference. The resulting uncertainty maps reliably indicated regions of anatomical ambiguity, pathological variation, or manual labeling inconsistency. We further validated clinical utility by comparing flow velocities derived from automated and manual labels in co-registered 4D Flow MRI datasets, observing close agreement with no statistically significant differences. Our framework offers a scalable, accurate, and uncertainty-aware solution for automated cerebrovascular labeling, supporting downstream hemodynamic analysis and facilitating clinical integration.

MRI Segmentation Neurological Methodology In Silico Academic Lab GenAI

Visual Instruction Pretraining for Domain-Specific Foundation Models

Yuxuan Li, Yicheng Zhang, Wenhao Tang, Yimian Dai, Ming-Ming Cheng, Xiang Li, Jian Yang

•preprint•Sep 22 2025

Modern computer vision is converging on a closed loop in which perception, reasoning and generation mutually reinforce each other. However, this loop remains incomplete: the top-down influence of high-level reasoning on the foundational learning of low-level perceptual features is not yet underexplored. This paper addresses this gap by proposing a new paradigm for pretraining foundation models in downstream domains. We introduce Visual insTruction Pretraining (ViTP), a novel approach that directly leverages reasoning to enhance perception. ViTP embeds a Vision Transformer (ViT) backbone within a Vision-Language Model and pretrains it end-to-end using a rich corpus of visual instruction data curated from target downstream domains. ViTP is powered by our proposed Visual Robustness Learning (VRL), which compels the ViT to learn robust and domain-relevant features from a sparse set of visual tokens. Extensive experiments on 16 challenging remote sensing and medical imaging benchmarks demonstrate that ViTP establishes new state-of-the-art performance across a diverse range of downstream tasks. The code is available at github.com/zcablii/ViTP.

Mixed Modality Classification Methodology In Silico Academic Lab Benchmark SOTA Open Code

Medical AI Consensus: A Multi-Agent Framework for Radiology Report Generation and Evaluation

Ahmed T. Elboardy, Ghada Khoriba, Essam A. Rashed

•preprint•Sep 22 2025

Automating radiology report generation poses a dual challenge: building clinically reliable systems and designing rigorous evaluation protocols. We introduce a multi-agent reinforcement learning framework that serves as both a benchmark and evaluation environment for multimodal clinical reasoning in the radiology ecosystem. The proposed framework integrates large language models (LLMs) and large vision models (LVMs) within a modular architecture composed of ten specialized agents responsible for image analysis, feature extraction, report generation, review, and evaluation. This design enables fine-grained assessment at both the agent level (e.g., detection and segmentation accuracy) and the consensus level (e.g., report quality and clinical relevance). We demonstrate an implementation using chatGPT-4o on public radiology datasets, where LLMs act as evaluators alongside medical radiologist feedback. By aligning evaluation protocols with the LLM development lifecycle, including pretraining, finetuning, alignment, and deployment, the proposed benchmark establishes a path toward trustworthy deviance-based radiology report generation.

Mixed Modality LLM Radiology Report Methodology In Silico Academic Lab Benchmark SOTA Open Code GenAI

Path-Weighted Integrated Gradients for Interpretable Dementia Classification

Firuz Kamalov, Mohmad Al Falasi, Fadi Thabtah

•preprint•Sep 22 2025

Integrated Gradients (IG) is a widely used attribution method in explainable artificial intelligence (XAI). In this paper, we introduce Path-Weighted Integrated Gradients (PWIG), a generalization of IG that incorporates a customizable weighting function into the attribution integral. This modification allows for targeted emphasis along different segments of the path between a baseline and the input, enabling improved interpretability, noise mitigation, and the detection of path-dependent feature relevance. We establish its theoretical properties and illustrate its utility through experiments on a dementia classification task using the OASIS-1 MRI dataset. Attribution maps generated by PWIG highlight clinically meaningful brain regions associated with various stages of dementia, providing users with sharp and stable explanations. The results suggest that PWIG offers a flexible and theoretically grounded approach for enhancing attribution quality in complex predictive models.

MRI Classification Neurological Methodology In Silico GenAI

Exploring Machine Learning Models for Physical Dose Calculation in Carbon Ion Therapy Using Heterogeneous Imaging Data - A Proof of Concept Study

Miriam Schwarze, Hui Khee Looe, Björn Poppe, Pichaya Tappayuthpijarn, Leo Thomas, Hans Rabus

•preprint•Sep 22 2025

Background: Accurate and fast dose calculation is essential for optimizing carbon ion therapy. Existing machine learning (ML) models have been developed for other radiotherapy modalities. They use patient data with uniform CT imaging properties. Purpose: This study investigates the application of several ML models for physical dose calculation in carbon ion therapy and compares their ability to generalize to CT data with varying resolutions. Among the models examined is a Diffusion Model, which is tested for the first time for the calculation of physical dose distributions. Methods: A dataset was generated using publicly available CT images of the head and neck region. Monoenergetic carbon ion beams were simulated at various initial energies using Geant4 simulation software. A U-Net architecture was developed for dose prediction based on distributions of material density in patients and of absorbed dose in water. It was trained as a Generative Adversarial Network (GAN) generator, a Diffusion Model noise estimator, and as a standalone network. Their performances were compared with two models from literature. Results: All models produced dose distributions deviating by less than 2% from that obtained by a full Monte Carlo simulation, even for a patient not seen during training. Dose calculation time on a GPU was in the range of 3 ms to 15 s. The resource-efficient U-Net appears to perform comparably to the more computationally intensive GAN and Diffusion Model. Conclusion: This study demonstrates that ML models can effectively balance accuracy and speed for physical dose calculation in carbon ion therapy. Using the computationally efficient U-Net can help conserve resources. The generalizability of the models to different CT image resolutions enables the use for different patients without extensive retraining.

CT Registration Neurological Methodology In Silico Academic Lab GenAI

Comprehensive Assessment of Tumor Stromal Heterogeneity in Bladder Cancer by Deep Learning and Habitat Radiomics.

Du Y, Sui Y, Tao Y, Cao J, Jiang X, Yu J, Wang B, Wang Y, Li H

•papers•Sep 22 2025

Tumor stromal heterogeneity plays a pivotal role in bladder cancer progression. The tumor-stroma ratio (TSR) is a key pathological marker reflecting stromal heterogeneity. This study aimed to develop a preoperative, CT-based machine learning model for predicting TSR in bladder cancer, comparing various radiomic approaches, and evaluating their utility in prognostic assessment and immunotherapy response prediction. A total of 477 bladder urothelial carcinoma patients from two centers were retrospectively included. Tumors were segmented on preoperative contrast-enhanced CT, and radiomic features were extracted. K-means clustering was used to divide tumors into subregions. Radiomics models were constructed: a conventional model (Intra), a multi-subregion model (Habitat), and single-subregion models (HabitatH1/H2/H3). A deep transfer learning model (DeepL) based on the largest tumor cross-section was also developed. Model performance was evaluated in training, testing, and external validation cohorts, and associations with recurrence-free survival, CD8+ T cell infiltration, and immunotherapy response were analyzed. The HabitatH1 model demonstrated robust diagnostic performance with favorable calibration and clinical utility. The DeepL model surpassed all radiomics models in predictive accuracy. A nomogram combining DeepL and clinical variables effectively predicted recurrence-free survival, CD8+ T cell infiltration, and immunotherapy response. Imaging-predicted TSR showed significant associations with the tumor immune microenvironment and treatment outcomes. CT-based habitat radiomics and deep learning models enable non-invasive, quantitative assessment of TSR in bladder cancer. The DeepL model provides superior diagnostic and prognostic value, supporting personalized treatment decisions and prediction of immunotherapy response.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Exploring transfer learning techniques for classifying Alzheimer's disease with rs-fMRI.

Abbasabadi S, Fattahi P, Shiri M

•papers•Sep 22 2025

Alzheimer's disease, the most prevalent form of dementia, leads to a fatal progression after progressively destroying memory at each stage. This irreversible disease appears more frequently in older populations. Even though research on Alzheimer's disease has risen over the past few years, the intricacy of brain structure and function creates challenges for accurate disease diagnosis. As a neuroimaging technology, resting-state functional magnetic resonance imaging enables researchers to study debilitating neural diseases while scanning the brain. The research investigates resting-state functional magnetic resonance imaging approaches and deep learning methods to distinguish between Alzheimer's patients and normal individuals. resting-state functional magnetic resonance imaging of 97 participants is obtained from the Alzheimer's disease neuroimaging initiative database, with 56 participants classified in the Alzheimer's disease group and 41 in the normal control group. Extensive preprocessing is applied to the resting-state functional magnetic resonance imaging data before classification. Using transfer learning, classification between the normal control and Alzheimer's disease groups is conducted with proposed VGG19, AlexNet, and ResNet50 algorithms; the classification accuracy of them is 96.91 %, 98.71 %, and 98.20 %, respectively. For evaluation, precision, recall, and F1-score are utilized as additional assessment metrics. The AlexNet model exhibits higher accuracy than the other models and outperforms them in other evaluation metrics, including precision, recall, and F1-score. While AlexNet achieves the highest overall classification performance, ResNet50 demonstrates superior interpretability through Grad-CAM visualizations, producing more anatomically focused and clinically meaningful attention maps.

MRI Classification Neurological Retrospective Clinical In Silico

Filter Papers

Tags

Beyond Diagnosis: Evaluating Multimodal LLMs for Pathology Localization in Chest Radiographs

MRN: Harnessing 2D Vision Foundation Models for Diagnosing Parkinson's Disease with Limited 3D MR Data

Conditional Diffusion Models for CT Image Synthesis from CBCT: A Systematic Review

Automated Labeling of Intracranial Arteries with Uncertainty Quantification Using Deep Learning

Visual Instruction Pretraining for Domain-Specific Foundation Models

Medical AI Consensus: A Multi-Agent Framework for Radiology Report Generation and Evaluation

Path-Weighted Integrated Gradients for Interpretable Dementia Classification

Exploring Machine Learning Models for Physical Dose Calculation in Carbon Ion Therapy Using Heterogeneous Imaging Data - A Proof of Concept Study

Comprehensive Assessment of Tumor Stromal Heterogeneity in Bladder Cancer by Deep Learning and Habitat Radiomics.

Exploring transfer learning techniques for classifying Alzheimer's disease with rs-fMRI.

Ready to Sharpen Your Edge?