Sort by:
Page 5 of 1201199 results

Beyond Diagnosis: Evaluating Multimodal LLMs for Pathology Localization in Chest Radiographs

Advait Gosai, Arun Kavishwar, Stephanie L. McNamara, Soujanya Samineni, Renato Umeton, Alexander Chowdhury, William Lotter

arxiv logopreprintSep 22 2025
Recent work has shown promising performance of frontier large language models (LLMs) and their multimodal counterparts in medical quizzes and diagnostic tasks, highlighting their potential for broad clinical utility given their accessible, general-purpose nature. However, beyond diagnosis, a fundamental aspect of medical image interpretation is the ability to localize pathological findings. Evaluating localization not only has clinical and educational relevance but also provides insight into a model's spatial understanding of anatomy and disease. Here, we systematically assess two general-purpose MLLMs (GPT-4 and GPT-5) and a domain-specific model (MedGemma) in their ability to localize pathologies on chest radiographs, using a prompting pipeline that overlays a spatial grid and elicits coordinate-based predictions. Averaged across nine pathologies in the CheXlocalize dataset, GPT-5 exhibited a localization accuracy of 49.7%, followed by GPT-4 (39.1%) and MedGemma (17.7%), all lower than a task-specific CNN baseline (59.9%) and a radiologist benchmark (80.1%). Despite modest performance, error analysis revealed that GPT-5's predictions were largely in anatomically plausible regions, just not always precisely localized. GPT-4 performed well on pathologies with fixed anatomical locations, but struggled with spatially variable findings and exhibited anatomically implausible predictions more frequently. MedGemma demonstrated the lowest performance on all pathologies, showing limited capacity to generalize to this novel task. Our findings highlight both the promise and limitations of current MLLMs in medical imaging and underscore the importance of integrating them with task-specific tools for reliable use.

MRN: Harnessing 2D Vision Foundation Models for Diagnosing Parkinson's Disease with Limited 3D MR Data

Ding Shaodong, Liu Ziyang, Zhou Yijun, Liu Tao

arxiv logopreprintSep 22 2025
The automatic diagnosis of Parkinson's disease is in high clinical demand due to its prevalence and the importance of targeted treatment. Current clinical practice often relies on diagnostic biomarkers in QSM and NM-MRI images. However, the lack of large, high-quality datasets makes training diagnostic models from scratch prone to overfitting. Adapting pre-trained 3D medical models is also challenging, as the diversity of medical imaging leads to mismatches in voxel spacing and modality between pre-training and fine-tuning data. In this paper, we address these challenges by leveraging 2D vision foundation models (VFMs). Specifically, we crop multiple key ROIs from NM and QSM images, process each ROI through separate branches to compress the ROI into a token, and then combine these tokens into a unified patient representation for classification. Within each branch, we use 2D VFMs to encode axial slices of the 3D ROI volume and fuse them into the ROI token, guided by an auxiliary segmentation head that steers the feature extraction toward specific brain nuclei. Additionally, we introduce multi-ROI supervised contrastive learning, which improves diagnostic performance by pulling together representations of patients from the same class while pushing away those from different classes. Our approach achieved first place in the MICCAI 2025 PDCADxFoundation challenge, with an accuracy of 86.0% trained on a dataset of only 300 labeled QSM and NM-MRI scans, outperforming the second-place method by 5.5%.These results highlight the potential of 2D VFMs for clinical analysis of 3D MR images.

Conditional Diffusion Models for CT Image Synthesis from CBCT: A Systematic Review

Alzahra Altalib, Chunhui Li, Alessandro Perelli

arxiv logopreprintSep 22 2025
Objective: Cone-beam computed tomography (CBCT) provides a low-dose imaging alternative to conventional CT, but suffers from noise, scatter, and artifacts that degrade image quality. Synthetic CT (sCT) aims to translate CBCT to high-quality CT-like images for improved anatomical accuracy and dosimetric precision. Although deep learning approaches have shown promise, they often face limitations in generalizability and detail preservation. Conditional diffusion models (CDMs), with their iterative refinement process, offers a novel solution. This review systematically examines the use of CDMs for CBCT-to-sCT synthesis. Methods: A systematic search was conducted in Web of Science, Scopus, and Google Scholar for studies published between 2013 and 2024. Inclusion criteria targeted works employing conditional diffusion models specifically for sCT generation. Eleven relevant studies were identified and analyzed to address three questions: (1) What conditional diffusion methods are used? (2) How do they compare to conventional deep learning in accuracy? (3) What are their clinical implications? Results: CDMs incorporating anatomical priors and spatial-frequency features demonstrated improved structural preservation and noise robustness. Energy-guided and hybrid latent models enabled enhanced dosimetric accuracy and personalized image synthesis. Across studies, CDMs consistently outperformed traditional deep learning models in noise suppression and artefact reduction, especially in challenging cases like lung imaging and dual-energy CT. Conclusion: Conditional diffusion models show strong potential for generalized, accurate sCT generation from CBCT. However, clinical adoption remains limited. Future work should focus on scalability, real-time inference, and integration with multi-modal imaging to enhance clinical relevance.

Automated Labeling of Intracranial Arteries with Uncertainty Quantification Using Deep Learning

Javier Bisbal, Patrick Winter, Sebastian Jofre, Aaron Ponce, Sameer A. Ansari, Ramez Abdalla, Michael Markl, Oliver Welin Odeback, Sergio Uribe, Cristian Tejos, Julio Sotelo, Susanne Schnell, David Marlevi

arxiv logopreprintSep 22 2025
Accurate anatomical labeling of intracranial arteries is essential for cerebrovascular diagnosis and hemodynamic analysis but remains time-consuming and subject to interoperator variability. We present a deep learning-based framework for automated artery labeling from 3D Time-of-Flight Magnetic Resonance Angiography (3D ToF-MRA) segmentations (n=35), incorporating uncertainty quantification to enhance interpretability and reliability. We evaluated three convolutional neural network architectures: (1) a UNet with residual encoder blocks, reflecting commonly used baselines in vascular labeling; (2) CS-Net, an attention-augmented UNet incorporating channel and spatial attention mechanisms for enhanced curvilinear structure recognition; and (3) nnUNet, a self-configuring framework that automates preprocessing, training, and architectural adaptation based on dataset characteristics. Among these, nnUNet achieved the highest labeling performance (average Dice score: 0.922; average surface distance: 0.387 mm), with improved robustness in anatomically complex vessels. To assess predictive confidence, we implemented test-time augmentation (TTA) and introduced a novel coordinate-guided strategy to reduce interpolation errors during augmented inference. The resulting uncertainty maps reliably indicated regions of anatomical ambiguity, pathological variation, or manual labeling inconsistency. We further validated clinical utility by comparing flow velocities derived from automated and manual labels in co-registered 4D Flow MRI datasets, observing close agreement with no statistically significant differences. Our framework offers a scalable, accurate, and uncertainty-aware solution for automated cerebrovascular labeling, supporting downstream hemodynamic analysis and facilitating clinical integration.

Visual Instruction Pretraining for Domain-Specific Foundation Models

Yuxuan Li, Yicheng Zhang, Wenhao Tang, Yimian Dai, Ming-Ming Cheng, Xiang Li, Jian Yang

arxiv logopreprintSep 22 2025
Modern computer vision is converging on a closed loop in which perception, reasoning and generation mutually reinforce each other. However, this loop remains incomplete: the top-down influence of high-level reasoning on the foundational learning of low-level perceptual features is not yet underexplored. This paper addresses this gap by proposing a new paradigm for pretraining foundation models in downstream domains. We introduce Visual insTruction Pretraining (ViTP), a novel approach that directly leverages reasoning to enhance perception. ViTP embeds a Vision Transformer (ViT) backbone within a Vision-Language Model and pretrains it end-to-end using a rich corpus of visual instruction data curated from target downstream domains. ViTP is powered by our proposed Visual Robustness Learning (VRL), which compels the ViT to learn robust and domain-relevant features from a sparse set of visual tokens. Extensive experiments on 16 challenging remote sensing and medical imaging benchmarks demonstrate that ViTP establishes new state-of-the-art performance across a diverse range of downstream tasks. The code is available at github.com/zcablii/ViTP.

Medical AI Consensus: A Multi-Agent Framework for Radiology Report Generation and Evaluation

Ahmed T. Elboardy, Ghada Khoriba, Essam A. Rashed

arxiv logopreprintSep 22 2025
Automating radiology report generation poses a dual challenge: building clinically reliable systems and designing rigorous evaluation protocols. We introduce a multi-agent reinforcement learning framework that serves as both a benchmark and evaluation environment for multimodal clinical reasoning in the radiology ecosystem. The proposed framework integrates large language models (LLMs) and large vision models (LVMs) within a modular architecture composed of ten specialized agents responsible for image analysis, feature extraction, report generation, review, and evaluation. This design enables fine-grained assessment at both the agent level (e.g., detection and segmentation accuracy) and the consensus level (e.g., report quality and clinical relevance). We demonstrate an implementation using chatGPT-4o on public radiology datasets, where LLMs act as evaluators alongside medical radiologist feedback. By aligning evaluation protocols with the LLM development lifecycle, including pretraining, finetuning, alignment, and deployment, the proposed benchmark establishes a path toward trustworthy deviance-based radiology report generation.

Path-Weighted Integrated Gradients for Interpretable Dementia Classification

Firuz Kamalov, Mohmad Al Falasi, Fadi Thabtah

arxiv logopreprintSep 22 2025
Integrated Gradients (IG) is a widely used attribution method in explainable artificial intelligence (XAI). In this paper, we introduce Path-Weighted Integrated Gradients (PWIG), a generalization of IG that incorporates a customizable weighting function into the attribution integral. This modification allows for targeted emphasis along different segments of the path between a baseline and the input, enabling improved interpretability, noise mitigation, and the detection of path-dependent feature relevance. We establish its theoretical properties and illustrate its utility through experiments on a dementia classification task using the OASIS-1 MRI dataset. Attribution maps generated by PWIG highlight clinically meaningful brain regions associated with various stages of dementia, providing users with sharp and stable explanations. The results suggest that PWIG offers a flexible and theoretically grounded approach for enhancing attribution quality in complex predictive models.

Exploring Machine Learning Models for Physical Dose Calculation in Carbon Ion Therapy Using Heterogeneous Imaging Data - A Proof of Concept Study

Miriam Schwarze, Hui Khee Looe, Björn Poppe, Pichaya Tappayuthpijarn, Leo Thomas, Hans Rabus

arxiv logopreprintSep 22 2025
Background: Accurate and fast dose calculation is essential for optimizing carbon ion therapy. Existing machine learning (ML) models have been developed for other radiotherapy modalities. They use patient data with uniform CT imaging properties. Purpose: This study investigates the application of several ML models for physical dose calculation in carbon ion therapy and compares their ability to generalize to CT data with varying resolutions. Among the models examined is a Diffusion Model, which is tested for the first time for the calculation of physical dose distributions. Methods: A dataset was generated using publicly available CT images of the head and neck region. Monoenergetic carbon ion beams were simulated at various initial energies using Geant4 simulation software. A U-Net architecture was developed for dose prediction based on distributions of material density in patients and of absorbed dose in water. It was trained as a Generative Adversarial Network (GAN) generator, a Diffusion Model noise estimator, and as a standalone network. Their performances were compared with two models from literature. Results: All models produced dose distributions deviating by less than 2% from that obtained by a full Monte Carlo simulation, even for a patient not seen during training. Dose calculation time on a GPU was in the range of 3 ms to 15 s. The resource-efficient U-Net appears to perform comparably to the more computationally intensive GAN and Diffusion Model. Conclusion: This study demonstrates that ML models can effectively balance accuracy and speed for physical dose calculation in carbon ion therapy. Using the computationally efficient U-Net can help conserve resources. The generalizability of the models to different CT image resolutions enables the use for different patients without extensive retraining.

CPT-4DMR: Continuous sPatial-Temporal Representation for 4D-MRI Reconstruction

Xinyang Wu, Muheng Li, Xia Li, Orso Pusterla, Sairos Safai, Philippe C. Cattin, Antony J. Lomax, Ye Zhang

arxiv logopreprintSep 22 2025
Four-dimensional MRI (4D-MRI) is an promising technique for capturing respiratory-induced motion in radiation therapy planning and delivery. Conventional 4D reconstruction methods, which typically rely on phase binning or separate template scans, struggle to capture temporal variability, complicate workflows, and impose heavy computational loads. We introduce a neural representation framework that considers respiratory motion as a smooth, continuous deformation steered by a 1D surrogate signal, completely replacing the conventional discrete sorting approach. The new method fuses motion modeling with image reconstruction through two synergistic networks: the Spatial Anatomy Network (SAN) encodes a continuous 3D anatomical representation, while a Temporal Motion Network (TMN), guided by Transformer-derived respiratory signals, produces temporally consistent deformation fields. Evaluation using a free-breathing dataset of 19 volunteers demonstrates that our template- and phase-free method accurately captures both regular and irregular respiratory patterns, while preserving vessel and bronchial continuity with high anatomical fidelity. The proposed method significantly improves efficiency, reducing the total processing time from approximately five hours required by conventional discrete sorting methods to just 15 minutes of training. Furthermore, it enables inference of each 3D volume in under one second. The framework accurately reconstructs 3D images at any respiratory state, achieves superior performance compared to conventional methods, and demonstrates strong potential for application in 4D radiation therapy planning and real-time adaptive treatment.

Neural Network-Driven Direct CBCT-Based Dose Calculation for Head-and-Neck Proton Treatment Planning

Muheng Li, Evangelia Choulilitsa, Lisa Fankhauser, Francesca Albertini, Antony Lomax, Ye Zhang

arxiv logopreprintSep 22 2025
Accurate dose calculation on cone beam computed tomography (CBCT) images is essential for modern proton treatment planning workflows, particularly when accounting for inter-fractional anatomical changes in adaptive treatment scenarios. Traditional CBCT-based dose calculation suffers from image quality limitations, requiring complex correction workflows. This study develops and validates a deep learning approach for direct proton dose calculation from CBCT images using extended Long Short-Term Memory (xLSTM) neural networks. A retrospective dataset of 40 head-and-neck cancer patients with paired planning CT and treatment CBCT images was used to train an xLSTM-based neural network (CBCT-NN). The architecture incorporates energy token encoding and beam's-eye-view sequence modelling to capture spatial dependencies in proton dose deposition patterns. Training utilized 82,500 paired beam configurations with Monte Carlo-generated ground truth doses. Validation was performed on 5 independent patients using gamma analysis, mean percentage dose error assessment, and dose-volume histogram comparison. The CBCT-NN achieved gamma pass rates of 95.1 $\pm$ 2.7% using 2mm/2% criteria. Mean percentage dose errors were 2.6 $\pm$ 1.4% in high-dose regions ($>$90% of max dose) and 5.9 $\pm$ 1.9% globally. Dose-volume histogram analysis showed excellent preservation of target coverage metrics (Clinical Target Volume V95% difference: -0.6 $\pm$ 1.1%) and organ-at-risk constraints (parotid mean dose difference: -0.5 $\pm$ 1.5%). Computation time is under 3 minutes without sacrificing Monte Carlo-level accuracy. This study demonstrates the proof-of-principle of direct CBCT-based proton dose calculation using xLSTM neural networks. The approach eliminates traditional correction workflows while achieving comparable accuracy and computational efficiency suitable for adaptive protocols.
Page 5 of 1201199 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.