Sort by:
Page 5 of 59584 results

Medical AI Consensus: A Multi-Agent Framework for Radiology Report Generation and Evaluation

Ahmed T. Elboardy, Ghada Khoriba, Essam A. Rashed

arxiv logopreprintSep 22 2025
Automating radiology report generation poses a dual challenge: building clinically reliable systems and designing rigorous evaluation protocols. We introduce a multi-agent reinforcement learning framework that serves as both a benchmark and evaluation environment for multimodal clinical reasoning in the radiology ecosystem. The proposed framework integrates large language models (LLMs) and large vision models (LVMs) within a modular architecture composed of ten specialized agents responsible for image analysis, feature extraction, report generation, review, and evaluation. This design enables fine-grained assessment at both the agent level (e.g., detection and segmentation accuracy) and the consensus level (e.g., report quality and clinical relevance). We demonstrate an implementation using chatGPT-4o on public radiology datasets, where LLMs act as evaluators alongside medical radiologist feedback. By aligning evaluation protocols with the LLM development lifecycle, including pretraining, finetuning, alignment, and deployment, the proposed benchmark establishes a path toward trustworthy deviance-based radiology report generation.

Learning Contrastive Multimodal Fusion with Improved Modality Dropout for Disease Detection and Prediction

Yi Gu, Kuniaki Saito, Jiaxin Ma

arxiv logopreprintSep 22 2025
As medical diagnoses increasingly leverage multimodal data, machine learning models are expected to effectively fuse heterogeneous information while remaining robust to missing modalities. In this work, we propose a novel multimodal learning framework that integrates enhanced modalities dropout and contrastive learning to address real-world limitations such as modality imbalance and missingness. Our approach introduces learnable modality tokens for improving missingness-aware fusion of modalities and augments conventional unimodal contrastive objectives with fused multimodal representations. We validate our framework on large-scale clinical datasets for disease detection and prediction tasks, encompassing both visual and tabular modalities. Experimental results demonstrate that our method achieves state-of-the-art performance, particularly in challenging and practical scenarios where only a single modality is available. Furthermore, we show its adaptability through successful integration with a recent CT foundation model. Our findings highlight the effectiveness, efficiency, and generalizability of our approach for multimodal learning, offering a scalable, low-cost solution with significant potential for real-world clinical applications. The code is available at https://github.com/omron-sinicx/medical-modality-dropout.

Visual Instruction Pretraining for Domain-Specific Foundation Models

Yuxuan Li, Yicheng Zhang, Wenhao Tang, Yimian Dai, Ming-Ming Cheng, Xiang Li, Jian Yang

arxiv logopreprintSep 22 2025
Modern computer vision is converging on a closed loop in which perception, reasoning and generation mutually reinforce each other. However, this loop remains incomplete: the top-down influence of high-level reasoning on the foundational learning of low-level perceptual features is not yet underexplored. This paper addresses this gap by proposing a new paradigm for pretraining foundation models in downstream domains. We introduce Visual insTruction Pretraining (ViTP), a novel approach that directly leverages reasoning to enhance perception. ViTP embeds a Vision Transformer (ViT) backbone within a Vision-Language Model and pretrains it end-to-end using a rich corpus of visual instruction data curated from target downstream domains. ViTP is powered by our proposed Visual Robustness Learning (VRL), which compels the ViT to learn robust and domain-relevant features from a sparse set of visual tokens. Extensive experiments on 16 challenging remote sensing and medical imaging benchmarks demonstrate that ViTP establishes new state-of-the-art performance across a diverse range of downstream tasks. The code is available at https://github.com/zcablii/ViTP.

Uncertainty-Supervised Interpretable and Robust Evidential Segmentation

Yuzhu Li, An Sui, Fuping Wu, Xiahai Zhuang

arxiv logopreprintSep 21 2025
Uncertainty estimation has been widely studied in medical image segmentation as a tool to provide reliability, particularly in deep learning approaches. However, previous methods generally lack effective supervision in uncertainty estimation, leading to low interpretability and robustness of the predictions. In this work, we propose a self-supervised approach to guide the learning of uncertainty. Specifically, we introduce three principles about the relationships between the uncertainty and the image gradients around boundaries and noise. Based on these principles, two uncertainty supervision losses are designed. These losses enhance the alignment between model predictions and human interpretation. Accordingly, we introduce novel quantitative metrics for evaluating the interpretability and robustness of uncertainty. Experimental results demonstrate that compared to state-of-the-art approaches, the proposed method can achieve competitive segmentation performance and superior results in out-of-distribution (OOD) scenarios while significantly improving the interpretability and robustness of uncertainty estimation. Code is available via https://github.com/suiannaius/SURE.

Echo-Path: Pathology-Conditioned Echo Video Generation

Kabir Hamzah Muhammad, Marawan Elbatel, Yi Qin, Xiaomeng Li

arxiv logopreprintSep 21 2025
Cardiovascular diseases (CVDs) remain the leading cause of mortality globally, and echocardiography is critical for diagnosis of both common and congenital cardiac conditions. However, echocardiographic data for certain pathologies are scarce, hindering the development of robust automated diagnosis models. In this work, we propose Echo-Path, a novel generative framework to produce echocardiogram videos conditioned on specific cardiac pathologies. Echo-Path can synthesize realistic ultrasound video sequences that exhibit targeted abnormalities, focusing here on atrial septal defect (ASD) and pulmonary arterial hypertension (PAH). Our approach introduces a pathology-conditioning mechanism into a state-of-the-art echo video generator, allowing the model to learn and control disease-specific structural and motion patterns in the heart. Quantitative evaluation demonstrates that the synthetic videos achieve low distribution distances, indicating high visual fidelity. Clinically, the generated echoes exhibit plausible pathology markers. Furthermore, classifiers trained on our synthetic data generalize well to real data and, when used to augment real training sets, it improves downstream diagnosis of ASD and PAH by 7\% and 8\% respectively. Code, weights and dataset are available here https://github.com/Marshall-mk/EchoPathv1

Diffusion-based arbitrary-scale magnetic resonance image super-resolution via progressive k-space reconstruction and denoising.

Wang J, Shi Z, Gu X, Yang Y, Sun J

pubmed logopapersSep 20 2025
Acquiring high-resolution Magnetic resonance (MR) images is challenging due to constraints such as hardware limitations and acquisition times. Super-resolution (SR) techniques offer a potential solution to enhance MR image quality without changing the magnetic resonance imaging (MRI) hardware. However, typical SR methods are designed for fixed upsampling scales and often produce over-smoothed images that lack fine textures and edge details. To address these issues, we propose a unified diffusion-based framework for arbitrary-scale in-plane MR image SR, dubbed Progressive Reconstruction and Denoising Diffusion Model (PRDDiff). Specifically, the forward diffusion process of PRDDiff gradually masks out high-frequency components and adds Gaussian noise to simulate the downsampling process in MRI. To reverse this process, we propose an Adaptive Resolution Restoration Network (ARRNet), which introduces a current step corresponding to the resolution of input MR image and an ending step corresponding to the target resolution. This design guide the ARRNet to recovering the clean MR image at the target resolution from input MR image. The SR process starts from an MR image at the initial resolution and gradually enhances them to higher resolution by progressively reconstructing high-frequency components and removing the noise based on the recovered MR image from ARRNet. Furthermore, we design a multi-stage SR strategy that incrementally enhances resolution through multiple sequential stages to further improve recovery accuracy. Each stage utilizes a set number of sampling steps from PRDDiff, guided by a specific ending step, to recover details pertinent to the predefined intermediate resolution. We conduct extensive experiments on fastMRI knee dataset, fastMRI brain dataset, our real-collected LR-HR brain dataset, and clinical pediatric cerebral palsy (CP) dataset, including T1-weighted and T2-weighted images for the brain and proton density-weighted images for the knee. The results demonstrate that PRDDiff outperforms previous MR image super-resolution methods in term of reconstruction accuracy, generalization, and downstream lesion segmentation accuracy and CP classification performance. The code is publicly available at https://github.com/Jiazhen-Wang/PRDDiff-main.

A Novel Metric for Detecting Memorization in Generative Models for Brain MRI Synthesis

Antonio Scardace, Lemuel Puglisi, Francesco Guarnera, Sebastiano Battiato, Daniele Ravì

arxiv logopreprintSep 20 2025
Deep generative models have emerged as a transformative tool in medical imaging, offering substantial potential for synthetic data generation. However, recent empirical studies highlight a critical vulnerability: these models can memorize sensitive training data, posing significant risks of unauthorized patient information disclosure. Detecting memorization in generative models remains particularly challenging, necessitating scalable methods capable of identifying training data leakage across large sets of generated samples. In this work, we propose DeepSSIM, a novel self-supervised metric for quantifying memorization in generative models. DeepSSIM is trained to: i) project images into a learned embedding space and ii) force the cosine similarity between embeddings to match the ground-truth SSIM (Structural Similarity Index) scores computed in the image space. To capture domain-specific anatomical features, training incorporates structure-preserving augmentations, allowing DeepSSIM to estimate similarity reliably without requiring precise spatial alignment. We evaluate DeepSSIM in a case study involving synthetic brain MRI data generated by a Latent Diffusion Model (LDM) trained under memorization-prone conditions, using 2,195 MRI scans from two publicly available datasets (IXI and CoRR). Compared to state-of-the-art memorization metrics, DeepSSIM achieves superior performance, improving F1 scores by an average of +52.03% over the best existing method. Code and data of our approach are publicly available at the following link: https://github.com/brAIn-science/DeepSSIM.

MUSCLE: A New Perspective to Multi-scale Fusion for Medical Image Classification based on the Theory of Evidence.

Qiu J, Cao J, Huang Y, Zhu Z, Wang F, Lu C, Li Y, Zheng Y

pubmed logopapersSep 19 2025
In the field of medical image analysis, medical image classification is one of the most fundamental and critical tasks. Current researches often rely on the off-the-shelf backbone networks derived from the field of computer vision, hoping to achieve satisfactory classification performance for medical images. However, given the characteristics of medical images, such as scattered distribution and varying sizes of lesions, features extracted with a single scale from the existing backbones often fail to perform accurate medical image classification. To this end, we propose a novel multi-scale learning paradigm, namely MUlti-SCale Learning with trusted Evidences (MUSCLE), which extracts and integrates features from different scales based on the theory of evidence, to generate the more comprehensive feature representation for the medical image classification task. Particularly, the proposed MUSCLE first estimates the uncertainties of features extracted from different scales/stages of the classification backbone as the evidences, and accordingly form the opinions regarding to the feature trustworthiness via a set of evidential deep neural networks. Then, these opinions on different scales of features are ensembled to yield an aggregated opinion, which can be used to adaptively tune the weights of multi-scale features for scatteredly distributed and size-varying lesions, and consequently improve the network capacity for accurate medical image classification. Our MUSCLE paradigm has been evaluated on five publicly available medical image datasets. The experimental results show that the proposed MUSCLE not only improves the accuracy of the original backbone network, but also enhances the reliability and interpretability of model decisions with the trusted evidences (https://github.com/Q4CS/MUSCLE).

SLaM-DiMM: Shared Latent Modeling for Diffusion Based Missing Modality Synthesis in MRI

Bhavesh Sandbhor, Bheeshm Sharma, Balamurugan Palaniappan

arxiv logopreprintSep 19 2025
Brain MRI scans are often found in four modalities, consisting of T1-weighted with and without contrast enhancement (T1ce and T1w), T2-weighted imaging (T2w), and Flair. Leveraging complementary information from these different modalities enables models to learn richer, more discriminative features for understanding brain anatomy, which could be used in downstream tasks such as anomaly detection. However, in clinical practice, not all MRI modalities are always available due to various reasons. This makes missing modality generation a critical challenge in medical image analysis. In this paper, we propose SLaM-DiMM, a novel missing modality generation framework that harnesses the power of diffusion models to synthesize any of the four target MRI modalities from other available modalities. Our approach not only generates high-fidelity images but also ensures structural coherence across the depth of the volume through a dedicated coherence enhancement mechanism. Qualitative and quantitative evaluations on the BraTS-Lighthouse-2025 Challenge dataset demonstrate the effectiveness of the proposed approach in synthesizing anatomically plausible and structurally consistent results. Code is available at https://github.com/BheeshmSharma/SLaM-DiMM-MICCAI-BraTS-Challenge-2025.

Deep Feedback Models

David Calhas, Arlindo L. Oliveira

arxiv logopreprintSep 19 2025
Deep Feedback Models (DFMs) are a new class of stateful neural networks that combine bottom up input with high level representations over time. This feedback mechanism introduces dynamics into otherwise static architectures, enabling DFMs to iteratively refine their internal state and mimic aspects of biological decision making. We model this process as a differential equation solved through a recurrent neural network, stabilized via exponential decay to ensure convergence. To evaluate their effectiveness, we measure DFMs under two key conditions: robustness to noise and generalization with limited data. In both object recognition and segmentation tasks, DFMs consistently outperform their feedforward counterparts, particularly in low data or high noise regimes. In addition, DFMs translate to medical imaging settings, while being robust against various types of noise corruption. These findings highlight the importance of feedback in achieving stable, robust, and generalizable learning. Code is available at https://github.com/DCalhas/deep_feedback_models.
Page 5 of 59584 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.