Latest Papers on Radiology AI. Tags: GenAI, Order: Best Match, Limit: 10.

Beyond the type 1 pattern: comprehensive risk stratification in Brugada syndrome.

Kan KY, Van Wyk A, Paterson T, Ninan N, Lysyganicz P, Tyagi I, Bhasi Lizi R, Boukrid F, Alfaifi M, Mishra A, Katraj SVK, Pooranachandran V

•papers•Aug 6 2025

Brugada Syndrome (BrS) is an inherited cardiac ion channelopathy associated with an elevated risk of sudden cardiac death, particularly due to ventricular arrhythmias in structurally normal hearts. Affecting approximately 1 in 2,000 individuals, BrS is most prevalent among middle-aged males of Asian descent. Although diagnosis is based on the presence of a Type 1 electrocardiographic (ECG) pattern, either spontaneous or induced, accurately stratifying risk in asymptomatic and borderline patients remains a major clinical challenge. This review explores current and emerging approaches to BrS risk stratification, focusing on electrocardiographic, electrophysiological, imaging, and computational markers. Non-invasive ECG indicators such as the β-angle, fragmented QRS, S wave in lead I, early repolarisation, aVR sign, and transmural dispersion of repolarisation have demonstrated predictive value for arrhythmic events. Adjunctive tools like signal-averaged ECG, Holter monitoring, and exercise stress testing enhance diagnostic yield by capturing dynamic electrophysiological changes. In parallel, imaging modalities, particularly speckle-tracking echocardiography and cardiac magnetic resonance have revealed subclinical structural abnormalities in the right ventricular outflow tract and atria, challenging the paradigm of BrS as a purely electrical disorder. Invasive electrophysiological studies and substrate mapping have further clarified the anatomical basis of arrhythmogenesis, while risk scoring systems (e.g., Sieira, BRUGADA-RISK, PAT) and machine learning models offer new avenues for personalised risk assessment. Together, these advances underscore the importance of an integrated, multimodal approach to BrS risk stratification. Optimising these strategies is essential to guide implantable cardioverter-defibrillator decisions and improve outcomes in patients vulnerable to life-threatening arrhythmias.

Mixed Modality Classification Cardiac Review Concept GenAI

On the effectiveness of multimodal privileged knowledge distillation in two vision transformer based diagnostic applications

Simon Baur, Alexandra Benova, Emilio Dolgener Cantú, Jackie Ma

•preprint•Aug 6 2025

Deploying deep learning models in clinical practice often requires leveraging multiple data modalities, such as images, text, and structured data, to achieve robust and trustworthy decisions. However, not all modalities are always available at inference time. In this work, we propose multimodal privileged knowledge distillation (MMPKD), a training strategy that utilizes additional modalities available solely during training to guide a unimodal vision model. Specifically, we used a text-based teacher model for chest radiographs (MIMIC-CXR) and a tabular metadata-based teacher model for mammography (CBIS-DDSM) to distill knowledge into a vision transformer student model. We show that MMPKD can improve the resulting attention maps' zero-shot capabilities of localizing ROI in input images, while this effect does not generalize across domains, as contrarily suggested by prior research.

Mixed Modality Classification Chest Methodology In Silico Academic Lab GenAI

Foundation models for radiology-the position of the AI for Health Imaging (AI4HI) network.

de Almeida JG, Alberich LC, Tsakou G, Marias K, Tsiknakis M, Lekadir K, Marti-Bonmati L, Papanikolaou N

•papers•Aug 6 2025

Foundation models are large models trained on big data which can be used for downstream tasks. In radiology, these models can potentially address several gaps in fairness and generalization, as they can be trained on massive datasets without labelled data and adapted to tasks requiring data with a small number of descriptions. This reduces one of the limiting bottlenecks in clinical model construction-data annotation-as these models can be trained through a variety of techniques that require little more than radiological images with or without their corresponding radiological reports. However, foundation models may be insufficient as they are affected-to a smaller extent when compared with traditional supervised learning approaches-by the same issues that lead to underperforming models, such as a lack of transparency/explainability, and biases. To address these issues, we advocate that the development of foundation models should not only be pursued but also accompanied by the development of a decentralized clinical validation and continuous training framework. This does not guarantee the resolution of the problems associated with foundation models, but it enables developers, clinicians and patients to know when, how and why models should be updated, creating a clinical AI ecosystem that is better capable of serving all stakeholders. CRITICAL RELEVANCE STATEMENT: Foundation models may mitigate issues like bias and poor generalization in radiology AI, but challenges persist. We propose a decentralized, cross-institutional framework for continuous validation and training to enhance model reliability, safety, and clinical utility. KEY POINTS: Foundation models trained on large datasets reduce annotation burdens and improve fairness and generalization in radiology. Despite improvements, they still face challenges like limited transparency, explainability, and residual biases. A decentralized, cross-institutional framework for clinical validation and continuous training can strengthen reliability and inclusivity in clinical AI.

Mixed Modality Classification Whole Body Review Concept Consortium Policy Ethics GenAI

R2GenKG: Hierarchical Multi-modal Knowledge Graph for LLM-based Radiology Report Generation

Futian Wang, Yuhan Qiao, Xiao Wang, Fuling Wang, Yuxiang Zhang, Dengdi Sun

•preprint•Aug 5 2025

X-ray medical report generation is one of the important applications of artificial intelligence in healthcare. With the support of large foundation models, the quality of medical report generation has significantly improved. However, challenges such as hallucination and weak disease diagnostic capability still persist. In this paper, we first construct a large-scale multi-modal medical knowledge graph (termed M3KG) based on the ground truth medical report using the GPT-4o. It contains 2477 entities, 3 kinds of relations, 37424 triples, and 6943 disease-aware vision tokens for the CheXpert Plus dataset. Then, we sample it to obtain multi-granularity semantic graphs and use an R-GCN encoder for feature extraction. For the input X-ray image, we adopt the Swin-Transformer to extract the vision features and interact with the knowledge using cross-attention. The vision tokens are fed into a Q-former and retrieved the disease-aware vision tokens using another cross-attention. Finally, we adopt the large language model to map the semantic knowledge graph, input X-ray image, and disease-aware vision tokens into language descriptions. Extensive experiments on multiple datasets fully validated the effectiveness of our proposed knowledge graph and X-ray report generation framework. The source code of this paper will be released on https://github.com/Event-AHU/Medical_Image_Analysis.

X-Ray Report Generation Chest Methodology In Silico Open Code GenAI

Augmenting Continual Learning of Diseases with LLM-Generated Visual Concepts

Jiantao Tan, Peixian Ma, Kanghao Chen, Zhiming Dai, Ruixuan Wang

•preprint•Aug 5 2025

Continual learning is essential for medical image classification systems to adapt to dynamically evolving clinical environments. The integration of multimodal information can significantly enhance continual learning of image classes. However, while existing approaches do utilize textual modality information, they solely rely on simplistic templates with a class name, thereby neglecting richer semantic information. To address these limitations, we propose a novel framework that harnesses visual concepts generated by large language models (LLMs) as discriminative semantic guidance. Our method dynamically constructs a visual concept pool with a similarity-based filtering mechanism to prevent redundancy. Then, to integrate the concepts into the continual learning process, we employ a cross-modal image-concept attention module, coupled with an attention loss. Through attention, the module can leverage the semantic knowledge from relevant visual concepts and produce class-representative fused features for classification. Experiments on medical and natural image datasets show our method achieves state-of-the-art performance, demonstrating the effectiveness and superiority of our method. We will release the code publicly.

Mixed Modality Classification Methodology In Silico Academic Lab Open Code GenAI

Towards a zero-shot low-latency navigation for open surgery augmented reality applications.

Schwimmbeck M, Khajarian S, Auer C, Wittenberg T, Remmele S

•papers•Aug 5 2025

Augmented reality (AR) enhances surgical navigation by superimposing visible anatomical structures with three-dimensional virtual models using head-mounted displays (HMDs). In particular, interventions such as open liver surgery can benefit from AR navigation, as it aids in identifying and distinguishing tumors and risk structures. However, there is a lack of automatic and markerless methods that are robust against real-world challenges, such as partial occlusion and organ motion. We introduce a novel multi-device approach for automatic live navigation in open liver surgery that enhances the visualization and interaction capabilities of a HoloLens 2 HMD through precise and reliable registration using an Intel RealSense RGB-D camera. The intraoperative RGB-D segmentation and the preoperative CT data are utilized to register a virtual liver model to the target anatomy. An AR-prompted Segment Anything Model (SAM) enables robust segmentation of the liver in situ without the need for additional training data. To mitigate algorithmic latency, Double Exponential Smoothing (DES) is applied to forecast registration results. We conducted a phantom study for open liver surgery, investigating various scenarios of liver motion, viewpoints, and occlusion. The mean registration errors (8.31 mm-18.78 mm TRE) are comparable to those reported in prior work, while our approach demonstrates high success rates even for high occlusion factors and strong motion. Using forecasting, we bypassed the algorithmic latency of 79.8 ms per frame, with median forecasting errors below 2 mms and 1.5 degrees between the quaternions. To our knowledge, this is the first work to approach markerless in situ visualization by combining a multi-device method with forecasting and a foundation model for segmentation and tracking. This enables a more reliable and precise AR registration of surgical targets with low latency. Our approach can be applied to other surgical applications and AR hardware with minimal effort.

CT Segmentation Abdominal Methodology Phantom/Animal GenAI

Real-time 3D US-CT fusion-based semi-automatic puncture robot system: clinical evaluation.

Nakayama M, Zhang B, Kuromatsu R, Nakano M, Noda Y, Kawaguchi T, Li Q, Maekawa Y, Fujie MG, Sugano S

•papers•Aug 5 2025

Conventional systems supporting percutaneous radiofrequency ablation (PRFA) have faced difficulties in ensuring safe and accurate puncture due to issues inherent to the medical images used and organ displacement caused by patients' respiration. To address this problem, this study proposes a semi-automatic puncture robot system that integrates real-time ultrasound (US) images with computed tomography (CT) images. The purpose of this paper is to evaluate the system's usefulness through a pilot clinical experiment involving participants. For the clinical experiment using the proposed system, an improved U-net model based on fivefold cross-validation was constructed. Following the workflow of the proposed system, the model was trained using US images acquired from patients with robotic arms. The average Dice coefficient for the entire validation dataset was confirmed to be 0.87. Therefore, the model was implemented in the robotic system and applied to clinical experiment. A clinical experiment was conducted using the robotic system equipped with the developed AI model on five adult male and female participants. The centroid distances between the point clouds from each modality were evaluated in the 3D US-CT fusion process, assuming the blood vessel centerline represents the overall structural position. The results of the centroid distances showed a minimum value of 0.38 mm, a maximum value of 4.81 mm, and an average of 1.97 mm. Although the five participants had different CP classifications and the derived US images exhibited individual variability, all centroid distances satisfied the ablation margin of 5.00 mm considered in PRFA, suggesting the potential accuracy and utility of the robotic system for puncture navigation. Additionally, the results suggested the potential generalization performance of the AI model trained with data acquired according to the robotic system's workflow.

Mixed Modality Segmentation Abdominal Prospective Clinical Pilot Academic Lab GenAI

Controllable Mask Diffusion Model for medical annotation synthesis with semantic information extraction.

Heo C, Jung J

•papers•Aug 5 2025

Medical segmentation, a prominent task in medical image analysis utilizing artificial intelligence, plays a crucial role in computer-aided diagnosis and depends heavily on the quality of the training data. However, the availability of sufficient data is constrained by strict privacy regulations associated with medical data. To mitigate this issue, research on data augmentation has gained significant attention. Medical segmentation tasks require paired datasets consisting of medical images and annotation images, also known as mask images, which represent lesion areas or radiological information within the medical images. Consequently, it is essential to apply data augmentation to both image types. This study proposes a Controllable Mask Diffusion Model, a novel approach capable of controlling and generating new masks. This model leverages the binary structure of the mask to extract semantic information, namely, the mask's size, location, and count, which is then applied as multi-conditional input to a diffusion model via a regressor. Through the regressor, newly generated masks conform to the input semantic information, thereby enabling input-driven controllable generation. Additionally, a technique that analyzes correlation within semantic information was devised for large-scale data synthesis. The generative capacity of the proposed model was evaluated against real datasets, and the model's ability to control and generate new masks based on previously unseen semantic information was confirmed. Furthermore, the practical applicability of the model was demonstrated by augmenting the data with the generated data, applying it to segmentation tasks, and comparing the performance with and without augmentation. Additionally, experiments were conducted on single-label and multi-label masks, yielding superior results for both types. This demonstrates the potential applicability of this study to various areas within the medical field.

Mixed Modality Image Synthesis Methodology In Silico Academic Lab GenAI

Point-Based Shape Representation Generation with a Correspondence-Preserving Diffusion Model

Shen Zhu, Yinzhu Jin, Ifrah Zawar, P. Thomas Fletcher

•preprint•Aug 5 2025

We propose a diffusion model designed to generate point-based shape representations with correspondences. Traditional statistical shape models have considered point correspondences extensively, but current deep learning methods do not take them into account, focusing on unordered point clouds instead. Current deep generative models for point clouds do not address generating shapes with point correspondences between generated shapes. This work aims to formulate a diffusion model that is capable of generating realistic point-based shape representations, which preserve point correspondences that are present in the training data. Using shape representation data with correspondences derived from Open Access Series of Imaging Studies 3 (OASIS-3), we demonstrate that our correspondence-preserving model effectively generates point-based hippocampal shape representations that are highly realistic compared to existing methods. We further demonstrate the applications of our generative model by downstream tasks, such as conditional generation of healthy and AD subjects and predicting morphological changes of disease progression by counterfactual generation.

MRI Image Synthesis Neurological Methodology In Silico Academic Lab GenAI

A Multi-Agent System for Complex Reasoning in Radiology Visual Question Answering

Ziruo Yi, Jinyu Liu, Ting Xiao, Mark V. Albert

•preprint•Aug 4 2025

Radiology visual question answering (RVQA) provides precise answers to questions about chest X-ray images, alleviating radiologists' workload. While recent methods based on multimodal large language models (MLLMs) and retrieval-augmented generation (RAG) have shown promising progress in RVQA, they still face challenges in factual accuracy, hallucinations, and cross-modal misalignment. We introduce a multi-agent system (MAS) designed to support complex reasoning in RVQA, with specialized agents for context understanding, multimodal reasoning, and answer validation. We evaluate our system on a challenging RVQA set curated via model disagreement filtering, comprising consistently hard cases across multiple MLLMs. Extensive experiments demonstrate the superiority and effectiveness of our system over strong MLLM baselines, with a case study illustrating its reliability and interpretability. This work highlights the potential of multi-agent approaches to support explainable and trustworthy clinical AI applications that require complex reasoning.

X-Ray LLM Radiology Report Chest Methodology In Silico GenAI

Beyond the type 1 pattern: comprehensive risk stratification in Brugada syndrome.

On the effectiveness of multimodal privileged knowledge distillation in two vision transformer based diagnostic applications

Foundation models for radiology-the position of the AI for Health Imaging (AI4HI) network.

R2GenKG: Hierarchical Multi-modal Knowledge Graph for LLM-based Radiology Report Generation

Augmenting Continual Learning of Diseases with LLM-Generated Visual Concepts

Towards a zero-shot low-latency navigation for open surgery augmented reality applications.

Real-time 3D US-CT fusion-based semi-automatic puncture robot system: clinical evaluation.

Controllable Mask Diffusion Model for medical annotation synthesis with semantic information extraction.

Point-Based Shape Representation Generation with a Correspondence-Preserving Diffusion Model

A Multi-Agent System for Complex Reasoning in Radiology Visual Question Answering

Ready to Sharpen Your Edge?