Sort by:
Page 27 of 78779 results

R2GenKG: Hierarchical Multi-modal Knowledge Graph for LLM-based Radiology Report Generation

Futian Wang, Yuhan Qiao, Xiao Wang, Fuling Wang, Yuxiang Zhang, Dengdi Sun

arxiv logopreprintAug 5 2025
X-ray medical report generation is one of the important applications of artificial intelligence in healthcare. With the support of large foundation models, the quality of medical report generation has significantly improved. However, challenges such as hallucination and weak disease diagnostic capability still persist. In this paper, we first construct a large-scale multi-modal medical knowledge graph (termed M3KG) based on the ground truth medical report using the GPT-4o. It contains 2477 entities, 3 kinds of relations, 37424 triples, and 6943 disease-aware vision tokens for the CheXpert Plus dataset. Then, we sample it to obtain multi-granularity semantic graphs and use an R-GCN encoder for feature extraction. For the input X-ray image, we adopt the Swin-Transformer to extract the vision features and interact with the knowledge using cross-attention. The vision tokens are fed into a Q-former and retrieved the disease-aware vision tokens using another cross-attention. Finally, we adopt the large language model to map the semantic knowledge graph, input X-ray image, and disease-aware vision tokens into language descriptions. Extensive experiments on multiple datasets fully validated the effectiveness of our proposed knowledge graph and X-ray report generation framework. The source code of this paper will be released on https://github.com/Event-AHU/Medical_Image_Analysis.

Augmenting Continual Learning of Diseases with LLM-Generated Visual Concepts

Jiantao Tan, Peixian Ma, Kanghao Chen, Zhiming Dai, Ruixuan Wang

arxiv logopreprintAug 5 2025
Continual learning is essential for medical image classification systems to adapt to dynamically evolving clinical environments. The integration of multimodal information can significantly enhance continual learning of image classes. However, while existing approaches do utilize textual modality information, they solely rely on simplistic templates with a class name, thereby neglecting richer semantic information. To address these limitations, we propose a novel framework that harnesses visual concepts generated by large language models (LLMs) as discriminative semantic guidance. Our method dynamically constructs a visual concept pool with a similarity-based filtering mechanism to prevent redundancy. Then, to integrate the concepts into the continual learning process, we employ a cross-modal image-concept attention module, coupled with an attention loss. Through attention, the module can leverage the semantic knowledge from relevant visual concepts and produce class-representative fused features for classification. Experiments on medical and natural image datasets show our method achieves state-of-the-art performance, demonstrating the effectiveness and superiority of our method. We will release the code publicly.

Controllable Mask Diffusion Model for medical annotation synthesis with semantic information extraction.

Heo C, Jung J

pubmed logopapersAug 5 2025
Medical segmentation, a prominent task in medical image analysis utilizing artificial intelligence, plays a crucial role in computer-aided diagnosis and depends heavily on the quality of the training data. However, the availability of sufficient data is constrained by strict privacy regulations associated with medical data. To mitigate this issue, research on data augmentation has gained significant attention. Medical segmentation tasks require paired datasets consisting of medical images and annotation images, also known as mask images, which represent lesion areas or radiological information within the medical images. Consequently, it is essential to apply data augmentation to both image types. This study proposes a Controllable Mask Diffusion Model, a novel approach capable of controlling and generating new masks. This model leverages the binary structure of the mask to extract semantic information, namely, the mask's size, location, and count, which is then applied as multi-conditional input to a diffusion model via a regressor. Through the regressor, newly generated masks conform to the input semantic information, thereby enabling input-driven controllable generation. Additionally, a technique that analyzes correlation within semantic information was devised for large-scale data synthesis. The generative capacity of the proposed model was evaluated against real datasets, and the model's ability to control and generate new masks based on previously unseen semantic information was confirmed. Furthermore, the practical applicability of the model was demonstrated by augmenting the data with the generated data, applying it to segmentation tasks, and comparing the performance with and without augmentation. Additionally, experiments were conducted on single-label and multi-label masks, yielding superior results for both types. This demonstrates the potential applicability of this study to various areas within the medical field.

Real-time 3D US-CT fusion-based semi-automatic puncture robot system: clinical evaluation.

Nakayama M, Zhang B, Kuromatsu R, Nakano M, Noda Y, Kawaguchi T, Li Q, Maekawa Y, Fujie MG, Sugano S

pubmed logopapersAug 5 2025
Conventional systems supporting percutaneous radiofrequency ablation (PRFA) have faced difficulties in ensuring safe and accurate puncture due to issues inherent to the medical images used and organ displacement caused by patients' respiration. To address this problem, this study proposes a semi-automatic puncture robot system that integrates real-time ultrasound (US) images with computed tomography (CT) images. The purpose of this paper is to evaluate the system's usefulness through a pilot clinical experiment involving participants. For the clinical experiment using the proposed system, an improved U-net model based on fivefold cross-validation was constructed. Following the workflow of the proposed system, the model was trained using US images acquired from patients with robotic arms. The average Dice coefficient for the entire validation dataset was confirmed to be 0.87. Therefore, the model was implemented in the robotic system and applied to clinical experiment. A clinical experiment was conducted using the robotic system equipped with the developed AI model on five adult male and female participants. The centroid distances between the point clouds from each modality were evaluated in the 3D US-CT fusion process, assuming the blood vessel centerline represents the overall structural position. The results of the centroid distances showed a minimum value of 0.38 mm, a maximum value of 4.81 mm, and an average of 1.97 mm. Although the five participants had different CP classifications and the derived US images exhibited individual variability, all centroid distances satisfied the ablation margin of 5.00 mm considered in PRFA, suggesting the potential accuracy and utility of the robotic system for puncture navigation. Additionally, the results suggested the potential generalization performance of the AI model trained with data acquired according to the robotic system's workflow.

S-RRG-Bench: Structured Radiology Report Generation with Fine-Grained Evaluation Framework

Yingshu Li, Yunyi Liu, Zhanyu Wang, Xinyu Liang, Lingqiao Liu, Lei Wang, Luping Zhou

arxiv logopreprintAug 4 2025
Radiology report generation (RRG) for diagnostic images, such as chest X-rays, plays a pivotal role in both clinical practice and AI. Traditional free-text reports suffer from redundancy and inconsistent language, complicating the extraction of critical clinical details. Structured radiology report generation (S-RRG) offers a promising solution by organizing information into standardized, concise formats. However, existing approaches often rely on classification or visual question answering (VQA) pipelines that require predefined label sets and produce only fragmented outputs. Template-based approaches, which generate reports by replacing keywords within fixed sentence patterns, further compromise expressiveness and often omit clinically important details. In this work, we present a novel approach to S-RRG that includes dataset construction, model training, and the introduction of a new evaluation framework. We first create a robust chest X-ray dataset (MIMIC-STRUC) that includes disease names, severity levels, probabilities, and anatomical locations, ensuring that the dataset is both clinically relevant and well-structured. We train an LLM-based model to generate standardized, high-quality reports. To assess the generated reports, we propose a specialized evaluation metric (S-Score) that not only measures disease prediction accuracy but also evaluates the precision of disease-specific details, thus offering a clinically meaningful metric for report quality that focuses on elements critical to clinical decision-making and demonstrates a stronger alignment with human assessments. Our approach highlights the effectiveness of structured reports and the importance of a tailored evaluation metric for S-RRG, providing a more clinically relevant measure of report quality.

Evaluating acute image ordering for real-world patient cases via language model alignment with radiological guidelines.

Yao MS, Chae A, Saraiya P, Kahn CE, Witschey WR, Gee JC, Sagreiya H, Bastani O

pubmed logopapersAug 4 2025
Diagnostic imaging studies are increasingly important in the management of acutely presenting patients. However, ordering appropriate imaging studies in the emergency department is a challenging task with a high degree of variability among healthcare providers. To address this issue, recent work has investigated whether generative AI and large language models can be leveraged to recommend diagnostic imaging studies in accordance with evidence-based medical guidelines. However, it remains challenging to ensure that these tools can provide recommendations that correctly align with medical guidelines, especially given the limited diagnostic information available in acute care settings. In this study, we introduce a framework to intelligently leverage language models by recommending imaging studies for patient cases that align with the American College of Radiology's Appropriateness Criteria, a set of evidence-based guidelines. To power our experiments, we introduce RadCases, a dataset of over 1500 annotated case summaries reflecting common patient presentations, and apply our framework to enable state-of-the-art language models to reason about appropriate imaging choices. Using our framework, state-of-the-art language models achieve accuracy comparable to clinicians in ordering imaging studies. Furthermore, we demonstrate that our language model-based pipeline can be used as an intelligent assistant by clinicians to support image ordering workflows and improve the accuracy of acute image ordering according to the American College of Radiology's Appropriateness Criteria. Our work demonstrates and validates a strategy to leverage AI-based software to improve trustworthy clinical decision-making in alignment with expert evidence-based guidelines.

A Multi-Agent System for Complex Reasoning in Radiology Visual Question Answering

Ziruo Yi, Jinyu Liu, Ting Xiao, Mark V. Albert

arxiv logopreprintAug 4 2025
Radiology visual question answering (RVQA) provides precise answers to questions about chest X-ray images, alleviating radiologists' workload. While recent methods based on multimodal large language models (MLLMs) and retrieval-augmented generation (RAG) have shown promising progress in RVQA, they still face challenges in factual accuracy, hallucinations, and cross-modal misalignment. We introduce a multi-agent system (MAS) designed to support complex reasoning in RVQA, with specialized agents for context understanding, multimodal reasoning, and answer validation. We evaluate our system on a challenging RVQA set curated via model disagreement filtering, comprising consistently hard cases across multiple MLLMs. Extensive experiments demonstrate the superiority and effectiveness of our system over strong MLLM baselines, with a case study illustrating its reliability and interpretability. This work highlights the potential of multi-agent approaches to support explainable and trustworthy clinical AI applications that require complex reasoning.

An integrated predictive model for Alzheimer's disease progression from cognitively normal subjects using generated MRI and interpretable AI.

Aghaei A, Moghaddam ME

pubmed logopapersAug 4 2025
Alzheimer's disease (AD) is a progressive neurodegenerative disorder that begins with subtle cognitive changes and advances to severe impairment. Early diagnosis is crucial for effective intervention and management. In this study, we propose an integrated framework that leverages ensemble transfer learning, generative modeling, and automatic ROI extraction techniques to predict the progression of Alzheimer's disease from cognitively normal (CN) subjects. Using the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, we employ a three-stage process: (1) estimating the probability of transitioning from CN to mild cognitive impairment (MCI) using ensemble transfer learning, (2) generating future MRI images using Transformer-based Generative Adversarial Network (ViT-GANs) to simulate disease progression after two years, and (3) predicting AD using a 3D convolutional neural network (CNN) with calibrated probabilities using isotonic regression and interpreting critical regions of interest (ROIs) with Gradient-weighted Class Activation Mapping (Grad-CAM). However, the proposed method has generality and may work when sufficient data for simulating brain changes after three years or more is available; in the training phase, regarding available data, brain changes after 2 years have been considered. Our approach addresses the challenge of limited longitudinal data by creating high-quality synthetic images and improving model transparency by identifying key brain regions involved in disease progression. The proposed method demonstrates high accuracy and F1-score, 0.85 and 0.86, respectively, in CN to AD prediction up to 10 years, offering a potential tool for early diagnosis and personalized intervention strategies in Alzheimer's disease.

Development and Validation of an Explainable MRI-Based Habitat Radiomics Model for Predicting p53-Abnormal Endometrial Cancer: A Multicentre Feasibility Study.

Jin W, Zhang H, Ning Y, Chen X, Zhang G, Li H, Zhang H

pubmed logopapersAug 4 2025
We developed an MRI-based habitat radiomics model (HRM) to predict p53-abnormal (p53abn) molecular subtypes of endometrial cancer (EC). Patients with pathologically confirmed EC were retrospectively enrolled from three hospitals and categorized into a training cohort (n = 270), test cohort 1 (n = 70), and test cohort 2 (n = 154). The tumour was divided into habitat sub-regions using diffusion-weighted imaging (DWI) and contrast-enhanced (CE) images with the K-means algorithm. Radiomics features were extracted from T1-weighted imaging (T1WI), T2-weighted imaging (T2WI), DWI, and CE images. Three machine learning classifiers-logistic regression, support vector machines, and random forests-were applied to develop predictive models for p53abn EC. Model performance was validated using receiver operating characteristic (ROC) curves, and the model with the best predictive performance was selected as the HRM. A whole-region radiomics model (WRM) was also constructed, and a clinical model (CM) with five clinical features was developed. The SHApley Additive ExPlanations (SHAP) method was used to explain the outputs of the models. DeLong's test evaluated and compared the performance across the cohorts. A total of 1920 habitat radiomics features were considered. Eight features were selected for the HRM, ten for the WRM, and three clinical features for the CM. The HRM achieved the highest AUC: 0.855 (training), 0.769 (test1), and 0.766 (test2). The AUCs of the WRM were 0.707 (training), 0.703 (test1), and 0.738 (test2). The AUCs of the CM were 0.709 (training), 0.641 (test1), and 0.665 (test2). The MRI-based HRM successfully predicted p53abn EC. The results indicate that habitat combined with machine learning, radiomics, and SHAP can effectively predict p53abn EC, providing clinicians with intuitive insights and interpretability regarding the impact of risk factors in the model.

Accurate and Interpretable Postmenstrual Age Prediction via Multimodal Large Language Model

Qifan Chen, Jin Cui, Cindy Duan, Yushuo Han, Yifei Shi

arxiv logopreprintAug 4 2025
Accurate estimation of postmenstrual age (PMA) at scan is crucial for assessing neonatal development and health. While deep learning models have achieved high accuracy in predicting PMA from brain MRI, they often function as black boxes, offering limited transparency and interpretability in clinical decision support. In this work, we address the dual challenge of accuracy and interpretability by adapting a multimodal large language model (MLLM) to perform both precise PMA prediction and clinically relevant explanation generation. We introduce a parameter-efficient fine-tuning (PEFT) strategy using instruction tuning and Low-Rank Adaptation (LoRA) applied to the Qwen2.5-VL-7B model. The model is trained on four 2D cortical surface projection maps derived from neonatal MRI scans. By employing distinct prompts for training and inference, our approach enables the MLLM to handle a regression task during training and generate clinically relevant explanations during inference. The fine-tuned model achieves a low prediction error with a 95 percent confidence interval of 0.78 to 1.52 weeks, while producing interpretable outputs grounded in developmental features, marking a significant step toward transparent and trustworthy AI systems in perinatal neuroscience.
Page 27 of 78779 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.