Sort by:
Page 12 of 58575 results

RealDeal: Enhancing Realism and Details in Brain Image Generation via Image-to-Image Diffusion Models

Shen Zhu, Yinzhu Jin, Tyler Spears, Ifrah Zawar, P. Thomas Fletcher

arxiv logopreprintJul 24 2025
We propose image-to-image diffusion models that are designed to enhance the realism and details of generated brain images by introducing sharp edges, fine textures, subtle anatomical features, and imaging noise. Generative models have been widely adopted in the biomedical domain, especially in image generation applications. Latent diffusion models achieve state-of-the-art results in generating brain MRIs. However, due to latent compression, generated images from these models are overly smooth, lacking fine anatomical structures and scan acquisition noise that are typically seen in real images. This work formulates the realism enhancing and detail adding process as image-to-image diffusion models, which refines the quality of LDM-generated images. We employ commonly used metrics like FID and LPIPS for image realism assessment. Furthermore, we introduce new metrics to demonstrate the realism of images generated by RealDeal in terms of image noise distribution, sharpness, and texture.

Interpretable AI Framework for Secure and Reliable Medical Image Analysis in IoMT Systems.

Matthew UO, Rosa RL, Saadi M, Rodriguez DZ

pubmed logopapersJul 23 2025
The integration of artificial intelligence (AI) into medical image analysis has transformed healthcare, offering unprecedented precision in diagnosis, treatment planning, and disease monitoring. However, its adoption within the Internet of Medical Things (IoMT) raises significant challenges related to transparency, trustworthiness, and security. This paper introduces a novel Explainable AI (XAI) framework tailored for Medical Cyber-Physical Systems (MCPS), addressing these challenges by combining deep neural networks with symbolic knowledge reasoning to deliver clinically interpretable insights. The framework incorporates an Enhanced Dynamic Confidence-Weighted Attention (Enhanced DCWA) mechanism, which improves interpretability and robustness by dynamically refining attention maps through adaptive normalization and multi-level confidence weighting. Additionally, a Resilient Observability and Detection Engine (RODE) leverages sparse observability principles to detect and mitigate adversarial threats, ensuring reliable performance in dynamic IoMT environments. Evaluations conducted on benchmark datasets, including CheXpert, RSNA Pneumonia Detection Challenge, and NIH Chest X-ray Dataset, demonstrate significant advancements in classification accuracy, adversarial robustness, and explainability. The framework achieves a 15% increase in lesion classification accuracy, a 30% reduction in robustness loss, and a 20% improvement in the Explainability Index compared to state-of-the-art methods.

Back to the Future-Cardiovascular Imaging From 1966 to Today and Tomorrow.

Wintersperger BJ, Alkadhi H, Wildberger JE

pubmed logopapersJul 23 2025
This article, on the 60th anniversary of the journal Investigative Radiology, a journal dedicated to cutting-edge imaging technology, discusses key historical milestones in CT and MRI technology, as well as the ongoing advancement of contrast agent development for cardiovascular imaging over the past decades. It specifically highlights recent developments and the current state-of-the-art technology, including photon-counting detector CT and artificial intelligence, which will further push the boundaries of cardiovascular imaging. What were once ideas and visions have become today's clinical reality for the benefit of patients, and imaging technology will continue to evolve and transform modern medicine.

Hi ChatGPT, I am a Radiologist, How can you help me?

Bellini D, Ferrari R, Vicini S, Rengo M, Saletti CL, Carbone I

pubmed logopapersJul 23 2025
This review paper explores the integration of ChatGPT, a generative AI model developed by OpenAI, into radiological practices, focusing on its potential to enhance the operational efficiency of radiologists. ChatGPT operates on the GPT architecture, utilizing advanced machine learning techniques, including unsupervised pre-training and reinforcement learning, to generate human-like text responses. While AI applications in radiology predominantly focus on imaging acquisition, reconstruction, and interpretation-commonly embedded directly within hardware-the accessibility and functional breadth of ChatGPT make it a unique tool. This interview-based review should not be intended as a detailed evaluation of all ChatGPT features. Instead, it aims to test its utility in everyday radiological tasks through real-world examples. ChatGPT demonstrated strong capabilities in structuring radiology reports according to international guidelines (e.g., PI-RADS, CT reporting for diverticulitis), designing a complete research protocol, and performing advanced statistical analysis from Excel datasets, including ROC curve generation and intergroup comparison. Although not capable of directly interpreting DICOM images, ChatGPT provided meaningful assistance in image post-processing and interpretation when images were converted to standard formats. These findings highlight its current strengths and limitations as a supportive tool for radiologists.

MaskedCLIP: Bridging the Masked and CLIP Space for Semi-Supervised Medical Vision-Language Pre-training

Lei Zhu, Jun Zhou, Rick Siow Mong Goh, Yong Liu

arxiv logopreprintJul 23 2025
Foundation models have recently gained tremendous popularity in medical image analysis. State-of-the-art methods leverage either paired image-text data via vision-language pre-training or unpaired image data via self-supervised pre-training to learn foundation models with generalizable image features to boost downstream task performance. However, learning foundation models exclusively on either paired or unpaired image data limits their ability to learn richer and more comprehensive image features. In this paper, we investigate a novel task termed semi-supervised vision-language pre-training, aiming to fully harness the potential of both paired and unpaired image data for foundation model learning. To this end, we propose MaskedCLIP, a synergistic masked image modeling and contrastive language-image pre-training framework for semi-supervised vision-language pre-training. The key challenge in combining paired and unpaired image data for learning a foundation model lies in the incompatible feature spaces derived from these two types of data. To address this issue, we propose to connect the masked feature space with the CLIP feature space with a bridge transformer. In this way, the more semantic specific CLIP features can benefit from the more general masked features for semantic feature extraction. We further propose a masked knowledge distillation loss to distill semantic knowledge of original image features in CLIP feature space back to the predicted masked image features in masked feature space. With this mutually interactive design, our framework effectively leverages both paired and unpaired image data to learn more generalizable image features for downstream tasks. Extensive experiments on retinal image analysis demonstrate the effectiveness and data efficiency of our method.

Synthetic data trained open-source language models are feasible alternatives to proprietary models for radiology reporting.

Pandita A, Keniston A, Madhuripan N

pubmed logopapersJul 23 2025
The study assessed the feasibility of using synthetic data to fine-tune various open-source LLMs for free text to structured data conversation in radiology, comparing their performance with GPT models. A training set of 3000 synthetic thyroid nodule dictations was generated to train six open-source models (Starcoderbase-1B, Starcoderbase-3B, Mistral-7B, Llama-3-8B, Llama-2-13B, and Yi-34B). ACR TI-RADS template was the target model output. The model performance was tested on 50 thyroid nodule dictations from MIMIC-III patient dataset and compared against 0-shot, 1-shot, and 5-shot performance of GPT-3.5 and GPT-4. GPT-4 5-shot and Yi-34B showed the highest performance with no statistically significant difference between the models. Various open models outperformed GPT models with statistical significance. Overall, models trained with synthetic data showed performance comparable to GPT models in structured text conversion in our study. Given privacy preserving advantages, open LLMs can be utilized as a viable alternative to proprietary GPT models.

Interpretable Deep Learning Approaches for Reliable GI Image Classification: A Study with the HyperKvasir Dataset

Wahid, S. B., Rothy, Z. T., News, R. K., Rieyan, S. A.

medrxiv logopreprintJul 23 2025
Deep learning has emerged as a promising tool for automating gastrointestinal (GI) disease diagnosis. However, multi-class GI disease classification remains underexplored. This study addresses this gap by presenting a framework that uses advanced models like InceptionNetV3 and ResNet50, combined with boosting algorithms (XGB, LGBM), to classify lower GI abnormalities. InceptionNetV3 with XGB achieved the best recall of 0.81 and an F1 score of 0.90. To assist clinicians in understanding model decisions, the Grad-CAM technique, a form of explainable AI, was employed to highlight the critical regions influencing predictions, fostering trust in these systems. This approach significantly improves both the accuracy and reliability of GI disease diagnosis.

Role of Brain Age Gap as a Mediator in the Relationship Between Cognitive Impairment Risk Factors and Cognition.

Tan WY, Huang X, Huang J, Robert C, Cui J, Chen CPLH, Hilal S

pubmed logopapersJul 22 2025
Cerebrovascular disease (CeVD) and cognitive impairment risk factors contribute to cognitive decline, but the role of brain age gap (BAG) in mediating this relationship remains unclear, especially in Southeast Asian populations. This study investigated the influence of cognitive impairment risk factors on cognition and examined how BAG mediates this relationship, particularly in individuals with varying CeVD burden. This cross-sectional study analyzed Singaporean community and memory clinic participants. Cognitive impairment risk factors were assessed using the Cognitive Impairment Scoring System (CISS), encompassing 11 sociodemographic and vascular factors. Cognition was assessed through a neuropsychological battery, evaluating global cognition and 6 cognitive domains: executive function, attention, memory, language, visuomotor speed, and visuoconstruction. Brain age was derived from structural MRI features using ensemble machine learning model. Propensity score matching balanced risk profiles between model training and the remaining sample. Structural equation modeling examined the mediation effect of BAG on CISS-cognition relationship, stratified by CeVD burden (high: CeVD+, low: CeVD-). The study included 1,437 individuals without dementia, with 646 in the matched sample (mean age 66.4 ± 6.0 years, 47% female, 60% with no cognitive impairment). Higher CISS was consistently associated with poorer cognitive performance across all domains, with the strongest negative associations in visuomotor speed (β = -2.70, <i>p</i> < 0.001) and visuoconstruction (β = -3.02, <i>p</i> < 0.001). Among the CeVD+ group, BAG significantly mediated the relationship between CISS and global cognition (proportion mediated: 19.95%, <i>p</i> = 0.01), with the strongest mediation effects in executive function (34.1%, <i>p</i> = 0.03) and language (26.6%, <i>p</i> = 0.008). BAG also mediated the relationship between CISS and memory (21.1%) and visuoconstruction (14.4%) in the CeVD+ group, but these effects diminished after statistical adjustments. Our findings suggest that BAG is a key intermediary linking cognitive impairment risk factors to cognitive function, particularly in individuals with high CeVD burden. This mediation effect is domain-specific, with executive function, language, and visuoconstruction being the most vulnerable to accelerated brain aging. Limitations of this study include the cross-sectional design, limiting causal inference, and the focus on Southeast Asian populations, limiting generalizability. Future longitudinal studies should verify these relationships and explore additional factors not captured in our model.

AgentMRI: A Vison Language Model-Powered AI System for Self-regulating MRI Reconstruction with Multiple Degradations.

Sajua GA, Akhib M, Chang Y

pubmed logopapersJul 22 2025
Artificial intelligence (AI)-driven autonomous agents are transforming multiple domains by integrating reasoning, decision-making, and task execution into a unified framework. In medical imaging, such agents have the potential to change workflows by reducing human intervention and optimizing image quality. In this paper, we introduce the AgentMRI. It is an AI-driven system that leverages vision language models (VLMs) for fully autonomous magnetic resonance imaging (MRI) reconstruction in the presence of multiple degradations. Unlike traditional MRI correction or reconstruction methods, AgentMRI does not rely on manual intervention for post-processing or does not rely on fixed correction models. Instead, it dynamically detects MRI corruption and then automatically selects the best correction model for image reconstruction. The framework uses a multi-query VLM strategy to ensure robust corruption detection through consensus-based decision-making and confidence-weighted inference. AgentMRI automatically chooses deep learning models that include MRI reconstruction, motion correction, and denoising models. We evaluated AgentMRI in both zero-shot and fine-tuned settings. Experimental results on a comprehensive brain MRI dataset demonstrate that AgentMRI achieves an average of 73.6% accuracy in zero-shot and 95.1% accuracy for fine-tuned settings. Experiments show that it accurately executes the reconstruction process without human intervention. AgentMRI eliminates manual intervention and introduces a scalable and multimodal AI framework for autonomous MRI processing. This work may build a significant step toward fully autonomous and intelligent MR image reconstruction systems.

AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation

Nima Fathi, Amar Kumar, Tal Arbel

arxiv logopreprintJul 22 2025
Recent advancements in Large Language Models (LLMs) have catalyzed a paradigm shift from static prediction systems to agentic AI agents capable of reasoning, interacting with tools, and adapting to complex tasks. While LLM-based agentic systems have shown promise across many domains, their application to medical imaging remains in its infancy. In this work, we introduce AURA, the first visual linguistic explainability agent designed specifically for comprehensive analysis, explanation, and evaluation of medical images. By enabling dynamic interactions, contextual explanations, and hypothesis testing, AURA represents a significant advancement toward more transparent, adaptable, and clinically aligned AI systems. We highlight the promise of agentic AI in transforming medical image analysis from static predictions to interactive decision support. Leveraging Qwen-32B, an LLM-based architecture, AURA integrates a modular toolbox comprising: (i) a segmentation suite with phase grounding, pathology segmentation, and anatomy segmentation to localize clinically meaningful regions; (ii) a counterfactual image-generation module that supports reasoning through image-level explanations; and (iii) a set of evaluation tools including pixel-wise difference-map analysis, classification, and advanced state-of-the-art components to assess diagnostic relevance and visual interpretability.
Page 12 of 58575 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.