Sort by:
Page 2 of 19185 results

Perceptual Evaluation of GANs and Diffusion Models for Generating X-rays

Gregory Schuit, Denis Parra, Cecilia Besa

arxiv logopreprintAug 10 2025
Generative image models have achieved remarkable progress in both natural and medical imaging. In the medical context, these techniques offer a potential solution to data scarcity-especially for low-prevalence anomalies that impair the performance of AI-driven diagnostic and segmentation tools. However, questions remain regarding the fidelity and clinical utility of synthetic images, since poor generation quality can undermine model generalizability and trust. In this study, we evaluate the effectiveness of state-of-the-art generative models-Generative Adversarial Networks (GANs) and Diffusion Models (DMs)-for synthesizing chest X-rays conditioned on four abnormalities: Atelectasis (AT), Lung Opacity (LO), Pleural Effusion (PE), and Enlarged Cardiac Silhouette (ECS). Using a benchmark composed of real images from the MIMIC-CXR dataset and synthetic images from both GANs and DMs, we conducted a reader study with three radiologists of varied experience. Participants were asked to distinguish real from synthetic images and assess the consistency between visual features and the target abnormality. Our results show that while DMs generate more visually realistic images overall, GANs can report better accuracy for specific conditions, such as absence of ECS. We further identify visual cues radiologists use to detect synthetic images, offering insights into the perceptual gaps in current models. These findings underscore the complementary strengths of GANs and DMs and point to the need for further refinement to ensure generative models can reliably augment training datasets for AI diagnostic systems.

Prediction of hematoma changes in spontaneous intracerebral hemorrhage using a Transformer-based generative adversarial network to generate follow-up CT images.

Feng C, Jiang C, Hu C, Kong S, Ye Z, Han J, Zhong K, Yang T, Yin H, Lao Q, Ding Z, Shen D, Shen Q

pubmed logopapersAug 10 2025
To visualize and assess hematoma growth trends by generating follow-up CT images within 24 h based on baseline CT images of spontaneous intracerebral hemorrhage (sICH) using Transformer-integrated Generative Adversarial Networks (GAN). Patients with sICH were retrospectively recruited from two medical centers. The imaging data included baseline non-contrast CT scans taken after onset and follow-up imaging within 24 h. In the test set, the peak signal-to-noise ratio (PSNR) and the structural similarity index measure (SSIM) were utilized to quantitatively assess the quality of the predicted images. Pearson's correlation analysis was performed to assess the agreement of semantic features and geometric properties of hematomas between true follow-up CT images and the predicted images. The consistency of hematoma expansion prediction between true and generated images was further examined. The PSNR of the predicted images was 26.73 ± 1.11, and the SSIM was 91.23 ± 1.10. The Pearson correlation coefficients (r) with 95 % confidence intervals (CI) for irregularity, satellite sign number, intraventricular or subarachnoid hemorrhage, midline shift, edema expansion, mean CT value, maximum cross-sectional area, and hematoma volume between the predicted and true follow-up images were as follows: 0.94 (0.91, 0.96), 0.87 (0.81, 0.91), 0.86 (0.80, 0.91), 0.89 (0.84, 0.92), 0.91 (0.87, 0.94), 0.78(0.68, 0.84), 0.94(0.91, 0.96), and 0.94 (0.91, 0.96), respectively. The correlation coefficient (r) for predicting hematoma expansion between predicted and true follow-up images was 0.86 (95 % CI: 0.79, 0.90; P < 0.001). The model constructed using a GAN integrated with Transformer modules can accurately visualize early hematoma changes in sICH.

DiffUS: Differentiable Ultrasound Rendering from Volumetric Imaging

Noe Bertramo, Gabriel Duguey, Vivek Gopalakrishnan

arxiv logopreprintAug 9 2025
Intraoperative ultrasound imaging provides real-time guidance during numerous surgical procedures, but its interpretation is complicated by noise, artifacts, and poor alignment with high-resolution preoperative MRI/CT scans. To bridge the gap between reoperative planning and intraoperative guidance, we present DiffUS, a physics-based, differentiable ultrasound renderer that synthesizes realistic B-mode images from volumetric imaging. DiffUS first converts MRI 3D scans into acoustic impedance volumes using a machine learning approach. Next, we simulate ultrasound beam propagation using ray tracing with coupled reflection-transmission equations. DiffUS formulates wave propagation as a sparse linear system that captures multiple internal reflections. Finally, we reconstruct B-mode images via depth-resolved echo extraction across fan-shaped acquisition geometry, incorporating realistic artifacts including speckle noise and depth-dependent degradation. DiffUS is entirely implemented as differentiable tensor operations in PyTorch, enabling gradient-based optimization for downstream applications such as slice-to-volume registration and volumetric reconstruction. Evaluation on the ReMIND dataset demonstrates DiffUS's ability to generate anatomically accurate ultrasound images from brain MRI data.

Deep learning-based image enhancement for improved black blood imaging in brain metastasis.

Oh G, Paik S, Jo SW, Choi HJ, Yoo RE, Choi SH

pubmed logopapersAug 8 2025
To evaluate the utility of a deep learning (DL)-based image enhancement for improving the image quality and diagnostic performance of 3D contrast-enhanced T1-weighted black blood (BB) MR imaging for brain metastases. This retrospective study included 126 patients with and 121 patients without brain metastasis who underwent 3-T MRI examinations. Commercially available DL-based MR image enhancement software was utilized for image post-processing. The signal-to-noise ratio (SNR) and contrast-to-noise ratio (CNR) of enhancing lesions were measured. For qualitative assessment and diagnostic performance evaluation, two radiologists graded the overall image quality, noise, and artifacts of each image and the conspicuity of visible lesions. The Wilcoxon signed-rank test and regression analyses with generalized estimating equations (GEEs) were used for statistical analysis. For MR images that were not previously processed using other DL-based methods, SNR and CNR were higher in the DL-enhanced images than in the standard images (438.3 vs. 661.1, p < 0.01; 173.9 vs. 223.5, p < 0.01). Overall image quality and noise were improved in the DL images (p < 0.01, average score-5 proportion 38% vs. 65%; p < 0.01, 43% vs. 74%), whereas artifacts did not significantly differ (p ≥ 0.07). Sensitivity was increased after post-processing from 79 to 86% (p = 0.02), especially for lesions smaller than 5 mm (69 to 78%, p = 0.03), and changes in specificity (p = 0.24) and average false-positive (FP) count (p = 0.18) were not significant. DL image enhancement improves the image quality and diagnostic performance of 3D contrast-enhanced T1-weighted BB MR imaging for the detection of small brain metastases. Question Can deep learning (DL)-based image enhancement improve the image quality and diagnostic performance of 3D contrast-enhanced T1-weighted black blood (BB) MR imaging for brain metastases? Findings DL-based image enhancement improved image quality of thin slice BB MR images and sensitivity for brain metastasis, particularly for lesions smaller than 5 mm. Clinical relevance DL-based image enhancement on BB images may assist in the accurate diagnosis of brain metastasis by achieving better sensitivity while maintaining comparable specificity.

Synthesized myelin and iron stainings from 7T multi-contrast MRI via deep learning.

Pittayapong S, Hametner S, Bachrata B, Endmayr V, Bogner W, Höftberger R, Grabner G

pubmed logopapersAug 8 2025
Iron and myelin are key biomarkers for studying neurodegenerative and demyelinating brain diseases. Multi-contrast MRI techniques, such as R2* and QSM, are commonly used for iron assessment, with histology as the reference standard, but non-invasive myelin assessment remains challenging. To address this, we developed a deep learning model to generate iron and myelin staining images from in vivo multi-contrast MRI data, with a resolution comparable to ex vivo histology macro-scans. A cadaver head was scanned using a 7T MR scanner to acquire T1-weighted and multi-echo GRE data for R2*, and QSM processing, followed by histological staining for myelin and iron. To evaluate the generalizability of the model, a second cadaver head and two in vivo MRI datasets were included. After MRI-to-histology registration in the training subject, a self-attention generative adversarial network (GAN) was trained to synthesize myelin and iron staining images using various combinations of MRI contrast. The model achieved optimal myelin prediction when combining T1w, R2*, and QSM images. Incorporating the synthesized myelin images improved the subsequent prediction of iron staining. The generated images displayed fine details similar to those in histology data and demonstrated generalizability across healthy control subjects. Synthesized myelin images clearly differentiated myelin concentration between white and gray matter, while synthesized iron staining presented distinct patterns such as particularly high deposition in deep gray matter. This study shows that deep learning can transform MRI data into histological feature images, offering ex vivo insights from in vivo data and contributing to advancements in brain histology research.

MM2CT: MR-to-CT translation for multi-modal image fusion with mamba

Chaohui Gong, Zhiying Wu, Zisheng Huang, Gaofeng Meng, Zhen Lei, Hongbin Liu

arxiv logopreprintAug 7 2025
Magnetic resonance (MR)-to-computed tomography (CT) translation offers significant advantages, including the elimination of radiation exposure associated with CT scans and the mitigation of imaging artifacts caused by patient motion. The existing approaches are based on single-modality MR-to-CT translation, with limited research exploring multimodal fusion. To address this limitation, we introduce Multi-modal MR to CT (MM2CT) translation method by leveraging multimodal T1- and T2-weighted MRI data, an innovative Mamba-based framework for multi-modal medical image synthesis. Mamba effectively overcomes the limited local receptive field in CNNs and the high computational complexity issues in Transformers. MM2CT leverages this advantage to maintain long-range dependencies modeling capabilities while achieving multi-modal MR feature integration. Additionally, we incorporate a dynamic local convolution module and a dynamic enhancement module to improve MRI-to-CT synthesis. The experiments on a public pelvis dataset demonstrate that MM2CT achieves state-of-the-art performance in terms of Structural Similarity Index Measure (SSIM) and Peak Signal-to-Noise Ratio (PSNR). Our code is publicly available at https://github.com/Gots-ch/MM2CT.

Generative Artificial Intelligence in Medical Imaging: Foundations, Progress, and Clinical Translation

Xuanru Zhou, Cheng Li, Shuqiang Wang, Ye Li, Tao Tan, Hairong Zheng, Shanshan Wang

arxiv logopreprintAug 7 2025
Generative artificial intelligence (AI) is rapidly transforming medical imaging by enabling capabilities such as data synthesis, image enhancement, modality translation, and spatiotemporal modeling. This review presents a comprehensive and forward-looking synthesis of recent advances in generative modeling including generative adversarial networks (GANs), variational autoencoders (VAEs), diffusion models, and emerging multimodal foundation architectures and evaluates their expanding roles across the clinical imaging continuum. We systematically examine how generative AI contributes to key stages of the imaging workflow, from acquisition and reconstruction to cross-modality synthesis, diagnostic support, and treatment planning. Emphasis is placed on both retrospective and prospective clinical scenarios, where generative models help address longstanding challenges such as data scarcity, standardization, and integration across modalities. To promote rigorous benchmarking and translational readiness, we propose a three-tiered evaluation framework encompassing pixel-level fidelity, feature-level realism, and task-level clinical relevance. We also identify critical obstacles to real-world deployment, including generalization under domain shift, hallucination risk, data privacy concerns, and regulatory hurdles. Finally, we explore the convergence of generative AI with large-scale foundation models, highlighting how this synergy may enable the next generation of scalable, reliable, and clinically integrated imaging systems. By charting technical progress and translational pathways, this review aims to guide future research and foster interdisciplinary collaboration at the intersection of AI, medicine, and biomedical engineering.

MAISI-v2: Accelerated 3D High-Resolution Medical Image Synthesis with Rectified Flow and Region-specific Contrastive Loss

Can Zhao, Pengfei Guo, Dong Yang, Yucheng Tang, Yufan He, Benjamin Simon, Mason Belue, Stephanie Harmon, Baris Turkbey, Daguang Xu

arxiv logopreprintAug 7 2025
Medical image synthesis is an important topic for both clinical and research applications. Recently, diffusion models have become a leading approach in this area. Despite their strengths, many existing methods struggle with (1) limited generalizability that only work for specific body regions or voxel spacings, (2) slow inference, which is a common issue for diffusion models, and (3) weak alignment with input conditions, which is a critical issue for medical imaging. MAISI, a previously proposed framework, addresses generalizability issues but still suffers from slow inference and limited condition consistency. In this work, we present MAISI-v2, the first accelerated 3D medical image synthesis framework that integrates rectified flow to enable fast and high quality generation. To further enhance condition fidelity, we introduce a novel region-specific contrastive loss to enhance the sensitivity to region of interest. Our experiments show that MAISI-v2 can achieve SOTA image quality with $33 \times$ acceleration for latent diffusion model. We also conducted a downstream segmentation experiment to show that the synthetic images can be used for data augmentation. We release our code, training details, model weights, and a GUI demo to facilitate reproducibility and promote further development within the community.

Clinical information prompt-driven retinal fundus image for brain health evaluation.

Tong N, Hui Y, Gou SP, Chen LX, Wang XH, Chen SH, Li J, Li XS, Wu YT, Wu SL, Wang ZC, Sun J, Lv H

pubmed logopapersAug 6 2025
Brain volume measurement serves as a critical approach for assessing brain health status. Considering the close biological connection between the eyes and brain, this study aims to investigate the feasibility of estimating brain volume through retinal fundus imaging integrated with clinical metadata, and to offer a cost-effective approach for assessing brain health. Based on clinical information, retinal fundus images, and neuroimaging data derived from a multicenter, population-based cohort study, the KaiLuan Study, we proposed a cross-modal correlation representation (CMCR) network to elucidate the intricate co-degenerative relationships between the eyes and brain for 755 subjects. Specifically, individual clinical information, which has been followed up for as long as 12 years, was encoded as a prompt to enhance the accuracy of brain volume estimation. Independent internal validation and external validation were performed to assess the robustness of the proposed model. Root mean square error (RMSE), peak signal-to-noise ratio (PSNR), and structural similarity index measure (SSIM) metrics were employed to quantitatively evaluate the quality of synthetic brain images derived from retinal imaging data. The proposed framework yielded average RMSE, PSNR, and SSIM values of 98.23, 35.78 dB, and 0.64, respectively, which significantly outperformed 5 other methods: multi-channel Variational Autoencoder (mcVAE), Pixel-to-Pixel (Pixel2pixel), transformer-based U-Net (TransUNet), multi-scale transformer network (MT-Net), and residual vision transformer (ResViT). The two- (2D) and three-dimensional (3D) visualization results showed that the shape and texture of the synthetic brain images generated by the proposed method most closely resembled those of actual brain images. Thus, the CMCR framework accurately captured the latent structural correlations between the fundus and the brain. The average difference between predicted and actual brain volumes was 61.36 cm<sup>3</sup>, with a relative error of 4.54%. When all of the clinical information (including age and sex, daily habits, cardiovascular factors, metabolic factors, and inflammatory factors) was encoded, the difference was decreased to 53.89 cm<sup>3</sup>, with a relative error of 3.98%. Based on the synthesized brain MR images from retinal fundus images, the volumes of brain tissues could be estimated with high accuracy. This study provides an innovative, accurate, and cost-effective approach to characterize brain health status through readily accessible retinal fundus images. NCT05453877 ( https://clinicaltrials.gov/ ).

Controllable Mask Diffusion Model for medical annotation synthesis with semantic information extraction.

Heo C, Jung J

pubmed logopapersAug 5 2025
Medical segmentation, a prominent task in medical image analysis utilizing artificial intelligence, plays a crucial role in computer-aided diagnosis and depends heavily on the quality of the training data. However, the availability of sufficient data is constrained by strict privacy regulations associated with medical data. To mitigate this issue, research on data augmentation has gained significant attention. Medical segmentation tasks require paired datasets consisting of medical images and annotation images, also known as mask images, which represent lesion areas or radiological information within the medical images. Consequently, it is essential to apply data augmentation to both image types. This study proposes a Controllable Mask Diffusion Model, a novel approach capable of controlling and generating new masks. This model leverages the binary structure of the mask to extract semantic information, namely, the mask's size, location, and count, which is then applied as multi-conditional input to a diffusion model via a regressor. Through the regressor, newly generated masks conform to the input semantic information, thereby enabling input-driven controllable generation. Additionally, a technique that analyzes correlation within semantic information was devised for large-scale data synthesis. The generative capacity of the proposed model was evaluated against real datasets, and the model's ability to control and generate new masks based on previously unseen semantic information was confirmed. Furthermore, the practical applicability of the model was demonstrated by augmenting the data with the generated data, applying it to segmentation tasks, and comparing the performance with and without augmentation. Additionally, experiments were conducted on single-label and multi-label masks, yielding superior results for both types. This demonstrates the potential applicability of this study to various areas within the medical field.
Page 2 of 19185 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.