Sort by:
Page 10 of 19185 results

Multi-modal MRI synthesis with conditional latent diffusion models for data augmentation in tumor segmentation.

Kebaili A, Lapuyade-Lahorgue J, Vera P, Ruan S

pubmed logopapersJul 1 2025
Multimodality is often necessary for improving object segmentation tasks, especially in the case of multilabel tasks, such as tumor segmentation, which is crucial for clinical diagnosis and treatment planning. However, a major challenge in utilizing multimodality with deep learning remains: the limited availability of annotated training data, primarily due to the time-consuming acquisition process and the necessity for expert annotations. Although deep learning has significantly advanced many tasks in medical imaging, conventional augmentation techniques are often insufficient due to the inherent complexity of volumetric medical data. To address this problem, we propose an innovative slice-based latent diffusion architecture for the generation of 3D multi-modal images and their corresponding multi-label masks. Our approach enables the simultaneous generation of the image and mask in a slice-by-slice fashion, leveraging a positional encoding and a Latent Aggregation module to maintain spatial coherence and capture slice sequentiality. This method effectively reduces the computational complexity and memory demands typically associated with diffusion models. Additionally, we condition our architecture on tumor characteristics to generate a diverse array of tumor variations and enhance texture using a refining module that acts like a super-resolution mechanism, mitigating the inherent blurriness caused by data scarcity in the autoencoder. We evaluate the effectiveness of our synthesized volumes using the BRATS2021 dataset to segment the tumor with three tissue labels and compare them with other state-of-the-art diffusion models through a downstream segmentation task, demonstrating the superior performance and efficiency of our method. While our primary application is tumor segmentation, this method can be readily adapted to other modalities. Code is available here : https://github.com/Arksyd96/multi-modal-mri-and-mask-synthesis-with-conditional-slice-based-ldm.

Diffusion-driven multi-modality medical image fusion.

Qu J, Huang D, Shi Y, Liu J, Tang W

pubmed logopapersJul 1 2025
Multi-modality medical image fusion (MMIF) technology utilizes the complementarity of different modalities to provide more comprehensive diagnostic insights for clinical practice. Existing deep learning-based methods often focus on extracting the primary information from individual modalities while ignoring the correlation of information distribution across different modalities, which leads to insufficient fusion of image details and color information. To address this problem, a diffusion-driven MMIF method is proposed to leverage the information distribution relationship among multi-modality images in the latent space. To better preserve the complementary information from different modalities, a local and global network (LAGN) is suggested. Additionally, a loss strategy is designed to establish robust constraints among diffusion-generated images, original images, and fused images. This strategy supervises the training process and prevents information loss in fused images. The experimental results demonstrate that the proposed method surpasses state-of-the-art image fusion methods in terms of unsupervised metrics on three datasets: MRI/CT, MRI/PET, and MRI/SPECT images. The proposed method successfully captures rich details and color information. Furthermore, 16 doctors and medical students were invited to evaluate the effectiveness of our method in assisting clinical diagnosis and treatment.

Automated quantification of brain PET in PET/CT using deep learning-based CT-to-MR translation: a feasibility study.

Kim D, Choo K, Lee S, Kang S, Yun M, Yang J

pubmed logopapersJul 1 2025
Quantitative analysis of PET images in brain PET/CT relies on MRI-derived regions of interest (ROIs). However, the pairs of PET/CT and MR images are not always available, and their alignment is challenging if their acquisition times differ considerably. To address these problems, this study proposes a deep learning framework for translating CT of PET/CT to synthetic MR images (MR<sub>SYN</sub>) and performing automated quantitative regional analysis using MR<sub>SYN</sub>-derived segmentation. In this retrospective study, 139 subjects who underwent brain [<sup>18</sup>F]FBB PET/CT and T1-weighted MRI were included. A U-Net-like model was trained to translate CT images to MR<sub>SYN</sub>; subsequently, a separate model was trained to segment MR<sub>SYN</sub> into 95 regions. Regional and composite standardised uptake value ratio (SUVr) was calculated in [<sup>18</sup>F]FBB PET images using the acquired ROIs. For evaluation of MR<sub>SYN</sub>, quantitative measurements including structural similarity index measure (SSIM) were employed, while for MR<sub>SYN</sub>-based segmentation evaluation, Dice similarity coefficient (DSC) was calculated. Wilcoxon signed-rank test was performed for SUVrs computed using MR<sub>SYN</sub> and ground-truth MR (MR<sub>GT</sub>). Compared to MR<sub>GT</sub>, the mean SSIM of MR<sub>SYN</sub> was 0.974 ± 0.005. The MR<sub>SYN</sub>-based segmentation achieved a mean DSC of 0.733 across 95 regions. No statistical significance (P > 0.05) was found for SUVr between the ROIs from MR<sub>SYN</sub> and those from MR<sub>GT</sub>, excluding the precuneus. We demonstrated a deep learning framework for automated regional brain analysis in PET/CT with MR<sub>SYN</sub>. Our proposed framework can benefit patients who have difficulties in performing an MRI scan.

Generation of synthetic CT-like imaging of the spine from biplanar radiographs: comparison of different deep learning architectures.

Bottini M, Zanier O, Da Mutten R, Gandia-Gonzalez ML, Edström E, Elmi-Terander A, Regli L, Serra C, Staartjes VE

pubmed logopapersJul 1 2025
This study compared two deep learning architectures-generative adversarial networks (GANs) and convolutional neural networks combined with implicit neural representations (CNN-INRs)-for generating synthetic CT (sCT) images of the spine from biplanar radiographs. The aim of the study was to identify the most robust and clinically viable approach for this potential intraoperative imaging technique. A spine CT dataset of 216 training and 54 validation cases was used. Digitally reconstructed radiographs (DRRs) served as 2D inputs for training both models under identical conditions for 170 epochs. Evaluation metrics included the Structural Similarity Index Measure (SSIM), peak signal-to-noise ratio (PSNR), and cosine similarity (CS), complemented by qualitative assessments of anatomical fidelity. The GAN model achieved a mean SSIM of 0.932 ± 0.015, PSNR of 19.85 ± 1.40 dB, and CS of 0.671 ± 0.177. The CNN-INR model demonstrated a mean SSIM of 0.921 ± 0.015, PSNR of 21.96 ± 1.20 dB, and CS of 0.707 ± 0.114. Statistical analysis revealed significant differences for SSIM (p = 0.001) and PSNR (p < 0.001), while CS differences were not statistically significant (p = 0.667). Qualitative evaluations consistently favored the GAN model, which produced more anatomically detailed and visually realistic sCT images. This study demonstrated the feasibility of generating spine sCT images from biplanar radiographs using GAN and CNN-INR models. While neither model achieved clinical-grade outputs, the GAN architecture showed greater potential for generating anatomically accurate and visually realistic images. These findings highlight the promise of sCT image generation from biplanar radiographs as an innovative approach to reducing radiation exposure and improving imaging accessibility, with GANs emerging as the more promising avenue for further research and clinical integration.

Evaluation of MRI-based synthetic CT for lumbar degenerative disease: a comparison with CT.

Jiang Z, Zhu Y, Wang W, Li Z, Li Y, Zhang M

pubmed logopapersJul 1 2025
Patients with lumbar degenerative disease typically undergo preoperative MRI combined with CT scans, but this approach introduces additional ionizing radiation and examination costs. To compare the effectiveness of MRI-based synthetic CT (sCT) in displaying lumbar degenerative changes, using CT as the gold standard. This prospective study was conducted between June 2021 and September 2023. Adult patients suspected of lumbar degenerative disease were enrolled and underwent both lumbar MRI and CT scans on the same day. The MRI images were processed using a deep learning-based image synthesis method (BoneMRI) to generate sCT images. Two radiologists independently assessed and measured the display and length of osteophytes, the presence of annular calcifications, and the CT values (HU) of L1 vertebrae on both sCT and CT images. The consistency between CT and sCT in terms of imaging results was evaluated using equivalence statistical tests. The display performance of sCT images generated from MRI scans by different manufacturers and field strengths was also compared. A total of 105 participants were included (54 males and 51 females, aged 19-95 years). sCT demonstrated statistical equivalence to CT in displaying osteophytes and annular calcifications but showed poorer performance in detecting osteoporosis. The display effectiveness of sCT images synthesized from MRI scans obtained using different imaging equipment was consistent. sCT demonstrated comparable effectiveness to CT in geometric measurements of lumbar degenerative changes. However, sCT cannot independently detect osteoporosis. When combined with conventional MRI's soft tissue information, sCT offers a promising possibility for radiation-free diagnosis and preoperative planning.

Unsupervised Cardiac Video Translation Via Motion Feature Guided Diffusion Model

Swakshar Deb, Nian Wu, Frederick H. Epstein, Miaomiao Zhang

arxiv logopreprintJul 1 2025
This paper presents a novel motion feature guided diffusion model for unpaired video-to-video translation (MFD-V2V), designed to synthesize dynamic, high-contrast cine cardiac magnetic resonance (CMR) from lower-contrast, artifact-prone displacement encoding with stimulated echoes (DENSE) CMR sequences. To achieve this, we first introduce a Latent Temporal Multi-Attention (LTMA) registration network that effectively learns more accurate and consistent cardiac motions from cine CMR image videos. A multi-level motion feature guided diffusion model, equipped with a specialized Spatio-Temporal Motion Encoder (STME) to extract fine-grained motion conditioning, is then developed to improve synthesis quality and fidelity. We evaluate our method, MFD-V2V, on a comprehensive cardiac dataset, demonstrating superior performance over the state-of-the-art in both quantitative metrics and qualitative assessments. Furthermore, we show the benefits of our synthesized cine CMRs improving downstream clinical and analytical tasks, underscoring the broader impact of our approach. Our code is publicly available at https://github.com/SwaksharDeb/MFD-V2V.

A lung structure and function information-guided residual diffusion model for predicting idiopathic pulmonary fibrosis progression.

Jiang C, Xing X, Nan Y, Fang Y, Zhang S, Walsh S, Yang G, Shen D

pubmed logopapersJul 1 2025
Idiopathic Pulmonary Fibrosis (IPF) is a progressive lung disease that continuously scars and thickens lung tissue, leading to respiratory difficulties. Timely assessment of IPF progression is essential for developing treatment plans and improving patient survival rates. However, current clinical standards require multiple (usually two) CT scans at certain intervals to assess disease progression. This presents a dilemma: the disease progression is identified only after the disease has already progressed. To address this issue, a feasible solution is to generate the follow-up CT image from the patient's initial CT image to achieve early prediction of IPF. To this end, we propose a lung structure and function information-guided residual diffusion model. The key components of our model include (1) using a 2.5D generation strategy to reduce computational cost of generating 3D images with the diffusion model; (2) designing structural attention to mitigate negative impact of spatial misalignment between the two CT images on generation performance; (3) employing residual diffusion to accelerate model training and inference while focusing more on differences between the two CT images (i.e., the lesion areas); and (4) developing a CLIP-based text extraction module to extract lung function test information and further using such extracted information to guide the generation. Extensive experiments demonstrate that our method can effectively predict IPF progression and achieve superior generation performance compared to state-of-the-art methods.

In-silico CT simulations of deep learning generated heterogeneous phantoms.

Salinas CS, Magudia K, Sangal A, Ren L, Segars PW

pubmed logopapersJun 30 2025
Current virtual imaging phantoms primarily emphasize geometric&#xD;accuracy of anatomical structures. However, to enhance realism, it is also important&#xD;to incorporate intra-organ detail. Because biological tissues are heterogeneous in&#xD;composition, virtual phantoms should reflect this by including realistic intra-organ&#xD;texture and material variation.&#xD;We propose training two 3D Double U-Net conditional generative adversarial&#xD;networks (3D DUC-GAN) to generate sixteen unique textures that encompass organs&#xD;found within the torso. The model was trained on 378 CT image-segmentation&#xD;pairs taken from a publicly available dataset with 18 additional pairs reserved for&#xD;testing. Textured phantoms were generated and imaged using DukeSim, a virtual CT&#xD;simulation platform.&#xD;Results showed that the deep learning model was able to synthesize realistic&#xD;heterogeneous phantoms from a set of homogeneous phantoms. These phantoms were&#xD;compared with original CT scans and had a mean absolute difference of 46.15 ± 1.06&#xD;HU. The structural similarity index (SSIM) and peak signal-to-noise ratio (PSNR)&#xD;were 0.86 ± 0.004 and 28.62 ± 0.14, respectively. The maximum mean discrepancy&#xD;between the generated and actual distribution was 0.0016. These metrics marked&#xD;an improvement of 27%, 5.9%, 6.2%, and 28% respectively, compared to current&#xD;homogeneous texture methods. The generated phantoms that underwent a virtual&#xD;CT scan had a closer visual resemblance to the true CT scan compared to the previous&#xD;method.&#xD;The resulting heterogeneous phantoms offer a significant step toward more realistic&#xD;in silico trials, enabling enhanced simulation of imaging procedures with greater fidelity&#xD;to true anatomical variation.

VAP-Diffusion: Enriching Descriptions with MLLMs for Enhanced Medical Image Generation

Peng Huang, Junhu Fu, Bowen Guo, Zeju Li, Yuanyuan Wang, Yi Guo

arxiv logopreprintJun 30 2025
As the appearance of medical images is influenced by multiple underlying factors, generative models require rich attribute information beyond labels to produce realistic and diverse images. For instance, generating an image of skin lesion with specific patterns demands descriptions that go beyond diagnosis, such as shape, size, texture, and color. However, such detailed descriptions are not always accessible. To address this, we explore a framework, termed Visual Attribute Prompts (VAP)-Diffusion, to leverage external knowledge from pre-trained Multi-modal Large Language Models (MLLMs) to improve the quality and diversity of medical image generation. First, to derive descriptions from MLLMs without hallucination, we design a series of prompts following Chain-of-Thoughts for common medical imaging tasks, including dermatologic, colorectal, and chest X-ray images. Generated descriptions are utilized during training and stored across different categories. During testing, descriptions are randomly retrieved from the corresponding category for inference. Moreover, to make the generator robust to unseen combination of descriptions at the test time, we propose a Prototype Condition Mechanism that restricts test embeddings to be similar to those from training. Experiments on three common types of medical imaging across four datasets verify the effectiveness of VAP-Diffusion.

Towards 3D Semantic Image Synthesis for Medical Imaging

Wenwu Tang, Khaled Seyam, Bin Yang

arxiv logopreprintJun 30 2025
In the medical domain, acquiring large datasets is challenging due to both accessibility issues and stringent privacy regulations. Consequently, data availability and privacy protection are major obstacles to applying machine learning in medical imaging. To address this, our study proposes the Med-LSDM (Latent Semantic Diffusion Model), which operates directly in the 3D domain and leverages de-identified semantic maps to generate synthetic data as a method of privacy preservation and data augmentation. Unlike many existing methods that focus on generating 2D slices, Med-LSDM is designed specifically for 3D semantic image synthesis, making it well-suited for applications requiring full volumetric data. Med-LSDM incorporates a guiding mechanism that controls the 3D image generation process by applying a diffusion model within the latent space of a pre-trained VQ-GAN. By operating in the compressed latent space, the model significantly reduces computational complexity while still preserving critical 3D spatial details. Our approach demonstrates strong performance in 3D semantic medical image synthesis, achieving a 3D-FID score of 0.0054 on the conditional Duke Breast dataset and similar Dice scores (0.70964) to those of real images (0.71496). These results demonstrate that the synthetic data from our model have a small domain gap with real data and are useful for data augmentation.
Page 10 of 19185 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.