Sort by:
Page 13 of 19185 results

Anatomy-Grounded Weakly Supervised Prompt Tuning for Chest X-ray Latent Diffusion Models

Konstantinos Vilouras, Ilias Stogiannidis, Junyu Yan, Alison Q. O'Neil, Sotirios A. Tsaftaris

arxiv logopreprintJun 12 2025
Latent Diffusion Models have shown remarkable results in text-guided image synthesis in recent years. In the domain of natural (RGB) images, recent works have shown that such models can be adapted to various vision-language downstream tasks with little to no supervision involved. On the contrary, text-to-image Latent Diffusion Models remain relatively underexplored in the field of medical imaging, primarily due to limited data availability (e.g., due to privacy concerns). In this work, focusing on the chest X-ray modality, we first demonstrate that a standard text-conditioned Latent Diffusion Model has not learned to align clinically relevant information in free-text radiology reports with the corresponding areas of the given scan. Then, to alleviate this issue, we propose a fine-tuning framework to improve multi-modal alignment in a pre-trained model such that it can be efficiently repurposed for downstream tasks such as phrase grounding. Our method sets a new state-of-the-art on a standard benchmark dataset (MS-CXR), while also exhibiting robust performance on out-of-distribution data (VinDr-CXR). Our code will be made publicly available.

Score-based Generative Diffusion Models to Synthesize Full-dose FDG Brain PET from MRI in Epilepsy Patients

Jiaqi Wu, Jiahong Ouyang, Farshad Moradi, Mohammad Mehdi Khalighi, Greg Zaharchuk

arxiv logopreprintJun 12 2025
Fluorodeoxyglucose (FDG) PET to evaluate patients with epilepsy is one of the most common applications for simultaneous PET/MRI, given the need to image both brain structure and metabolism, but is suboptimal due to the radiation dose in this young population. Little work has been done synthesizing diagnostic quality PET images from MRI data or MRI data with ultralow-dose PET using advanced generative AI methods, such as diffusion models, with attention to clinical evaluations tailored for the epilepsy population. Here we compared the performance of diffusion- and non-diffusion-based deep learning models for the MRI-to-PET image translation task for epilepsy imaging using simultaneous PET/MRI in 52 subjects (40 train/2 validate/10 hold-out test). We tested three different models: 2 score-based generative diffusion models (SGM-Karras Diffusion [SGM-KD] and SGM-variance preserving [SGM-VP]) and a Transformer-Unet. We report results on standard image processing metrics as well as clinically relevant metrics, including congruency measures (Congruence Index and Congruency Mean Absolute Error) that assess hemispheric metabolic asymmetry, which is a key part of the clinical analysis of these images. The SGM-KD produced the best qualitative and quantitative results when synthesizing PET purely from T1w and T2 FLAIR images with the least mean absolute error in whole-brain specific uptake value ratio (SUVR) and highest intraclass correlation coefficient. When 1% low-dose PET images are included in the inputs, all models improve significantly and are interchangeable for quantitative performance and visual quality. In summary, SGMs hold great potential for pure MRI-to-PET translation, while all 3 model types can synthesize full-dose FDG-PET accurately using MRI and ultralow-dose PET.

Simulation-free workflow for lattice radiation therapy using deep learning predicted synthetic computed tomography: A feasibility study.

Zhu L, Yu NY, Ahmed SK, Ashman JB, Toesca DS, Grams MP, Deufel CL, Duan J, Chen Q, Rong Y

pubmed logopapersJun 12 2025
Lattice radiation therapy (LRT) is a form of spatially fractionated radiation therapy that allows increased total dose delivery aiming for improved treatment response without an increase in toxicities, commonly utilized for palliation of bulky tumors. The LRT treatment planning process is complex, while eligible patients often have an urgent need for expedited treatment start. In this study, we aimed to develop a simulation-free workflow for volumetric modulated arc therapy (VMAT)-based LRT planning via deep learning-predicted synthetic CT (sCT) to expedite treatment initiation. Two deep learning models were initially trained using 3D U-Net architecture to generate sCT from diagnostic CTs (dCT) of the thoracic and abdomen regions using a training dataset of 50 patients. The models were then tested on an independent dataset of 15 patients using image similarity analysis assessing mean absolute error (MAE) and structural similarity index measure (SSIM) as metrics. VMAT-based LRT plans were generated based on sCT and recalculated on the planning CT (pCT) for dosimetric accuracy comparison. Differences in dose volume histogram (DVH) metrics between pCT and sCT plans were assessed using the Wilcoxon signed-rank test. The final sCT prediction model demonstrated high image similarity to pCT, with a MAE and SSIM of 38.93 ± 14.79 Hounsfield Units (HU) and 0.92 ± 0.05 for the thoracic region, and 73.60 ± 22.90 HU and 0.90 ± 0.03 for the abdominal region, respectively. There were no statistically significant differences between sCT and pCT plans in terms of organ-at-risk and target volume DVH parameters, including maximum dose (Dmax), mean dose (Dmean), dose delivered to 90% (D90%) and 50% (D50%) of target volume, except for minimum dose (Dmin) and (D10%). With demonstrated high image similarity and adequate dose agreement between sCT and pCT, our study is a proof-of-concept for using deep learning predicted sCT for a simulation-free treatment planning workflow for VMAT-based LRT.

Prompt-Guided Latent Diffusion with Predictive Class Conditioning for 3D Prostate MRI Generation

Emerson P. Grabke, Masoom A. Haider, Babak Taati

arxiv logopreprintJun 11 2025
Latent diffusion models (LDM) could alleviate data scarcity challenges affecting machine learning development for medical imaging. However, medical LDM training typically relies on performance- or scientific accessibility-limiting strategies including a reliance on short-prompt text encoders, the reuse of non-medical LDMs, or a requirement for fine-tuning with large data volumes. We propose a Class-Conditioned Efficient Large Language model Adapter (CCELLA) to address these limitations. CCELLA is a novel dual-head conditioning approach that simultaneously conditions the LDM U-Net with non-medical large language model-encoded text features through cross-attention and with pathology classification through the timestep embedding. We also propose a joint loss function and a data-efficient LDM training framework. In combination, these strategies enable pathology-conditioned LDM training for high-quality medical image synthesis given limited data volume and human data annotation, improving LDM performance and scientific accessibility. Our method achieves a 3D FID score of 0.025 on a size-limited prostate MRI dataset, significantly outperforming a recent foundation model with FID 0.071. When training a classifier for prostate cancer prediction, adding synthetic images generated by our method to the training dataset improves classifier accuracy from 69% to 74%. Training a classifier solely on our method's synthetic images achieved comparable performance to training on real images alone.

CINeMA: Conditional Implicit Neural Multi-Modal Atlas for a Spatio-Temporal Representation of the Perinatal Brain

Maik Dannecker, Vasiliki Sideri-Lampretsa, Sophie Starck, Angeline Mihailov, Mathieu Milh, Nadine Girard, Guillaume Auzias, Daniel Rueckert

arxiv logopreprintJun 11 2025
Magnetic resonance imaging of fetal and neonatal brains reveals rapid neurodevelopment marked by substantial anatomical changes unfolding within days. Studying this critical stage of the developing human brain, therefore, requires accurate brain models-referred to as atlases-of high spatial and temporal resolution. To meet these demands, established traditional atlases and recently proposed deep learning-based methods rely on large and comprehensive datasets. This poses a major challenge for studying brains in the presence of pathologies for which data remains scarce. We address this limitation with CINeMA (Conditional Implicit Neural Multi-Modal Atlas), a novel framework for creating high-resolution, spatio-temporal, multimodal brain atlases, suitable for low-data settings. Unlike established methods, CINeMA operates in latent space, avoiding compute-intensive image registration and reducing atlas construction times from days to minutes. Furthermore, it enables flexible conditioning on anatomical features including GA, birth age, and pathologies like ventriculomegaly (VM) and agenesis of the corpus callosum (ACC). CINeMA supports downstream tasks such as tissue segmentation and age prediction whereas its generative properties enable synthetic data creation and anatomically informed data augmentation. Surpassing state-of-the-art methods in accuracy, efficiency, and versatility, CINeMA represents a powerful tool for advancing brain research. We release the code and atlases at https://github.com/m-dannecker/CINeMA.

MAMBO: High-Resolution Generative Approach for Mammography Images

Milica Škipina, Nikola Jovišić, Nicola Dall'Asen, Vanja Švenda, Anil Osman Tur, Slobodan Ilić, Elisa Ricci, Dubravko Ćulibrk

arxiv logopreprintJun 10 2025
Mammography is the gold standard for the detection and diagnosis of breast cancer. This procedure can be significantly enhanced with Artificial Intelligence (AI)-based software, which assists radiologists in identifying abnormalities. However, training AI systems requires large and diverse datasets, which are often difficult to obtain due to privacy and ethical constraints. To address this issue, the paper introduces MAMmography ensemBle mOdel (MAMBO), a novel patch-based diffusion approach designed to generate full-resolution mammograms. Diffusion models have shown breakthrough results in realistic image generation, yet few studies have focused on mammograms, and none have successfully generated high-resolution outputs required to capture fine-grained features of small lesions. To achieve this, MAMBO integrates separate diffusion models to capture both local and global (image-level) contexts. The contextual information is then fed into the final patch-based model, significantly aiding the noise removal process. This thoughtful design enables MAMBO to generate highly realistic mammograms of up to 3840x3840 pixels. Importantly, this approach can be used to enhance the training of classification models and extended to anomaly detection. Experiments, both numerical and radiologist validation, assess MAMBO's capabilities in image generation, super-resolution, and anomaly detection, highlighting its potential to enhance mammography analysis for more accurate diagnoses and earlier lesion detection.

A Privacy-Preserving Federated Learning Framework for Generalizable CBCT to Synthetic CT Translation in Head and Neck

Ciro Benito Raggio, Paolo Zaffino, Maria Francesca Spadea

arxiv logopreprintJun 10 2025
Shortened Abstract Cone-beam computed tomography (CBCT) has become a widely adopted modality for image-guided radiotherapy (IGRT). However, CBCT suffers from increased noise, limited soft-tissue contrast, and artifacts, resulting in unreliable Hounsfield unit values and hindering direct dose calculation. Synthetic CT (sCT) generation from CBCT addresses these issues, especially using deep learning (DL) methods. Existing approaches are limited by institutional heterogeneity, scanner-dependent variations, and data privacy regulations that prevent multi-center data sharing. To overcome these challenges, we propose a cross-silo horizontal federated learning (FL) approach for CBCT-to-sCT synthesis in the head and neck region, extending our FedSynthCT framework. A conditional generative adversarial network was collaboratively trained on data from three European medical centers in the public SynthRAD2025 challenge dataset. The federated model demonstrated effective generalization across centers, with mean absolute error (MAE) ranging from $64.38\pm13.63$ to $85.90\pm7.10$ HU, structural similarity index (SSIM) from $0.882\pm0.022$ to $0.922\pm0.039$, and peak signal-to-noise ratio (PSNR) from $32.86\pm0.94$ to $34.91\pm1.04$ dB. Notably, on an external validation dataset of 60 patients, comparable performance was achieved (MAE: $75.22\pm11.81$ HU, SSIM: $0.904\pm0.034$, PSNR: $33.52\pm2.06$ dB) without additional training, confirming robust generalization despite protocol, scanner differences and registration errors. These findings demonstrate the technical feasibility of FL for CBCT-to-sCT synthesis while preserving data privacy and offer a collaborative solution for developing generalizable models across institutions without centralized data sharing or site-specific fine-tuning.

APTOS-2024 challenge report: Generation of synthetic 3D OCT images from fundus photographs

Bowen Liu, Weiyi Zhang, Peranut Chotcomwongse, Xiaolan Chen, Ruoyu Chen, Pawin Pakaymaskul, Niracha Arjkongharn, Nattaporn Vongsa, Xuelian Cheng, Zongyuan Ge, Kun Huang, Xiaohui Li, Yiru Duan, Zhenbang Wang, BaoYe Xie, Qiang Chen, Huazhu Fu, Michael A. Mahr, Jiaqi Qu, Wangyiyang Chen, Shiye Wang, Yubo Tan, Yongjie Li, Mingguang He, Danli Shi, Paisan Ruamviboonsuk

arxiv logopreprintJun 9 2025
Optical Coherence Tomography (OCT) provides high-resolution, 3D, and non-invasive visualization of retinal layers in vivo, serving as a critical tool for lesion localization and disease diagnosis. However, its widespread adoption is limited by equipment costs and the need for specialized operators. In comparison, 2D color fundus photography offers faster acquisition and greater accessibility with less dependence on expensive devices. Although generative artificial intelligence has demonstrated promising results in medical image synthesis, translating 2D fundus images into 3D OCT images presents unique challenges due to inherent differences in data dimensionality and biological information between modalities. To advance generative models in the fundus-to-3D-OCT setting, the Asia Pacific Tele-Ophthalmology Society (APTOS-2024) organized a challenge titled Artificial Intelligence-based OCT Generation from Fundus Images. This paper details the challenge framework (referred to as APTOS-2024 Challenge), including: the benchmark dataset, evaluation methodology featuring two fidelity metrics-image-based distance (pixel-level OCT B-scan similarity) and video-based distance (semantic-level volumetric consistency), and analysis of top-performing solutions. The challenge attracted 342 participating teams, with 42 preliminary submissions and 9 finalists. Leading methodologies incorporated innovations in hybrid data preprocessing or augmentation (cross-modality collaborative paradigms), pre-training on external ophthalmic imaging datasets, integration of vision foundation models, and model architecture improvement. The APTOS-2024 Challenge is the first benchmark demonstrating the feasibility of fundus-to-3D-OCT synthesis as a potential solution for improving ophthalmic care accessibility in under-resourced healthcare settings, while helping to expedite medical research and clinical applications.

optiGAN: A Deep Learning-Based Alternative to Optical Photon Tracking in Python-Based GATE (10+).

Mummaneni G, Trigila C, Krah N, Sarrut D, Roncali E

pubmed logopapersJun 9 2025
To accelerate optical photon transport simulations in the GATE medical physics framework using a Generative Adversarial Network (GAN), while ensuring high modeling accuracy. Traditionally, detailed optical Monte Carlo methods have been the gold standard for modeling photon interactions in detectors, but their high computational cost remains a challenge. This study explores the integration of optiGAN, a Generative Adversarial Network (GAN) model into GATE 10, the new Python-based version of the GATE medical physics simulation framework released in November 2024.
Approach: The goal of optiGAN is to accelerate optical photon transport simulations while maintaining modelling accuracy. The optiGAN model, based on a GAN architecture, was integrated into GATE 10 as a computationally efficient alternative to traditional optical Monte Carlo simulations. To ensure consistency, optical photon transport modules were implemented in GATE 10 and validated against GATE v9.3 under identical simulation conditions. Subsequently, simulations using full Monte Carlo tracking in GATE 10 were compared to those using GATE 10-optiGAN.
Main results: Validation studies confirmed that GATE 10 produces results consistent with GATE v9.3. Simulations using GATE 10-optiGAN showed over 92% similarity to Monte Carlo-based GATE 10 results, based on the Jensen-Shannon distance across multiple photon transport parameters. optiGAN successfully captured multimodal distributions of photon position, direction, and energy at the photodetector face. Simulation time analysis revealed a reduction of approximately 50% in execution time with GATE 10-optiGAN compared to full Monte Carlo simulations.
Significance: The study confirms both the fidelity of optical photon transport modeling in GATE 10 and the effective integration of deep learning-based acceleration through optiGAN. This advancement enables large-scale, high-fidelity optical simulations with significantly reduced computational cost, supporting broader applications in medical imaging and detector design.

SMART MRS: A Simulated MEGA-PRESS ARTifacts toolbox for GABA-edited MRS.

Bugler H, Shamaei A, Souza R, Harris AD

pubmed logopapersJun 8 2025
To create a Python-based toolbox to simulate commonly occurring artifacts for single voxel gamma-aminobutyric acid (GABA)-edited MRS data. The toolbox was designed to maximize user flexibility and contains artifact, applied, input/output (I/O), and support functions. The artifact functions can produce spurious echoes, eddy currents, nuisance peaks, line broadening, baseline contamination, linear frequency drifts, and frequency and phase shift artifacts. Applied functions combine or apply specific parameter values to produce recognizable effects such as lipid peak and motion contamination. I/O and support functions provide additional functionality to accommodate different kinds of input data (MATLAB FID-A.mat files, NIfTI-MRS files), which vary by domain (time vs. frequency), MRS data type (e.g., edited vs. non-edited) and scale. A frequency and phase correction machine learning model experiment trained on corrupted simulated data and validated on in vivo data is shown to highlight the utility of our toolbox. Data simulated from the toolbox are complementary for research applications, as demonstrated by training a frequency and phase correction deep learning model that is applied to in vivo data containing artifacts. Visual assessment also confirms the resemblance of simulated artifacts compared to artifacts found in in vivo data. Our easy to install Python artifact simulated toolbox SMART_MRS is useful to enhance the diversity and quality of existing simulated edited-MRS data and is complementary to existing MRS simulation software.
Page 13 of 19185 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.