Sort by:
Page 4 of 1093 results

Anatomy-Grounded Weakly Supervised Prompt Tuning for Chest X-ray Latent Diffusion Models

Konstantinos Vilouras, Ilias Stogiannidis, Junyu Yan, Alison Q. O'Neil, Sotirios A. Tsaftaris

arxiv logopreprintJun 12 2025
Latent Diffusion Models have shown remarkable results in text-guided image synthesis in recent years. In the domain of natural (RGB) images, recent works have shown that such models can be adapted to various vision-language downstream tasks with little to no supervision involved. On the contrary, text-to-image Latent Diffusion Models remain relatively underexplored in the field of medical imaging, primarily due to limited data availability (e.g., due to privacy concerns). In this work, focusing on the chest X-ray modality, we first demonstrate that a standard text-conditioned Latent Diffusion Model has not learned to align clinically relevant information in free-text radiology reports with the corresponding areas of the given scan. Then, to alleviate this issue, we propose a fine-tuning framework to improve multi-modal alignment in a pre-trained model such that it can be efficiently repurposed for downstream tasks such as phrase grounding. Our method sets a new state-of-the-art on a standard benchmark dataset (MS-CXR), while also exhibiting robust performance on out-of-distribution data (VinDr-CXR). Our code will be made publicly available.

Prompt-Guided Latent Diffusion with Predictive Class Conditioning for 3D Prostate MRI Generation

Emerson P. Grabke, Masoom A. Haider, Babak Taati

arxiv logopreprintJun 11 2025
Latent diffusion models (LDM) could alleviate data scarcity challenges affecting machine learning development for medical imaging. However, medical LDM training typically relies on performance- or scientific accessibility-limiting strategies including a reliance on short-prompt text encoders, the reuse of non-medical LDMs, or a requirement for fine-tuning with large data volumes. We propose a Class-Conditioned Efficient Large Language model Adapter (CCELLA) to address these limitations. CCELLA is a novel dual-head conditioning approach that simultaneously conditions the LDM U-Net with non-medical large language model-encoded text features through cross-attention and with pathology classification through the timestep embedding. We also propose a joint loss function and a data-efficient LDM training framework. In combination, these strategies enable pathology-conditioned LDM training for high-quality medical image synthesis given limited data volume and human data annotation, improving LDM performance and scientific accessibility. Our method achieves a 3D FID score of 0.025 on a size-limited prostate MRI dataset, significantly outperforming a recent foundation model with FID 0.071. When training a classifier for prostate cancer prediction, adding synthetic images generated by our method to the training dataset improves classifier accuracy from 69% to 74%. Training a classifier solely on our method's synthetic images achieved comparable performance to training on real images alone.

CINeMA: Conditional Implicit Neural Multi-Modal Atlas for a Spatio-Temporal Representation of the Perinatal Brain

Maik Dannecker, Vasiliki Sideri-Lampretsa, Sophie Starck, Angeline Mihailov, Mathieu Milh, Nadine Girard, Guillaume Auzias, Daniel Rueckert

arxiv logopreprintJun 11 2025
Magnetic resonance imaging of fetal and neonatal brains reveals rapid neurodevelopment marked by substantial anatomical changes unfolding within days. Studying this critical stage of the developing human brain, therefore, requires accurate brain models-referred to as atlases-of high spatial and temporal resolution. To meet these demands, established traditional atlases and recently proposed deep learning-based methods rely on large and comprehensive datasets. This poses a major challenge for studying brains in the presence of pathologies for which data remains scarce. We address this limitation with CINeMA (Conditional Implicit Neural Multi-Modal Atlas), a novel framework for creating high-resolution, spatio-temporal, multimodal brain atlases, suitable for low-data settings. Unlike established methods, CINeMA operates in latent space, avoiding compute-intensive image registration and reducing atlas construction times from days to minutes. Furthermore, it enables flexible conditioning on anatomical features including GA, birth age, and pathologies like ventriculomegaly (VM) and agenesis of the corpus callosum (ACC). CINeMA supports downstream tasks such as tissue segmentation and age prediction whereas its generative properties enable synthetic data creation and anatomically informed data augmentation. Surpassing state-of-the-art methods in accuracy, efficiency, and versatility, CINeMA represents a powerful tool for advancing brain research. We release the code and atlases at https://github.com/m-dannecker/CINeMA.

MAMBO: High-Resolution Generative Approach for Mammography Images

Milica Škipina, Nikola Jovišić, Nicola Dall'Asen, Vanja Švenda, Anil Osman Tur, Slobodan Ilić, Elisa Ricci, Dubravko Ćulibrk

arxiv logopreprintJun 10 2025
Mammography is the gold standard for the detection and diagnosis of breast cancer. This procedure can be significantly enhanced with Artificial Intelligence (AI)-based software, which assists radiologists in identifying abnormalities. However, training AI systems requires large and diverse datasets, which are often difficult to obtain due to privacy and ethical constraints. To address this issue, the paper introduces MAMmography ensemBle mOdel (MAMBO), a novel patch-based diffusion approach designed to generate full-resolution mammograms. Diffusion models have shown breakthrough results in realistic image generation, yet few studies have focused on mammograms, and none have successfully generated high-resolution outputs required to capture fine-grained features of small lesions. To achieve this, MAMBO integrates separate diffusion models to capture both local and global (image-level) contexts. The contextual information is then fed into the final patch-based model, significantly aiding the noise removal process. This thoughtful design enables MAMBO to generate highly realistic mammograms of up to 3840x3840 pixels. Importantly, this approach can be used to enhance the training of classification models and extended to anomaly detection. Experiments, both numerical and radiologist validation, assess MAMBO's capabilities in image generation, super-resolution, and anomaly detection, highlighting its potential to enhance mammography analysis for more accurate diagnoses and earlier lesion detection.

A Privacy-Preserving Federated Learning Framework for Generalizable CBCT to Synthetic CT Translation in Head and Neck

Ciro Benito Raggio, Paolo Zaffino, Maria Francesca Spadea

arxiv logopreprintJun 10 2025
Shortened Abstract Cone-beam computed tomography (CBCT) has become a widely adopted modality for image-guided radiotherapy (IGRT). However, CBCT suffers from increased noise, limited soft-tissue contrast, and artifacts, resulting in unreliable Hounsfield unit values and hindering direct dose calculation. Synthetic CT (sCT) generation from CBCT addresses these issues, especially using deep learning (DL) methods. Existing approaches are limited by institutional heterogeneity, scanner-dependent variations, and data privacy regulations that prevent multi-center data sharing. To overcome these challenges, we propose a cross-silo horizontal federated learning (FL) approach for CBCT-to-sCT synthesis in the head and neck region, extending our FedSynthCT framework. A conditional generative adversarial network was collaboratively trained on data from three European medical centers in the public SynthRAD2025 challenge dataset. The federated model demonstrated effective generalization across centers, with mean absolute error (MAE) ranging from $64.38\pm13.63$ to $85.90\pm7.10$ HU, structural similarity index (SSIM) from $0.882\pm0.022$ to $0.922\pm0.039$, and peak signal-to-noise ratio (PSNR) from $32.86\pm0.94$ to $34.91\pm1.04$ dB. Notably, on an external validation dataset of 60 patients, comparable performance was achieved (MAE: $75.22\pm11.81$ HU, SSIM: $0.904\pm0.034$, PSNR: $33.52\pm2.06$ dB) without additional training, confirming robust generalization despite protocol, scanner differences and registration errors. These findings demonstrate the technical feasibility of FL for CBCT-to-sCT synthesis while preserving data privacy and offer a collaborative solution for developing generalizable models across institutions without centralized data sharing or site-specific fine-tuning.

APTOS-2024 challenge report: Generation of synthetic 3D OCT images from fundus photographs

Bowen Liu, Weiyi Zhang, Peranut Chotcomwongse, Xiaolan Chen, Ruoyu Chen, Pawin Pakaymaskul, Niracha Arjkongharn, Nattaporn Vongsa, Xuelian Cheng, Zongyuan Ge, Kun Huang, Xiaohui Li, Yiru Duan, Zhenbang Wang, BaoYe Xie, Qiang Chen, Huazhu Fu, Michael A. Mahr, Jiaqi Qu, Wangyiyang Chen, Shiye Wang, Yubo Tan, Yongjie Li, Mingguang He, Danli Shi, Paisan Ruamviboonsuk

arxiv logopreprintJun 9 2025
Optical Coherence Tomography (OCT) provides high-resolution, 3D, and non-invasive visualization of retinal layers in vivo, serving as a critical tool for lesion localization and disease diagnosis. However, its widespread adoption is limited by equipment costs and the need for specialized operators. In comparison, 2D color fundus photography offers faster acquisition and greater accessibility with less dependence on expensive devices. Although generative artificial intelligence has demonstrated promising results in medical image synthesis, translating 2D fundus images into 3D OCT images presents unique challenges due to inherent differences in data dimensionality and biological information between modalities. To advance generative models in the fundus-to-3D-OCT setting, the Asia Pacific Tele-Ophthalmology Society (APTOS-2024) organized a challenge titled Artificial Intelligence-based OCT Generation from Fundus Images. This paper details the challenge framework (referred to as APTOS-2024 Challenge), including: the benchmark dataset, evaluation methodology featuring two fidelity metrics-image-based distance (pixel-level OCT B-scan similarity) and video-based distance (semantic-level volumetric consistency), and analysis of top-performing solutions. The challenge attracted 342 participating teams, with 42 preliminary submissions and 9 finalists. Leading methodologies incorporated innovations in hybrid data preprocessing or augmentation (cross-modality collaborative paradigms), pre-training on external ophthalmic imaging datasets, integration of vision foundation models, and model architecture improvement. The APTOS-2024 Challenge is the first benchmark demonstrating the feasibility of fundus-to-3D-OCT synthesis as a potential solution for improving ophthalmic care accessibility in under-resourced healthcare settings, while helping to expedite medical research and clinical applications.

optiGAN: A Deep Learning-Based Alternative to Optical Photon Tracking in Python-Based GATE (10+).

Mummaneni G, Trigila C, Krah N, Sarrut D, Roncali E

pubmed logopapersJun 9 2025
To accelerate optical photon transport simulations in the GATE medical physics framework using a Generative Adversarial Network (GAN), while ensuring high modeling accuracy. Traditionally, detailed optical Monte Carlo methods have been the gold standard for modeling photon interactions in detectors, but their high computational cost remains a challenge. This study explores the integration of optiGAN, a Generative Adversarial Network (GAN) model into GATE 10, the new Python-based version of the GATE medical physics simulation framework released in November 2024.
Approach: The goal of optiGAN is to accelerate optical photon transport simulations while maintaining modelling accuracy. The optiGAN model, based on a GAN architecture, was integrated into GATE 10 as a computationally efficient alternative to traditional optical Monte Carlo simulations. To ensure consistency, optical photon transport modules were implemented in GATE 10 and validated against GATE v9.3 under identical simulation conditions. Subsequently, simulations using full Monte Carlo tracking in GATE 10 were compared to those using GATE 10-optiGAN.
Main results: Validation studies confirmed that GATE 10 produces results consistent with GATE v9.3. Simulations using GATE 10-optiGAN showed over 92% similarity to Monte Carlo-based GATE 10 results, based on the Jensen-Shannon distance across multiple photon transport parameters. optiGAN successfully captured multimodal distributions of photon position, direction, and energy at the photodetector face. Simulation time analysis revealed a reduction of approximately 50% in execution time with GATE 10-optiGAN compared to full Monte Carlo simulations.
Significance: The study confirms both the fidelity of optical photon transport modeling in GATE 10 and the effective integration of deep learning-based acceleration through optiGAN. This advancement enables large-scale, high-fidelity optical simulations with significantly reduced computational cost, supporting broader applications in medical imaging and detector design.

SMART MRS: A Simulated MEGA-PRESS ARTifacts toolbox for GABA-edited MRS.

Bugler H, Shamaei A, Souza R, Harris AD

pubmed logopapersJun 8 2025
To create a Python-based toolbox to simulate commonly occurring artifacts for single voxel gamma-aminobutyric acid (GABA)-edited MRS data. The toolbox was designed to maximize user flexibility and contains artifact, applied, input/output (I/O), and support functions. The artifact functions can produce spurious echoes, eddy currents, nuisance peaks, line broadening, baseline contamination, linear frequency drifts, and frequency and phase shift artifacts. Applied functions combine or apply specific parameter values to produce recognizable effects such as lipid peak and motion contamination. I/O and support functions provide additional functionality to accommodate different kinds of input data (MATLAB FID-A.mat files, NIfTI-MRS files), which vary by domain (time vs. frequency), MRS data type (e.g., edited vs. non-edited) and scale. A frequency and phase correction machine learning model experiment trained on corrupted simulated data and validated on in vivo data is shown to highlight the utility of our toolbox. Data simulated from the toolbox are complementary for research applications, as demonstrated by training a frequency and phase correction deep learning model that is applied to in vivo data containing artifacts. Visual assessment also confirms the resemblance of simulated artifacts compared to artifacts found in in vivo data. Our easy to install Python artifact simulated toolbox SMART_MRS is useful to enhance the diversity and quality of existing simulated edited-MRS data and is complementary to existing MRS simulation software.

Diffusion-based image translation model from low-dose chest CT to calcium scoring CT with random point sampling.

Jung JH, Lee JE, Lee HS, Yang DH, Lee JG

pubmed logopapersJun 7 2025
Coronary artery calcium (CAC) scoring is an important method for cardiovascular risk assessment. While artificial intelligence (AI) has been applied to automate CAC scoring in calcium scoring computed tomography (CSCT), its application to low-dose computed tomography (LDCT) scans, typically used for lung cancer screening, remains challenging due to the lower image quality and higher noise levels of LDCT. This study presents a diffusion model-based method for converting LDCT to CSCT images with the aim of improving CAC scoring accuracy from LDCT scans. A conditional diffusion model was developed to generate CSCT images from LDCT by modifying the denoising diffusion implicit model (DDIM) sampling process. Two main modifications were introduced: (1) random pointing, a novel sampling technique that enhances the trajectory guidance methodology of DDIM using stochastic Gaussian noise to optimize domain adaptation, and (2) intermediate sampling, an advanced methodology that strategically injects minimal noise into LDCT images prior to sampling to maximize structural preservation. The model was trained on LDCT and CSCT images obtained from the same patients but acquired separately at different time points and patient positions, and validated on 37 test cases. The proposed method showed superior performance compared to widely used image-to-image models (CycleGAN, CUT, DCLGAN, NEGCUT) across several evaluation metrics, including peak signal-to-noise ratio (39.93 ± 0.44), Local Normalized Cross-Correlation (0.97 ± 0.01), structural similarity index (0.97 ± 0.01), and Dice similarity coefficient (0.73 ± 0.10). The modifications to the sampling process reduced the number of iterations from 1000 to 10 while maintaining image quality. Volumetric analysis indicated a stronger correlation between the calcium volumes in the enhanced CSCT images and expert-verified annotations, as compared to the original LDCT images. The proposed method effectively transforms LDCT images to CSCT images while preserving anatomical structures and calcium deposits. The reduction in sampling time and the improved preservation of calcium structures suggest that the method could be applicable for clinical use in cardiovascular risk assessment.

Quasi-supervised MR-CT image conversion based on unpaired data.

Zhu R, Ruan Y, Li M, Qian W, Yao Y, Teng Y

pubmed logopapersJun 6 2025
In radiotherapy planning, acquiring both magnetic resonance (MR) and computed tomography (CT) images is crucial for comprehensive evaluation and treatment. However, simultaneous acquisition of MR and CT images is time-consuming, economically expensive, and involves ionizing radiation, which poses health risks to patients. The objective of this study is to generate CT images from radiation-free MR images using a novel quasi-supervised learning framework. In this work, we propose a quasi-supervised framework to explore the underlying relationship between unpaired MR and CT images. Normalized mutual information (NMI) is employed as a similarity metric to evaluate the correspondence between MR and CT scans. To establish optimal pairings, we compute an NMI matrix across the training set and apply the Hungarian algorithm for global matching. The resulting MR-CT pairs, along with their NMI scores, are treated as prior knowledge and integrated into the training process to guide the MR-to-CT image translation model. Experimental results indicate that the proposed method significantly outperforms existing unsupervised image synthesis methods in terms of both image quality and consistency of image features during the MR to CT image conversion process. The generated CT images show a higher degree of accuracy and fidelity to the original MR images, ensuring better preservation of anatomical details and structural integrity. This study proposes a quasi-supervised framework that converts unpaired MR and CT images into structurally consistent pseudo-pairs, providing informative priors to enhance cross-modality image synthesis. This strategy not only improves the accuracy and reliability of MR-CT conversion, but also reduces reliance on costly and scarce paired datasets. The proposed framework offers a practical 1 and scalable solution for real-world medical imaging applications, where paired annotations are often unavailable.
Page 4 of 1093 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.