Sort by:
Page 4 of 659 results

Text-to-CT Generation via 3D Latent Diffusion Model with Contrastive Vision-Language Pretraining

Daniele Molino, Camillo Maria Caruso, Filippo Ruffini, Paolo Soda, Valerio Guarrasi

arxiv logopreprintMay 31 2025
Objective: While recent advances in text-conditioned generative models have enabled the synthesis of realistic medical images, progress has been largely confined to 2D modalities such as chest X-rays. Extending text-to-image generation to volumetric Computed Tomography (CT) remains a significant challenge, due to its high dimensionality, anatomical complexity, and the absence of robust frameworks that align vision-language data in 3D medical imaging. Methods: We introduce a novel architecture for Text-to-CT generation that combines a latent diffusion model with a 3D contrastive vision-language pretraining scheme. Our approach leverages a dual-encoder CLIP-style model trained on paired CT volumes and radiology reports to establish a shared embedding space, which serves as the conditioning input for generation. CT volumes are compressed into a low-dimensional latent space via a pretrained volumetric VAE, enabling efficient 3D denoising diffusion without requiring external super-resolution stages. Results: We evaluate our method on the CT-RATE dataset and conduct a comprehensive assessment of image fidelity, clinical relevance, and semantic alignment. Our model achieves competitive performance across all tasks, significantly outperforming prior baselines for text-to-CT generation. Moreover, we demonstrate that CT scans synthesized by our framework can effectively augment real data, improving downstream diagnostic performance. Conclusion: Our results show that modality-specific vision-language alignment is a key component for high-quality 3D medical image generation. By integrating contrastive pretraining and volumetric diffusion, our method offers a scalable and controllable solution for synthesizing clinically meaningful CT volumes from text, paving the way for new applications in data augmentation, medical education, and automated clinical simulation.

Motion-resolved parametric imaging derived from short dynamic [<sup>18</sup>F]FDG PET/CT scans.

Artesani A, van Sluis J, Providência L, van Snick JH, Slart RHJA, Noordzij W, Tsoumpas C

pubmed logopapersMay 29 2025
This study aims to assess the added value of utilizing short-dynamic whole-body PET/CT scans and implementing motion correction before quantifying metabolic rate, offering more insights into physiological processes. While this approach may not be commonly adopted, addressing motion effects is crucial due to their demonstrated potential to cause significant errors in parametric imaging. A 15-minute dynamic FDG PET acquisition protocol was utilized for four lymphoma patients undergoing therapy evaluation. Parametric imaging was obtained using a population-based input function (PBIF) derived from twelve patients with full 65-minute dynamic FDG PET acquisition. AI-based registration methods were employed to correct misalignments between both PET and ACCT and PET-to-PET. Tumour characteristics were assessed using both parametric images and standardized uptake values (SUV). The motion correction process significantly reduced mismatches between images without significantly altering voxel intensity values, except for SUV<sub>max</sub>. Following the alignment of the attenuation correction map with the PET frame, an increase in SUV<sub>max</sub> in FDG-avid lymph nodes was observed, indicating its susceptibility to spatial misalignments. In contrast, Patlak K<sub>i</sub> parameter was highly sensitive to misalignment across PET frames, that notably altered the Patlak slope. Upon completion of the motion correction process, the parametric representation revealed heterogeneous behaviour among lymph nodes compared to SUV images. Notably, reduced volume of elevated metabolic rate was determined in the mediastinal lymph nodes in contrast with an SUV of 5 g/ml, indicating potential perfusion or inflammation. Motion resolved short-dynamic PET can enhance the utility and reliability of parametric imaging, an aspect often overlooked in commercial software.

Cascaded 3D Diffusion Models for Whole-body 3D 18-F FDG PET/CT synthesis from Demographics

Siyeop Yoon, Sifan Song, Pengfei Jin, Matthew Tivnan, Yujin Oh, Sekeun Kim, Dufan Wu, Xiang Li, Quanzheng Li

arxiv logopreprintMay 28 2025
We propose a cascaded 3D diffusion model framework to synthesize high-fidelity 3D PET/CT volumes directly from demographic variables, addressing the growing need for realistic digital twins in oncologic imaging, virtual trials, and AI-driven data augmentation. Unlike deterministic phantoms, which rely on predefined anatomical and metabolic templates, our method employs a two-stage generative process. An initial score-based diffusion model synthesizes low-resolution PET/CT volumes from demographic variables alone, providing global anatomical structures and approximate metabolic activity. This is followed by a super-resolution residual diffusion model that refines spatial resolution. Our framework was trained on 18-F FDG PET/CT scans from the AutoPET dataset and evaluated using organ-wise volume and standardized uptake value (SUV) distributions, comparing synthetic and real data between demographic subgroups. The organ-wise comparison demonstrated strong concordance between synthetic and real images. In particular, most deviations in metabolic uptake values remained within 3-5% of the ground truth in subgroup analysis. These findings highlight the potential of cascaded 3D diffusion models to generate anatomically and metabolically accurate PET/CT images, offering a robust alternative to traditional phantoms and enabling scalable, population-informed synthetic imaging for clinical and research applications.

Multimodal integration of longitudinal noninvasive diagnostics for survival prediction in immunotherapy using deep learning.

Yeghaian M, Bodalal Z, van den Broek D, Haanen JBAG, Beets-Tan RGH, Trebeschi S, van Gerven MAJ

pubmed logopapersMay 26 2025
Immunotherapies have revolutionized the landscape of cancer treatments. However, our understanding of response patterns in advanced cancers treated with immunotherapy remains limited. By leveraging routinely collected noninvasive longitudinal and multimodal data with artificial intelligence, we could unlock the potential to transform immunotherapy for cancer patients, paving the way for personalized treatment approaches. In this study, we developed a novel artificial neural network architecture, multimodal transformer-based simple temporal attention (MMTSimTA) network, building upon a combination of recent successful developments. We integrated pre- and on-treatment blood measurements, prescribed medications, and CT-based volumes of organs from a large pan-cancer cohort of 694 patients treated with immunotherapy to predict mortality at 3, 6, 9, and 12 months. Different variants of our extended MMTSimTA network were implemented and compared to baseline methods, incorporating intermediate and late fusion-based integration methods. The strongest prognostic performance was demonstrated using a variant of the MMTSimTA model with area under the curves of 0.84 ± 0.04, 0.83 ± 0.02, 0.82 ± 0.02, 0.81 ± 0.03 for 3-, 6-, 9-, and 12-month survival prediction, respectively. Our findings show that integrating noninvasive longitudinal data using our novel architecture yields an improved multimodal prognostic performance, especially in short-term survival prediction. Our study demonstrates that multimodal longitudinal integration of noninvasive data using deep learning may offer a promising approach for personalized prognostication in immunotherapy-treated cancer patients.

Joint Reconstruction of Activity and Attenuation in PET by Diffusion Posterior Sampling in Wavelet Coefficient Space

Clémentine Phung-Ngoc, Alexandre Bousse, Antoine De Paepe, Hong-Phuong Dang, Olivier Saut, Dimitris Visvikis

arxiv logopreprintMay 24 2025
Attenuation correction (AC) is necessary for accurate activity quantification in positron emission tomography (PET). Conventional reconstruction methods typically rely on attenuation maps derived from a co-registered computed tomography (CT) or magnetic resonance imaging scan. However, this additional scan may complicate the imaging workflow, introduce misalignment artifacts and increase radiation exposure. In this paper, we propose a joint reconstruction of activity and attenuation (JRAA) approach that eliminates the need for auxiliary anatomical imaging by relying solely on emission data. This framework combines wavelet diffusion model (WDM) and diffusion posterior sampling (DPS) to reconstruct fully three-dimensional (3-D) data. Experimental results show our method outperforms maximum likelihood activity and attenuation (MLAA) and MLAA with UNet-based post processing, and yields high-quality noise-free reconstructions across various count settings when time-of-flight (TOF) information is available. It is also able to reconstruct non-TOF data, although the reconstruction quality significantly degrades in low-count (LC) conditions, limiting its practical effectiveness in such settings. This approach represents a step towards stand-alone PET imaging by reducing the dependence on anatomical modalities while maintaining quantification accuracy, even in low-count scenarios when TOF information is available.

Render-FM: A Foundation Model for Real-time Photorealistic Volumetric Rendering

Zhongpai Gao, Meng Zheng, Benjamin Planche, Anwesa Choudhuri, Terrence Chen, Ziyan Wu

arxiv logopreprintMay 22 2025
Volumetric rendering of Computed Tomography (CT) scans is crucial for visualizing complex 3D anatomical structures in medical imaging. Current high-fidelity approaches, especially neural rendering techniques, require time-consuming per-scene optimization, limiting clinical applicability due to computational demands and poor generalizability. We propose Render-FM, a novel foundation model for direct, real-time volumetric rendering of CT scans. Render-FM employs an encoder-decoder architecture that directly regresses 6D Gaussian Splatting (6DGS) parameters from CT volumes, eliminating per-scan optimization through large-scale pre-training on diverse medical data. By integrating robust feature extraction with the expressive power of 6DGS, our approach efficiently generates high-quality, real-time interactive 3D visualizations across diverse clinical CT data. Experiments demonstrate that Render-FM achieves visual fidelity comparable or superior to specialized per-scan methods while drastically reducing preparation time from nearly an hour to seconds for a single inference step. This advancement enables seamless integration into real-time surgical planning and diagnostic workflows. The project page is: https://gaozhongpai.github.io/renderfm/.

Large medical image database impact on generalizability of synthetic CT scan generation.

Boily C, Mazellier JP, Meyer P

pubmed logopapersMay 21 2025
This study systematically examines the impact of training database size and the generalizability of deep learning models for synthetic medical image generation. Specifically, we employ a Cycle-Consistency Generative Adversarial Network (CycleGAN) with softly paired data to synthesize kilovoltage computed tomography (kVCT) images from megavoltage computed tomography (MVCT) scans. Unlike previous works, which were constrained by limited data availability, our study uses an extensive database comprising 4,000 patient CT scans, an order of magnitude larger than prior research, allowing for a more rigorous assessment of database size in medical image translation. We quantitatively evaluate the fidelity of the generated synthetic images using established image similarity metrics, including Mean Absolute Error (MAE) and Structural Similarity Index Measure (SSIM). Beyond assessing image quality, we investigate the model's capacity for generalization by analyzing its performance across diverse patient subgroups, considering factors such as sex, age, and anatomical region. This approach enables a more granular understanding of how dataset composition influences model robustness.

Synthesizing [<sup>18</sup>F]PSMA-1007 PET bone images from CT images with GAN for early detection of prostate cancer bone metastases: a pilot validation study.

Chai L, Yao X, Yang X, Na R, Yan W, Jiang M, Zhu H, Sun C, Dai Z, Yang X

pubmed logopapersMay 21 2025
[<sup>18</sup>F]FDG PET/CT scan combined with [<sup>18</sup>F]PSMA-1007 PET/CT scan is commonly conducted for detecting bone metastases in prostate cancer (PCa). However, it is expensive and may expose patients to more radiation hazards. This study explores deep learning (DL) techniques to synthesize [<sup>18</sup>F]PSMA-1007 PET bone images from CT bone images for the early detection of bone metastases in PCa, which may reduce additional PET/CT scans and relieve the burden on patients. We retrospectively collected paired whole-body (WB) [<sup>18</sup>F]PSMA-1007 PET/CT images from 152 patients with clinical and pathological diagnosis results, including 123 PCa and 29 cases of benign lesions. The average age of the patients was 67.48 ± 10.87 years, and the average lesion size was 8.76 ± 15.5 mm. The paired low-dose CT and PET images were preprocessed and segmented to construct the WB bone structure images. 152 subjects were randomly stratified into training, validation, and test groups in the number of 92:41:19. Two generative adversarial network (GAN) models-Pix2pix and Cycle GAN-were trained to synthesize [<sup>18</sup>F]PSMA-1007 PET bone images from paired CT bone images. The performance of two synthesis models was evaluated using quantitative metrics of mean absolute error (MAE), mean squared error (MSE), peak signal-to-noise ratio (PSNR), and structural similarity index metrics (SSIM), as well as the target-to-background ratio (TBR). The results of DL-based image synthesis indicated that the synthesis of [<sup>18</sup>F]PSMA-1007 PET bone images from low-dose CT bone images was highly feasible. The Pix2pix model performed better with an SSIM of 0.97, PSNR of 44.96, MSE of 0.80, and MAE of 0.10, respectively. The TBRs of bone metastasis lesions calculated on DL-synthesized PET bone images were highly correlated with those of real PET bone images (Pearson's r > 0.90) and had no significant differences (p < 0.05). It is feasible to generate synthetic [<sup>18</sup>F]PSMA-1007 PET bone images from CT bone images by using DL techniques with reasonable accuracy, which can provide information for early detection of PCa bone metastases.

Exchange of Quantitative Computed Tomography Assessed Body Composition Data Using Fast Healthcare Interoperability Resources as a Necessary Step Toward Interoperable Integration of Opportunistic Screening Into Clinical Practice: Methodological Development Study.

Wen Y, Choo VY, Eil JH, Thun S, Pinto Dos Santos D, Kast J, Sigle S, Prokosch HU, Ovelgönne DL, Borys K, Kohnke J, Arzideh K, Winnekens P, Baldini G, Schmidt CS, Haubold J, Nensa F, Pelka O, Hosch R

pubmed logopapersMay 21 2025
Fast Healthcare Interoperability Resources (FHIR) is a widely used standard for storing and exchanging health care data. At the same time, image-based artificial intelligence (AI) models for quantifying relevant body structures and organs from routine computed tomography (CT)/magnetic resonance imaging scans have emerged. The missing link, simultaneously a needed step in advancing personalized medicine, is the incorporation of measurements delivered by AI models into an interoperable and standardized format. Incorporating image-based measurements and biomarkers into FHIR profiles can standardize data exchange, enabling timely, personalized treatment decisions and improving the precision and efficiency of patient care. This study aims to present the synergistic incorporation of CT-derived body organ and composition measurements with FHIR, delineating an initial paradigm for storing image-based biomarkers. This study integrated the results of the Body and Organ Analysis (BOA) model into FHIR profiles to enhance the interoperability of image-based biomarkers in radiology. The BOA model was selected as an exemplary AI model due to its ability to provide detailed body composition and organ measurements from CT scans. The FHIR profiles were developed based on 2 primary observation types: Body Composition Analysis (BCA Observation) for quantitative body composition metrics and Body Structure Observation for organ measurements. These profiles were structured to interoperate with a specially designed Diagnostic Report profile, which references the associated Imaging Study, ensuring a standardized linkage between image data and derived biomarkers. To ensure interoperability, all labels were mapped to SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms) or RadLex terminologies using specific value sets. The profiles were developed using FHIR Shorthand (FSH) and SUSHI, enabling efficient definition and implementation guide generation, ensuring consistency and maintainability. In this study, 4 BOA profiles, namely, Body Composition Analysis Observation, Body Structure Volume Observation, Diagnostic Report, and Imaging Study, have been presented. These FHIR profiles, which cover 104 anatomical landmarks, 8 body regions, and 8 tissues, enable the interoperable usage of the results of AI segmentation models, providing a direct link between image studies, series, and measurements. The BOA profiles provide a foundational framework for integrating AI-derived imaging biomarkers into FHIR, bridging the gap between advanced imaging analytics and standardized health care data exchange. By enabling structured, interoperable representation of body composition and organ measurements, these profiles facilitate seamless integration into clinical and research workflows, supporting improved data accessibility and interoperability. Their adaptability allows for extension to other imaging modalities and AI models, fostering a more standardized and scalable approach to using imaging biomarkers in precision medicine. This work represents a step toward enhancing the integration of AI-driven insights into digital health ecosystems, ultimately contributing to more data-driven, personalized, and efficient patient care.
Page 4 of 659 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.