Latest Papers on Radiology AI. Tags: Chest

Radiology Report Conditional 3D CT Generation with Multi Encoder Latent diffusion Model

Sina Amirrajab, Zohaib Salahuddin, Sheng Kuang, Henry C. Woodruff, Philippe Lambin

•preprint•Sep 18 2025

Text to image latent diffusion models have recently advanced medical image synthesis, but applications to 3D CT generation remain limited. Existing approaches rely on simplified prompts, neglecting the rich semantic detail in full radiology reports, which reduces text image alignment and clinical fidelity. We propose Report2CT, a radiology report conditional latent diffusion framework for synthesizing 3D chest CT volumes directly from free text radiology reports, incorporating both findings and impression sections using multiple text encoder. Report2CT integrates three pretrained medical text encoders (BiomedVLP CXR BERT, MedEmbed, and ClinicalBERT) to capture nuanced clinical context. Radiology reports and voxel spacing information condition a 3D latent diffusion model trained on 20000 CT volumes from the CT RATE dataset. Model performance was evaluated using Frechet Inception Distance (FID) for real synthetic distributional similarity and CLIP based metrics for semantic alignment, with additional qualitative and quantitative comparisons against GenerateCT model. Report2CT generated anatomically consistent CT volumes with excellent visual quality and text image alignment. Multi encoder conditioning improved CLIP scores, indicating stronger preservation of fine grained clinical details in the free text radiology reports. Classifier free guidance further enhanced alignment with only a minor trade off in FID. We ranked first in the VLM3D Challenge at MICCAI 2025 on Text Conditional CT Generation and achieved state of the art performance across all evaluation metrics. By leveraging complete radiology reports and multi encoder text conditioning, Report2CT advances 3D CT synthesis, producing clinically faithful and high quality synthetic data.

CT Image Synthesis Chest Methodology In Silico Academic Lab GenAI Benchmark SOTA

HybridMamba: A Dual-domain Mamba for 3D Medical Image Segmentation

Weitong Wu, Zhaohu Xing, Jing Gong, Qin Peng, Lei Zhu

•preprint•Sep 18 2025

In the domain of 3D biomedical image segmentation, Mamba exhibits the superior performance for it addresses the limitations in modeling long-range dependencies inherent to CNNs and mitigates the abundant computational overhead associated with Transformer-based frameworks when processing high-resolution medical volumes. However, attaching undue importance to global context modeling may inadvertently compromise critical local structural information, thus leading to boundary ambiguity and regional distortion in segmentation outputs. Therefore, we propose the HybridMamba, an architecture employing dual complementary mechanisms: 1) a feature scanning strategy that progressively integrates representations both axial-traversal and local-adaptive pathways to harmonize the relationship between local and global representations, and 2) a gated module combining spatial-frequency analysis for comprehensive contextual modeling. Besides, we collect a multi-center CT dataset related to lung cancer. Experiments on MRI and CT datasets demonstrate that HybridMamba significantly outperforms the state-of-the-art methods in 3D medical image segmentation.

Mixed Modality Segmentation Chest Methodology In Silico

Multimodal radiomics fusion for predicting postoperative recurrence in NSCLC patients.

Mehri-Kakavand G, Mdletshe S, Amini M, Wang A

•papers•Sep 18 2025

Postoperative recurrence in non-small cell lung cancer (NSCLC) affects up to 55% of patients, underscoring limits of TNM staging. We assessed multimodal radiomics—positron emission tomography (PET), computed tomography (CT), and clinicopathological (CP) data—for personalized recurrence prediction. Data from 131 NSCLC patients with PET/CT imaging and CP variables were analysed. Radiomics features were extracted using PyRadiomics (1,316 PET and 1,409 CT features per tumor), with robustness testing and selection yielding 20 CT, 20 PET, and 23 CP variables. Prediction models were trained using Logistic Regression (L1, L2, Elastic Net), Random Forest, Gradient Boosting, XGBoost, and CatBoost. Nested cross-validation with SMOTE addressed class imbalance. Fusion strategies included early (feature concatenation), intermediate (stacked ensembles), and late (weighted averaging) fusion. Among single modalities, CT with Elastic Net achieved the highest cross-validated AUC (0.679, 95% CI: 0.57–0.79). Fusion improved performance: PET + CT + Clinical late fusion with Elastic Net achieved the best cross-validated AUC (0.811, 95% CI: 0.69–0.91). Out-of-fold ROC curves confirmed stronger discrimination for the fusion model (AUC = 0.836 vs. 0.741 for CT). Fusion also showed better calibration, higher net clinical benefit (decision-curve analysis), and clearer survival stratification (Kaplan–Meier). Integrating PET, CT, and CP data—particularly via late fusion with Elastic Net—enhances discrimination beyond single-modality models and supports more consistent risk stratification. These findings suggest practical potential for informing postoperative surveillance and adjuvant therapy decisions, encouraging a shift beyond TNM alone toward interpretable multimodal frameworks. External validation in larger, multicenter cohorts is warranted. The online version contains supplementary material available at 10.1007/s00432-025-06311-w.

Mixed Modality Classification Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Dose reduction in 4D CT imaging: Breathing signal-guided deep learning-driven data acquisition.

Wimmert L, Gauerd T, Dickmanne J, Hofmanne C, Sentkera T, Wernera R

•papers•Sep 18 2025

4D CT imaging is essential for radiotherapy planning in thoracic tumors. However, current protocols tend to acquire more projection data than is strictly necessary for reconstructing the 4D CT, potentially leading to unnecessary radiation exposure and misalignment with the ALARA (As Low As Reasonably Achievable) principle. We propose a deep learning (DL)-driven approach that uses the patient's breathing signal to guide data acquisition, aiming to acquire only necessary projection data. This retrospective study analyzed 1,415 breathing signals from 294 patients, with a 75/25 training/validation split at patient level. Based on the signals, a DL model was trained to predict optimal beam-on events for projection data acquisition. Model testing was performed on 104 independent clinical 4D CT scans. The performance of the model was assessed by measuring temporal alignment between predicted and optimal beam-on events. To assess the impact on the reconstructed images, each 4D dataset was reconstructed twice: (1) using all clinically acquired projections (reference) and (2) using only the model-selected projections (dose-reduced). Reference and dose-reduced images were compared using Dice coefficients for organ segmentations, deformable image registration (DIR)-based displacement fields, artifact frequency, and tumor segmentation agreement, the latter evaluated in terms of Hausdorff distance and tumor motion ranges. The proposed approach reduced beam-on time and imaging dose by a median of 29% (IQR: 24-35%), corresponding to 11.6 mGy dose reduction for a standard 4D CT CTDIvol of 40 mGy. Temporal alignment between predicted and optimal beam-on events showed marginal differences. Similarly, reconstructed dose-reduced images showed only minimal differences to the reference images, demonstrated by high lung and liver segmentation Dice values, small-magnitude (DIR) displacement fields, and unchanged artifact frequency. Minor deviations of tumor segmentation and motion ranges compared to the reference suggest only minimal impact of the proposed approach on treatment planning. The proposed DL-driven data acquisition approach has the ability to reduce radiation exposure during 4D CT imaging while preserving diagnostic quality, offering a clinically viable, ALARA-adhering solution for 4D CT imaging.

CT Reconstruction Chest Retrospective Clinical In Silico Academic Lab Reproducibility

Evaluating the diagnostic accuracy of WHO-recommended treatment decision algorithms for childhood tuberculosis using an individual person dataset: a study protocol.

Olbrich L, Larsson L, Dodd PJ, Palmer M, Nguyen MHTN, d'Elbée M, Hesseling AC, Heinrich N, Zar HJ, Ntinginya NE, Khosa C, Nliwasa M, Verghese V, Bonnet M, Wobudeya E, Nduna B, Moh R, Mwanga J, Mustapha A, Breton G, Taguebue JV, Borand L, Marcy O, Chabala C, Seddon J, van der Zalm MM

•papers•Sep 17 2025

In 2022, the WHO conditionally recommended the use of treatment decision algorithms (TDAs) for treatment decision-making in children <10 years with presumptive tuberculosis (TB), aiming to decrease the substantial case detection gap and improve treatment access in high TB-incidence settings. WHO also called for external validation of these TDAs. Within the Decide-TB project (PACT ID: PACTR202407866544155, 23 July 2024), we aim to generate an individual-participant dataset (IPD) from prospective TB diagnostic accuracy cohorts (RaPaed-TB, UMOYA and two cohorts from TB-Speed). Using the IPD, we aim to: (1) assess the diagnostic accuracy of published TDAs using a set of consensus case definitions produced by the National Institute of Health as reference standard (confirmed and unconfirmed vs unlikely TB); (2) evaluate the added value of novel tools (including biomarkers and artificial intelligence-interpreted radiology) in the existing TDAs; (3) generate an artificial population, modelling the target population of children eligible for WHO-endorsed TDAs presenting at primary and secondary healthcare levels and assess the diagnostic accuracy of published TDAs and (4) identify clinical predictors of radiological disease severity in children from the study population of children with presumptive TB. This study will externally validate the first data-driven WHO TDAs in a large, well-characterised and diverse paediatric IPD derived from four large paediatric cohorts of children investigated for TB. The study has received ethical clearance for sharing secondary deidentified data from the ethics committees of the parent studies (RaPaed-TB, UMOYA and TB Speed) and as the aims of this study were part of the parent studies' protocols, a separate approval was not necessary. Study findings will be published in peer-reviewed journals and disseminated at local, regional and international scientific meetings and conferences. This database will serve as a catalyst for the assessment of the inclusion of novel tools and the generation of an artificial population to simulate the impact of novel diagnostic pathways for TB in children at lower levels of healthcare. TDAs have the potential to close the diagnostic gap in childhood TB. Further finetuning of the currently available algorithms will facilitate this and improve access to care.

X-Ray Classification Chest Retrospective Clinical In Silico Consortium Benchmark SOTA

<sup>18</sup>F-FDG PET/CT-based Radiomics Analysis of Different Machine Learning Models for Predicting Pathological Highly Invasive Non-small Cell Lung Cancer.

Li Y, Shen MJ, Yi JW, Zhao QQ, Zhao QP, Hao LY, Qi JJ, Li WH, Wu XD, Zhao L, Wang Y

•papers•Sep 17 2025

This study aimed to develop and validate machine learning models integrating clinicoradiological and radiomic features from 2-[18 F]-fluoro-2-deoxy-D-glucose (<sup>18</sup>F-FDG) positron emission tomography/computed tomography (PET/CT) to predict pathological high invasiveness in cT1-sized (tumor size ≤ 3 cm) non-small cell lung cancer (NSCLC). We retrospectively reviewed 1459 patients with NSCLC (633 with pathological high invasiveness and 826 with pathological non-high invasiveness) from two medical centers. Patients with cT1-sized NSCLC were included. 1145 radiomic features were extracted per modality (PET and CT) from each patient. Optimal predictors were selected to construct a radiomics score (Rad-score) for the PET/CT radiomics model. A combined model incorporating significant clinicoradiological features and the Rad-score was developed. Logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGBoost) algorithms were used to train the combined model. Model performance was assessed the area under the receiver operating characteristic (ROC) curve (AUC), calibration curve, and decision curve analysis (DCA). Shapley Additive Explanations (SHAP) was applied to visualize the prediction process. The radiomics model was built using 11 radiomic features, achieving AUCs of 0.851 (training), 0.859 (internal validation), and 0.829 (external validation). Among all models, the XGBoost combined model demonstrated the best predictive performance, with AUCs of 0.958, 0.919, and 0.903, respectively, along with good calibration and high net benefit. The XGBoost combined model showed strong performance in predicting pathological high invasiveness in cT1-sized NSCLC.

Mixed Modality Classification Chest Retrospective Clinical In Silico Academic Lab

MPCM-RRG: Multi-modal Prompt Collaboration Mechanism for Radiology Report Generation.

Yu Y, Huang G, Tan Z, Shi J, Li M, Pun CM, Zheng F, Ma S, Wang S, He L

•papers•Sep 17 2025

The task of medical report generation involves automatically creating descriptive text reports from medical images, with the aim of alleviating the workload of physicians and enhancing diagnostic efficiency. However, although many existing medical report generation models based on the Transformer framework consider structural information in medical images, they ignore the interference of confounding factors on these structures, which limits the model's ability to effectively capture rich and critical lesion information. Furthermore, these models often struggle to address the significant imbalance between normal and abnormal content in actual reports, leading to challenges in accurately describing abnormalities. To address these limitations, we propose the Multi-modal Prompt Collaboration Mechanism for Radiology Report Generation Model (MPCM-RRG). This model consists of three key components: the Visual Causal Prompting Module (VCP), the Textual Prompt-Guided Feature Enhancement Module (TPGF), and the Visual-Textual Semantic Consistency Module (VTSC). The VCP module uses chest X-ray masks as visual prompts and incorporates causal inference principles to help the model minimize the influence of irrelevant regions. Through causal intervention, the model can learn the causal relationships between the pathological regions in the image and the corresponding findings described in the report. The TPGF module tackles the imbalance between abnormal and normal text by integrating detailed textual prompts, which also guide the model to focus on lesion areas using a multi-head attention mechanism. The VTSC module promotes alignment between the visual and textual representations through contrastive consistency loss, fostering greater interaction and collaboration between the visual and textual prompts. Experimental results demonstrate that MPCM-RRG outperforms other methods on the IU X-ray and MIMIC-CXR datasets, highlighting its effectiveness in generating high-quality medical reports.

X-Ray Report Generation Chest Methodology In Silico

Exploring the Capabilities of LLM Encoders for Image-Text Retrieval in Chest X-rays

Hanbin Ko, Gihun Cho, Inhyeok Baek, Donguk Kim, Joonbeom Koo, Changi Kim, Dongheon Lee, Chang Min Park

•preprint•Sep 17 2025

Vision-language pretraining has advanced image-text alignment, yet progress in radiology remains constrained by the heterogeneity of clinical reports, including abbreviations, impression-only notes, and stylistic variability. Unlike general-domain settings where more data often leads to better performance, naively scaling to large collections of noisy reports can plateau or even degrade model learning. We ask whether large language model (LLM) encoders can provide robust clinical representations that transfer across diverse styles and better guide image-text alignment. We introduce LLM2VEC4CXR, a domain-adapted LLM encoder for chest X-ray reports, and LLM2CLIP4CXR, a dual-tower framework that couples this encoder with a vision backbone. LLM2VEC4CXR improves clinical text understanding over BERT-based baselines, handles abbreviations and style variation, and achieves strong clinical alignment on report-level metrics. LLM2CLIP4CXR leverages these embeddings to boost retrieval accuracy and clinically oriented scores, with stronger cross-dataset generalization than prior medical CLIP variants. Trained on 1.6M CXR studies from public and private sources with heterogeneous and noisy reports, our models demonstrate that robustness -- not scale alone -- is the key to effective multimodal learning. We release models to support further research in medical image-text representation learning.

X-Ray Image Synthesis Chest Methodology In Silico Academic Lab Open Code

More performant and scalable: Rethinking contrastive vision-language pre-training of radiology in the LLM era

Yingtai Li, Haoran Lai, Xiaoqian Zhou, Shuai Ming, Wenxin Ma, Wei Wei, Shaohua Kevin Zhou

•preprint•Sep 16 2025

The emergence of Large Language Models (LLMs) presents unprecedented opportunities to revolutionize medical contrastive vision-language pre-training. In this paper, we show how LLMs can facilitate large-scale supervised pre-training, thereby advancing vision-language alignment. We begin by demonstrate that modern LLMs can automatically extract diagnostic labels from radiology reports with remarkable precision (>96\% AUC in our experiments) without complex prompt engineering, enabling the creation of large-scale "silver-standard" datasets at a minimal cost (~\$3 for 50k CT image-report pairs). Further, we find that vision encoder trained on this "silver-standard" dataset achieves performance comparable to those trained on labels extracted by specialized BERT-based models, thereby democratizing the access to large-scale supervised pre-training. Building on this foundation, we proceed to reveal that supervised pre-training fundamentally improves contrastive vision-language alignment. Our approach achieves state-of-the-art performance using only a 3D ResNet-18 with vanilla CLIP training, including 83.8\% AUC for zero-shot diagnosis on CT-RATE, 77.3\% AUC on RAD-ChestCT, and substantial improvements in cross-modal retrieval (MAP@50=53.7\% for image-image, Recall@100=52.2\% for report-image). These results demonstrate the potential of utilizing LLMs to facilitate {\bf more performant and scalable} medical AI systems. Our code is avaiable at https://github.com/SadVoxel/More-performant-and-scalable.

CT Classification Chest Methodology In Silico Academic Lab Benchmark SOTA Open Code GenAI

Data Scaling Laws for Radiology Foundation Models

Maximilian Ilse, Harshita Sharma, Anton Schwaighofer, Sam Bond-Taylor, Fernando Pérez-García, Olesya Melnichenko, Anne-Marie G. Sykes, Kelly K. Horst, Ashish Khandelwal, Maxwell Reynolds, Maria T. Wetscherek, Noel C. F. Codella, Javier Alvarez-Valle, Korfiatis Panagiotis, Valentina Salvatelli

•preprint•Sep 16 2025

Foundation vision encoders such as CLIP and DINOv2, trained on web-scale data, exhibit strong transfer performance across tasks and datasets. However, medical imaging foundation models remain constrained by smaller datasets, limiting our understanding of how data scale and pretraining paradigms affect performance in this setting. In this work, we systematically study continual pretraining of two vision encoders, MedImageInsight (MI2) and RAD-DINO representing the two major encoder paradigms CLIP and DINOv2, on up to 3.5M chest x-rays from a single institution, holding compute and evaluation protocols constant. We evaluate on classification (radiology findings, lines and tubes), segmentation (lines and tubes), and radiology report generation. While prior work has primarily focused on tasks related to radiology findings, we include lines and tubes tasks to counterbalance this bias and evaluate a model's ability to extract features that preserve continuity along elongated structures. Our experiments show that MI2 scales more effectively for finding-related tasks, while RAD-DINO is stronger on tube-related tasks. Surprisingly, continually pretraining MI2 with both reports and structured labels using UniCL improves performance, underscoring the value of structured supervision at scale. We further show that for some tasks, as few as 30k in-domain samples are sufficient to surpass open-weights foundation models. These results highlight the utility of center-specific continual pretraining, enabling medical institutions to derive significant performance gains by utilizing in-domain data.

X-Ray Classification Chest Methodology In Silico Academic Lab Benchmark SOTA GenAI

Filter Papers

Tags

Radiology Report Conditional 3D CT Generation with Multi Encoder Latent diffusion Model

HybridMamba: A Dual-domain Mamba for 3D Medical Image Segmentation

Multimodal radiomics fusion for predicting postoperative recurrence in NSCLC patients.

Dose reduction in 4D CT imaging: Breathing signal-guided deep learning-driven data acquisition.

Evaluating the diagnostic accuracy of WHO-recommended treatment decision algorithms for childhood tuberculosis using an individual person dataset: a study protocol.

<sup>18</sup>F-FDG PET/CT-based Radiomics Analysis of Different Machine Learning Models for Predicting Pathological Highly Invasive Non-small Cell Lung Cancer.

MPCM-RRG: Multi-modal Prompt Collaboration Mechanism for Radiology Report Generation.

Exploring the Capabilities of LLM Encoders for Image-Text Retrieval in Chest X-rays

More performant and scalable: Rethinking contrastive vision-language pre-training of radiology in the LLM era

Data Scaling Laws for Radiology Foundation Models

Ready to Sharpen Your Edge?