Latest Papers on Radiology AI. Tags: GenAI

LD-ViCE: Latent Diffusion Model for Video Counterfactual Explanations

Payal Varshney, Adriano Lucieri, Christoph Balada, Sheraz Ahmed, Andreas Dengel

•preprint•Sep 10 2025

Video-based AI systems are increasingly adopted in safety-critical domains such as autonomous driving and healthcare. However, interpreting their decisions remains challenging due to the inherent spatiotemporal complexity of video data and the opacity of deep learning models. Existing explanation techniques often suffer from limited temporal coherence, insufficient robustness, and a lack of actionable causal insights. Current counterfactual explanation methods typically do not incorporate guidance from the target model, reducing semantic fidelity and practical utility. We introduce Latent Diffusion for Video Counterfactual Explanations (LD-ViCE), a novel framework designed to explain the behavior of video-based AI models. Compared to previous approaches, LD-ViCE reduces the computational costs of generating explanations by operating in latent space using a state-of-the-art diffusion model, while producing realistic and interpretable counterfactuals through an additional refinement step. Our experiments demonstrate the effectiveness of LD-ViCE across three diverse video datasets, including EchoNet-Dynamic (cardiac ultrasound), FERV39k (facial expression), and Something-Something V2 (action recognition). LD-ViCE outperforms a recent state-of-the-art method, achieving an increase in R2 score of up to 68% while reducing inference time by half. Qualitative analysis confirms that LD-ViCE generates semantically meaningful and temporally coherent explanations, offering valuable insights into the target model behavior. LD-ViCE represents a valuable step toward the trustworthy deployment of AI in safety-critical domains.

Ultrasound Image Synthesis Cardiac Methodology In Silico Academic Lab GenAI

RoentMod: A Synthetic Chest X-Ray Modification Model to Identify and Correct Image Interpretation Model Shortcuts

Lauren H. Cooke, Matthias Jung, Jan M. Brendel, Nora M. Kerkovits, Borek Foldyna, Michael T. Lu, Vineet K. Raghu

•preprint•Sep 10 2025

Chest radiographs (CXRs) are among the most common tests in medicine. Automated image interpretation may reduce radiologists\' workload and expand access to diagnostic expertise. Deep learning multi-task and foundation models have shown strong performance for CXR interpretation but are vulnerable to shortcut learning, where models rely on spurious and off-target correlations rather than clinically relevant features to make decisions. We introduce RoentMod, a counterfactual image editing framework that generates anatomically realistic CXRs with user-specified, synthetic pathology while preserving unrelated anatomical features of the original scan. RoentMod combines an open-source medical image generator (RoentGen) with an image-to-image modification model without requiring retraining. In reader studies with board-certified radiologists and radiology residents, RoentMod-produced images appeared realistic in 93\% of cases, correctly incorporated the specified finding in 89-99\% of cases, and preserved native anatomy comparable to real follow-up CXRs. Using RoentMod, we demonstrate that state-of-the-art multi-task and foundation models frequently exploit off-target pathology as shortcuts, limiting their specificity. Incorporating RoentMod-generated counterfactual images during training mitigated this vulnerability, improving model discrimination across multiple pathologies by 3-19\% AUC in internal validation and by 1-11\% for 5 out of 6 tested pathologies in external testing. These findings establish RoentMod as a broadly applicable tool for probing and correcting shortcut learning in medical AI. By enabling controlled counterfactual interventions, RoentMod enhances the robustness and interpretability of CXR interpretation models and provides a generalizable strategy for improving foundation models in medical imaging.

X-Ray Image Synthesis Chest Methodology In Silico Academic Lab GenAI Benchmark SOTA Open Code

Leveraging GPT-4o for Automated Extraction and Categorization of CAD-RADS Features From Free-Text Coronary CT Angiography Reports: Diagnostic Study.

Chen Y, Dong M, Sun J, Meng Z, Yang Y, Muhetaier A, Li C, Qin J

•papers•Sep 10 2025

Despite the Coronary Artery Reporting and Data System (CAD-RADS) providing a standardized approach, radiologists continue to favor free-text reports. This preference creates significant challenges for data extraction and analysis in longitudinal studies, potentially limiting large-scale research and quality assessment initiatives. To evaluate the ability of the generative pre-trained transformer (GPT)-4o model to convert real-world coronary computed tomography angiography (CCTA) free-text reports into structured data and automatically identify CAD-RADS categories and P categories. This retrospective study analyzed CCTA reports from January 2024 and July 2024. A subset of 25 reports was used for prompt engineering to instruct the large language models (LLMs) in extracting CAD-RADS categories, P categories, and the presence of myocardial bridges and noncalcified plaques. Reports were processed using the GPT-4o API (application programming interface) and custom Python scripts. The ground truth was established by radiologists based on the CAD-RADS 2.0 guidelines. Model performance was assessed using accuracy, sensitivity, specificity, and F1-score. Intrarater reliability was assessed using Cohen κ coefficient. Among 999 patients (median age 66 y, range 58-74; 650 males), CAD-RADS categorization showed accuracy of 0.98-1.00 (95% CI 0.9730-1.0000), sensitivity of 0.95-1.00 (95% CI 0.9191-1.0000), specificity of 0.98-1.00 (95% CI 0.9669-1.0000), and F1-score of 0.96-1.00 (95% CI 0.9253-1.0000). P categories demonstrated accuracy of 0.97-1.00 (95% CI 0.9569-0.9990), sensitivity from 0.90 to 1.00 (95% CI 0.8085-1.0000), specificity from 0.97 to 1.00 (95% CI 0.9533-1.0000), and F1-score from 0.91 to 0.99 (95% CI 0.8377-0.9967). Myocardial bridge detection achieved an accuracy of 0.98 (95% CI 0.9680-0.9870), and noncalcified coronary plaques detection showed an accuracy of 0.98 (95% CI 0.9680-0.9870). Cohen κ values for all classifications exceeded 0.98. The GPT-4o model efficiently and accurately converts CCTA free-text reports into structured data, excelling in CAD-RADS classification, plaque burden assessment, and detection of myocardial bridges and calcified plaques.

CT LLM Radiology Report Cardiac Retrospective Clinical In Silico Academic Lab GenAI

A comprehensive review of techniques, algorithms, advancements, challenges, and clinical applications of multi-modal medical image fusion for improved diagnosis.

Zubair M, Hussain M, Albashrawi MA, Bendechache M, Owais M

•papers•Sep 9 2025

Multi-modal medical image fusion (MMIF) is increasingly recognized as an essential technique for enhancing diagnostic precision and facilitating effective clinical decision-making within computer-aided diagnosis systems. MMIF combines data from X-ray, MRI, CT, PET, SPECT, and ultrasound to create detailed, clinically useful images of patient anatomy and pathology. These integrated representations significantly advance diagnostic accuracy, lesion detection, and segmentation. This comprehensive review meticulously surveys the evolution, methodologies, algorithms, current advancements, and clinical applications of MMIF. We present a critical comparative analysis of traditional fusion approaches, including pixel-, feature-, and decision-level methods, and delves into recent advancements driven by deep learning, generative models, and transformer-based architectures. A critical comparative analysis is presented between these conventional methods and contemporary techniques, highlighting differences in robustness, computational efficiency, and interpretability. The article addresses extensive clinical applications across oncology, neurology, and cardiology, demonstrating MMIF's vital role in precision medicine through improved patient-specific therapeutic outcomes. Moreover, the review thoroughly investigates the persistent challenges affecting MMIF's broad adoption, including issues related to data privacy, heterogeneity, computational complexity, interpretability of AI-driven algorithms, and integration within clinical workflows. It also identifies significant future research avenues, such as the integration of explainable AI, adoption of privacy-preserving federated learning frameworks, development of real-time fusion systems, and standardization efforts for regulatory compliance. This review organizes key knowledge, outlines challenges, and highlights opportunities, guiding researchers, clinicians, and developers in advancing MMIF for routine clinical use and promoting personalized healthcare. To support further research, we provide a GitHub repository that includes popular multi-modal medical imaging datasets along with recent models in our shared GitHub repository.

Mixed Modality Image Synthesis Whole Body Review Concept Academic Lab Open Dataset Open Code GenAI Ethics

Assessing the ability of large language models to simplify lumbar spine imaging reports into patient-facing text: a pilot study of GPT-4.

Khazanchi R, Chen AR, Desai P, Herrera D, Staub JR, Follett MA, Krushelnytskyy M, Kemeny H, Hsu WK, Patel AA, Divi SN

•papers•Sep 9 2025

To assess the ability of large language models (LLMs) to accurately simplify lumbar spine magnetic resonance imaging (MRI) reports. Patients who underwent lumbar decompression and/or fusion surgery in 2022 at one tertiary academic medical center were queried using appropriate CPT codes. We then identified all patients with a preoperative ICD diagnosis of lumbar spondylolisthesis and extracted the latest preoperative spine MRI radiology report text. The GPT-4 API was deployed on deidentified reports with a prompt to produce translations and evaluated for accuracy and readability. An enhanced GPT prompt was constructed using high-scoring reports and evaluated on low-scoring reports. Of 93 included reports, GPT effectively reduced the average reading level (11.47 versus 8.50, p < 0.001). While most reports had no accuracy issues, 34% of translations omitted at least one clinically relevant piece of information, while 6% produced a clinically significant inaccuracy in the translation. An enhanced prompt model using high scoring reports-maintained reading level while significantly improving omission rate (p < 0.0001). However, even in the enhanced prompt model, GPT made several errors regarding location of stenosis, description of prior spine surgery, and description of other spine pathologies. GPT-4 effectively simplifies the reading level of lumbar spine MRI reports. The model tends to omit key information in its translations, which can be mitigated with enhanced prompting. Further validation in the domain of spine radiology needs to be performed to facilitate clinical integration.

MRI LLM Radiology Report Musculoskeletal Retrospective Clinical In Silico Academic Lab GenAI

Brain CT for Diagnosis of Intracranial Disease in Ambulatory Cancer Patients: Assessment of the Diagnostic Value of Scanning Without Contrast Prior to With Contrast.

Wang E, Darbandi A, Tu L, Ballester LY, Morales CJ, Chen M, Gule-Monroe MK, Johnson JM

•papers•Sep 9 2025

Brain imaging with MRI or CT is standard in screening for intracranial disease among ambulatory cancer patients. Although MRI offers greater sensitivity, CT is frequently employed due to its accessibility, affordability, and faster acquisition time. However, the necessity of routinely performing a non-contrast CT with the contrast-enhanced study is unknown. This study evaluates the clinical and economic utility of the non-contrast portion of the brain CT examination. A board-certified neuroradiologist reviewed 737 brain CT reports from outpatients at MD Anderson Cancer Center who underwent contrast and non-contrast CT for cancer staging (October 2014 to March 2016) to assess if significant findings were identified only on non-contrast CT. A GPT-3 model was then fine-tuned to extract reports with a high likelihood of unique and significant non-contrast findings from 1,980 additional brain CT reports (January 2017 to April 2022). These reports were manually reviewed by two neuroradiologists, with adjudication by a third reviewer if needed. The incremental cost-effectiveness ratio of non-contrast CT inclusion was then calculated based on Medicare reimbursement and the 95% confidence interval of the proportion of all reports in which non-contrast CT was necessary for identifying significant findings RESULTS: Seven of 737 reports in the initial dataset revealed significant findings unique to the non-contrast CT, all of which were hemorrhage. The GPT-3 model identified 145 additional reports with a high unique non-contrast CT finding likelihood for manual review from the second dataset of 1,980 reports. 19 of these reports were found to have unique and significant non-contrast CT findings. In total, 0.96% (95% CI: 0.63% -1.40%) of reports had significant findings identified only on non-contrast CT. The incremental cost-effectiveness ratio for identification of a single significant finding on non-contrast CT missed on the contrast-enhanced study was $1,855 to $4,122. In brain CT for ambulatory screening for intracranial disease in cancer patients, non-contrast CT offers limited additional diagnostic value compared to contrast-enhanced CT alone. Considering the associated financial cost, workload, and patient radiation exposure associated with performing a non-contrast CT, contrast-enhanced brain CT alone is sufficient for cancer staging in asymptomatic cancer patients. GPT-3＝ Generative Pretrained Transformers 3.

CT Classification Neurological Retrospective Clinical In Silico Academic Lab GenAI

New imaging techniques and trends in radiology.

Kantarcı M, Aydın S, Oğul H, Kızılgöz V

•papers•Sep 8 2025

Radiography is a field of medicine inherently intertwined with technology. The dependency on technology is very high for obtaining images in ultrasound (US), computed tomography (CT), and magnetic resonance imaging (MRI). Although the reduction in radiation dose is not applicable in US and MRI, advancements in technology have made it possible in CT, with ongoing studies aimed at further optimization. The resolution and diagnostic quality of images obtained through advancements in each modality are steadily improving. Additionally, technological progress has significantly shortened acquisition times for CT and MRI. The use of artificial intelligence (AI), which is becoming increasingly widespread worldwide, has also been incorporated into radiography. This technology can produce more accurate and reproducible results in US examinations. Machine learning offers great potential for improving image quality, creating more distinct and useful images, and even developing new US imaging modalities. Furthermore, AI technologies are increasingly prevalent in CT and MRI for image evaluation, image generation, and enhanced image quality.

Mixed Modality Image Synthesis Whole Body Review Concept GenAI

PUUMA (Placental patch and whole-Uterus dual-branch U-Mamba-based Architecture): Functional MRI Prediction of Gestational Age at Birth and Preterm Risk

Diego Fajardo-Rojas, Levente Baljer, Jordina Aviles Verdera, Megan Hall, Daniel Cromb, Mary A. Rutherford, Lisa Story, Emma C. Robinson, Jana Hutter

•preprint•Sep 8 2025

Preterm birth is a major cause of mortality and lifelong morbidity in childhood. Its complex and multifactorial origins limit the effectiveness of current clinical predictors and impede optimal care. In this study, a dual-branch deep learning architecture (PUUMA) was developed to predict gestational age (GA) at birth using T2* fetal MRI data from 295 pregnancies, encompassing a heterogeneous and imbalanced population. The model integrates both global whole-uterus and local placental features. Its performance was benchmarked against linear regression using cervical length measurements obtained by experienced clinicians from anatomical MRI and other Deep Learning architectures. The GA at birth predictions were assessed using mean absolute error. Accuracy, sensitivity, and specificity were used to assess preterm classification. Both the fully automated MRI-based pipeline and the cervical length regression achieved comparable mean absolute errors (3 weeks) and good sensitivity (0.67) for detecting preterm birth, despite pronounced class imbalance in the dataset. These results provide a proof of concept for automated prediction of GA at birth from functional MRI, and underscore the value of whole-uterus functional imaging in identifying at-risk pregnancies. Additionally, we demonstrate that manual, high-definition cervical length measurements derived from MRI, not currently routine in clinical practice, offer valuable predictive information. Future work will focus on expanding the cohort size and incorporating additional organ-specific imaging to improve generalisability and predictive performance.

MRI Classification Abdominal Methodology In Silico Academic Lab GenAI

Predicting Rejection Risk in Heart Transplantation: An Integrated Clinical-Histopathologic Framework for Personalized Post-Transplant Care

Kim, D. D., Madabhushi, A., Margulies, K. B., Peyster, E. G.

•preprint•Sep 8 2025

BackgroundCardiac allograft rejection (CAR) remains the leading cause of early graft failure after heart transplantation (HT). Current diagnostics, including histologic grading of endomyocardial biopsy (EMB) and blood-based assays, lack accurate predictive power for future CAR risk. We developed a predictive model integrating routine clinical data with quantitative morphologic features extracted from routine EMBs to demonstrate the precision-medicine potential of mining existing data sources in post-HT care. MethodsIn a retrospective cohort of 484 HT recipients with 1,188 EMB encounters within 6 months post-transplant, we extracted 370 quantitative pathology features describing lymphocyte infiltration and stromal architecture from digitized H&E-stained slides. Longitudinal clinical data comprising 268 variables--including lab values, immunosuppression records, and prior rejection history--were aggregated per patient. Using the XGBoost algorithm with rigorous cross-validation, we compared models based on four different data sources: clinical-only, morphology-only, cross-sectional-only, and fully integrated longitudinal data. The top predictors informed the derivation of a simplified Integrated Rejection Risk Index (IRRI), which relies on just 4 clinical and 4 morphology risk facts. Model performance was evaluated by AUROC, AUPRC, and time-to-event hazard ratios. ResultsThe fully integrated longitudinal model achieved superior predictive accuracy (AUROC 0.86, AUPRC 0.74). IRRI stratified patients into risk categories with distinct future CAR hazards: high-risk patients showed a markedly increased CAR risk (HR=6.15, 95% CI: 4.17-9.09), while low-risk patients had significantly reduced risk (HR=0.52, 95% CI: 0.33-0.84). This performance exceeded models based on just cross-sectional or single-domain data, demonstrating the value of multi-modal, temporal data integration. ConclusionsBy integrating longitudinal clinical and biopsy morphologic features, IRRI provides a scalable, interpretable tool for proactive CAR risk assessment. This precision-based approach could support risk-adaptive surveillance and immunosuppression management strategies, offering a promising pathway toward safer, more personalized post-HT care with the potential to reduce unnecessary procedures and improve outcomes. Clinical PerspectiveWhat is new? O_LICurrent tools for cardiac allograft monitoring detect rejection only after it occurs and are not designed to forecast future risk. This leads to missed opportunities for early intervention, avoidable patient injury, unnecessary testing, and inefficiencies in care. C_LIO_LIWe developed a machine learning-based risk index that integrates clinical features, quantitative biopsy morphology, and longitudinal temporal trends to create a robust predictive framework. C_LIO_LIThe Integrated Rejection Risk Index (IRRI) provides highly accurate prediction of future allograft rejection, identifying both high- and low-risk patients up to 90 days in advance - a capability entirely absent from current transplant management. C_LI What are the clinical implications? O_LIIntegrating quantitative histopathology with clinical data provides a more precise, individualized estimate of rejection risk in heart transplant recipients. C_LIO_LIThis framework has the potential to guide post-transplant surveillance intensity, immunosuppressive management, and patient counseling. C_LIO_LIAutomated biopsy analysis could be incorporated into digital pathology workflows, enabling scalable, multicenter application in real-world transplant care. C_LI

Mixed Modality Classification Cardiac Retrospective Clinical In Silico Academic Lab GenAI

Breast Cancer Detection in Thermographic Images via Diffusion-Based Augmentation and Nonlinear Feature Fusion

Sepehr Salem, M. Moein Esfahani, Jingyu Liu, Vince Calhoun

•preprint•Sep 8 2025

Data scarcity hinders deep learning for medical imaging. We propose a framework for breast cancer classification in thermograms that addresses this using a Diffusion Probabilistic Model (DPM) for data augmentation. Our DPM-based augmentation is shown to be superior to both traditional methods and a ProGAN baseline. The framework fuses deep features from a pre-trained ResNet-50 with handcrafted nonlinear features (e.g., Fractal Dimension) derived from U-Net segmented tumors. An XGBoost classifier trained on these fused features achieves 98.0\% accuracy and 98.1\% sensitivity. Ablation studies and statistical tests confirm that both the DPM augmentation and the nonlinear feature fusion are critical, statistically significant components of this success. This work validates the synergy between advanced generative models and interpretable features for creating highly accurate medical diagnostic tools.

OCT Classification Breast Methodology In Silico Academic Lab GenAI

Filter Papers

Tags

LD-ViCE: Latent Diffusion Model for Video Counterfactual Explanations

RoentMod: A Synthetic Chest X-Ray Modification Model to Identify and Correct Image Interpretation Model Shortcuts

Leveraging GPT-4o for Automated Extraction and Categorization of CAD-RADS Features From Free-Text Coronary CT Angiography Reports: Diagnostic Study.

A comprehensive review of techniques, algorithms, advancements, challenges, and clinical applications of multi-modal medical image fusion for improved diagnosis.

Assessing the ability of large language models to simplify lumbar spine imaging reports into patient-facing text: a pilot study of GPT-4.

Brain CT for Diagnosis of Intracranial Disease in Ambulatory Cancer Patients: Assessment of the Diagnostic Value of Scanning Without Contrast Prior to With Contrast.

New imaging techniques and trends in radiology.

PUUMA (Placental patch and whole-Uterus dual-branch U-Mamba-based Architecture): Functional MRI Prediction of Gestational Age at Birth and Preterm Risk

Predicting Rejection Risk in Heart Transplantation: An Integrated Clinical-Histopathologic Framework for Personalized Post-Transplant Care

Breast Cancer Detection in Thermographic Images via Diffusion-Based Augmentation and Nonlinear Feature Fusion

Ready to Sharpen Your Edge?