Latest Papers on Radiology AI. Tags: GenAI, Order: Best Match, Limit: 10.

Improving discriminative ability in mammographic microcalcification classification using deep learning: a novel double transfer learning approach validated with an explainable artificial intelligence technique

Arlan, K., Bjornstrom, M., Makela, T., Meretoja, T. J., Hukkinen, K.

•preprint•Aug 11 2025

BackgroundBreast microcalcification diagnostics are challenging due to their subtle presentation, overlapping with benign findings, and high inter-reader variability, often leading to unnecessary biopsies. While deep learning (DL) models - particularly deep convolutional neural networks (DCNNs) - have shown potential to improve diagnostic accuracy, their clinical application remains limited by the need for large annotated datasets and the "black box" nature of their decision-making. PurposeTo develop and validate a deep learning model (DCNN) using a double transfer learning (d-TL) strategy for classifying suspected mammographic microcalcifications, with explainable AI (XAI) techniques to support model interpretability. Material and methodsA retrospective dataset of 396 annotated regions of interest (ROIs) from full-field digital mammography (FFDM) images of 194 patients who underwent stereotactic vacuum-assisted biopsy at the Womens Hospital radiological department, Helsinki University Hospital, was collected. The dataset was randomly split into training and test sets (24% test set, balanced for benign and malignant cases). A ResNeXt-based DCNN was developed using a d-TL approach: first pretrained on ImageNet, then adapted using an intermediate mammography dataset before fine-tuning on the target microcalcification data. Saliency maps were generated using Gradient-weighted Class Activation Mapping (Grad-CAM) to evaluate the visual relevance of model predictions. Diagnostic performance was compared to a radiologists BI-RADS-based assessment, using final histopathology as the reference standard. ResultsThe ensemble DCNN achieved an area under the ROC curve (AUC) of 0.76, with 65% sensitivity, 83% specificity, 79% positive predictive value (PPV), and 70% accuracy. The radiologist achieved an AUC of 0.65 with 100% sensitivity but lower specificity (30%) and PPV (59%). Grad-CAM visualizations showed consistent activation of the correct ROIs, even in misclassified cases where confidence scores fell below the threshold. ConclusionThe DCNN model utilizing d-TL achieved performance comparable to radiologists, with higher specificity and PPV than BI-RADS. The approach addresses data limitation issues and may help reduce additional imaging and unnecessary biopsies.

Mammography Classification Breast Retrospective Clinical In Silico Academic Lab GenAI

ChatRadio-Valuer: A Chat Large Language Model for Generalizable Radiology Impression Generation on Multi-institution and Multi-system Data.

Zhong T, Zhao W, Zhang Y, Pan Y, Dong P, Jiang Z, Jiang H, Zhou Y, Kui X, Shang Y, Zhao L, Yang L, Wei Y, Li Z, Zhang J, Yang L, Chen H, Zhao H, Liu Y, Zhu N, Li Y, Wang Y, Yao J, Wang J, Zeng Y, He L, Zheng C, Zhang Z, Li M, Liu Z, Dai H, Wu Z, Zhang L, Zhang S, Cai X, Hu X, Zhao S, Jiang X, Zhang X, Liu W, Li X, Zhu D, Guo L, Shen D, Han J, Liu T, Liu J, Zhang T

•papers•Aug 11 2025

Achieving clinical level performance and widespread deployment for generating radiology impressions encounters a giant challenge for conventional artificial intelligence models tailored to specific diseases and organs. Concurrent with the increasing accessibility of radiology reports and advancements in modern general AI techniques, the emergence and potential of deployable radiology AI exploration have been bolstered. Here, we present ChatRadio-Valuer, the first general radiology diagnosis large language model for localized deployment within hospitals and being close to clinical use for multi-institution and multi-system diseases. ChatRadio-Valuer achieved 15 state-of-the-art results across five human systems and six institutions in clinical-level events (n=332,673) through rigorous and full-spectrum assessment, including engineering metrics, clinical validation, and efficiency evaluation. Notably, it exceeded OpenAI's GPT-3.5 and GPT-4 models, achieving superior performance in comprehensive disease diagnosis compared to the average level of radiology experts. Besides, ChatRadio-Valuer supports zero-shot transfer learning, greatly boosting its effectiveness as a radiology assistant, while ensuring adherence to privacy standards and being readily utilized for large-scale patient populations. Our expeditions suggest the development of localized LLMs would become an imperative avenue in hospital applications.

Mixed Modality Report Generation Whole Body Retrospective Clinical In Silico Academic Lab Benchmark SOTA GenAI

Decoding fetal motion in 4D ultrasound with DeepLabCut.

Inubashiri E, Kaishi Y, Miyake T, Yamaguchi R, Hamaguchi T, Inubashiri M, Ota H, Watanabe Y, Deguchi K, Kuroki K, Maeda N

•papers•Aug 11 2025

This study aimed to objectively and quantitatively analyze fetal motor behavior using DeepLabCut (DLC), a markerless posture estimation tool based on deep learning, applied to four-dimensional ultrasound (4DUS) data collected during the second trimester. We propose a novel clinical method for precise assessment of fetal neurodevelopment. Fifty 4DUS video recordings of normal singleton fetuses aged 12 to 22 gestational weeks were analyzed. Eight fetal joints were manually labeled in 2% of each video to train a customized DLC model. The model's accuracy was evaluated using likelihood scores. Intra- and inter-rater reliability of manual labeling were assessed using intraclass correlation coefficients (ICC). Angular velocity time series derived from joint coordinates were analyzed to quantify fetal movement patterns and developmental coordination. Manual labeling demonstrated excellent reproducibility (inter-rater ICC = 0.990, intra-rater ICC = 0.961). The trained DLC model achieved a mean likelihood score of 0.960, confirming high tracking accuracy. Kinematic analysis revealed developmental trends: localized rapid limb movements were common at 12-13 weeks; movements became more coordinated and systemic by 18-20 weeks, reflecting advancing neuromuscular maturation. Although a modest increase in tracking accuracy was observed with gestational age, this trend did not reach statistical significance (p < 0.001). DLC enables precise quantitative analysis of fetal motor behavior from 4DUS recordings. This AI-driven approach offers a promising, noninvasive alternative to conventional qualitative assessments, providing detailed insights into early fetal neurodevelopmental trajectories and potential early screening for neurodevelopmental disorders.

Ultrasound Segmentation Abdominal Retrospective Clinical In Silico Academic Lab GenAI

Post-deployment Monitoring of AI Performance in Intracranial Hemorrhage Detection by ChatGPT.

Rohren E, Ahmadzade M, Colella S, Kottler N, Krishnan S, Poff J, Rastogi N, Wiggins W, Yee J, Zuluaga C, Ramis P, Ghasemi-Rad M

•papers•Aug 11 2025

To evaluate the post-deployment performance of an artificial intelligence (AI) system (Aidoc) for intracranial hemorrhage (ICH) detection and assess the utility of ChatGPT-4 Turbo for automated AI monitoring. This retrospective study evaluated 332,809 head CT examinations from 37 radiology practices across the United States (December 2023-May 2024). Of these, 13,569 cases were flagged as positive for ICH by the Aidoc AI system. A HIPAA (Health Insurance Portability and Accountability Act) -compliant version of ChatGPT-4 Turbo was used to extract data from radiology reports. Ground truth was established through radiologists' review of 200 randomly selected cases. Performance metrics were calculated for ChatGPT, Aidoc and radiologists. ChatGPT-4 Turbo demonstrated high diagnostic accuracy in identifying intracranial hemorrhage (ICH) from radiology reports, with a positive predictive value of 1 and a negative predictive value of 0.988 (AUC:0.996). Aidoc's false positive classifications were influenced by scanner manufacturer, midline shift, mass effect, artifacts, and neurologic symptoms. Multivariate analysis identified Philips scanners (OR: 6.97, p=0.003) and artifacts (OR: 3.79, p=0.029) as significant contributors to false positives, while midline shift (OR: 0.08, p=0.021) and mass effect (OR: 0.18, p=0.021) were associated with a reduced false positive rate. Aidoc-assisted radiologists achieved a sensitivity of 0.936 and a specificity of 1. This study underscores the importance of continuous performance monitoring for AI systems in clinical practice. The integration of LLMs offers a scalable solution for evaluating AI performance, ensuring reliable deployment and enhancing diagnostic workflows.

CT Detection Neurological Retrospective Clinical Post Market Startup Reproducibility GenAI

Generative Artificial Intelligence to Automate Cerebral Perfusion Mapping in Acute Ischemic Stroke from Non-contrast Head Computed Tomography Images: Pilot Study.

Primiano NJ, Changa AR, Kohli S, Greenspan H, Cahan N, Kummer BR

•papers•Aug 11 2025

Acute ischemic stroke (AIS) is a leading cause of death and long-term disability worldwide, where rapid reperfusion remains critical for salvaging brain tissue. Although CT perfusion (CTP) imaging provides essential hemodynamic information, its limitations-including extended processing times, additional radiation exposure, and variable software outputs-can delay treatment. In contrast, non-contrast head CT (NCHCT) is ubiquitously available in acute stroke settings. This study explores a generative artificial intelligence approach to predict key perfusion parameters (relative cerebral blood flow [rCBF] and time-to-maximum [Tmax]) directly from NCHCT, potentially streamlining stroke imaging workflows and expanding access to critical perfusion data. We retrospectively identified patients evaluated for AIS who underwent NCHCT, CT angiography, and CTP. Ground truth perfusion maps (rCBF and Tmax) were extracted from VIZ.ai post-processed CTP studies. A modified pix2pix-turbo generative adversarial network (GAN) was developed to translate co-registered NCHCT images into corresponding perfusion maps. The network was trained using paired NCHCT-CTP data, with training, validation, and testing splits of 80%:10%:10%. Performance was assessed on the test set using quantitative metrics including the structural similarity index measure (SSIM), peak signal-to-noise ratio (PSNR), and Fréchet inception distance (FID). Out of 120 patients, studies from 99 patients fitting our inclusion and exclusion criteria were used as the primary cohort (mean age 73.3 ± 13.5 years; 46.5% female). Cerebral occlusions were predominantly in the middle cerebral artery. GAN-generated Tmax maps achieved an SSIM of 0.827, PSNR of 16.99, and FID of 62.21, while the rCBF maps demonstrated comparable performance (SSIM 0.79, PSNR 16.38, FID 59.58). These results indicate that the model approximates ground truth perfusion maps to a moderate degree and successfully captures key cerebral hemodynamic features. Our findings demonstrate the feasibility of generating functional perfusion maps directly from widely available NCHCT images using a modified GAN. This cross-modality approach may serve as a valuable adjunct in AIS evaluation, particularly in resource-limited settings or when traditional CTP provides limited diagnostic information. Future studies with larger, multicenter datasets and further model refinements are warranted to enhance clinical accuracy and utility.

CT Image Synthesis Neurological Retrospective Clinical In Silico Academic Lab GenAI

Safeguarding Generative AI Applications in Preclinical Imaging through Hybrid Anomaly Detection

Jakub Binda, Valentina Paneta, Vasileios Eleftheriadis, Hongkyou Chung, Panagiotis Papadimitroulas, Neo Christopher Chung

•preprint•Aug 11 2025

Generative AI holds great potentials to automate and enhance data synthesis in nuclear medicine. However, the high-stakes nature of biomedical imaging necessitates robust mechanisms to detect and manage unexpected or erroneous model behavior. We introduce development and implementation of a hybrid anomaly detection framework to safeguard GenAI models in BIOEMTECH's eyes(TM) systems. Two applications are demonstrated: Pose2Xray, which generates synthetic X-rays from photographic mouse images, and DosimetrEYE, which estimates 3D radiation dose maps from 2D SPECT/CT scans. In both cases, our outlier detection (OD) enhances reliability, reduces manual oversight, and supports real-time quality control. This approach strengthens the industrial viability of GenAI in preclinical settings by increasing robustness, scalability, and regulatory compliance.

Mixed Modality Detection Whole Body Methodology Prototype Startup GenAI

Neonatal neuroimaging: from research to bedside practice.

Cizmeci MN, El-Dib M, de Vries LS

•papers•Aug 11 2025

Neonatal neuroimaging is essential in research and clinical practice, offering important insights into brain development and neurologic injury mechanisms. Visualizing the brain enables researchers and clinicians to improve neonatal care and parental counselling through better diagnosis and prognostication of disease. Common neuroimaging modalities used in the neonatal intensive care unit (NICU) are cranial ultrasonography (cUS) and magnetic resonance imaging (MRI). Between these modalities, conventional MRI provides the optimal image resolution and detail about the developing brain, while advanced MRI techniques allow for the evaluation of tissue microstructure and functional networks. Over the last two decades, medical imaging techniques using brain MRI have rapidly progressed, and these advances have facilitated high-quality extraction of quantitative features as well as the implementation of novel devices for use in neurological disorders. Major advancements encompass the use of low-field dedicated MRI systems within the NICU and trials of ultralow-field portable MRI systems at the bedside. Additionally, higher-field magnets are utilized to enhance image quality, and ultrafast brain MRI is employed to decrease image acquisition time. Furthermore, the implementation of advanced MRI sequences, the application of machine learning algorithms, multimodal neuroimaging techniques, motion correction techniques, and novel modalities are used to visualize pathologies that are not visible to the human eye. In this narrative review, we will discuss the fundamentals of these neuroimaging modalities, and their clinical applications to explore the present landscape of neonatal neuroimaging from bench to bedside.

MRI Segmentation Neurological Review In Silico GenAI

Large-scale Multi-sequence Pretraining for Generalizable MRI Analysis in Versatile Clinical Applications

Zelin Qiu, Xi Wang, Zhuoyao Xie, Juan Zhou, Yu Wang, Lingjie Yang, Xinrui Jiang, Juyoung Bae, Moo Hyun Son, Qiang Ye, Dexuan Chen, Rui Zhang, Tao Li, Neeraj Ramesh Mahboobani, Varut Vardhanabhuti, Xiaohui Duan, Yinghua Zhao, Hao Chen

•preprint•Aug 10 2025

Multi-sequence Magnetic Resonance Imaging (MRI) offers remarkable versatility, enabling the distinct visualization of different tissue types. Nevertheless, the inherent heterogeneity among MRI sequences poses significant challenges to the generalization capability of deep learning models. These challenges undermine model performance when faced with varying acquisition parameters, thereby severely restricting their clinical utility. In this study, we present PRISM, a foundation model PRe-trained with large-scale multI-Sequence MRI. We collected a total of 64 datasets from both public and private sources, encompassing a wide range of whole-body anatomical structures, with scans spanning diverse MRI sequences. Among them, 336,476 volumetric MRI scans from 34 datasets (8 public and 26 private) were curated to construct the largest multi-organ multi-sequence MRI pretraining corpus to date. We propose a novel pretraining paradigm that disentangles anatomically invariant features from sequence-specific variations in MRI, while preserving high-level semantic representations. We established a benchmark comprising 44 downstream tasks, including disease diagnosis, image segmentation, registration, progression prediction, and report generation. These tasks were evaluated on 32 public datasets and 5 private cohorts. PRISM consistently outperformed both non-pretrained models and existing foundation models, achieving first-rank results in 39 out of 44 downstream benchmarks with statistical significance improvements. These results underscore its ability to learn robust and generalizable representations across unseen data acquired under diverse MRI protocols. PRISM provides a scalable framework for multi-sequence MRI analysis, thereby enhancing the translational potential of AI in radiology. It delivers consistent performance across diverse imaging protocols, reinforcing its clinical applicability.

MRI Classification Whole Body Methodology In Silico Academic Lab Benchmark SOTA GenAI Open Dataset

Perceptual Evaluation of GANs and Diffusion Models for Generating X-rays

Gregory Schuit, Denis Parra, Cecilia Besa

•preprint•Aug 10 2025

Generative image models have achieved remarkable progress in both natural and medical imaging. In the medical context, these techniques offer a potential solution to data scarcity-especially for low-prevalence anomalies that impair the performance of AI-driven diagnostic and segmentation tools. However, questions remain regarding the fidelity and clinical utility of synthetic images, since poor generation quality can undermine model generalizability and trust. In this study, we evaluate the effectiveness of state-of-the-art generative models-Generative Adversarial Networks (GANs) and Diffusion Models (DMs)-for synthesizing chest X-rays conditioned on four abnormalities: Atelectasis (AT), Lung Opacity (LO), Pleural Effusion (PE), and Enlarged Cardiac Silhouette (ECS). Using a benchmark composed of real images from the MIMIC-CXR dataset and synthetic images from both GANs and DMs, we conducted a reader study with three radiologists of varied experience. Participants were asked to distinguish real from synthetic images and assess the consistency between visual features and the target abnormality. Our results show that while DMs generate more visually realistic images overall, GANs can report better accuracy for specific conditions, such as absence of ECS. We further identify visual cues radiologists use to detect synthetic images, offering insights into the perceptual gaps in current models. These findings underscore the complementary strengths of GANs and DMs and point to the need for further refinement to ensure generative models can reliably augment training datasets for AI diagnostic systems.

X-Ray Image Synthesis Chest Retrospective Clinical In Silico GenAI

Spatio-Temporal Conditional Diffusion Models for Forecasting Future Multiple Sclerosis Lesion Masks Conditioned on Treatments

Gian Mario Favero, Ge Ya Luo, Nima Fathi, Justin Szeto, Douglas L. Arnold, Brennan Nichyporuk, Chris Pal, Tal Arbel

•preprint•Aug 9 2025

Image-based personalized medicine has the potential to transform healthcare, particularly for diseases that exhibit heterogeneous progression such as Multiple Sclerosis (MS). In this work, we introduce the first treatment-aware spatio-temporal diffusion model that is able to generate future masks demonstrating lesion evolution in MS. Our voxel-space approach incorporates multi-modal patient data, including MRI and treatment information, to forecast new and enlarging T2 (NET2) lesion masks at a future time point. Extensive experiments on a multi-centre dataset of 2131 patient 3D MRIs from randomized clinical trials for relapsing-remitting MS demonstrate that our generative model is able to accurately predict NET2 lesion masks for patients across six different treatments. Moreover, we demonstrate our model has the potential for real-world clinical applications through downstream tasks such as future lesion count and location estimation, binary lesion activity classification, and generating counterfactual future NET2 masks for several treatments with different efficacies. This work highlights the potential of causal, image-based generative models as powerful tools for advancing data-driven prognostics in MS.

MRI Segmentation Neurological Methodology In Silico Academic Lab GenAI

Improving discriminative ability in mammographic microcalcification classification using deep learning: a novel double transfer learning approach validated with an explainable artificial intelligence technique

ChatRadio-Valuer: A Chat Large Language Model for Generalizable Radiology Impression Generation on Multi-institution and Multi-system Data.

Decoding fetal motion in 4D ultrasound with DeepLabCut.

Post-deployment Monitoring of AI Performance in Intracranial Hemorrhage Detection by ChatGPT.

Generative Artificial Intelligence to Automate Cerebral Perfusion Mapping in Acute Ischemic Stroke from Non-contrast Head Computed Tomography Images: Pilot Study.

Safeguarding Generative AI Applications in Preclinical Imaging through Hybrid Anomaly Detection

Neonatal neuroimaging: from research to bedside practice.

Large-scale Multi-sequence Pretraining for Generalizable MRI Analysis in Versatile Clinical Applications

Perceptual Evaluation of GANs and Diffusion Models for Generating X-rays

Spatio-Temporal Conditional Diffusion Models for Forecasting Future Multiple Sclerosis Lesion Masks Conditioned on Treatments

Ready to Sharpen Your Edge?