Latest Papers on Radiology AI. Tags: Reproducibility

Beyond the First Read: AI-Assisted Perceptual Error Detection in Chest Radiography Accounting for Interobserver Variability

Adhrith Vutukuri, Akash Awasthi, David Yang, Carol C. Wu, Hien Van Nguyen

•preprint•Jun 16 2025

Chest radiography is widely used in diagnostic imaging. However, perceptual errors -- especially overlooked but visible abnormalities -- remain common and clinically significant. Current workflows and AI systems provide limited support for detecting such errors after interpretation and often lack meaningful human--AI collaboration. We introduce RADAR (Radiologist--AI Diagnostic Assistance and Review), a post-interpretation companion system. RADAR ingests finalized radiologist annotations and CXR images, then performs regional-level analysis to detect and refer potentially missed abnormal regions. The system supports a "second-look" workflow and offers suggested regions of interest (ROIs) rather than fixed labels to accommodate inter-observer variation. We evaluated RADAR on a simulated perceptual-error dataset derived from de-identified CXR cases, using F1 score and Intersection over Union (IoU) as primary metrics. RADAR achieved a recall of 0.78, precision of 0.44, and an F1 score of 0.56 in detecting missed abnormalities in the simulated perceptual-error dataset. Although precision is moderate, this reduces over-reliance on AI by encouraging radiologist oversight in human--AI collaboration. The median IoU was 0.78, with more than 90% of referrals exceeding 0.5 IoU, indicating accurate regional localization. RADAR effectively complements radiologist judgment, providing valuable post-read support for perceptual-error detection in CXR interpretation. Its flexible ROI suggestions and non-intrusive integration position it as a promising tool in real-world radiology workflows. To facilitate reproducibility and further evaluation, we release a fully open-source web implementation alongside a simulated error dataset. All code, data, demonstration videos, and the application are publicly available at https://github.com/avutukuri01/RADAR.

X-Ray Detection Chest Methodology In Silico Academic Lab Open Code Open Dataset Reproducibility

Artificial intelligence (AI) and CT in abdominal imaging: image reconstruction and beyond.

Pisuchpen N, Srinivas Rao S, Noda Y, Kongboonvijit S, Rezaei A, Kambadakone A

•papers•Jun 16 2025

Computed tomography (CT) is a cornerstone of abdominal imaging, playing a vital role in accurate diagnosis, appropriate treatment planning, and disease monitoring. The evolution of artificial intelligence (AI) in imaging has introduced deep learning-based reconstruction (DLR) techniques that enhance image quality, reduce radiation dose, and improve workflow efficiency. Traditional image reconstruction methods, including filtered back projection (FBP) and iterative reconstruction (IR), have limitations such as high noise levels and artificial image texture. DLR overcomes these challenges by leveraging convolutional neural networks to generate high-fidelity images while preserving anatomical details. Recent advances in vendor-specific and vendor-agnostic DLR algorithms, such as TrueFidelity, AiCE, and Precise Image, have demonstrated significant improvements in contrast-to-noise ratio, lesion detection, and diagnostic confidence across various abdominal organs, including the liver, pancreas, and kidneys. Furthermore, AI extends beyond image reconstruction to applications such as low contrast lesion detection, quantitative imaging, and workflow optimization, augmenting radiologists' efficiency and diagnostic accuracy. However, challenges remain in clinical validation, standardization, and widespread adoption. This review explores the principles, advancements, and future directions of AI-driven CT image reconstruction and its expanding role in abdominal imaging.

CT Reconstruction Abdominal Review In Silico Academic Lab Reproducibility

Real-time cardiac cine MRI: A comparison of a diffusion probabilistic model with alternative state-of-the-art image reconstruction techniques for undersampled spiral acquisitions.

Schad O, Heidenreich JF, Petri N, Kleineisel J, Sauer S, Bley TA, Nordbeck P, Petritsch B, Wech T

•papers•Jun 16 2025

Electrocardiogram (ECG)-gated cine imaging in breath-hold enables high-quality diagnostics in most patients but can be compromised by arrhythmia and inability to hold breath. Real-time cardiac MRI offers faster and robust exams without these limitations. To achieve sufficient acceleration, advanced reconstruction methods, which transfer data into high-quality images, are required. In this study, undersampled spiral balanced SSFP (bSSFP) real-time data in free-breathing were acquired at 1.5T in 16 healthy volunteers and five arrhythmic patients, with ECG-gated Cartesian cine in breath-hold serving as clinical reference. Image reconstructions were performed using a tailored and specifically trained score-based diffusion model, compared to a variational network and different compressed sensing approaches. The techniques were assessed using an expert reader study, scalar metric calculations, difference images against a segmented reference, and Bland-Altman analysis of cardiac functional parameters. In participants with irregular RR-cycles, spiral real-time acquisitions showed superior image quality compared to the clinical reference. Quantitative and qualitative metrics indicate enhanced image quality of the diffusion model in comparison to the alternative reconstruction methods, although improvements over the variational network were minor. Slightly higher ejection fractions for the real-time diffusion reconstructions were exhibited relative to the clinical references with a bias of 1.1 ± 5.7% for healthy subjects. The proposed real-time technique enables free-breathing acquisitions of spatio-temporal images with high quality, covering the entire heart in less than 1 min. Evaluation of ejection fraction using the ECG-gated reference can be vulnerable to arrhythmia and averaging effects, highlighting the need for real-time approaches. Prolonged inference times and stochastic variability of the diffusion reconstruction represent obstacles to overcome for clinical translation.

MRI Reconstruction Cardiac Retrospective Clinical In Silico Academic Lab Reproducibility

BreastDCEDL: Curating a Comprehensive DCE-MRI Dataset and developing a Transformer Implementation for Breast Cancer Treatment Response Prediction

Naomi Fridman, Bubby Solway, Tomer Fridman, Itamar Barnea, Anat Goldshtein

•preprint•Jun 13 2025

Breast cancer remains a leading cause of cancer-related mortality worldwide, making early detection and accurate treatment response monitoring critical priorities. We present BreastDCEDL, a curated, deep learning-ready dataset comprising pre-treatment 3D Dynamic Contrast-Enhanced MRI (DCE-MRI) scans from 2,070 breast cancer patients drawn from the I-SPY1, I-SPY2, and Duke cohorts, all sourced from The Cancer Imaging Archive. The raw DICOM imaging data were rigorously converted into standardized 3D NIfTI volumes with preserved signal integrity, accompanied by unified tumor annotations and harmonized clinical metadata including pathologic complete response (pCR), hormone receptor (HR), and HER2 status. Although DCE-MRI provides essential diagnostic information and deep learning offers tremendous potential for analyzing such complex data, progress has been limited by lack of accessible, public, multicenter datasets. BreastDCEDL addresses this gap by enabling development of advanced models, including state-of-the-art transformer architectures that require substantial training data. To demonstrate its capacity for robust modeling, we developed the first transformer-based model for breast DCE-MRI, leveraging Vision Transformer (ViT) architecture trained on RGB-fused images from three contrast phases (pre-contrast, early post-contrast, and late post-contrast). Our ViT model achieved state-of-the-art pCR prediction performance in HR+/HER2- patients (AUC 0.94, accuracy 0.93). BreastDCEDL includes predefined benchmark splits, offering a framework for reproducible research and enabling clinically meaningful modeling in breast cancer imaging.

MRI Classification Breast Dataset Release In Silico Academic Lab Open Dataset Benchmark SOTA Reproducibility

Empirical evaluation of artificial intelligence distillation techniques for ascertaining cancer outcomes from electronic health records.

Riaz IB, Naqvi SAA, Ashraf N, Harris GJ, Kehl KL

•papers•Jun 10 2025

Phenotypic information for cancer research is embedded in unstructured electronic health records (EHR), requiring effort to extract. Deep learning models can automate this but face scalability issues due to privacy concerns. We evaluated techniques for applying a teacher-student framework to extract longitudinal clinical outcomes from EHRs. We focused on the challenging task of ascertaining two cancer outcomes-overall response and progression according to Response Evaluation Criteria in Solid Tumors (RECIST)-from free-text radiology reports. Teacher models with hierarchical Transformer architecture were trained on data from Dana-Farber Cancer Institute (DFCI). These models labeled public datasets (MIMIC-IV, Wiki-text) and GPT-4-generated synthetic data. "Student" models were then trained to mimic the teachers' predictions. DFCI "teacher" models achieved high performance, and student models trained on MIMIC-IV data showed comparable results, demonstrating effective knowledge transfer. However, student models trained on Wiki-text and synthetic data performed worse, emphasizing the need for in-domain public datasets for model distillation.

Mixed Modality LLM Radiology Report Methodology In Silico Academic Lab GenAI Reproducibility

Post-processing steps improve generalisability and robustness of an MRI-based radiogenomic model for human papillomavirus status prediction in oropharyngeal cancer.

Ahmadian M, Bodalal Z, Bos P, Martens RM, Agrotis G, van der Hulst HJ, Vens C, Karssemakers L, Al-Mamgani A, de Graaf P, Jasperse B, Brakenhoff RH, Leemans CR, Beets-Tan RGH, Castelijns JA, van den Brekel MWM

•papers•Jun 6 2025

To assess the impact of image post-processing steps on the generalisability of MRI-based radiogenomic models. Using a human papillomavirus (HPV) status in oropharyngeal squamous cell carcinoma (OPSCC) prediction model, this study examines the potential of different post-processing strategies to increase its generalisability across data from different centres and image acquisition protocols. Contrast-enhanced T1-weighted MR images of OPSCC patients of two cohorts from different centres, with confirmed HPV status, were manually segmented. After radiomic feature extraction, the HPV prediction model trained on a training set with 91 patients was subsequently tested on two independent cohorts: a test set with 62 patients and an externally derived cohort of 157 patients. The data processing options included: data harmonisation, a process to ensure consistency in data from different centres; exclusion of unstable features across different segmentations and scan protocols; and removal of highly correlated features to reduce redundancy. The predictive model, trained without post-processing, showed high performance on the test set, with an AUC of 0.79 (95% CI: 0.66-0.90, p < 0.001). However, when tested on the external data, the model performed less well, resulting in an AUC of 0.52 (95% CI: 0.45-0.58, p = 0.334). The model's generalisability substantially improved after performing post-processing steps. The AUC for the test set reached 0.76 (95% CI: 0.63-0.87, p < 0.001), while for the external cohort, the predictive model achieved an AUC of 0.73 (95% CI: 0.64-0.81, p < 0.001). When applied before model development, post-processing steps can enhance the robustness and generalisability of predictive radiogenomics models. Question How do post-processing steps impact the generalisability of MRI-based radiogenomic prediction models? Findings Applying post-processing steps, i.e., data harmonisation, identification of stable radiomic features, and removal of correlated features, before model development can improve model robustness and generalisability. Clinical relevance Post-processing steps in MRI radiogenomic model generation lead to reliable non-invasive diagnostic tools for personalised cancer treatment strategies.

MRI Classification Retrospective Clinical In Silico Academic Lab Reproducibility

ResPF: Residual Poisson Flow for Efficient and Physically Consistent Sparse-View CT Reconstruction

Changsheng Fang, Yongtong Liu, Bahareh Morovati, Shuo Han, Yu Shi, Li Zhou, Shuyi Fan, Hengyong Yu

•preprint•Jun 6 2025

Sparse-view computed tomography (CT) is a practical solution to reduce radiation dose, but the resulting ill-posed inverse problem poses significant challenges for accurate image reconstruction. Although deep learning and diffusion-based methods have shown promising results, they often lack physical interpretability or suffer from high computational costs due to iterative sampling starting from random noise. Recent advances in generative modeling, particularly Poisson Flow Generative Models (PFGM), enable high-fidelity image synthesis by modeling the full data distribution. In this work, we propose Residual Poisson Flow (ResPF) Generative Models for efficient and accurate sparse-view CT reconstruction. Based on PFGM++, ResPF integrates conditional guidance from sparse measurements and employs a hijacking strategy to significantly reduce sampling cost by skipping redundant initial steps. However, skipping early stages can degrade reconstruction quality and introduce unrealistic structures. To address this, we embed a data-consistency into each iteration, ensuring fidelity to sparse-view measurements. Yet, PFGM sampling relies on a fixed ordinary differential equation (ODE) trajectory induced by electrostatic fields, which can be disrupted by step-wise data consistency, resulting in unstable or degraded reconstructions. Inspired by ResNet, we introduce a residual fusion module to linearly combine generative outputs with data-consistent reconstructions, effectively preserving trajectory continuity. To the best of our knowledge, this is the first application of Poisson flow models to sparse-view CT. Extensive experiments on synthetic and clinical datasets demonstrate that ResPF achieves superior reconstruction quality, faster inference, and stronger robustness compared to state-of-the-art iterative, learning-based, and diffusion models.

CT Reconstruction Methodology In Silico Academic Lab Reproducibility

Enhancing image quality in fast neutron-based range verification of proton therapy using a deep learning-based prior in LM-MAP-EM reconstruction.

Setterdahl LM, Skjerdal K, Ratliff HN, Ytre-Hauge KS, Lionheart WRB, Holman S, Pettersen HES, Blangiardi F, Lathouwers D, Meric I

•papers•Jun 5 2025

This study investigates the use of list-mode (LM) maximum a posteriori (MAP) expectation maximization (EM) incorporating prior information predicted by a convolutional neural network for image reconstruction in fast neutron (FN)-based proton therapy range verification.Approach. A conditional generative adversarial network (pix2pix) was trained on progressively noisier data, where detector resolution effects were introduced gradually to simulate realistic conditions. FN data were generated using Monte Carlo simulations of an 85 MeV proton pencil beam in a computed tomography (CT)-based lung cancer patient model, with range shifts emulating weight gain and loss. The network was trained to estimate the expected two-dimensional (2D) ground truth FN production distribution from simple back-projection images. Performance was evaluated using mean squared error (MSE), structural similarity index (SSIM), and the correlation between shifts in predicted distributions and true range shifts. Main results. Our results show that pix2pix performs well on noise-free data but suffers from significant degradation when detector resolution effects are introduced. Among the LM-MAP-EM approaches tested, incorporating a mean prior estimate into the reconstruction process improved performance, with LM-MAP-EM using a mean prior estimate outperforming naïve LM maximum likelihood EM (LM-MLEM) and conventional LM-MAP-EM with a smoothing quadratic energy function in terms of SSIM. Significance. Findings suggest that deep learning techniques can enhance iterative reconstruction for range verification in proton therapy. However, the effectiveness of the model is highly dependent on data quality, limiting its robustness in high-noise scenarios.&#xD.

CT Reconstruction Chest Methodology In Silico Academic Lab Reproducibility

Best Practices and Checklist for Reviewing Artificial Intelligence-Based Medical Imaging Papers: Classification.

Kline TL, Kitamura F, Warren D, Pan I, Korchi AM, Tenenholtz N, Moy L, Gichoya JW, Santos I, Moradi K, Avval AH, Alkhulaifat D, Blumer SL, Hwang MY, Git KA, Shroff A, Stember J, Walach E, Shih G, Langer SG

•papers•Jun 4 2025

Recent advances in Artificial Intelligence (AI) methodologies and their application to medical imaging has led to an explosion of related research programs utilizing AI to produce state-of-the-art classification performance. Ideally, research culminates in dissemination of the findings in peer-reviewed journals. To date, acceptance or rejection criteria are often subjective; however, reproducible science requires reproducible review. The Machine Learning Education Sub-Committee of the Society for Imaging Informatics in Medicine (SIIM) has identified a knowledge gap and need to establish guidelines for reviewing these studies. This present work, written from the machine learning practitioner standpoint, follows a similar approach to our previous paper related to segmentation. In this series, the committee will address best practices to follow in AI-based studies and present the required sections with examples and discussion of requirements to make the studies cohesive, reproducible, accurate, and self-contained. This entry in the series focuses on image classification. Elements like dataset curation, data pre-processing steps, reference standard identification, data partitioning, model architecture, and training are discussed. Sections are presented as in a typical manuscript. The content describes the information necessary to ensure the study is of sufficient quality for publication consideration and, compared with other checklists, provides a focused approach with application to image classification tasks. The goal of this series is to provide resources to not only help improve the review process for AI-based medical imaging papers, but to facilitate a standard for the information that should be presented within all components of the research study.

Classification Review Consortium Reproducibility Policy

PARADIM: A Platform to Support Research at the Interface of Data Science and Medical Imaging.

Lemaréchal Y, Couture G, Pelletier F, Lefol R, Asselin PL, Ouellet S, Bernard J, Ebrahimpour L, Manem VSK, Topalis J, Schachtner B, Jodogne S, Joubert P, Jeblick K, Ingrisch M, Després P

•papers•Jun 3 2025

This paper describes PARADIM, a digital infrastructure designed to support research at the interface of data science and medical imaging, with a focus on Research Data Management best practices. The platform is built from open-source components and rooted in the FAIR principles through strict compliance with the DICOM standard. It addresses key needs in data curation, governance, privacy, and scalable resource management. Supporting every stage of the data science discovery cycle, the platform offers robust functionalities for user identity and access management, data de-identification, storage, annotation, as well as model training and evaluation. Rich metadata are generated all along the research lifecycle to ensure the traceability and reproducibility of results. PARADIM hosts several medical image collections and allows the automation of large-scale, computationally intensive pipelines (e.g., automatic segmentation, dose calculations, AI model evaluation). The platform fills a gap at the interface of data science and medical imaging, where digital infrastructures are key in the development, evaluation, and deployment of innovative solutions in the real world.

Mixed Modality Segmentation Methodology Prototype Academic Lab Reproducibility Open Code Benchmark SOTA

Filter Papers

Tags