Latest Papers on Radiology AI. Tags: Reproducibility

External validation of deep learning-derived 18F-FDG PET/CT delta biomarkers for loco-regional control in head and neck cancer.

Kovacs DG, Aznar M, Van Herk M, Mohamed I, Price J, Ladefoged CN, Fischer BM, Andersen FL, McPartlin A, Osorio EMV, Abravan A

•papers•Aug 30 2025

Delta biomarkers that reflect changes in tumour burden over time can support personalised follow-up in head and neck cancer. However, their clinical use can be limited by the need for manual image segmentation. This study externally evaluates a deep learning model for automatic determination of volume change from serial 18F-fluorodeoxyglucose (18F-FDG) positron emission tomography/computed tomography (PET/CT) scans to stratify patients by loco-regional outcome. Patient/material and methods: An externally developed deep learning algorithm for tumour segmentation was applied to pre- and post-radiotherapy (RT, with or without concomitant chemoradiotherapy) PET/CT scans of 50 consecutive head and neck cancer patients from The Christie NHS Foundation Trust, UK. The model, originally trained on pre-treatment scans from a different institution, was deployed to derive tumour volumes at both time points. The AI-derived change in tumour volume (ΔPET-Gross tumour volume (GTV)) was calculated for each patient. Kaplan-Meier analysis assessed loco-regional control based on ΔPET-GTV, dichotomised at the cohort median. In a separate secondary analysis confined to the pre‑treatment scans, a radiation oncologist qualitatively evaluated the AI‑generated PET‑GTV contours. Patients with higher ΔPET-GTV (i.e. greater tumour shrinkage) had significantly improved loco-regional control (log-rank p = 0.02). At 2 years, control was 94.1% (95% CI: 83.6-100%) vs. 53.6% (95% CI: 32.2-89.1%). Only one of nine failures occurred in the high ΔPET-GTV group. Clinician review found AI volumes acceptable for planning in 78% of cases. In two cases, the algorithm identified oropharyngeal primaries on pre-treatment PET-CT before clinical identification. Deep learning-derived ΔPET-GTV may support clinically meaningful assessment of post-treatment disease status and risk stratification, offering a scalable alternative to manual segmentation in PET/CT follow-up.

Mixed Modality Segmentation Retrospective Clinical Clinical Pilot Academic Lab Reproducibility

Artificial Intelligence-Guided PET Image Reconstruction and Multi-Tracer Imaging: Novel Methods, Challenges, And Opportunities

Movindu Dassanayake, Alejandro Lopez, Andrew Reader, Gary J. R. Cook, Clemens Mingels, Arman Rahmim, Robert Seifert, Ian Alberts, Fereshteh Yousefirizi

•preprint•Aug 30 2025

LAFOV PET/CT has the potential to unlock new applications such as ultra-low dose PET/CT imaging, multiplexed imaging, for biomarker development and for faster AI-driven reconstruction, but further work is required before these can be deployed in clinical routine. LAFOV PET/CT has unrivalled sensitivity but has a spatial resolution of an equivalent scanner with a shorter axial field of view. AI approaches are increasingly explored as potential avenues to enhance image resolution.

PET Reconstruction Whole Body Review Concept Academic Lab Reproducibility

Nasopharyngeal cancer adaptive radiotherapy with CBCT-derived synthetic CT: deep learning-based auto-segmentation precision and dose calculation consistency on a C-Arm linac.

Lei W, Han L, Cao Z, Duan T, Wang B, Li C, Pei X

•papers•Aug 28 2025

To evaluate the precision of automated segmentation facilitated by deep learning (DL) and dose calculation in adaptive radiotherapy (ART) for nasopharyngeal cancer (NPC), leveraging synthetic CT (sCT) images derived from cone-beam CT (CBCT) scans on a conventional C-arm linac. Sixteen NPC patients undergoing a two-phase offline ART were analyzed retrospectively. The initial (pCT1) and adaptive (pCT2) CT scans served as gold standard alongside weekly acquired CBCT scans. Patient data, including manually delineated contours and dose information, were imported into ArcherQA. Using a cycle-consistent generative adversarial network (cycle-GAN) trained on an independent dataset, sCT images (sCT1, sCT4, sCT4*) were generated from weekly CBCT scans (CBCT1, CBCT4, CBCT4) paired with corresponding planning CTs (pCT1, pCT1, pCT2). Auto-segmentation was performed on sCTs, followed by GPU-accelerated Monte Carlo dose recalculation. Auto-segmentation accuracy was assessed via Dice similarity coefficient (DSC) and 95th percentile Hausdorff distance (HD95). Dose calculation fidelity on sCTs was evaluated using dose-volume parameters. Dosimetric consistency between recalculated sCT and pCT plans was analyzed via Spearman's correlation, while volumetric changes were concurrently evaluated to quantify anatomical variations. Most anatomical structures demonstrated high pCT-sCT agreement, with mean values of DSC > 0.85 and HD95 < 5.10 mm. Notable exceptions included the primary Gross Tumor Volume (GTVp) in the pCT2-sCT4 comparison (DSC: 0.75, HD95: 6.03 mm), involved lymph node (GTVn) showing lower agreement (DSC: 0.43, HD95: 16.42 mm), and submandibular glands with moderate agreement (DSC: 0.64-0.73, HD95: 4.45-5.66 mm). Dosimetric analysis revealed the largest mean differences in GTVn D99: -1.44 Gy (95% CI: [-3.01, 0.13] Gy) and right parotid mean dose: -1.94 Gy (95% CI: [-3.33, -0.55] Gy, p < 0.05). Anatomical variations, quantified via sCTs measurements, correlated significantly with offline adaptive plan adjustments in ART. This correlation was strong for parotid glands (ρ > 0.72, p < 0.001), a result that aligned with sCT-derived dose discrepancy analysis (ρ > 0.57, p < 0.05). The proposed method exhibited minor variations in volumetric and dosimetric parameters compared to prior treatment data, suggesting potential efficiency improvements for ART in NPC through reduced human dependency.

CT Segmentation Neurological Retrospective Clinical In Silico Academic Lab Reproducibility

Shining light on degeneracies and uncertainties in quantifying both exchange and restriction with time-dependent diffusion MRI using Bayesian inference

Maëliss Jallais, Quentin Uhl, Tommaso Pavan, Malwina Molendowska, Derek K. Jones, Ileana Jelescu, Marco Palombo

•preprint•Aug 26 2025

Diffusion MRI (dMRI) biophysical models hold promise for characterizing gray matter tissue microstructure. Yet, the reliability of estimated parameters remains largely under-studied, especially in models that incorporate water exchange. In this study, we investigate the accuracy, precision, and presence of degeneracy of two recently proposed gray matter models, NEXI and SANDIX, using two acquisition protocols from the literature, on both simulated and in vivo data. We employ $\mu$GUIDE, a Bayesian inference framework based on deep learning, to quantify model uncertainty and detect parameter degeneracies, enabling a more interpretable assessment of fitted parameters. Our results show that while some microstructural parameters, such as extra-cellular diffusivity and neurite signal fraction, are robustly estimated, others, such as exchange time and soma radius, are often associated with high uncertainty and estimation bias, especially under realistic noise conditions and reduced acquisition protocols. Comparisons with non-linear least squares fitting underscore the added value of uncertainty-aware methods, which allow for the identification and filtering of unreliable estimates. These findings emphasize the need to report uncertainty and consider model degeneracies when interpreting model-based estimates. Our study advocates for the integration of probabilistic fitting approaches in neuroscience imaging pipelines to improve reproducibility and biological interpretability.

MRI Classification Neurological Methodology In Silico Academic Lab Reproducibility

Optimizing meningioma grading with radiomics and deep features integration, attention mechanisms, and reproducibility analysis.

Albadr RJ, Sur D, Yadav A, Rekha MM, Jain B, Jayabalan K, Kubaev A, Taher WM, Alwan M, Jawad MJ, Al-Nuaimi AMA, Mohammadifard M, Farhood B, Akhavan-Sigari R

•papers•Aug 26 2025

This study aims to develop a robust and clinically applicable framework for preoperative grading of meningiomas using T1-contrast-enhanced and T2-weighted MRI images. The approach integrates radiomic feature extraction, attention-guided deep learning models, and reproducibility assessment to achieve high diagnostic accuracy, model interpretability, and clinical reliability. We analyzed MRI scans from 2546 patients with histopathologically confirmed meningiomas (1560 low-grade, 986 high-grade). High-quality T1-contrast and T2-weighted images were preprocessed through harmonization, normalization, resizing, and augmentation. Tumor segmentation was performed using ITK-SNAP, and inter-rater reliability of radiomic features was evaluated using the intraclass correlation coefficient (ICC). Radiomic features were extracted via the SERA software, while deep features were derived from pre-trained models (ResNet50 and EfficientNet-B0), with attention mechanisms enhancing focus on tumor-relevant regions. Feature fusion and dimensionality reduction were conducted using PCA and LASSO. Ensemble models employing Random Forest, XGBoost, and LightGBM were implemented to optimize classification performance using both radiomic and deep features. Reproducibility analysis showed that 52% of radiomic features demonstrated excellent reliability (ICC > 0.90). Deep features from EfficientNet-B0 outperformed ResNet50, achieving AUCs of 94.12% (T1) and 93.17% (T2). Hybrid models combining radiomic and deep features further improved performance, with XGBoost reaching AUCs of 95.19% (T2) and 96.87% (T1). Ensemble models incorporating both deep architectures achieved the highest classification performance, with AUCs of 96.12% (T2) and 96.80% (T1), demonstrating superior robustness and accuracy. This work introduces a comprehensive and clinically meaningful AI framework that significantly enhances the preoperative grading of meningiomas. The model's high accuracy, interpretability, and reproducibility support its potential to inform surgical planning, reduce reliance on invasive diagnostics, and facilitate more personalized therapeutic decision-making in routine neuro-oncology practice. Not applicable.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA Reproducibility

Reducing radiomics errors in nasopharyngeal cancer via deep learning-based synthetic CT generation from CBCT.

Xiao Y, Lin W, Xie F, Liu L, Zheng G, Xiao C

•papers•Aug 25 2025

This study investigates the impact of cone beam computed tomography (CBCT) image quality on radiomic analysis and evaluates the potential of deep learning-based enhancement to improve radiomic feature accuracy in nasopharyngeal cancer (NPC). The CBAMRegGAN model was trained on 114 paired CT and CBCT datasets from 114 nasopharyngeal cancer patients to enhance CBCT images, with CT images as ground truth. The dataset was split into 82 patients for training, 12 for validation, and 20 for testing. The radiomic features in 6 different categories, including first-order, gray-level co-occurrence matrix (GLCM), gray-level run-length matrix (GLRLM), gray-level size-zone matrix(GLSZM), neighbouring gray tone difference matrix (NGTDM), and gray-level dependence matrix (GLDM), were extracted from the gross tumor volume (GTV) of original CBCT, enhanced CBCT, and CT. Comparing feature errors between original and enhanced CBCT showed that deep learning-based enhancement improves radiomic feature accuracy. The CBAMRegGAN model achieved improved image quality with a peak signal-to-noise ratio (PSNR) of 29.52 ± 2.28 dB, normalized mean absolute error (NMAE) of 0.0129 ± 0.004, and structural similarity index (SSIM) of 0.910 ± 0.025 for enhanced CBCT images. This led to reduced errors in most radiomic features, with average reductions across 20 patients of 19.0%, 24.0%, 3.0%, 19%, 15.0%, and 5.0% for first-order, GLCM, GLRLM, GLSZM, NGTDM, and GLDM features. This study demonstrates that CBCT image quality significantly influences radiomic analysis, and deep learning-based enhancement techniques can effectively improve both image quality and the accuracy of radiomic features in NPC.

CT Image Synthesis Retrospective Clinical In Silico Academic Lab Reproducibility

Benchmarking Class Activation Map Methods for Explainable Brain Hemorrhage Classification on Hemorica Dataset

Z. Rafati, M. Hoseyni, J. Khoramdel, A. Nikoofard

•preprint•Aug 25 2025

Explainable Artificial Intelligence (XAI) has become an essential component of medical imaging research, aiming to increase transparency and clinical trust in deep learning models. This study investigates brain hemorrhage diagnosis with a focus on explainability through Class Activation Mapping (CAM) techniques. A pipeline was developed to extract pixellevel segmentation and detection annotations from classification models using nine state-of-the-art CAM algorithms, applied across multiple network stages, and quantitatively evaluated on the Hemorica dataset, which uniquely provides both slice-level labels and high-quality segmentation masks. Metrics including Dice, IoU, and pixel-wise overlap were employed to benchmark CAM variants. Results show that the strongest localization performance occurred at stage 5 of EfficientNetV2S, with HiResCAM yielding the highest bounding-box alignment and AblationCAM achieving the best pixel-level Dice (0.57) and IoU (0.40), representing strong accuracy given that models were trained solely for classification without segmentation supervision. To the best of current knowledge, this is among the f irst works to quantitatively compare CAM methods for brain hemorrhage detection, establishing a reproducible benchmark and underscoring the potential of XAI-driven pipelines for clinically meaningful AI-assisted diagnosis.

CT Classification Neurological Methodology In Silico Academic Lab Benchmark SOTA Reproducibility

Spectral computed tomography thermometry for thermal ablation: applicability and needle artifact reduction.

Koetzier LR, Hendriks P, Heemskerk JWT, van der Werf NR, Selles M, van der Molen AJ, Smits MLJ, Goorden MC, Burgmans MC

•papers•Aug 23 2025

Effective thermal ablation of liver tumors requires precise monitoring of the ablation zone. Computed tomography (CT) thermometry can non-invasively monitor lethal temperatures but suffers from metal artifacts caused by ablation equipment. This study assesses spectral CT thermometry's applicability during microwave ablation, comparing the reproducibility, precision, and accuracy of attenuation-based versus physical density-based thermometry. Furthermore, it identifies optimal metal artifact reduction (MAR) methods: O-MAR, deep learning-MAR, spectral CT, and combinations thereof. Four gel phantoms embedded with temperature sensors underwent a 10- minute, 60 W microwave ablation imaged by dual-layer spectral CT scanner in 23 scans over time. For each scan attenuation-based and physical density-based temperature maps were reconstructed. Attenuation-based and physical density-based thermometry models were tested for reproducibility over three repetitions; a fourth repetition focused on accuracy. MAR techniques were applied to one repetition to evaluate temperature precision in artifact-corrupted slices. The correlation between CT value and temperature was highly linear with an R-squared value exceeding 96 %. Model parameters for attenuation-based and physical density-based thermometry were -0.38 HU/°C and 0.00039 °C-1, with coefficients of variation of 2.3 % and 6.7 %, respectively. Physical density maps improved temperature precision in presence of needle artifacts by 73 % compared to attenuation images. O-MAR improved temperature precision with 49 % compared to no MAR. Attenuation-based thermometry yielded narrower Bland-Altman limits-of-agreement (-7.7 °C to 5.3 °C) than physical density-based thermometry. Spectral physical density-based CT thermometry at 150 keV, utilized alongside O-MAR, enhances temperature precision in presence of metal artifacts and achieves reproducible temperature measurements with high accuracy.

CT Reconstruction Abdominal Methodology Phantom/Animal Academic Lab Reproducibility GenAI

Robust Deep Learning for Pulse-echo Speed of Sound Imaging via Time-shift Maps.

Chen H, Han A

•papers•Aug 22 2025

Accurately imaging the spatial distribution of longitudinal speed of sound (SoS) has a profound impact on image quality and the diagnostic value of ultrasound. Knowledge of SoS distribution allows effective aberration correction to improve image quality. SoS imaging also provides a new contrast mechanism to facilitate disease diagnosis. However, SoS imaging is challenging in the pulse-echo mode. Deep learning (DL) is a promising approach for pulse-echo SoS imaging, which may yield more accurate results than pure physics-based approaches. Herein, we developed a robust DL approach for SoS imaging that learns the nonlinear mapping between measured time shifts and the underlying SoS without subjecting to the constraints of a specific forward model. Various strategies were adopted to enhance model performance. Time-shift maps were computed by adopting a common mid-angle configuration from the non-DL literature, normalizing complex beamformed ultrasound data, and accounting for depth-dependent frequency when converting phase shifts to time shifts. The structural similarity index measure (SSIM) was incorporated into the loss function to learn the global structure for SoS imaging. A two-stage training strategy was employed, leveraging computationally efficient ray-tracing synthesis for extensive pretraining, and more realistic but computationally expensive full-wave simulations for fine-tuning. Using these combined strategies, our model was shown to be robust and generalizable across different conditions. The simulation-trained model successfully reconstructed the SoS maps of phantoms using experimental data. Compared with the physics-based inversion approach, our method improved reconstruction accuracy and contrast-to-noise ratio in phantom experiments. These results demonstrated the accuracy and robustness of our approach.

Ultrasound Reconstruction Methodology In Silico Academic Lab Reproducibility

Label Uncertainty for Ultrasound Segmentation

Malini Shivaram, Gautam Rajendrakumar Gare, Laura Hutchins, Jacob Duplantis, Thomas Deiss, Thales Nogueira Gomes, Thong Tran, Keyur H. Patel, Thomas H Fox, Amita Krishnan, Deva Ramanan, Bennett DeBoisblanc, Ricardo Rodriguez, John Galeotti

•preprint•Aug 21 2025

In medical imaging, inter-observer variability among radiologists often introduces label uncertainty, particularly in modalities where visual interpretation is subjective. Lung ultrasound (LUS) is a prime example-it frequently presents a mixture of highly ambiguous regions and clearly discernible structures, making consistent annotation challenging even for experienced clinicians. In this work, we introduce a novel approach to both labeling and training AI models using expert-supplied, per-pixel confidence values. Rather than treating annotations as absolute ground truth, we design a data annotation protocol that captures the confidence that radiologists have in each labeled region, modeling the inherent aleatoric uncertainty present in real-world clinical data. We demonstrate that incorporating these confidence values during training leads to improved segmentation performance. More importantly, we show that this enhanced segmentation quality translates into better performance on downstream clinically-critical tasks-specifically, estimating S/F oxygenation ratio values, classifying S/F ratio change, and predicting 30-day patient readmission. While we empirically evaluate many methods for exposing the uncertainty to the learning model, we find that a simple approach that trains a model on binarized labels obtained with a (60%) confidence threshold works well. Importantly, high thresholds work far better than a naive approach of a 50% threshold, indicating that training on very confident pixels is far more effective. Our study systematically investigates the impact of training with varying confidence thresholds, comparing not only segmentation metrics but also downstream clinical outcomes. These results suggest that label confidence is a valuable signal that, when properly leveraged, can significantly enhance the reliability and clinical utility of AI in medical imaging.

Ultrasound Segmentation Chest Methodology In Silico Reproducibility

Filter Papers

Tags