Latest Papers on Radiology AI. Sources: arxiv, Tags: Reproducibility.

Clinical Uncertainty Impacts Machine Learning Evaluations

Simone Lionetti, Fabian Gröger, Philippe Gottfrois, Alvaro Gonzalez-Jimenez, Ludovic Amruthalingam, Alexander A. Navarini, Marc Pouly

•preprint•Sep 26 2025

Clinical dataset labels are rarely certain as annotators disagree and confidence is not uniform across cases. Typical aggregation procedures, such as majority voting, obscure this variability. In simple experiments on medical imaging benchmarks, accounting for the confidence in binary labels significantly impacts model rankings. We therefore argue that machine-learning evaluations should explicitly account for annotation uncertainty using probabilistic metrics that directly operate on distributions. These metrics can be applied independently of the annotations' generating process, whether modeled by simple counting, subjective confidence ratings, or probabilistic response models. They are also computationally lightweight, as closed-form expressions have linear-time implementations once examples are sorted by model score. We thus urge the community to release raw annotations for datasets and to adopt uncertainty-aware evaluation so that performance estimates may better reflect clinical data.

Classification Methodology In Silico Reproducibility Benchmark SOTA

Interpreting Convolutional Neural Network Activation Maps with Hand-crafted Radiomics Features on Progression of Pediatric Craniopharyngioma after Irradiation Therapy

Wenjun Yang, Chuang Wang, Tina Davis, Jinsoo Uh, Chia-Ho Hua, Thomas E. Merchant

•preprint•Sep 25 2025

Purpose: Convolutional neural networks (CNNs) are promising in predicting treatment outcome for pediatric craniopharyngioma while the decision mechanisms are difficult to interpret. We compared the activation maps of CNN with hand crafted radiomics features of a densely connected artificial neural network (ANN) to correlate with clinical decisions. Methods: A cohort of 100 pediatric craniopharyngioma patients were included. Binary tumor progression was classified by an ANN and CNN with input of T1w, T2w, and FLAIR MRI. Hand-crafted radiomic features were calculated from the MRI using the LifeX software and key features were selected by Group lasso regularization, comparing to the activation maps of CNN. We evaluated the radiomics models by accuracy, area under receiver operational curve (AUC), and confusion matrices. Results: The average accuracy of T1w, T2w, and FLAIR MRI was 0.85, 0.92, and 0.86 (ANOVA, F = 1.96, P = 0.18) with ANN; 0.83, 0.81, and 0.70 (ANOVA, F = 10.11, P = 0.003) with CNN. The average AUC of ANN was 0.91, 0.97, and 0.90; 0.86, 0.88, and 0.75 of CNN for the 3 MRI, respectively. The activation maps were correlated with tumor shape, min and max intensity, and texture features. Conclusions: The tumor progression for pediatric patients with craniopharyngioma achieved promising accuracy with ANN and CNN model. The activation maps extracted from different levels were interpreted with hand-crafted key features of ANN.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab Reproducibility

LiLAW: Lightweight Learnable Adaptive Weighting to Meta-Learn Sample Difficulty and Improve Noisy Training

Abhishek Moturu, Anna Goldenberg, Babak Taati

•preprint•Sep 25 2025

Training deep neural networks in the presence of noisy labels and data heterogeneity is a major challenge. We introduce Lightweight Learnable Adaptive Weighting (LiLAW), a novel method that dynamically adjusts the loss weight of each training sample based on its evolving difficulty level, categorized as easy, moderate, or hard. Using only three learnable parameters, LiLAW adaptively prioritizes informative samples throughout training by updating these weights using a single mini-batch gradient descent step on the validation set after each training mini-batch, without requiring excessive hyperparameter tuning or a clean validation set. Extensive experiments across multiple general and medical imaging datasets, noise levels and types, loss functions, and architectures with and without pretraining demonstrate that LiLAW consistently enhances performance, even in high-noise environments. It is effective without heavy reliance on data augmentation or advanced regularization, highlighting its practicality. It offers a computationally efficient solution to boost model generalization and robustness in any neural network training setup.

Methodology In Silico Reproducibility

CPT-4DMR: Continuous sPatial-Temporal Representation for 4D-MRI Reconstruction

Xinyang Wu, Muheng Li, Xia Li, Orso Pusterla, Sairos Safai, Philippe C. Cattin, Antony J. Lomax, Ye Zhang

•preprint•Sep 22 2025

Four-dimensional MRI (4D-MRI) is an promising technique for capturing respiratory-induced motion in radiation therapy planning and delivery. Conventional 4D reconstruction methods, which typically rely on phase binning or separate template scans, struggle to capture temporal variability, complicate workflows, and impose heavy computational loads. We introduce a neural representation framework that considers respiratory motion as a smooth, continuous deformation steered by a 1D surrogate signal, completely replacing the conventional discrete sorting approach. The new method fuses motion modeling with image reconstruction through two synergistic networks: the Spatial Anatomy Network (SAN) encodes a continuous 3D anatomical representation, while a Temporal Motion Network (TMN), guided by Transformer-derived respiratory signals, produces temporally consistent deformation fields. Evaluation using a free-breathing dataset of 19 volunteers demonstrates that our template- and phase-free method accurately captures both regular and irregular respiratory patterns, while preserving vessel and bronchial continuity with high anatomical fidelity. The proposed method significantly improves efficiency, reducing the total processing time from approximately five hours required by conventional discrete sorting methods to just 15 minutes of training. Furthermore, it enables inference of each 3D volume in under one second. The framework accurately reconstructs 3D images at any respiratory state, achieves superior performance compared to conventional methods, and demonstrates strong potential for application in 4D radiation therapy planning and real-time adaptive treatment.

MRI Reconstruction Chest Methodology In Silico Academic Lab Reproducibility Breakthrough

Enhancement Without Contrast: Stability-Aware Multicenter Machine Learning for Glioma MRI Imaging

Sajad Amiri, Shahram Taeb, Sara Gharibi, Setareh Dehghanfard, Somayeh Sadat Mehrnia, Mehrdad Oveisi, Ilker Hacihaliloglu, Arman Rahmim, Mohammad R. Salmanpour

•preprint•Sep 13 2025

Gadolinium-based contrast agents (GBCAs) are central to glioma imaging but raise safety, cost, and accessibility concerns. Predicting contrast enhancement from non-contrast MRI using machine learning (ML) offers a safer alternative, as enhancement reflects tumor aggressiveness and informs treatment planning. Yet scanner and cohort variability hinder robust model selection. We propose a stability-aware framework to identify reproducible ML pipelines for multicenter prediction of glioma MRI contrast enhancement. We analyzed 1,446 glioma cases from four TCIA datasets (UCSF-PDGM, UPENN-GB, BRATS-Africa, BRATS-TCGA-LGG). Non-contrast T1WI served as input, with enhancement derived from paired post-contrast T1WI. Using PyRadiomics under IBSI standards, 108 features were extracted and combined with 48 dimensionality reduction methods and 25 classifiers, yielding 1,200 pipelines. Rotational validation was trained on three datasets and tested on the fourth. Cross-validation prediction accuracies ranged from 0.91 to 0.96, with external testing achieving 0.87 (UCSF-PDGM), 0.98 (UPENN-GB), and 0.95 (BRATS-Africa), with an average of 0.93. F1, precision, and recall were stable (0.87 to 0.96), while ROC-AUC varied more widely (0.50 to 0.82), reflecting cohort heterogeneity. The MI linked with ETr pipeline consistently ranked highest, balancing accuracy and stability. This framework demonstrates that stability-aware model selection enables reliable prediction of contrast enhancement from non-contrast glioma MRI, reducing reliance on GBCAs and improving generalizability across centers. It provides a scalable template for reproducible ML in neuro-oncology and beyond.

MRI Image Synthesis Neurological Retrospective Clinical In Silico Academic Lab Reproducibility Benchmark SOTA

Mapping of discrete range modulated proton radiograph to water-equivalent path length using machine learning

Atiq Ur Rahman, Chun-Chieh Wang, Shu-Wei Wu, Tsi-Chian Chao, I-Chun Cho

•preprint•Sep 11 2025

Objective. Proton beams enable localized dose delivery. Accurate range estimation is essential, but planning still relies on X-ray CT, which introduces uncertainty in stopping power and range. Proton CT measures water equivalent thickness directly but suffers resolution loss from multiple Coulomb scattering. We develop a data driven method that reconstructs water equivalent path length (WEPL) maps from energy resolved proton radiographs, bypassing intermediate reconstructions. Approach. We present a machine learning pipeline for WEPL from high dimensional radiographs. Data were generated with the TOPAS Monte Carlo toolkit, modeling a clinical nozzle and a patient CT. Proton energies spanned 70-230 MeV across 72 projection angles. Principal component analysis reduced input dimensionality while preserving signal. A conditional GAN with gradient penalty was trained for WEPL prediction using a composite loss (adversarial, MSE, SSIM, perceptual) to balance sharpness, accuracy, and stability. Main results. The model reached a mean relative WEPL deviation of 2.5 percent, an SSIM of 0.97, and a proton radiography gamma index passing rate of 97.1 percent (2 percent delta WEPL, 3 mm distance-to-agreement) on a simulated head phantom. Results indicate high spatial fidelity and strong structural agreement. Significance. WEPL can be mapped directly from proton radiographs with deep learning while avoiding intermediate steps. The method mitigates limits of analytic techniques and may improve treatment planning. Future work will tune the number of PCA components, include detector response, explore low dose settings, and extend multi angle data toward full proton CT reconstruction; it is compatible with clinical workflows.

CT Reconstruction Neurological Methodology In Silico Academic Lab Reproducibility

Artificial Intelligence in Breast Cancer Care: Transforming Preoperative Planning and Patient Education with 3D Reconstruction

Mustafa Khanbhai, Giulia Di Nardo, Jun Ma, Vivienne Freitas, Caterina Masino, Ali Dolatabadi, Zhaoxun "Lorenz" Liu, Wey Leong, Wagner H. Souza, Amin Madani

•preprint•Sep 10 2025

Effective preoperative planning requires accurate algorithms for segmenting anatomical structures across diverse datasets, but traditional models struggle with generalization. This study presents a novel machine learning methodology to improve algorithm generalization for 3D anatomical reconstruction beyond breast cancer applications. We processed 120 retrospective breast MRIs (January 2018-June 2023) through three phases: anonymization and manual segmentation of T1-weighted and dynamic contrast-enhanced sequences; co-registration and segmentation of whole breast, fibroglandular tissue, and tumors; and 3D visualization using ITK-SNAP. A human-in-the-loop approach refined segmentations using U-Mamba, designed to generalize across imaging scenarios. Dice similarity coefficient assessed overlap between automated segmentation and ground truth. Clinical relevance was evaluated through clinician and patient interviews. U-Mamba showed strong performance with DSC values of 0.97 ($\pm$0.013) for whole organs, 0.96 ($\pm$0.024) for fibroglandular tissue, and 0.82 ($\pm$0.12) for tumors on T1-weighted images. The model generated accurate 3D reconstructions enabling visualization of complex anatomical features. Clinician interviews indicated improved planning, intraoperative navigation, and decision support. Integration of 3D visualization enhanced patient education, communication, and understanding. This human-in-the-loop machine learning approach successfully generalizes algorithms for 3D reconstruction and anatomical segmentation across patient datasets, offering enhanced visualization for clinicians, improved preoperative planning, and more effective patient education, facilitating shared decision-making and empowering informed patient choices across medical applications.

MRI Segmentation Breast Retrospective Clinical In Silico Academic Lab Reproducibility

Multispectral CT Denoising via Simulation-Trained Deep Learning: Experimental Results at the ESRF BM18

Peter Gänz, Steffen Kieß, Guangpu Yang, Jajnabalkya Guhathakurta, Tanja Pienkny, Charls Clark, Paul Tafforeau, Andreas Balles, Astrid Hölzing, Simon Zabler, Sven Simon

•preprint•Sep 10 2025

Multispectral computed tomography (CT) enables advanced material characterization by acquiring energy-resolved projection data. However, since the incoming X-ray flux is be distributed across multiple narrow energy bins, the photon count per bin is greatly reduced compared to standard energy-integrated imaging. This inevitably introduces substantial noise, which can either prolong acquisition times and make scan durations infeasible or degrade image quality with strong noise artifacts. To address this challenge, we present a dedicated neural network-based denoising approach tailored for multispectral CT projections acquired at the BM18 beamline of the ESRF. The method exploits redundancies across angular, spatial, and spectral domains through specialized sub-networks combined via stacked generalization and an attention mechanism. Non-local similarities in the angular-spatial domain are leveraged alongside correlations between adjacent energy bands in the spectral domain, enabling robust noise suppression while preserving fine structural details. Training was performed exclusively on simulated data replicating the physical and noise characteristics of the BM18 setup, with validation conducted on CT scans of custom-designed phantoms containing both high-Z and low-Z materials. The denoised projections and reconstructions demonstrate substantial improvements in image quality compared to classical denoising methods and baseline CNN models. Quantitative evaluations confirm that the proposed method achieves superior performance across a broad spectral range, generalizing effectively to real-world experimental data while significantly reducing noise without compromising structural fidelity.

CT Reconstruction Methodology In Silico Academic Lab Reproducibility

MedicalPatchNet: A Patch-Based Self-Explainable AI Architecture for Chest X-ray Classification

Patrick Wienholt, Christiane Kuhl, Jakob Nikolas Kather, Sven Nebelung, Daniel Truhn

•preprint•Sep 9 2025

Deep neural networks excel in radiological image classification but frequently suffer from poor interpretability, limiting clinical acceptance. We present MedicalPatchNet, an inherently self-explainable architecture for chest X-ray classification that transparently attributes decisions to distinct image regions. MedicalPatchNet splits images into non-overlapping patches, independently classifies each patch, and aggregates predictions, enabling intuitive visualization of each patch's diagnostic contribution without post-hoc techniques. Trained on the CheXpert dataset (223,414 images), MedicalPatchNet matches the classification performance (AUROC 0.907 vs. 0.908) of EfficientNet-B0, while substantially improving interpretability: MedicalPatchNet demonstrates substantially improved interpretability with higher pathology localization accuracy (mean hit-rate 0.485 vs. 0.376 with Grad-CAM) on the CheXlocalize dataset. By providing explicit, reliable explanations accessible even to non-AI experts, MedicalPatchNet mitigates risks associated with shortcut learning, thus improving clinical trust. Our model is publicly available with reproducible training and inference scripts and contributes to safer, explainable AI-assisted diagnostics across medical imaging domains. We make the code publicly available: https://github.com/TruhnLab/MedicalPatchNet

X-Ray Classification Chest Methodology In Silico Academic Lab Open Code Reproducibility

Veriserum: A dual-plane fluoroscopic dataset with knee implant phantoms for deep learning in medical imaging

Jinhao Wang, Florian Vogl, Pascal Schütz, Saša Ćuković, William R. Taylor

•preprint•Sep 5 2025

Veriserum is an open-source dataset designed to support the training of deep learning registration for dual-plane fluoroscopic analysis. It comprises approximately 110,000 X-ray images of 10 knee implant pair combinations (2 femur and 5 tibia implants) captured during 1,600 trials, incorporating poses associated with daily activities such as level gait and ramp descent. Each image is annotated with an automatically registered ground-truth pose, while 200 images include manually registered poses for benchmarking. Key features of Veriserum include dual-plane images and calibration tools. The dataset aims to support the development of applications such as 2D/3D image registration, image segmentation, X-ray distortion correction, and 3D reconstruction. Freely accessible, Veriserum aims to advance computer vision and medical imaging research by providing a reproducible benchmark for algorithm development and evaluation. The Veriserum dataset used in this study is publicly available via https://movement.ethz.ch/data-repository/veriserum.html, with the data stored at ETH Z\"urich Research Collections: https://doi.org/10.3929/ethz-b-000701146.

X-Ray Registration Musculoskeletal Dataset Release In Silico Academic Lab Open Dataset Reproducibility

Filter Papers

Tags