Latest Papers on Radiology AI. Tags: Reproducibility, Order: Best Match, Limit: 10.

Artificial Intelligence-Driven Cancer Diagnostics: Enhancing Radiology and Pathology through Reproducibility, Explainability, and Multimodality.

Khosravi P, Fuchs TJ, Ho DJ

•papers•Jul 2 2025

The integration of artificial intelligence (AI) in cancer research has significantly advanced radiology, pathology, and multimodal approaches, offering unprecedented capabilities in image analysis, diagnosis, and treatment planning. AI techniques provide standardized assistance to clinicians, in which many diagnostic and predictive tasks are manually conducted, causing low reproducibility. These AI methods can additionally provide explainability to help clinicians make the best decisions for patient care. This review explores state-of-the-art AI methods, focusing on their application in image classification, image segmentation, multiple instance learning, generative models, and self-supervised learning. In radiology, AI enhances tumor detection, diagnosis, and treatment planning through advanced imaging modalities and real-time applications. In pathology, AI-driven image analysis improves cancer detection, biomarker discovery, and diagnostic consistency. Multimodal AI approaches can integrate data from radiology, pathology, and genomics to provide comprehensive diagnostic insights. Emerging trends, challenges, and future directions in AI-driven cancer research are discussed, emphasizing the transformative potential of these technologies in improving patient outcomes and advancing cancer care. This article is part of a special series: Driving Cancer Discoveries with Computational Research, Data Science, and Machine Learning/AI.

Mixed Modality Classification Review Concept Academic Lab Reproducibility GenAI

Ensemble methods and partially-supervised learning for accurate and robust automatic murine organ segmentation.

Daenen LHBA, de Bruijn J, Staut N, Verhaegen F

•papers•Jul 2 2025

Delineation of multiple organs in murine µCT images is crucial for preclinical studies but requires manual volumetric segmentation, a tedious and time-consuming process prone to inter-observer variability. Automatic deep learning-based segmentation can improve speed and reproducibility. While 2D and 3D deep learning models have been developed for anatomical segmentation, their generalization to external datasets has not been extensively investigated. Furthermore, ensemble learning, combining predictions of multiple 2D models, and partially-supervised learning (PSL), enabling training on partially-labeled datasets, have not been explored for preclinical purposes. This study demonstrates the first use of PSL frameworks and the superiority of 3D models in accuracy and generalizability to external datasets. Ensemble methods performed on par or better than the best individual 2D network, but only 3D models consistently generalized to external datasets (Dice Similarity Coefficient (DSC) > 0.8). PSL frameworks showed promising results across various datasets and organs, but its generalization to external data can be improved for some organs. This work highlights the superiority of 3D models over 2D and ensemble counterparts in accuracy and generalizability for murine µCT image segmentation. Additionally, a promising PSL framework is presented for leveraging multiple datasets without complete annotations. Our model can increase time-efficiency and improve reproducibility in preclinical radiotherapy workflows by circumventing manual contouring bottlenecks. Moreover, high segmentation accuracy of 3D models allows monitoring multiple organs over time using repeated µCT imaging, potentially reducing the number of mice sacrificed in studies, adhering to the 3R principle, specifically Reduction and Refinement.

CT Segmentation Abdominal Methodology In Silico Academic Lab Reproducibility

Agreement between Routine-Dose and Lower-Dose CT with and without Deep Learning-based Denoising for Active Surveillance of Solid Small Renal Masses: A Multiobserver Study.

Borgbjerg J, Breen BS, Kristiansen CH, Larsen NE, Medrud L, Mikalone R, Müller S, Naujokaite G, Negård A, Nielsen TK, Salte IM, Frøkjær JB

•papers•Jul 1 2025

Purpose To assess the agreement between routine-dose (RD) and lower-dose (LD) contrast-enhanced CT scans, with and without Digital Imaging and Communications in Medicine-based deep learning-based denoising (DLD), in evaluating small renal masses (SRMs) during active surveillance. Materials and Methods In this retrospective study, CT scans from patients undergoing active surveillance for an SRM were included. Using a validated simulation technique, LD CT images were generated from the RD images to simulate 75% (LD75) and 90% (LD90) radiation dose reductions. Two additional LD image sets, in which the DLD was applied (LD75-DLD and LD90-DLD), were generated. Between January 2023 and June 2024, nine radiologists from three institutions independently evaluated 350 CT scans across five datasets for tumor size, tumor nearness to the collecting system (TN), and tumor shape irregularity (TSI), and interobserver reproducibility and agreement were assessed using the 95% limits of agreement with the mean (LOAM) and Gwet AC2 coefficient, respectively. Subjective and quantitative image quality assessments were also performed. Results The study sample included 70 patients (mean age, 73.2 years ± 9.2 [SD]; 48 male, 22 female). LD75 CT was found to be in agreement with RD scans for assessing SRM diameter, with a LOAM of ±2.4 mm (95% CI: 2.3, 2.6) for LD75 compared with ±2.2 mm (95% CI: 2.1, 2.4) for RD. However, a 90% dose reduction compromised reproducibility (LOAM ±3.0 mm; 95% CI: 2.8, 3.2). LD90-DLD preserved measurement reproducibility (LOAM ±2.4 mm; 95% CI: 2.3, 2.6). Observer agreement was comparable between TN and TSI assessments across all image sets, with no statistically significant differences identified (all comparisons P ≥ .35 for TN and P ≥ .02 for TSI; Holm-corrected significance threshold, P = .013). Subjective and quantitative image quality assessments confirmed that DLD effectively restored image quality at reduced dose levels: LD75-DLD had the highest overall image quality, significantly lower noise, and improved contrast-to-noise ratio compared with RD (P < .001). Conclusion A 75% reduction in radiation dose is feasible for SRM assessment in active surveillance using CT with a conventional iterative reconstruction technique, whereas applying DLD allows submillisievert dose reduction. Keywords: CT, Urinary, Kidney, Radiation Safety, Observer Performance, Technology Assessment Supplemental material is available for this article. © RSNA, 2025 See also commentary by Muglia in this issue.

CT Reconstruction Abdominal Retrospective Clinical In Silico Academic Lab Reproducibility

Multi-parametric MRI Habitat Radiomics Based on Interpretable Machine Learning for Preoperative Assessment of Microsatellite Instability in Rectal Cancer.

Wang Y, Xie B, Wang K, Zou W, Liu A, Xue Z, Liu M, Ma Y

•papers•Jul 1 2025

This study constructed an interpretable machine learning model based on multi-parameter MRI sub-region habitat radiomics and clinicopathological features, aiming to preoperatively evaluate the microsatellite instability (MSI) status of rectal cancer (RC) patients. This retrospective study recruited 291 rectal cancer patients with pathologically confirmed MSI status and randomly divided them into a training cohort and a testing cohort at a ratio of 8:2. First, the K-means method was used for cluster analysis of tumor voxels, and sub-region radiomics features and classical radiomics features were respectively extracted from multi-parameter MRI sequences. Then, the synthetic minority over-sampling technique method was used to balance the sample size, and finally, the features were screened. Prediction models were established using logistic regression based on clinicopathological variables, classical radiomics features, and MSI-related sub-region radiomics features, and the contribution of each feature to the model decision was quantified by the Shapley-Additive-Explanations (SHAP) algorithm. The area under the curve (AUC) of the sub-region radiomics model in the training and testing groups was 0.848 and 0.8, respectively, both better than that of the classical radiomics and clinical models. The combined model performed the best, with AUCs of 0.908 and 0.863 in the training and testing groups, respectively. We developed and validated a robust combined model that integrates clinical variables, classical radiomics features, and sub-region radiomics features to accurately determine the MSI status of RC patients. We visualized the prediction process using SHAP, enabling more effective personalized treatment plans and ultimately improving RC patient survival rates.

MRI Classification Abdominal Retrospective Clinical In Silico Academic Lab Reproducibility

Physiological Confounds in BOLD-fMRI and Their Correction.

Addeh A, Williams RJ, Golestani A, Pike GB, MacDonald ME

•papers•Jul 1 2025

Functional magnetic resonance imaging (fMRI) has opened new frontiers in neuroscience by instrumentally driving our understanding of brain function and development. Despite its substantial successes, fMRI studies persistently encounter obstacles stemming from inherent, unavoidable physiological confounds. The adverse effects of these confounds are especially noticeable with higher magnetic fields, which have been gaining momentum in fMRI experiments. This review focuses on the four major physiological confounds impacting fMRI studies: low-frequency fluctuations in both breathing depth and rate, low-frequency fluctuations in the heart rate, thoracic movements, and cardiac pulsatility. Over the past three decades, numerous correction techniques have emerged to address these challenges. Correction methods have effectively enhanced the detection of task-activated voxels and minimized the occurrence of false positives and false negatives in functional connectivity studies. While confound correction methods have merit, they also have certain limitations. For instance, model-based approaches require externally recorded physiological data that is often unavailable in fMRI studies. Methods reliant on independent component analysis, on the other hand, need prior knowledge about the number of components. Machine learning techniques, although showing potential, are still in the early stages of development and require additional validation. This article reviews the mechanics of physiological confound correction methods, scrutinizes their performance and limitations, and discusses their impact on fMRI studies.

MRI Reconstruction Neurological Review Concept Academic Lab Reproducibility

Virtual lung screening trial (VLST): An in silico study inspired by the national lung screening trial for lung cancer detection.

Tushar FI, Vancoillie L, McCabe C, Kavuri A, Dahal L, Harrawood B, Fryling M, Zarei M, Sotoudeh-Paima S, Ho FC, Ghosh D, Harowicz MR, Tailor TD, Luo S, Segars WP, Abadi E, Lafata KJ, Lo JY, Samei E

•papers•Jul 1 2025

Clinical imaging trials play a crucial role in advancing medical innovation but are often costly, inefficient, and ethically constrained. Virtual Imaging Trials (VITs) present a solution by simulating clinical trial components in a controlled, risk-free environment. The Virtual Lung Screening Trial (VLST), an in silico study inspired by the National Lung Screening Trial (NLST), illustrates the potential of VITs to expedite clinical trials, minimize risks to participants, and promote optimal use of imaging technologies in healthcare. This study aimed to show that a virtual imaging trial platform could investigate some key elements of a major clinical trial, specifically the NLST, which compared Computed tomography (CT) and chest radiography (CXR) for lung cancer screening. With simulated cancerous lung nodules, a virtual patient cohort of 294 subjects was created using XCAT human models. Each virtual patient underwent both CT and CXR imaging, with deep learning models, the AI CT-Reader and AI CXR-Reader, acting as virtual readers to perform recall patients with suspicion of lung cancer. The primary outcome was the difference in diagnostic performance between CT and CXR, measured by the Area Under the Curve (AUC). The AI CT-Reader showed superior diagnostic accuracy, achieving an AUC of 0.92 (95 % CI: 0.90-0.95) compared to the AI CXR-Reader's AUC of 0.72 (95 % CI: 0.67-0.77). Furthermore, at the same 94 % CT sensitivity reported by the NLST, the VLST specificity of 73 % was similar to the NLST specificity of 73.4 %. This CT performance highlights the potential of VITs to replicate certain aspects of clinical trials effectively, paving the way toward a safe and efficient method for advancing imaging-based diagnostics.

Mixed Modality Classification Chest Retrospective Clinical In Silico Academic Lab Reproducibility

Deep Guess acceleration for explainable image reconstruction in sparse-view CT.

Loli Piccolomini E, Evangelista D, Morotti E

•papers•Jul 1 2025

Sparse-view Computed Tomography (CT) is an emerging protocol designed to reduce X-ray dose radiation in medical imaging. Reconstructions based on the traditional Filtered Back Projection algorithm suffer from severe artifacts due to sparse data. In contrast, Model-Based Iterative Reconstruction (MBIR) algorithms, though better at mitigating noise through regularization, are too computationally costly for clinical use. This paper introduces a novel technique, denoted as the Deep Guess acceleration scheme, using a trained neural network both to quicken the regularized MBIR and to enhance the reconstruction accuracy. We integrate state-of-the-art deep learning tools to initialize a clever starting guess for a proximal algorithm solving a non-convex model and thus computing a (mathematically) interpretable solution image in a few iterations. Experimental results on real and synthetic CT images demonstrate the Deep Guess effectiveness in (very) sparse tomographic protocols, where it overcomes its mere variational counterpart and many data-driven approaches at the state of the art. We also consider a ground truth-free implementation and test the robustness of the proposed framework to noise.

CT Reconstruction Methodology In Silico Academic Lab Reproducibility

The impact of updated imaging software on the performance of machine learning models for breast cancer diagnosis: a multi-center, retrospective study.

Cai L, Golatta M, Sidey-Gibbons C, Barr RG, Pfob A

•papers•Jul 1 2025

Artificial Intelligence models based on medical (imaging) data are increasingly developed. However, the imaging software on which the original data is generated is frequently updated. The impact of updated imaging software on the performance of AI models is unclear. We aimed to develop machine learning models using shear wave elastography (SWE) data to identify malignant breast lesions and to test the models' generalizability by validating them on external data generated by both the original updated software versions. We developed and validated different machine learning models (GLM, MARS, XGBoost, SVM) using multicenter, international SWE data (NCT02638935) using tenfold cross-validation. Findings were compared to the histopathologic evaluation of the biopsy specimen or 2-year follow-up. The outcome measure was the area under the curve (AUROC). We included 1288 cases in the development set using the original imaging software and 385 cases in the validation set using both, original and updated software. In the external validation set, the GLM and XGBoost models showed better performance with the updated software data compared to the original software data (AUROC 0.941 vs. 0.902, p < 0.001 and 0.934 vs. 0.872, p < 0.001). The MARS model showed worse performance with the updated software data (0.847 vs. 0.894, p = 0.045). SVM was not calibrated. In this multicenter study using SWE data, some machine learning models demonstrated great potential to bridge the gap between original software and updated software, whereas others exhibited weak generalizability.

Ultrasound Classification Breast Retrospective Clinical In Silico Academic Lab Reproducibility

Generalizable, sequence-invariant deep learning image reconstruction for subspace-constrained quantitative MRI.

Hu Z, Chen Z, Cao T, Lee HL, Xie Y, Li D, Christodoulou AG

•papers•Jul 1 2025

To develop a deep subspace learning network that can function across different pulse sequences. A contrast-invariant component-by-component (CBC) network structure was developed and compared against previously reported spatiotemporal multicomponent (MC) structure for reconstructing MR Multitasking images. A total of 130, 167, and 16 subjects were imaged using T1, T1-T2, and T1-T2- <math xmlns="http://www.w3.org/1998/Math/MathML"> <semantics> <mrow><msubsup><mi>T</mi> <mn>2</mn> <mo>*</mo></msubsup> </mrow> <annotation>$$ {\mathrm{T}}_2^{\ast } $$</annotation></semantics> </math> -fat fraction (FF) mapping sequences, respectively. We compared CBC and MC networks in matched-sequence experiments (same sequence for training and testing), then examined their cross-sequence performance and generalizability by unmatched-sequence experiments (different sequences for training and testing). A "universal" CBC network was also evaluated using mixed-sequence training (combining data from all three sequences). Evaluation metrics included image normalized root mean squared error and Bland-Altman analyses of end-diastolic maps, both versus iteratively reconstructed references. The proposed CBC showed significantly better normalized root mean squared error than MC in both matched-sequence and unmatched-sequence experiments (p < 0.001), fewer structural details in quantitative error maps, and tighter limits of agreement. CBC was more generalizable than MC (smaller performance loss; p = 0.006 in T1 and p < 0.001 in T1-T2 from matched-sequence testing to unmatched-sequence testing) and additionally allowed training of a single universal network to reconstruct images from any of the three pulse sequences. The mixed-sequence CBC network performed similarly to matched-sequence CBC in T1 (p = 0.178) and T1-T2 (p = 0121), where training data were plentiful, and performed better in T1-T2- <math xmlns="http://www.w3.org/1998/Math/MathML"> <semantics> <mrow><msubsup><mi>T</mi> <mn>2</mn> <mo>*</mo></msubsup> </mrow> <annotation>$$ {\mathrm{T}}_2^{\ast } $$</annotation></semantics> </math> -FF (p < 0.001) where training data were scarce. Contrast-invariant learning of spatial features rather than spatiotemporal features improves performance and generalizability, addresses data scarcity, and offers a pathway to universal supervised deep subspace learning.

MRI Reconstruction Methodology In Silico Academic Lab Reproducibility

Mind the Detail: Uncovering Clinically Relevant Image Details in Accelerated MRI with Semantically Diverse Reconstructions

Jan Nikolas Morshuis, Christian Schlarmann, Thomas Küstner, Christian F. Baumgartner, Matthias Hein

•preprint•Jul 1 2025

In recent years, accelerated MRI reconstruction based on deep learning has led to significant improvements in image quality with impressive results for high acceleration factors. However, from a clinical perspective image quality is only secondary; much more important is that all clinically relevant information is preserved in the reconstruction from heavily undersampled data. In this paper, we show that existing techniques, even when considering resampling for diffusion-based reconstruction, can fail to reconstruct small and rare pathologies, thus leading to potentially wrong diagnosis decisions (false negatives). To uncover the potentially missing clinical information we propose ``Semantically Diverse Reconstructions'' (\SDR), a method which, given an original reconstruction, generates novel reconstructions with enhanced semantic variability while all of them are fully consistent with the measured data. To evaluate \SDR automatically we train an object detector on the fastMRI+ dataset. We show that \SDR significantly reduces the chance of false-negative diagnoses (higher recall) and improves mean average precision compared to the original reconstructions. The code is available on https://github.com/NikolasMorshuis/SDR

MRI Reconstruction Methodology In Silico Academic Lab Open Code Reproducibility

Artificial Intelligence-Driven Cancer Diagnostics: Enhancing Radiology and Pathology through Reproducibility, Explainability, and Multimodality.

Ensemble methods and partially-supervised learning for accurate and robust automatic murine organ segmentation.

Agreement between Routine-Dose and Lower-Dose CT with and without Deep Learning-based Denoising for Active Surveillance of Solid Small Renal Masses: A Multiobserver Study.

Multi-parametric MRI Habitat Radiomics Based on Interpretable Machine Learning for Preoperative Assessment of Microsatellite Instability in Rectal Cancer.

Physiological Confounds in BOLD-fMRI and Their Correction.

Virtual lung screening trial (VLST): An in silico study inspired by the national lung screening trial for lung cancer detection.

Deep Guess acceleration for explainable image reconstruction in sparse-view CT.

The impact of updated imaging software on the performance of machine learning models for breast cancer diagnosis: a multi-center, retrospective study.

Generalizable, sequence-invariant deep learning image reconstruction for subspace-constrained quantitative MRI.

Mind the Detail: Uncovering Clinically Relevant Image Details in Accelerated MRI with Semantically Diverse Reconstructions

Ready to Sharpen Your Edge?