Latest Papers on Radiology AI. Tags: Reproducibility

AI in Action: A Roadmap from the Radiology AI Council for Effective Model Evaluation and Deployment.

Trivedi H, Khosravi B, Gichoya J, Benson L, Dyckman D, Galt J, Howard B, Kikano E, Kunjummen J, Lall N, Li X, Patel S, Safdar N, Salastekar N, Segovis C, van Assen M, Harri P

•papers•May 23 2025

As the integration of artificial intelligence (AI) into radiology workflows continues to evolve, establishing standardized processes for the evaluation and deployment of AI models is crucial to ensure success. This paper outlines the creation of a Radiology AI Council at a large academic center and subsequent development of framework in the form of a rubric to formalize the evaluation of radiology AI models and onboard them into clinical workflows. The rubric aims to address the challenges faced during the deployment of AI models, such as real-world model performance, workflow implementation, resource allocation, return on investment (ROI), and impact to the broader health system. Using this comprehensive rubric, the council aims to ensure that the process for selecting AI models is both standardized and transparent. This paper outlines the steps taken to establish this rubric, its components, and initial results from evaluation of 13 models over an 8-month period. We emphasize the importance of holistic model evaluation beyond performance metrics, and transparency and objectivity in AI model evaluation with the goal of improving the efficacy and safety of AI models in radiology.

Methodology Clinical Pilot Academic Lab Policy Reproducibility

Evaluation of a deep-learning segmentation model for patients with colorectal cancer liver metastases (COALA) in the radiological workflow.

Zeeuw M, Bereska J, Strampel M, Wagenaar L, Janssen B, Marquering H, Kemna R, van Waesberghe JH, van den Bergh J, Nota I, Moos S, Nio Y, Kop M, Kist J, Struik F, Wesdorp N, Nelissen J, Rus K, de Sitter A, Stoker J, Huiskens J, Verpalen I, Kazemier G

•papers•May 23 2025

For patients with colorectal liver metastases (CRLM), total tumor volume (TTV) is prognostic. A deep-learning segmentation model for CRLM to assess TTV called COlorectal cAncer Liver metastases Assessment (COALA) has been developed. This study evaluated COALA's performance and practical utility in the radiological picture archiving and communication system (PACS). A secondary aim was to provide lessons for future researchers on the implementation of artificial intelligence (AI) models. Patients discussed between January and December 2023 in a multidisciplinary meeting for CRLM were included. In those patients, CRLM was automatically segmented in portal-venous phase CT scans by COALA and integrated with PACS. Eight expert abdominal radiologists completed a questionnaire addressing segmentation accuracy and PACS integration. They were also asked to write down general remarks. In total, 57 patients were evaluated. Of those patients, 112 contrast-enhanced portal-venous phase CT scans were analyzed. Of eight radiologists, six (75%) evaluated the model as user-friendly in their radiological workflow. Areas of improvement of the COALA model were the segmentation of small lesions, heterogeneous lesions, and lesions at the border of the liver with involvement of the diaphragm or heart. Key lessons for implementation were a multidisciplinary approach, a robust method prior to model development and organizing evaluation sessions with end-users early in the development phase. This study demonstrates that the deep-learning segmentation model for patients with CRLM (COALA) is user-friendly in the radiologist's PACS. Future researchers striving for implementation should have a multidisciplinary approach, propose a robust methodology and involve end-users prior to model development. Many segmentation models are being developed, but none of those models are evaluated in the (radiological) workflow or clinically implemented. Our model is implemented in the radiological work system, providing valuable lessons for researchers to achieve clinical implementation. Developed segmentation models should be implemented in the radiological workflow. Our implemented segmentation model provides valuable lessons for future researchers. If implemented in clinical practice, our model could allow for objective radiological evaluation.

CT Segmentation Abdominal Retrospective Clinical Clinical Pilot Academic Lab Reproducibility

ActiveNaf: A novel NeRF-based approach for low-dose CT image reconstruction through active learning.

Zidane A, Shimshoni I

•papers•May 22 2025

CT imaging provides essential information about internal anatomy; however, conventional CT imaging delivers radiation doses that can become problematic for patients requiring repeated imaging, highlighting the need for dose-reduction techniques. This study aims to reduce radiation doses without compromising image quality. We propose an approach that combines Neural Attenuation Fields (NAF) with an active learning strategy to better optimize CT reconstructions given a limited number of X-ray projections. Our method uses a secondary neural network to predict the Peak Signal-to-Noise Ratio (PSNR) of 2D projections generated by NAF from a range of angles in the operational range of the CT scanner. This prediction serves as a guide for the active learning process in choosing the most informative projections. In contrast to conventional techniques that acquire all X-ray projections in a single session, our technique iteratively acquires projections. The iterative process improves reconstruction quality, reduces the number of required projections, and decreases patient radiation exposure. We tested our methodology on spinal imaging using a limited subset of the VerSe 2020 dataset. We compare image quality metrics (PSNR3D, SSIM3D, and PSNR2D) to the baseline method and find significant improvements. Our method achieves the same quality with 36 projections as the baseline method achieves with 60. Our findings demonstrate that our approach achieves high-quality 3D CT reconstructions from sparse data, producing clearer and more detailed images of anatomical structures. This work lays the groundwork for advanced imaging techniques, paving the way for safer and more efficient medical imaging procedures.

CT Reconstruction Musculoskeletal Methodology In Silico Academic Lab Reproducibility

Multi-modal Integration Analysis of Alzheimer's Disease Using Large Language Models and Knowledge Graphs

Kanan Kiguchi, Yunhao Tu, Katsuhiro Ajito, Fady Alnajjar, Kazuyuki Murase

•preprint•May 21 2025

We propose a novel framework for integrating fragmented multi-modal data in Alzheimer's disease (AD) research using large language models (LLMs) and knowledge graphs. While traditional multimodal analysis requires matched patient IDs across datasets, our approach demonstrates population-level integration of MRI, gene expression, biomarkers, EEG, and clinical indicators from independent cohorts. Statistical analysis identified significant features in each modality, which were connected as nodes in a knowledge graph. LLMs then analyzed the graph to extract potential correlations and generate hypotheses in natural language. This approach revealed several novel relationships, including a potential pathway linking metabolic risk factors to tau protein abnormalities via neuroinflammation (r>0.6, p<0.001), and unexpected correlations between frontal EEG channels and specific gene expression profiles (r=0.42-0.58, p<0.01). Cross-validation with independent datasets confirmed the robustness of major findings, with consistent effect sizes across cohorts (variance <15%). The reproducibility of these findings was further supported by expert review (Cohen's k=0.82) and computational validation. Our framework enables cross modal integration at a conceptual level without requiring patient ID matching, offering new possibilities for understanding AD pathology through fragmented data reuse and generating testable hypotheses for future research.

Mixed Modality Classification Neurological Methodology In Silico GenAI Reproducibility

Multi-modal Integration Analysis of Alzheimer's Disease Using Large Language Models and Knowledge Graphs

Kanan Kiguchi, Yunhao Tu, Katsuhiro Ajito, Fady Alnajjar, Kazuyuki Murase

•preprint•May 21 2025

MRI Classification Neurological Methodology In Silico GenAI Reproducibility

CONSIGN: Conformal Segmentation Informed by Spatial Groupings via Decomposition

Bruno Viti, Elias Karabelas, Martin Holler

•preprint•May 20 2025

Most machine learning-based image segmentation models produce pixel-wise confidence scores - typically derived from softmax outputs - that represent the model's predicted probability for each class label at every pixel. While this information can be particularly valuable in high-stakes domains such as medical imaging, these (uncalibrated) scores are heuristic in nature and do not constitute rigorous quantitative uncertainty estimates. Conformal prediction (CP) provides a principled framework for transforming heuristic confidence scores into statistically valid uncertainty estimates. However, applying CP directly to image segmentation ignores the spatial correlations between pixels, a fundamental characteristic of image data. This can result in overly conservative and less interpretable uncertainty estimates. To address this, we propose CONSIGN (Conformal Segmentation Informed by Spatial Groupings via Decomposition), a CP-based method that incorporates spatial correlations to improve uncertainty quantification in image segmentation. Our method generates meaningful prediction sets that come with user-specified, high-probability error guarantees. It is compatible with any pre-trained segmentation model capable of generating multiple sample outputs - such as those using dropout, Bayesian modeling, or ensembles. We evaluate CONSIGN against a standard pixel-wise CP approach across three medical imaging datasets and two COCO dataset subsets, using three different pre-trained segmentation models. Results demonstrate that accounting for spatial structure significantly improves performance across multiple metrics and enhances the quality of uncertainty estimates.

Segmentation Methodology In Silico Reproducibility

FreqSelect: Frequency-Aware fMRI-to-Image Reconstruction

Junliang Ye, Lei Wang, Md Zakir Hossain

•preprint•May 18 2025

Reconstructing natural images from functional magnetic resonance imaging (fMRI) data remains a core challenge in natural decoding due to the mismatch between the richness of visual stimuli and the noisy, low resolution nature of fMRI signals. While recent two-stage models, combining deep variational autoencoders (VAEs) with diffusion models, have advanced this task, they treat all spatial-frequency components of the input equally. This uniform treatment forces the model to extract meaning features and suppress irrelevant noise simultaneously, limiting its effectiveness. We introduce FreqSelect, a lightweight, adaptive module that selectively filters spatial-frequency bands before encoding. By dynamically emphasizing frequencies that are most predictive of brain activity and suppressing those that are uninformative, FreqSelect acts as a content-aware gate between image features and natural data. It integrates seamlessly into standard very deep VAE-diffusion pipelines and requires no additional supervision. Evaluated on the Natural Scenes dataset, FreqSelect consistently improves reconstruction quality across both low- and high-level metrics. Beyond performance gains, the learned frequency-selection patterns offer interpretable insights into how different visual frequencies are represented in the brain. Our method generalizes across subjects and scenes, and holds promise for extension to other neuroimaging modalities, offering a principled approach to enhancing both decoding accuracy and neuroscientific interpretability.

MRI Reconstruction Neurological Methodology In Silico Reproducibility

External Validation of a CT-Based Radiogenomics Model for the Detection of EGFR Mutation in NSCLC and the Impact of Prevalence in Model Building by Using Synthetic Minority Over Sampling (SMOTE): Lessons Learned.

Kohan AA, Mirshahvalad SA, Hinzpeter R, Kulanthaivelu R, Avery L, Ortega C, Metser U, Hope A, Veit-Haibach P

•papers•May 15 2025

Radiogenomics holds promise in identifying molecular alterations in nonsmall cell lung cancer (NSCLC) using imaging features. Previously, we developed a radiogenomics model to predict epidermal growth factor receptor (EGFR) mutations based on contrast-enhanced computed tomography (CECT) in NSCLC patients. The current study aimed to externally validate this model using a publicly available National Institutes of Health (NIH)-based NSCLC dataset and assess the effect of EGFR mutation prevalence on model performance through synthetic minority oversampling technique (SMOTE). The original radiogenomics model was validated on an independent NIH cohort (n=140). For assessing the influence of disease prevalence, six SMOTE-augmented datasets were created, simulating EGFR mutation prevalence from 25% to 50%. Seven models were developed (one from original data, six SMOTE-augmented), each undergoing rigorous cross-validation, feature selection, and logistic regression modeling. Models were tested against the NIH cohort. Performance was compared using area under the receiver operating characteristic curve (Area Under the Curve [AUC]), and differences between radiomic-only, clinical-only, and combined models were statistically assessed. External validation revealed poor diagnostic performance for both our model and a previously published EGFR radiomics model (AUC ∼0.5). The clinical model alone achieved higher diagnostic accuracy (AUC 0.74). SMOTE-augmented models showed increased sensitivity but did not improve overall AUC compared to the clinical-only model. Changing EGFR mutation prevalence had minimal impact on AUC, challenging previous assumptions about the influence of sample imbalance on model performance. External validation failed to reproduce prior radiogenomics model performance, while clinical variables alone retained strong predictive value. SMOTE-based oversampling did not improve diagnostic accuracy, suggesting that, in EGFR prediction, radiomics may offer limited value beyond clinical data. Emphasis on robust external validation and data-sharing is essential for future clinical implementation of radiogenomic models.

CT Classification Chest Retrospective Clinical In Silico Academic Lab Reproducibility

Challenges in Implementing Artificial Intelligence in Breast Cancer Screening Programs: Systematic Review and Framework for Safe Adoption.

Goh S, Goh RSJ, Chong B, Ng QX, Koh GCH, Ngiam KY, Hartman M

•papers•May 15 2025

Artificial intelligence (AI) studies show promise in enhancing accuracy and efficiency in mammographic screening programs worldwide. However, its integration into clinical workflows faces several challenges, including unintended errors, the need for professional training, and ethical concerns. Notably, specific frameworks for AI imaging in breast cancer screening are still lacking. This study aims to identify the challenges associated with implementing AI in breast screening programs and to apply the Consolidated Framework for Implementation Research (CFIR) to discuss a practical governance framework for AI in this context. Three electronic databases (PubMed, Embase, and MEDLINE) were searched using combinations of the keywords "artificial intelligence," "regulation," "governance," "breast cancer," and "screening." Original studies evaluating AI in breast cancer detection or discussing challenges related to AI implementation in this setting were eligible for review. Findings were narratively synthesized and subsequently mapped directly onto the constructs within the CFIR. A total of 1240 results were retrieved, with 20 original studies ultimately included in this systematic review. The majority (n=19) focused on AI-enhanced mammography, while 1 addressed AI-enhanced ultrasound for women with dense breasts. Most studies originated from the United States (n=5) and the United Kingdom (n=4), with publication years ranging from 2019 to 2023. The quality of papers was rated as moderate to high. The key challenges identified were reproducibility, evidentiary standards, technological concerns, trust issues, as well as ethical, legal, societal concerns, and postadoption uncertainty. By aligning these findings with the CFIR constructs, action plans targeting the main challenges were incorporated into the framework, facilitating a structured approach to addressing these issues. This systematic review identifies key challenges in implementing AI in breast cancer screening, emphasizing the need for consistency, robust evidentiary standards, technological advancements, user trust, ethical frameworks, legal safeguards, and societal benefits. These findings can serve as a blueprint for policy makers, clinicians, and AI developers to collaboratively advance AI adoption in breast cancer screening. PROSPERO CRD42024553889; https://tinyurl.com/mu4nwcxt.

Mammography Detection Breast Review In Silico Academic Lab Policy Ethics Reproducibility

Fed-ComBat: A Generalized Federated Framework for Batch Effect Harmonization in Collaborative Studies

Silva, S., Lorenzi, M., Altmann, A., Oxtoby, N.

•preprint•May 14 2025

In neuroimaging research, the utilization of multi-centric analyses is crucial for obtaining sufficient sample sizes and representative clinical populations. Data harmonization techniques are typically part of the pipeline in multi-centric studies to address systematic biases and ensure the comparability of the data. However, most multi-centric studies require centralized data, which may result in exposing individual patient information. This poses a significant challenge in data governance, leading to the implementation of regulations such as the GDPR and the CCPA, which attempt to address these concerns but also hinder data access for researchers. Federated learning offers a privacy-preserving alternative approach in machine learning, enabling models to be collaboratively trained on decentralized data without the need for data centralization or sharing. In this paper, we present Fed-ComBat, a federated framework for batch effect harmonization on decentralized data. Fed-ComBat extends existing centralized linear methods, such as ComBat and distributed as d-ComBat, and nonlinear approaches like ComBat-GAM in accounting for potentially nonlinear and multivariate covariate effects. By doing so, Fed-ComBat enables the preservation of nonlinear covariate effects without requiring centralization of data and without prior knowledge of which variables should be considered nonlinear or their interactions, differentiating it from ComBat-GAM. We assessed Fed-ComBat and existing approaches on simulated data and multiple cohorts comprising healthy controls (CN) and subjects with various disorders such as Parkinson's disease (PD), Alzheimer's disease (AD), and autism spectrum disorder (ASD). The results of our study show that Fed-ComBat performs better than centralized ComBat when dealing with nonlinear effects and is on par with centralized methods like ComBat-GAM. Through experiments using synthetic data, Fed-ComBat demonstrates a superior ability to reconstruct the target unbiased function, achieving a 35% improvement (RMSE=0.5952) compared to d-ComBat (RMSE=0.9162) and a 12% improvement compared to our proposal to federate ComBat-GAM, d-ComBat-GAM (RMSE=0.6751). Additionally, Fed-ComBat achieves comparable results to centralized methods like ComBat-GAM for MRI-derived phenotypes without requiring prior knowledge of potential nonlinearities.

MRI Reconstruction Neurological Methodology In Silico Academic Lab Reproducibility

Filter Papers

Tags

AI in Action: A Roadmap from the Radiology AI Council for Effective Model Evaluation and Deployment.

Evaluation of a deep-learning segmentation model for patients with colorectal cancer liver metastases (COALA) in the radiological workflow.

ActiveNaf: A novel NeRF-based approach for low-dose CT image reconstruction through active learning.

Multi-modal Integration Analysis of Alzheimer's Disease Using Large Language Models and Knowledge Graphs

Multi-modal Integration Analysis of Alzheimer's Disease Using Large Language Models and Knowledge Graphs

CONSIGN: Conformal Segmentation Informed by Spatial Groupings via Decomposition

FreqSelect: Frequency-Aware fMRI-to-Image Reconstruction

External Validation of a CT-Based Radiogenomics Model for the Detection of EGFR Mutation in NSCLC and the Impact of Prevalence in Model Building by Using Synthetic Minority Over Sampling (SMOTE): Lessons Learned.

Challenges in Implementing Artificial Intelligence in Breast Cancer Screening Programs: Systematic Review and Framework for Safe Adoption.

Fed-ComBat: A Generalized Federated Framework for Batch Effect Harmonization in Collaborative Studies

Ready to Sharpen Your Edge?