Latest Papers on Radiology AI. Tags: None

An autonomous agent for auditing and improving the reliability of clinical AI models

Lukas Kuhn, Florian Buettner

•preprint•Jul 8 2025

The deployment of AI models in clinical practice faces a critical challenge: models achieving expert-level performance on benchmarks can fail catastrophically when confronted with real-world variations in medical imaging. Minor shifts in scanner hardware, lighting or demographics can erode accuracy, but currently reliability auditing to identify such catastrophic failure cases before deployment is a bespoke and time-consuming process. Practitioners lack accessible and interpretable tools to expose and repair hidden failure modes. Here we introduce ModelAuditor, a self-reflective agent that converses with users, selects task-specific metrics, and simulates context-dependent, clinically relevant distribution shifts. ModelAuditor then generates interpretable reports explaining how much performance likely degrades during deployment, discussing specific likely failure modes and identifying root causes and mitigation strategies. Our comprehensive evaluation across three real-world clinical scenarios - inter-institutional variation in histopathology, demographic shifts in dermatology, and equipment heterogeneity in chest radiography - demonstrates that ModelAuditor is able correctly identify context-specific failure modes of state-of-the-art models such as the established SIIM-ISIC melanoma classifier. Its targeted recommendations recover 15-25% of performance lost under real-world distribution shift, substantially outperforming both baseline models and state-of-the-art augmentation methods. These improvements are achieved through a multi-agent architecture and execute on consumer hardware in under 10 minutes, costing less than US$0.50 per audit.

X-Ray Classification Chest Methodology In Silico Academic Lab Benchmark SOTA Reproducibility

A novel framework for fully-automated co-registration of intravascular ultrasound and optical coherence tomography imaging data

Xingwei He, Kit Mills Bransby, Ahmet Emir Ulutas, Thamil Kumaran, Nathan Angelo Lecaros Yap, Gonul Zeren, Hesong Zeng, Yaojun Zhang, Andreas Baumbach, James Moon, Anthony Mathur, Jouke Dijkstra, Qianni Zhang, Lorenz Raber, Christos V Bourantas

•preprint•Jul 8 2025

Aims: To develop a deep-learning (DL) framework that will allow fully automated longitudinal and circumferential co-registration of intravascular ultrasound (IVUS) and optical coherence tomography (OCT) images. Methods and results: Data from 230 patients (714 vessels) with acute coronary syndrome that underwent near-infrared spectroscopy (NIRS)-IVUS and OCT imaging in their non-culprit vessels were included in the present analysis. The lumen borders annotated by expert analysts in 61,655 NIRS-IVUS and 62,334 OCT frames, and the side branches and calcific tissue identified in 10,000 NIRS-IVUS frames and 10,000 OCT frames, were used to train DL solutions for the automated extraction of these features. The trained DL solutions were used to process NIRS-IVUS and OCT images and their output was used by a dynamic time warping algorithm to co-register longitudinally the NIRS-IVUS and OCT images, while the circumferential registration of the IVUS and OCT was optimized through dynamic programming. On a test set of 77 vessels from 22 patients, the DL method showed high concordance with the expert analysts for the longitudinal and circumferential co-registration of the two imaging sets (concordance correlation coefficient >0.99 for the longitudinal and >0.90 for the circumferential co-registration). The Williams Index was 0.96 for longitudinal and 0.97 for circumferential co-registration, indicating a comparable performance to the analysts. The time needed for the DL pipeline to process imaging data from a vessel was <90s. Conclusion: The fully automated, DL-based framework introduced in this study for the co-registration of IVUS and OCT is fast and provides estimations that compare favorably to the expert analysts. These features renders it useful in research in the analysis of large-scale data collected in studies that incorporate multimodality imaging to characterize plaque composition.

Mixed Modality Registration Cardiac Methodology In Silico Academic Lab

LangMamba: A Language-driven Mamba Framework for Low-dose CT Denoising with Vision-language Models

Zhihao Chen, Tao Chen, Chenhui Wang, Qi Gao, Huidong Xie, Chuang Niu, Ge Wang, Hongming Shan

•preprint•Jul 8 2025

Low-dose computed tomography (LDCT) reduces radiation exposure but often degrades image quality, potentially compromising diagnostic accuracy. Existing deep learning-based denoising methods focus primarily on pixel-level mappings, overlooking the potential benefits of high-level semantic guidance. Recent advances in vision-language models (VLMs) suggest that language can serve as a powerful tool for capturing structured semantic information, offering new opportunities to improve LDCT reconstruction. In this paper, we introduce LangMamba, a Language-driven Mamba framework for LDCT denoising that leverages VLM-derived representations to enhance supervision from normal-dose CT (NDCT). LangMamba follows a two-stage learning strategy. First, we pre-train a Language-guided AutoEncoder (LangAE) that leverages frozen VLMs to map NDCT images into a semantic space enriched with anatomical information. Second, we synergize LangAE with two key components to guide LDCT denoising: Semantic-Enhanced Efficient Denoiser (SEED), which enhances NDCT-relevant local semantic while capturing global features with efficient Mamba mechanism, and Language-engaged Dual-space Alignment (LangDA) Loss, which ensures that denoised images align with NDCT in both perceptual and semantic spaces. Extensive experiments on two public datasets demonstrate that LangMamba outperforms conventional state-of-the-art methods, significantly improving detail preservation and visual fidelity. Remarkably, LangAE exhibits strong generalizability to unseen datasets, thereby reducing training costs. Furthermore, LangDA loss improves explainability by integrating language-guided insights into image reconstruction and offers a plug-and-play fashion. Our findings shed new light on the potential of language as a supervisory signal to advance LDCT denoising. The code is publicly available on https://github.com/hao1635/LangMamba.

CT Reconstruction Whole Body Methodology In Silico Academic Lab Open Code GenAI

Mitigating Multi-Sequence 3D Prostate MRI Data Scarcity through Domain Adaptation using Locally-Trained Latent Diffusion Models for Prostate Cancer Detection

Emerson P. Grabke, Babak Taati, Masoom A. Haider

•preprint•Jul 8 2025

Objective: Latent diffusion models (LDMs) could mitigate data scarcity challenges affecting machine learning development for medical image interpretation. The recent CCELLA LDM improved prostate cancer detection performance using synthetic MRI for classifier training but was limited to the axial T2-weighted (AxT2) sequence, did not investigate inter-institutional domain shift, and prioritized radiology over histopathology outcomes. We propose CCELLA++ to address these limitations and improve clinical utility. Methods: CCELLA++ expands CCELLA for simultaneous biparametric prostate MRI (bpMRI) generation, including the AxT2, high b-value diffusion series (HighB) and apparent diffusion coefficient map (ADC). Domain adaptation was investigated by pretraining classifiers on real or LDM-generated synthetic data from an internal institution, followed with fine-tuning on progressively smaller fractions of an out-of-distribution, external dataset. Results: CCELLA++ improved 3D FID for HighB and ADC but not AxT2 (0.013, 0.012, 0.063 respectively) sequences compared to CCELLA (0.060). Classifier pretraining with CCELLA++ bpMRI outperformed real bpMRI in AP and AUC for all domain adaptation scenarios. CCELLA++ pretraining achieved highest classifier performance below 50% (n=665) external dataset volume. Conclusion: Synthetic bpMRI generated by our method can improve downstream classifier generalization and performance beyond real bpMRI or CCELLA-generated AxT2-only images. Future work should seek to quantify medical image sample quality, balance multi-sequence LDM training, and condition the LDM with additional information. Significance: The proposed CCELLA++ LDM can generate synthetic bpMRI that outperforms real data for domain adaptation with a limited target institution dataset. Our code is available at https://github.com/grabkeem/CCELLA-plus-plus

MRI Image Synthesis Abdominal Methodology In Silico Academic Lab Open Code

Machine learning models using non-invasive tests & B-mode ultrasound to predict liver-related outcomes in metabolic dysfunction-associated steatotic liver disease.

Kosick HM, McIntosh C, Bera C, Fakhriyehasl M, Shengir M, Adeyi O, Amiri L, Sebastiani G, Jhaveri K, Patel K

•papers•Jul 8 2025

Advanced metabolic-dysfunction-associated steatotic liver disease (MASLD) fibrosis (F3-4) predicts liver-related outcomes. Serum and elastography-based non-invasive tests (NIT) cannot yet reliably predict MASLD outcomes. The role of B-mode ultrasound (US) for outcome prediction is not yet known. We aimed to evaluate machine learning (ML) algorithms based on simple NIT and US for prediction of adverse liver-related outcomes in MASLD. Retrospective cohort study of adult MASLD patients biopsied between 2010-2021 at one of two Canadian tertiary care centers. Random forest was used to create predictive models for outcomes-hepatic decompensation, liver-related outcomes (decompensation, hepatocellular carcinoma (HCC), liver transplant, and liver-related mortality), HCC, liver-related mortality, F3-4, and fibrotic metabolic dysfunction-associated steatohepatitis (MASH). Diagnostic performance was assessed using area under the curve (AUC). 457 MASLD patients were included with 44.9% F3-4, diabetes prevalence 31.6%, 53.8% male, mean age 49.2 and BMI 32.8 kg/m<sup>2</sup>. 6.3% had an adverse liver-related outcome over mean 43 months follow-up. AUC for ML predictive models were-hepatic decompensation 0.90(0.79-0.98), liver-related outcomes 0.87(0.76-0.96), HCC 0.72(0.29-0.96), liver-related mortality 0.79(0.31-0.98), F3-4 0.83(0.76-0.87), and fibrotic MASH 0.74(0.65-0.85). Biochemical and clinical variables had greatest feature importance overall, compared to US parameters. FIB-4 and AST:ALT ratio were highest ranked biochemical variables, while age was the highest ranked clinical variable. ML models based on clinical, biochemical, and US-based variables accurately predict adverse MASLD outcomes in this multi-centre cohort. Overall, biochemical variables had greatest feature importance. US-based features were not substantial predictors of outcomes in this study.

Ultrasound Classification Abdominal Retrospective Clinical In Silico Academic Lab

A confidence-guided Unsupervised domain adaptation network with pseudo-labeling and deformable CNN-transformer for medical image segmentation.

Zhou J, Xu Y, Liu Z, Pfaender F, Liu W

•papers•Jul 8 2025

Unsupervised domain adaptation (UDA) methods have achieved significant progress in medical image segmentation. Nevertheless, the significant differences between the source and target domains remain a daunting barrier, creating an urgent need for more robust cross-domain solutions. Current UDA techniques generally employ a fixed, unvarying feature alignment procedure to reduce inter-domain differences throughout the training process. This rigidity disregards the shifting nature of feature distributions throughout the training process, leading to suboptimal performance in boundary delineation and detail retention on the target domain. A novel confidence-guided unsupervised domain adaptation network (CUDA-Net) is introduced to overcome persistent domain gaps, adapt to shifting feature distributions during training, and enhance boundary delineation in the target domain. This proposed network adaptively aligns features by tracking cross-domain distribution shifts throughout training, starting with adversarial alignment at early stages (coarse) and transitioning to pseudo-label-driven alignment at later stages (fine-grained), thereby leading to more accurate segmentation in the target domain. A confidence-weighted mechanism then refines these pseudo labels by prioritizing high-confidence regions while allowing low-confidence areas to be gradually explored, thereby enhancing both label reliability and overall model stability. Experiments on three representative medical image datasets, namely MMWHS17, BraTS2021, and VS-Seg, confirm the superiority of CUDA-Net. Notably, CUDA-Net outperforms eight leading methods in terms of overall segmentation accuracy (Dice) and boundary extraction precision (ASD), highlighting that it offers an efficient and reliable solution for cross-domain medical image segmentation.

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA

Attention-Enhanced Deep Learning Ensemble for Breast Density Classification in Mammography

Peyman Sharifian, Xiaotong Hong, Alireza Karimian, Mehdi Amini, Hossein Arabi

•preprint•Jul 8 2025

Breast density assessment is a crucial component of mammographic interpretation, with high breast density (BI-RADS categories C and D) representing both a significant risk factor for developing breast cancer and a technical challenge for tumor detection. This study proposes an automated deep learning system for robust binary classification of breast density (low: A/B vs. high: C/D) using the VinDr-Mammo dataset. We implemented and compared four advanced convolutional neural networks: ResNet18, ResNet50, EfficientNet-B0, and DenseNet121, each enhanced with channel attention mechanisms. To address the inherent class imbalance, we developed a novel Combined Focal Label Smoothing Loss function that integrates focal loss, label smoothing, and class-balanced weighting. Our preprocessing pipeline incorporated advanced techniques, including contrast-limited adaptive histogram equalization (CLAHE) and comprehensive data augmentation. The individual models were combined through an optimized ensemble voting approach, achieving superior performance (AUC: 0.963, F1-score: 0.952) compared to any single model. This system demonstrates significant potential to standardize density assessments in clinical practice, potentially improving screening efficiency and early cancer detection rates while reducing inter-observer variability among radiologists.

Mammography Classification Breast Methodology In Silico Academic Lab

Integrating Machine Learning into Myositis Research: a Systematic Review.

Juarez-Gomez C, Aguilar-Vazquez A, Gonzalez-Gauna E, Garcia-Ordoñez GP, Martin-Marquez BT, Gomez-Rios CA, Becerra-Jimenez J, Gaspar-Ruiz A, Vazquez-Del Mercado M

•papers•Jul 8 2025

Idiopathic inflammatory myopathies (IIM) are a group of autoimmune rheumatic diseases characterized by proximal muscle weakness and extra muscular manifestations. Since 1975, these IIM have been classified into different clinical phenotypes. Each clinical phenotype is associated with a better or worse prognosis and a particular physiopathology. Machine learning (ML) is a fascinating field of knowledge with worldwide applications in different fields. In IIM, ML is an emerging tool assessed in very specific clinical contexts as a complementary tool for research purposes, including transcriptome profiles in muscle biopsies, differential diagnosis using magnetic resonance imaging (MRI), and ultrasound (US). With the cancer-associated risk and predisposing factors for interstitial lung disease (ILD) development, this systematic review evaluates 23 original studies using supervised learning models, including logistic regression (LR), random forest (RF), support vector machines (SVM), and convolutional neural networks (CNN), with performance assessed primarily through the area under the curve coupled with the receiver operating characteristic (AUC-ROC).

Mixed Modality Classification Musculoskeletal Review Concept Academic Lab Benchmark SOTA

Uncertainty and normalized glandular dose evaluations in digital mammography and digital breast tomosynthesis with a machine learning methodology.

Sarno A, Massera RT, Paternò G, Cardarelli P, Marshall N, Bosmans H, Bliznakova K

•papers•Jul 8 2025

To predict the normalized glandular dose (DgN) coefficients and the related uncertainty in mammography and digital breast tomosynthesis (DBT) using a machine learning algorithm and patient-like digital breast models. 126 patient-like digital breast phantoms were used for DgN Monte Carlo ground truth calculations. An Automatic Relevance Determination Regression algorithm was used to predict DgN from anatomical breast features. These features included compressed breast thickness, glandular fraction by volume, glandular volume, center of mass and standard deviation of the glandular tissue distribution in the cranio-caudal direction. An algorithm for data imputation was explored to account for avoiding the use of the latter two features. 5-fold cross validation showed that the predictive model provides an estimation of DgN with 1% average difference from the ground truth; this difference was less than 3% in 50% of the cases. The average uncertainty of the estimated DgN values was 9%. Excluding the information related to the glandular distribution increased this uncertainty to 17% without inducing a significant discrepancy in estimated DgN values, with half of the predicted cases differing from the ground truth by less than 9%. The data imputation algorithm reduced the estimated uncertainty, without restoring the original performance. Predictive performance improved by increasing tube voltage. The proposed methodology predicts the DgN in mammography and DBT for patient-derived breasts with an uncertainty below 9%. Predicting test evaluations reported 1% average difference from the ground truth, with 50% of the cohort cases differing by less than 5%.

Mammography Registration Breast Methodology In Silico Academic Lab

Assessment of T2-weighted MRI-derived synthetic CT for the detection of suspected lumbar facet arthritis: a comparative analysis with conventional CT.

Cao G, Wang H, Xie S, Cai D, Guo J, Zhu J, Ye K, Wang Y, Xia J

•papers•Jul 8 2025

We evaluated sCT generated from T2-weighted imaging (T2WI) using deep learning techniques to detect structural lesions in lumbar facet arthritis, with conventional CT as the reference standard. This single-center retrospective study included 40 patients who had lumbar MRI and CT with in 1 week (September 2020 to August 2021). A Pix2Pix-GAN framework generated CT images from MRI data, and image quality was assessed using structural similarity index (SSIM), mean absolute error (MAE), peak signal-to-noise ratio (PSNR), nd Dice similarity coefficient (DSC). Two senior radiologists evaluated 15 anatomical landmarks. Sensitivity, specificity, and accuracy for detecting bone erosion, osteosclerosis, and joint space alterations were analyzed for sCT, T2-weighted MRI, and conventional CT. Forty participants (21 men, 19 women) were enrolled, with a mean age of 39 ± 16.9 years. sCT showed strong agreement with conventional CT, with SSIM values of 0.888 for axial and 0.889 for sagittal views. PSNR and MAE values were 24.56 dB and 0.031 for axial and 23.75 dB and 0.038 for sagittal views, respectively. DSC values were 0.935 for axial and 0.876 for sagittal views. sCT showed excellent intra- and inter-reader reliability intraclass correlation coefficients (0.953-0.995 and 0.839-0.983, respectively). sCT had higher sensitivity (57.9% vs. 5.3%), specificity (98.8% vs. 84.6%), and accuracy (93.0% vs. 73.3%) for bone erosion than T2-weighted MRI and outperformed it for osteosclerosis and joint space changes. sCT outperformed conventional T2-weighted MRI in detecting structural lesions indicative of lumbar facet arthritis, with conventional CT as the reference standard.

Mixed Modality Image Synthesis Musculoskeletal Retrospective Clinical In Silico Academic Lab

Filter Papers

Tags

An autonomous agent for auditing and improving the reliability of clinical AI models

A novel framework for fully-automated co-registration of intravascular ultrasound and optical coherence tomography imaging data

LangMamba: A Language-driven Mamba Framework for Low-dose CT Denoising with Vision-language Models

Mitigating Multi-Sequence 3D Prostate MRI Data Scarcity through Domain Adaptation using Locally-Trained Latent Diffusion Models for Prostate Cancer Detection

Machine learning models using non-invasive tests & B-mode ultrasound to predict liver-related outcomes in metabolic dysfunction-associated steatotic liver disease.

A confidence-guided Unsupervised domain adaptation network with pseudo-labeling and deformable CNN-transformer for medical image segmentation.

Attention-Enhanced Deep Learning Ensemble for Breast Density Classification in Mammography

Integrating Machine Learning into Myositis Research: a Systematic Review.

Uncertainty and normalized glandular dose evaluations in digital mammography and digital breast tomosynthesis with a machine learning methodology.

Assessment of T2-weighted MRI-derived synthetic CT for the detection of suspected lumbar facet arthritis: a comparative analysis with conventional CT.

Ready to Sharpen Your Edge?