Latest Papers on Radiology AI. Tags: Benchmark SOTA

Detecting, Characterizing, and Mitigating Implicit and Explicit Racial Biases in Health Care Datasets With Subgroup Learnability: Algorithm Development and Validation Study.

Gulamali F, Sawant AS, Liharska L, Horowitz C, Chan L, Hofer I, Singh K, Richardson L, Mensah E, Charney A, Reich D, Hu J, Nadkarni G

•papers•Sep 4 2025

The growing adoption of diagnostic and prognostic algorithms in health care has led to concerns about the perpetuation of algorithmic bias against disadvantaged groups of individuals. Deep learning methods to detect and mitigate bias have revolved around modifying models, optimization strategies, and threshold calibration with varying levels of success and tradeoffs. However, there have been limited substantive efforts to address bias at the level of the data used to generate algorithms in health care datasets. The aim of this study is to create a simple metric (AEquity) that uses a learning curve approximation to distinguish and mitigate bias via guided dataset collection or relabeling. We demonstrate this metric in 2 well-known examples, chest X-rays and health care cost utilization, and detect novel biases in the National Health and Nutrition Examination Survey. We demonstrated that using AEquity to guide data-centric collection for each diagnostic finding in the chest radiograph dataset decreased bias by between 29% and 96.5% when measured by differences in area under the curve. Next, we wanted to examine (1) whether AEquity worked on intersectional populations and (2) if AEquity is invariant to different types of fairness metrics, not just area under the curve. Subsequently, we examined the effect of AEquity on mitigating bias when measured by false negative rate, precision, and false discovery rate for Black patients on Medicaid. When we examined Black patients on Medicaid, at the intersection of race and socioeconomic status, we found that AEquity-based interventions reduced bias across a number of different fairness metrics including overall false negative rate by 33.3% (bias reduction absolute=1.88×10-1, 95% CI 1.4×10-1 to 2.5×10-1; bias reduction of 33.3%, 95% CI 26.6%-40%; precision bias by 7.50×10-2, 95% CI 7.48×10-2 to 7.51×10-2; bias reduction of 94.6%, 95% CI 94.5%-94.7%; false discovery rate by 94.5%; absolute bias reduction=3.50×10-2, 95% CI 3.49×10-2 to 3.50×10-2). Similarly, AEquity-guided data collection demonstrated bias reduction of up to 80% on mortality prediction with the National Health and Nutrition Examination Survey (bias reduction absolute=0.08, 95% CI 0.07-0.09). Then, we wanted to compare AEquity to state-of-the-art data-guided debiasing measures such as balanced empirical risk minimization and calibration. Consequently, we benchmarked against balanced empirical risk minimization and calibration and showed that AEquity-guided data collection outperforms both standard approaches. Moreover, we demonstrated that AEquity works on fully connected networks; convolutional neural networks such as ResNet-50; transformer architectures such as VIT-B-16, a vision transformer with 86 million parameters; and nonparametric methods such as Light Gradient-Boosting Machine. In short, we demonstrated that AEquity is a robust tool by applying it to different datasets, algorithms, and intersectional analyses and measuring its effectiveness with respect to a range of traditional fairness metrics.

X-Ray Classification Chest Methodology In Silico Ethics Benchmark SOTA

Deep Learning Based Multiomics Model for Risk Stratification of Postoperative Distant Metastasis in Colorectal Cancer.

Yao X, Han X, Huang D, Zheng Y, Deng S, Ning X, Yuan L, Ao W

•papers•Sep 4 2025

To develop deep learning-based multiomics models for predicting postoperative distant metastasis (DM) and evaluating survival prognosis in colorectal cancer (CRC) patients. This retrospective study included 521 CRC patients who underwent curative surgery at two centers. Preoperative CT and postoperative hematoxylin-eosin (HE) stained slides were collected. A total of 381 patients from Center 1 were split (7:3) into training and internal validation sets; 140 patients from Center 2 formed the independent external validation set. Patients were grouped based on DM status during follow-up. Radiological and pathological models were constructed using independent imaging and pathological predictors. Deep features were extracted with a ResNet-101 backbone to build deep learning radiomics (DLRS) and deep learning pathomics (DLPS) models. Two integrated models were developed: Nomogram 1 (radiological + DLRS) and Nomogram 2 (pathological + DLPS). CT- reported T (cT) stage (OR=2.00, P=0.006) and CT-reported N (cN) stage (OR=1.63, P=0.023) were identified as independent radiologic predictors for building the radiological model; pN stage (OR=1.91, P=0.003) and perineural invasion (OR=2.07, P=0.030) were identified as pathological predictors for building the pathological model. DLRS and DLPS incorporated 28 and 30 deep features, respectively. In the training set, area under the curve (AUC) for radiological, pathological, DLRS, DLPS, Nomogram 1, and Nomogram 2 models were 0.657, 0.687, 0.931, 0.914, 0.938, and 0.930. DeLong's test showed DLRS, DLPS, and both nomograms significantly outperformed conventional models (P<.05). Kaplan-Meier analysis confirmed effective 3-year disease-free survival (DFS) stratification by the nomograms. Deep learning-based multiomics models provided high accuracy for postoperative DM prediction. Nomogram models enabled reliable DFS risk stratification in CRC patients.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

A Foundation Model for Chest X-ray Interpretation with Grounded Reasoning via Online Reinforcement Learning

Qika Lin, Yifan Zhu, Bin Pu, Ling Huang, Haoran Luo, Jingying Ma, Zhen Peng, Tianzhe Zhao, Fangzhi Xu, Jian Zhang, Kai He, Zhonghong Ou, Swapnil Mishra, Mengling Feng

•preprint•Sep 4 2025

Medical foundation models (FMs) have shown tremendous promise amid the rapid advancements in artificial intelligence (AI) technologies. However, current medical FMs typically generate answers in a black-box manner, lacking transparent reasoning processes and locally grounded interpretability, which hinders their practical clinical deployments. To this end, we introduce DeepMedix-R1, a holistic medical FM for chest X-ray (CXR) interpretation. It leverages a sequential training pipeline: initially fine-tuned on curated CXR instruction data to equip with fundamental CXR interpretation capabilities, then exposed to high-quality synthetic reasoning samples to enable cold-start reasoning, and finally refined via online reinforcement learning to enhance both grounded reasoning quality and generation performance. Thus, the model produces both an answer and reasoning steps tied to the image's local regions for each query. Quantitative evaluation demonstrates substantial improvements in report generation (e.g., 14.54% and 31.32% over LLaVA-Rad and MedGemma) and visual question answering (e.g., 57.75% and 23.06% over MedGemma and CheXagent) tasks. To facilitate robust assessment, we propose Report Arena, a benchmarking framework using advanced language models to evaluate answer quality, further highlighting the superiority of DeepMedix-R1. Expert review of generated reasoning steps reveals greater interpretability and clinical plausibility compared to the established Qwen2.5-VL-7B model (0.7416 vs. 0.2584 overall preference). Collectively, our work advances medical FM development toward holistic, transparent, and clinically actionable modeling for CXR interpretation.

X-Ray Report Generation Chest Methodology In Silico Academic Lab GenAI Benchmark SOTA

A Generative Foundation Model for Chest Radiography

Yuanfeng Ji, Dan Lin, Xiyue Wang, Lu Zhang, Wenhui Zhou, Chongjian Ge, Ruihang Chu, Xiaoli Yang, Junhan Zhao, Junsong Chen, Xiangde Luo, Sen Yang, Jin Fang, Ping Luo, Ruijiang Li

•preprint•Sep 4 2025

The scarcity of well-annotated diverse medical images is a major hurdle for developing reliable AI models in healthcare. Substantial technical advances have been made in generative foundation models for natural images. Here we develop `ChexGen', a generative vision-language foundation model that introduces a unified framework for text-, mask-, and bounding box-guided synthesis of chest radiographs. Built upon the latent diffusion transformer architecture, ChexGen was pretrained on the largest curated chest X-ray dataset to date, consisting of 960,000 radiograph-report pairs. ChexGen achieves accurate synthesis of radiographs through expert evaluations and quantitative metrics. We demonstrate the utility of ChexGen for training data augmentation and supervised pretraining, which led to performance improvements across disease classification, detection, and segmentation tasks using a small fraction of training data. Further, our model enables the creation of diverse patient cohorts that enhance model fairness by detecting and mitigating demographic biases. Our study supports the transformative role of generative foundation models in building more accurate, data-efficient, and equitable medical AI systems.

X-Ray Image Synthesis Chest Methodology In Silico Academic Lab GenAI Benchmark SOTA Open Dataset

MetaPredictomics: A Comprehensive Approach to Predict Postsurgical Non-Small Cell Lung Cancer Recurrence Using Clinicopathologic, Radiomics, and Organomics Data.

Amini M, Hajianfar G, Salimi Y, Mansouri Z, Zaidi H

•papers•Sep 3 2025

Non-small cell lung cancer (NSCLC) is a complex disease characterized by diverse clinical, genetic, and histopathologic traits, necessitating personalized treatment approaches. While numerous biomarkers have been introduced for NSCLC prognostication, no single source of information can provide a comprehensive understanding of the disease. However, integrating biomarkers from multiple sources may offer a holistic view of the disease, enabling more accurate predictions. In this study, we present MetaPredictomics, a framework that integrates clinicopathologic data with PET/CT radiomics from the primary tumor and presumed healthy organs (referred to as "organomics") to predict postsurgical recurrence. A fully automated deep learning-based segmentation model was employed to delineate 19 affected (whole lung and the affected lobe) and presumed healthy organs from CT images of the presurgical PET/CT scans of 145 NSCLC patients sourced from a publicly available data set. Using PyRadiomics, 214 features (107 from CT, 107 from PET) were extracted from the gross tumor volume (GTV) and each segmented organ. In addition, a clinicopathologic feature set was constructed, incorporating clinical characteristics, histopathologic data, gene mutation status, conventional PET imaging biomarkers, and patients' treatment history. GTV Radiomics, each of the organomics, and the clinicopathologic feature sets were each fed to a time-to-event prediction machine, based on glmboost, to establish first-level models. The risk scores obtained from the first-level models were then used as inputs for meta models developed using a stacked ensemble approach. Questing optimized performance, we assessed meta models established upon all combinations of first-level models with concordance index (C-index) ≥0.6. The performance of all the models was evaluated using the average C-index across a unique 3-fold cross-validation scheme for fair comparison. The clinicopathologic model outperformed other first-level models with a C-index of 0.67, followed closely by GTV radiomics model with C-index of 0.65. Among the organomics models, whole-lung and aorta models achieved top performance with a C-index of 0.65, while 12 organomics models achieved C-indices of ≥0.6. Meta models significantly outperformed the first-level models with the top 100 achieving C-indices between 0.703 and 0.731. The clinicopathologic, whole lung, esophagus, pancreas, and GTV models were the most frequently present models in the top 100 meta models with frequencies of 98, 71, 69, 62, and 61, respectively. In this study, we highlighted the value of maximizing the use of medical imaging for NSCLC recurrence prognostication by incorporating data from various organs, rather than focusing solely on the tumor and its immediate surroundings. This multisource integration proved particularly beneficial in the meta models, where combining clinicopathologic data with tumor radiomics and organomics models significantly enhanced recurrence prediction.

Mixed Modality Classification Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

AlzFormer: Video-based space-time attention model for early diagnosis of Alzheimer's disease.

Akan T, Akan S, Alp S, Ledbetter CR, Nobel Bhuiyan MA

•papers•Sep 3 2025

Early and accurate Alzheimer's disease (AD) diagnosis is critical for effective intervention, but it is still challenging due to neurodegeneration's slow and complex progression. Recent studies in brain imaging analysis have highlighted the crucial roles of deep learning techniques in computer-assisted interventions for diagnosing brain diseases. In this study, we propose AlzFormer, a novel deep learning framework based on a space-time attention mechanism, for multiclass classification of AD, MCI, and CN individuals using structural MRI scans. Unlike conventional deep learning models, we used spatiotemporal self-attention to model inter-slice continuity by treating T1-weighted MRI volumes as sequential inputs, where slices correspond to video frames. Our model was fine-tuned and evaluated using 1.5 T MRI scans from the ADNI dataset. To ensure the anatomical consistency of all the MRI data, All MRI volumes were pre-processed with skull stripping and spatial normalization to MNI space. AlzFormer achieved an overall accuracy of 94 % on the test set, with balanced class-wise F1-scores (AD: 0.94, MCI: 0.99, CN: 0.98) and a macro-average AUC of 0.98. We also utilized attention map analysis to identify clinically significant patterns, particularly emphasizing subcortical structures and medial temporal regions implicated in AD. These findings demonstrate the potential of transformer-based architectures for robust and interpretable classification of brain disorders using structural MRI.

MRI Classification Neurological Methodology In Silico Benchmark SOTA

Edge-centric Brain Connectome Representations Reveal Increased Brain Functional Diversity of Reward Circuit in Patients with Major Depressive Disorder.

Qin K, Ai C, Zhu P, Xiang J, Chen X, Zhang L, Wang C, Zou L, Chen F, Pan X, Wang Y, Gu J, Pan N, Chen W

•papers•Sep 3 2025

Major depressive disorder (MDD) has been increasingly understood as a disorder of network-level functional dysconnectivity. However, previous brain connectome studies have primarily relied on node-centric approaches, neglecting critical edge-edge interactions that may capture essential features of network dysfunction. This study included resting-state functional MRI data from 838 MDD patients and 881 healthy controls (HC) across 23 sites. We applied a novel edge-centric connectome model to estimate edge functional connectivity and identify overlapping network communities. Regional functional diversity was quantified via normalized entropy based on community overlap patterns. Neurobiological decoding was performed to map brain-wide relationships between functional diversity alterations and patterns of gene expression and neurotransmitter distribution. Comparative machine learning analyses further evaluated the diagnostic utility of edge-centric versus node-centric connectome representations. Compared with HC, MDD patients exhibited significantly increased functional diversity within the prefrontal-striatal-thalamic reward circuit. Neurobiological decoding analysis revealed that functional diversity alterations in MDD were spatially associated with transcriptional patterns enriched for inflammatory processes, as well as distribution of 5-HT1B receptors. Machine learning analyses demonstrated superior classification performance of edge-centric models over traditional node-centric approaches in distinguishing MDD patients from HC at the individual level. Our findings highlighted that abnormal functional diversity within the reward processing system might underlie multi-level neurobiological mechanisms of MDD. The edge-centric connectome approach offers a valuable tool for identifying disease biomarkers, characterizing individual variation and advancing current understanding of complex network configuration in psychiatric disorders.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

CINeMA: Conditional Implicit Neural Multi-Modal Atlas for a Spatio-Temporal Representation of the Perinatal Brain.

Dannecker M, Sideri-Lampretsa V, Starck S, Mihailov A, Milh M, Girard N, Auzias G, Rueckert D

•papers•Sep 3 2025

Magnetic resonance imaging of fetal and neonatal brains reveals rapid neurodevelopment marked by substantial anatomical changes unfolding within days. Studying this critical stage of the developing human brain, therefore, requires accurate brain models-referred to as atlases-of high spatial and temporal resolution. To meet these demands, established traditional atlases and recently proposed deep learning-based methods rely on large and comprehensive datasets. This poses a major challenge for studying brains in the presence of pathologies for which data remains scarce. We address this limitation with CINeMA (Conditional Implicit Neural Multi-Modal Atlas), a novel framework for creating high-resolution, spatio-temporal, multimodal brain atlases, suitable for low-data settings. Unlike established methods, CINeMA operates in latent space, avoiding compute-intensive image registration and reducing atlas construction times from days to minutes. Furthermore, it enables flexible conditioning on anatomical features including gestational age, birth age, and pathologies like agenesis of the corpus callosum and ventriculomegaly of varying degree. CINeMA supports downstream tasks such as tissue segmentation and age prediction whereas its generative properties enable synthetic data creation and anatomically informed data augmentation. Surpassing state-of-the-art methods in accuracy, efficiency, and versatility, CINeMA represents a powerful tool for advancing brain research. We release the code and atlases at https://github.com/m-dannecker/CINeMA.

MRI Image Synthesis Neurological Methodology In Silico Open Code Benchmark SOTA

Automated Deep Learning-Based Detection of Early Atherosclerotic Plaques in Carotid Ultrasound Imaging

Omarov, M., Zhang, L., Doroodgar Jorshery, S., Malik, R., Das, B., Bellomo, T. R., Mansmann, U., Menten, M. J., Natarajan, P., Dichgans, M., Kalic, M., Raghu, V. K., Berger, K., Anderson, C. D., Georgakis, M. K.

•preprint•Sep 3 2025

BackgroundCarotid plaque presence is associated with cardiovascular risk, even among asymptomatic individuals. While deep learning has shown promise for carotid plaque phenotyping in patients with advanced atherosclerosis, its application in population-based settings of asymptomatic individuals remains unexplored. MethodsWe developed a YOLOv8-based model for plaque detection using carotid ultrasound images from 19,499 participants of the population-based UK Biobank (UKB) and fine-tuned it for external validation in the BiDirect study (N = 2,105). Cox regression was used to estimate the impact of plaque presence and count on major cardiovascular events. To explore the genetic architecture of carotid atherosclerosis, we conducted a genome-wide association study (GWAS) meta-analysis of the UKB and CHARGE cohorts. Mendelian randomization (MR) assessed the effect of genetic predisposition to vascular risk factors on carotid atherosclerosis. ResultsOur model demonstrated high performance with accuracy, sensitivity, and specificity exceeding 85%, enabling identification of carotid plaques in 45% of the UKB population (aged 47-83 years). In the external BiDirect cohort, a fine-tuned model achieved 86% accuracy, 78% sensitivity, and 90% specificity. Plaque presence and count were associated with risk of major adverse cardiovascular events (MACE) over a follow-up of up to seven years, improving risk reclassification beyond the Pooled Cohort Equations. A GWAS meta-analysis of carotid plaques uncovered two novel genomic loci, with downstream analyses implicating targets of investigational drugs in advanced clinical development. Observational and MR analyses showed associations between smoking, LDL cholesterol, hypertension, and odds of carotid atherosclerosis. ConclusionsOur model offers a scalable solution for early carotid plaque detection, potentially enabling automated screening in asymptomatic individuals and improving plaque phenotyping in population-based cohorts. This approach could advance large-scale atherosclerosis research. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=131 SRC="FIGDIR/small/24315675v2_ufig1.gif" ALT="Figure 1"> View larger version (33K): [email protected]@27a04corg.highwire.dtl.DTLVardef@18cef18org.highwire.dtl.DTLVardef@1a53d8f_HPS_FORMAT_FIGEXP M_FIG O_FLOATNOGRAPHICAL ABSTRACT.C_FLOATNO ASCVD - Atherosclerotic Cardiovascular Disease, CVD - Cardiovascular disease, PCE - Pooled Cohort Equations, TP- true positive, FN - False Negative, FP - False Positive, TN - True Negative, GWAS - Genome-Wide Association Study. C_FIG CLINICAL PERSPECTIVECarotid ultrasound is a well-established method for assessing subclinical atherosclerosis with potential to improve cardiovascular risk assessment in asymptomatic individuals. Deep learning could automate plaque screening and enable processing of large imaging datasets, reducing the need for manual annotation. Integrating such large-scale carotid ultrasound datasets with clinical, genetic, and other relevant data can advance cardiovascular research. Prior studies applying deep learning to carotid ultrasound have focused on technical tasks-plaque classification, segmentation, and characterization-in small sample sizes of patients with advanced atherosclerosis. However, they did not assess the potential of deep learning in detecting plaques in asymptomatic individuals at the population level. We developed an efficient deep learning model for the automated detection and quantification of early carotid plaques in ultrasound imaging, primarily in asymptomatic individuals. The model demonstrated high accuracy and external validity across population-based cohort studies. Predicted plaque prevalence aligned with known cardiovascular risk factors. Importantly, predicted plaque presence and count were associated with future cardiovascular events and improved reclassification of asymptomatic individuals into clinically meaningful risk categories. Integrating our model predictions with genetic data identified two novel loci associated with carotid plaque presence--both previously linked to cardiovascular disease--highlighting the models potential for population-scale atherosclerosis research. Our model provides a scalable solution for automated carotid plaque phenotyping in ultrasound images at the population level. These findings support its use for automated screening in asymptomatic individuals and for streamlining plaque phenotyping in large cohorts, thereby advancing research on subclinical atherosclerosis in the general population.

Ultrasound Detection Vascular Retrospective Clinical In Silico Academic Lab Benchmark SOTA Breakthrough Open Dataset

Single Domain Generalization in Diabetic Retinopathy: A Neuro-Symbolic Learning Approach

Midhat Urooj, Ayan Banerjee, Farhat Shaikh, Kuntal Thakur, Sandeep Gupta

•preprint•Sep 3 2025

Domain generalization remains a critical challenge in medical imaging, where models trained on single sources often fail under real-world distribution shifts. We propose KG-DG, a neuro-symbolic framework for diabetic retinopathy (DR) classification that integrates vision transformers with expert-guided symbolic reasoning to enable robust generalization across unseen domains. Our approach leverages clinical lesion ontologies through structured, rule-based features and retinal vessel segmentation, fusing them with deep visual representations via a confidence-weighted integration strategy. The framework addresses both single-domain generalization (SDG) and multi-domain generalization (MDG) by minimizing the KL divergence between domain embeddings, thereby enforcing alignment of high-level clinical semantics. Extensive experiments across four public datasets (APTOS, EyePACS, Messidor-1, Messidor-2) demonstrate significant improvements: up to a 5.2% accuracy gain in cross-domain settings and a 6% improvement over baseline ViT models. Notably, our symbolic-only model achieves a 63.67% average accuracy in MDG, while the complete neuro-symbolic integration achieves the highest accuracy compared to existing published baselines and benchmarks in challenging SDG scenarios. Ablation studies reveal that lesion-based features (84.65% accuracy) substantially outperform purely neural approaches, confirming that symbolic components act as effective regularizers beyond merely enhancing interpretability. Our findings establish neuro-symbolic integration as a promising paradigm for building clinically robust, and domain-invariant medical AI systems.

OCT Classification Methodology In Silico Academic Lab Benchmark SOTA

Filter Papers

Tags

Detecting, Characterizing, and Mitigating Implicit and Explicit Racial Biases in Health Care Datasets With Subgroup Learnability: Algorithm Development and Validation Study.

Deep Learning Based Multiomics Model for Risk Stratification of Postoperative Distant Metastasis in Colorectal Cancer.

A Foundation Model for Chest X-ray Interpretation with Grounded Reasoning via Online Reinforcement Learning

A Generative Foundation Model for Chest Radiography

MetaPredictomics: A Comprehensive Approach to Predict Postsurgical Non-Small Cell Lung Cancer Recurrence Using Clinicopathologic, Radiomics, and Organomics Data.

AlzFormer: Video-based space-time attention model for early diagnosis of Alzheimer's disease.

Edge-centric Brain Connectome Representations Reveal Increased Brain Functional Diversity of Reward Circuit in Patients with Major Depressive Disorder.

CINeMA: Conditional Implicit Neural Multi-Modal Atlas for a Spatio-Temporal Representation of the Perinatal Brain.

Automated Deep Learning-Based Detection of Early Atherosclerotic Plaques in Carotid Ultrasound Imaging

Single Domain Generalization in Diabetic Retinopathy: A Neuro-Symbolic Learning Approach

Ready to Sharpen Your Edge?