Latest Papers on Radiology AI. Tags: Benchmark SOTA

Collaborative Integration of AI and Human Expertise to Improve Detection of Chest Radiograph Abnormalities.

Awasthi A, Le N, Deng Z, Wu CC, Nguyen HV

•papers•Jul 16 2025

"Just Accepted" papers have undergone full peer review and have been accepted for publication in Radiology: Artificial Intelligence. This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content. Purpose To develop a collaborative AI system that integrates eye gaze data and radiology reports to improve diagnostic accuracy in chest radiograph interpretation by identifying and correcting perceptual errors. Materials and Methods This retrospective study utilized public datasets REFLACX and EGD-CXR to develop a collaborative AI solution, named Collaborative Radiology Expert (CoRaX). It employs a large multimodal model to analyze image embeddings, eye gaze data, and radiology reports, aiming to rectify perceptual errors in chest radiology. The proposed system was evaluated using two simulated error datasets featuring random and uncertain alterations of five abnormalities. Evaluation focused on the system's referral-making process, the quality of referrals, and its performance within collaborative diagnostic settings. Results In the random masking-based error dataset, 28.0% (93/332) of abnormalities were altered. The system successfully corrected 21.3% (71/332) of these errors, with 6.6% (22/332) remaining unresolved. The accuracy of the system in identifying the correct regions of interest for missed abnormalities was 63.0% [95% CI: 59.0%, 68.0%], and 85.7% (240/280) of interactions with radiologists were deemed satisfactory, meaning that the system provided diagnostic aid to radiologists. In the uncertainty-masking-based error dataset, 43.9% (146/332) of abnormalities were altered. The system corrected 34.6% (115/332) of these errors, with 9.3% (31/332) unresolved. The accuracy of predicted regions of missed abnormalities for this dataset was 58.0% [95% CI: 55.0%, 62.0%], and 78.4% (233/297) of interactions were satisfactory. Conclusion The CoRaX system can collaborate efficiently with radiologists and address perceptual errors across various abnormalities in chest radiographs. ©RSNA, 2025.

X-Ray Detection Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Single Inspiratory Chest CT-based Generative Deep Learning Models to Evaluate Functional Small Airway Disease.

Zhang D, Zhao M, Zhou X, Li Y, Guan Y, Xia Y, Zhang J, Dai Q, Zhang J, Fan L, Zhou SK, Liu S

•papers•Jul 16 2025

"Just Accepted" papers have undergone full peer review and have been accepted for publication in Radiology: Artificial Intelligence. This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content. Purpose To develop a deep learning model that uses a single inspiratory chest CT scan to generate parametric response maps (PRM) and predict functional small airway disease (fSAD). Materials and Methods In this retrospective study, predictive and generative deep learning models for PRM using inspiratory chest CT were developed using a model development dataset with fivefold cross-validation, with PRM derived from paired respiratory CT as the reference standard. Voxel-wise metrics, including sensitivity, area under the receiver operating characteristic curve (AUC), and structural similarity, were used to evaluate model performance in predicting PRM and expiratory CT images. The best performing model was tested on three internal test sets and an external test set. Results The model development dataset of 308 patients (median age, 67 years, [IQR: 62-70 years]; 113 female) was divided into the training set (n = 216), the internal validation set (n = 31), and the first internal test set (n = 61). The generative model outperformed the predictive model in detecting fSAD (sensitivity 86.3% vs 38.9%; AUC 0.86 vs 0.70). The generative model performed well in the second internal (AUCs of 0.64, 0.84, 0.97 for emphysema, fSAD and normal lung tissue), the third internal (AUCs of 0.63, 0.83, 0.97), and the external (AUCs of 0.58, 0.85, 0.94) test sets. Notably, the model exhibited exceptional performance in the PRISm group of the fourth internal test set (AUC = 0.62, 0.88, and 0.96). Conclusion The proposed generative model, using a single inspiratory CT, outperformed existing algorithms in PRM evaluation, achieved comparable results to paired respiratory CT. Published under a CC BY 4.0 license.

CT Image Synthesis Chest Retrospective Clinical In Silico Benchmark SOTA

Multimodal neuroimaging unveils basal forebrain-limbic system circuit dysregulation in cognitive impairment with depression: a pathway to early diagnosis and intervention.

Xu X, Anayiti X, Chen P, Xie Z, Tao M, Xiang Y, Tan M, Liu Y, Yue L, Xiao S, Wang P

•papers•Jul 16 2025

Alzheimer's disease (AD) frequently co-occurs with depressive symptoms, exacerbating both cognitive decline and clinical complexity, yet the neural substrates linking this co-occurrence remain poorly understood. We aimed to investigate the role of basal forebrain-limbic system circuit dysregulation in the interaction between cognitive impairment and depressive symptoms, identifying potential biomarkers for early diagnosis and intervention. This cross-sectional study included participants stratified into normal controls (NC), cognitive impairment without depression (CI-nD), and cognitive impairment with depression (CI-D). Multimodal MRI (structural, diffusion, functional, perfusion, iron-sensitive imaging) and plasma biomarkers were analyzed. Machine learning models classified subgroups using neuroimaging features. CI-D exhibited distinct basal forebrain-limbic circuit alterations versus CI-nD and NC: (1) Elevated free-water fraction (FW) in basal forebrain subregions (Ch123/Ch4, p < 0.04), indicating early neuroinflammation; (2) Increased iron deposition in the anterior cingulate cortex and entorhinal cortex (p < 0.05); (3) Hyperperfusion and functional hyperactivity in Ch123 and amygdala; (4) Plasma neurofilamentlightchain exhibited correlated with hippocampal inflammation in CI-nD (p = 0.03) but linked to basal forebrain dysfunction in CI-D (p < 0.05). Multimodal support vector machine achieved 85 % accuracy (AUC=0.96) in distinguishing CI-D from CI-nD, with Ch123 and Ch4 as key discriminators. Pathway analysis in the CI-D group further revealed that FW-related neuroinflammation in the basal forebrain (Ch123/Ch4) indirectly contributed to cognitive impairment via structural atrophy. We identified a neuroinflammatory-cholinergic pathway in the basal forebrain as an early mechanism driving depression-associated cognitive decline. Multimodal imaging revealed distinct spatiotemporal patterns of circuit dysregulation, suggesting neuroinflammation and iron deposition precede structural degeneration. These findings position the basal forebrain-limbic system circuit as a therapeutic target and provide actionable biomarkers for early intervention in AD with depressive symptoms.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Automated CAD-RADS scoring from multiplanar CCTA images using radiomics-driven machine learning.

Corti A, Ronchetti F, Lo Iacono F, Chiesa M, Colombo G, Annoni A, Baggiano A, Carerj ML, Del Torto A, Fazzari F, Formenti A, Junod D, Mancini ME, Maragna R, Marchetti F, Sbordone FP, Tassetti L, Volpe A, Mushtaq S, Corino VDA, Pontone G

•papers•Jul 16 2025

Coronary Artery Disease-Reporting and Data System (CAD-RADS), a standardized reporting system of stenosis severity from coronary computed tomography angiography (CCTA), is performed manually by expert radiologists, being time-consuming and prone to interobserver variability. While deep learning methods automating CAD-RADS scoring have been proposed, radiomics-based machine-learning approaches are lacking, despite their improved interpretability. This study aims to introduce a novel radiomics-based machine-learning approach for automating CAD-RADS scoring from CCTA images with multiplanar reconstruction. This retrospective monocentric study included 251 patients (male 70 %; mean age 60.5 ± 12.7) who underwent CCTA in 2016-2018 for clinical evaluation of CAD. Images were automatically segmented, and radiomic features were extracted. Clinical characteristics were collected. The image dataset was partitioned into training and test sets (90 %-10 %). The training phase encompassed feature scaling and selection, data balancing and model training within a 5-fold cross-validation. A cascade pipeline was implemented for both 6-class CAD-RADS scoring and 4-class therapy-oriented classification (0-1, 2, 3-4, 5), through consecutive sub-tasks. For each classification task the cascade pipeline was applied to develop clinical, radiomic, and combined models. The radiomic, combined and clinical models yielded AUC = 0.88 [0.86-0.88], AUC = 0.90 [0.88-0.90], and AUC = 0.66 [0.66-0.67] for the CAD-RADS scoring, and AUC = 0.93 [0.91-0.93], AUC = 0.97 [0.96-0.97], and AUC = 79 [0.78-0.79] for the therapy-oriented classification. The radiomic and combined models significantly outperformed (DeLong p-value < 0.05) the clinical one in class 1 and 2 (CAD-RADS cascade) and class 2 (therapy-oriented cascade). This study represents the first CAD-RADS classification radiomic model, guaranteeing higher explainability and providing a promising support system in coronary artery stenosis assessment.

CT Classification Cardiac Retrospective Clinical In Silico Academic Lab Benchmark SOTA

MR-Transformer: A Vision Transformer-based Deep Learning Model for Total Knee Replacement Prediction Using MRI.

Zhang C, Chen S, Cigdem O, Rajamohan HR, Cho K, Kijowski R, Deniz CM

•papers•Jul 16 2025

"Just Accepted" papers have undergone full peer review and have been accepted for publication in Radiology: Artificial Intelligence. This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content. Purpose To develop a transformer-based deep learning model-MR-Transformer-that leverages ImageNet pretraining and three-dimensional (3D) spatial correlations to predict the progression of knee osteoarthritis to TKR using MRI. Materials and Methods This retrospective study included 353 case-control matched pairs of coronal intermediate-weighted turbo spin-echo (COR-IW-TSE) and sagittal intermediate-weighted turbo spin-echo with fat suppression (SAG-IW-TSE-FS) knee MRIs from the Osteoarthritis Initiative (OAI) database, with a follow-up period up to 9 years, and 270 case-control matched pairs of coronal short-tau inversion recovery (COR-STIR) and sagittal proton density fat-saturated (SAG-PD-FAT-SAT) knee MRIs from the Multicenter Osteoarthritis Study (MOST) database, with a follow-up period up to 7 years. Performance of the MR-Transformer to predict the progression of knee osteoarthritis was compared with that of existing state-of-the-art deep learning models (TSE-Net, 3DMeT, and MRNet) using sevenfold nested cross-validation across the four MRI tissue sequences. Results MR-Transformer achieved areas under the receiver operating characteristic curves (AUCs) of 0.88 (95% CI: 0.85, 0.91), 0.88 (95% CI: 0.85, 0.90), 0.86 (95% CI: 0.82, 0.89), and 0.84 (95% CI: 0.81, 0.87) for COR-IW-TSE, SAG-IW-TSE-FS, COR-STIR, and SAG-PD-FAT-SAT, respectively. The model achieved a higher AUC than that of 3DMeT for all MRI sequences (P < .001). The model showed the highest sensitivity of 83% (95% CI: 78, 87%) and specificity of 83% (95% CI: 76, 88%) for the COR-IW-TSE MRI sequence. Conclusion Compared with the existing deep learning models, the MR-Transformer exhibited state-of-the-art performance in predicting the progression of knee osteoarthritis to TKR using MRIs. ©RSNA, 2025.

MRI Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Site-Level Fine-Tuning with Progressive Layer Freezing: Towards Robust Prediction of Bronchopulmonary Dysplasia from Day-1 Chest Radiographs in Extremely Preterm Infants

Sybelle Goedicke-Fritz, Michelle Bous, Annika Engel, Matthias Flotho, Pascal Hirsch, Hannah Wittig, Dino Milanovic, Dominik Mohr, Mathias Kaspar, Sogand Nemat, Dorothea Kerner, Arno Bücker, Andreas Keller, Sascha Meyer, Michael Zemlin, Philipp Flotho

•preprint•Jul 16 2025

Bronchopulmonary dysplasia (BPD) is a chronic lung disease affecting 35% of extremely low birth weight infants. Defined by oxygen dependence at 36 weeks postmenstrual age, it causes lifelong respiratory complications. However, preventive interventions carry severe risks, including neurodevelopmental impairment, ventilator-induced lung injury, and systemic complications. Therefore, early BPD prognosis and prediction of BPD outcome is crucial to avoid unnecessary toxicity in low risk infants. Admission radiographs of extremely preterm infants are routinely acquired within 24h of life and could serve as a non-invasive prognostic tool. In this work, we developed and investigated a deep learning approach using chest X-rays from 163 extremely low-birth-weight infants ($\leq$32 weeks gestation, 401-999g) obtained within 24 hours of birth. We fine-tuned a ResNet-50 pretrained specifically on adult chest radiographs, employing progressive layer freezing with discriminative learning rates to prevent overfitting and evaluated a CutMix augmentation and linear probing. For moderate/severe BPD outcome prediction, our best performing model with progressive freezing, linear probing and CutMix achieved an AUROC of 0.78 $\pm$ 0.10, balanced accuracy of 0.69 $\pm$ 0.10, and an F1-score of 0.67 $\pm$ 0.11. In-domain pre-training significantly outperformed ImageNet initialization (p = 0.031) which confirms domain-specific pretraining to be important for BPD outcome prediction. Routine IRDS grades showed limited prognostic value (AUROC 0.57 $\pm$ 0.11), confirming the need of learned markers. Our approach demonstrates that domain-specific pretraining enables accurate BPD prediction from routine day-1 radiographs. Through progressive freezing and linear probing, the method remains computationally feasible for site-level implementation and future federated learning deployments.

X-Ray Classification Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

AI-Powered Segmentation and Prognosis with Missing MRI in Pediatric Brain Tumors

Chrysochoou, D., Gandhi, D., Adib, S., Familiar, A., Khalili, N., Khalili, N., Ware, J. B., Tu, W., Jain, P., Anderson, H., Haldar, S., Storm, P. B., Franson, A., Prados, M., Kline, C., Mueller, S., Resnick, A., Vossough, A., Davatzikos, C., Nabavizadeh, A., Fathi Kazerooni, A.

•preprint•Jul 16 2025

ImportanceBrain MRI is the main imaging modality for pediatric brain tumors (PBTs); however, incomplete MRI exams are common in pediatric neuro-oncology settings and pose a barrier to the development and application of deep learning (DL) models, such as tumor segmentation and prognostic risk estimation. ObjectiveTo evaluate DL-based strategies (image-dropout training and generative image synthesis) and heuristic imputation approaches for handling missing MRI sequences in PBT imaging from clinical acquisition protocols, and to determine their impact on segmentation accuracy and prognostic risk estimation. DesignThis cohort study included 715 patients from the Childrens Brain Tumor Network (CBTN) and BraTS-PEDs, and 43 patients with longitudinal MRI (157 timepoints) from PNOC003/007 clinical trials. We developed a dropout-trained nnU-Net tumor segmentation model that randomly omitted FLAIR and/or T1w (no contrast) sequences during training to simulate missing inputs. We compared this against three imputation approaches: a generative model for image synthesis, copy-substitution heuristics, and zeroed missing inputs. Model-generated tumor volumes from each segmentation method were compared and evaluated against ground truth (expert manual segmentations) and incorporated into time-varying Cox regression models for survival analysis. SettingMulti-institutional PBT datasets and longitudinal clinical trial cohorts. ParticipantsAll patients had multi-parametric MRI and expert manual segmentations. The PNOC cohort had a median of three imaging timepoints and associated clinical data. Main Outcomes and MeasuresSegmentation accuracy (Dice scores), image quality metrics for synthesized scans (SSIM, PSNR, MSE), and survival discrimination (C-index, hazard ratios). ResultsThe dropout model achieved robust segmentation under missing MRI, with [≤]0.04 Dice drop and a stable C-index of 0.65 compared to complete-input performance. DL-based MRI synthesis achieved high image quality (SSIM > 0.90) and removed artifacts, benefiting visual interpretability. Performance was consistent across cohorts and missing data scenarios. Conclusion and RelevanceModality-dropout training yields robust segmentation and risk-stratification on incomplete pediatric MRI without the computational and clinical complexity of synthesis approaches. Image synthesis, though less effective for these tasks, provides complementary benefits for artifact removal and qualitative assessment of missing or corrupted MRI scans. Together, these approaches can facilitate broader deployment of AI tools in real-world pediatric neuro-oncology settings.

MRI Segmentation Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA Open Dataset

Poincare guided geometric UNet for left atrial epicardial adipose tissue segmentation in Dixon MRI images.

Firouznia M, Ylipää E, Henningsson M, Carlhäll CJ

•papers•Jul 15 2025

Epicardial Adipose Tissue (EAT) is a recognized risk factor for cardiovascular diseases and plays a pivotal role in the pathophysiology of Atrial Fibrillation (AF). Accurate automatic segmentation of the EAT around the Left Atrium (LA) from Magnetic Resonance Imaging (MRI) data remains challenging. While Convolutional Neural Networks excel at multi-scale feature extraction using stacked convolutions, they struggle to capture long-range self-similarity and hierarchical relationships, which are essential in medical image segmentation. In this study, we present and validate PoinUNet, a deep learning model that integrates a Poincaré embedding layer into a 3D UNet to enhance LA wall and fat segmentation from Dixon MRI data. By using hyperbolic space learning, PoinUNet captures complex LA and EAT relationships and addresses class imbalance and fat geometry challenges using a new loss function. Sixty-six participants, including forty-eight AF patients, were scanned at 1.5T. The first network identified fat regions, while the second utilized Poincaré embeddings and convolutional layers for precise segmentation, enhanced by fat fraction maps. PoinUNet achieved a Dice Similarity Coefficient of 0.87 and a Hausdorff distance of 9.42 on the test set. This performance surpasses state-of-the-art methods, providing accurate quantification of the LA wall and LA EAT.

MRI Segmentation Cardiac Methodology In Silico Academic Lab Benchmark SOTA

SLOTMFound: Foundation-Based Diagnosis of Multiple Sclerosis Using Retinal SLO Imaging and OCT Thickness-maps

Esmailizadeh, R., Aghababaei, A., Mirzaei, S., Arian, R., Kafieh, R.

•preprint•Jul 15 2025

Multiple Sclerosis (MS) is a chronic autoimmune disorder of the central nervous system that can lead to significant neurological disability. Retinal imaging--particularly Scanning Laser Ophthalmoscopy (SLO) and Optical Coherence Tomography (OCT)--provides valuable biomarkers for early MS diagnosis through non-invasive visualization of neurodegenerative changes. This study proposes a foundation-based bi-modal classification framework that integrates SLO images and OCT-derived retinal thickness maps for MS diagnosis. To facilitate this, we introduce two modality-specific foundation models--SLOFound and TMFound--fine-tuned from the RETFound-Fundus backbone using an independent dataset of 203 healthy eyes, acquired at Noor Ophthalmology Hospital with the Heidelberg Spectralis HRA+OCT system. This dataset, which contains only normal cases, was used exclusively for encoder adaptation and is entirely disjoint from the classification dataset. For the classification stage, we use a separate dataset comprising IR-SLO images from 32 MS patients and 70 healthy controls, collected at the Kashani Comprehensive MS Center in Isfahan, Iran. We first assess OCT-derived maps layer-wise and identify the Ganglion Cell-Inner Plexiform Layer (GCIPL) as the most informative for MS detection. All subsequent analyses utilize GCIPL thickness maps in conjunction with SLO images. Experimental evaluations on the MS classification dataset demonstrate that our foundation-based bi-modal model outperforms unimodal variants and a prior ResNet-based state-of-the-art model, achieving a classification accuracy of 97.37%, with perfect sensitivity (100%). These results highlight the effectiveness of leveraging pre-trained foundation models, even when fine-tuned on limited data, to build robust, efficient, and generalizable diagnostic tools for MS in medical imaging contexts where labeled datasets are often scarce.

OCT Classification Methodology In Silico Academic Lab Benchmark SOTA

Multimodal Radiopathomics Signature for Prediction of Response to Immunotherapy-based Combination Therapy in Gastric Cancer Using Interpretable Machine Learning.

Huang W, Wang X, Zhong R, Li Z, Zhou K, Lyu Q, Han JE, Chen T, Islam MT, Yuan Q, Ahmad MU, Chen S, Chen C, Huang J, Xie J, Shen Y, Xiong W, Shen L, Xu Y, Yang F, Xu Z, Li G, Jiang Y

•papers•Jul 15 2025

Immunotherapy has become a cornerstone in the treatment of advanced gastric cancer (GC). However, identifying reliable predictive biomarkers remains a considerable challenge. This study demonstrates the potential of integrating multimodal baseline data, including computed tomography scan images and digital H&E-stained pathology images, with biological interpretation to predict the response to immunotherapy-based combination therapy using a multicenter cohort of 298 GC patients. By employing seven machine learning approaches, we developed a radiopathomics signature (RPS) to predict treatment response and stratify prognostic risk in GC. The RPS demonstrated area under the receiver-operating-characteristic curves (AUCs) of 0.978 (95% CI, 0.950-1.000), 0.863 (95% CI, 0.744-0.982), and 0.822 (95% CI, 0.668-0.975) in the training, internal validation, and external validation cohorts, respectively, outperforming conventional biomarkers such as CPS, MSI-H, EBV, and HER-2. Kaplan-Meier analysis revealed significant differences of survival between high- and low-risk groups, especially in advanced-stage and non-surgical patients. Additionally, genetic analyses revealed that the RPS correlates with enhanced immune regulation pathways and increased infiltration of memory B cells. The interpretable RPS provides accurate predictions for treatment response and prognosis in GC and holds potential for guiding more precise, patient-specific treatment strategies while offering insights into immune-related mechanisms.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Filter Papers

Tags

Collaborative Integration of AI and Human Expertise to Improve Detection of Chest Radiograph Abnormalities.

Single Inspiratory Chest CT-based Generative Deep Learning Models to Evaluate Functional Small Airway Disease.

Multimodal neuroimaging unveils basal forebrain-limbic system circuit dysregulation in cognitive impairment with depression: a pathway to early diagnosis and intervention.

Automated CAD-RADS scoring from multiplanar CCTA images using radiomics-driven machine learning.

MR-Transformer: A Vision Transformer-based Deep Learning Model for Total Knee Replacement Prediction Using MRI.

Site-Level Fine-Tuning with Progressive Layer Freezing: Towards Robust Prediction of Bronchopulmonary Dysplasia from Day-1 Chest Radiographs in Extremely Preterm Infants

AI-Powered Segmentation and Prognosis with Missing MRI in Pediatric Brain Tumors

Poincare guided geometric UNet for left atrial epicardial adipose tissue segmentation in Dixon MRI images.

SLOTMFound: Foundation-Based Diagnosis of Multiple Sclerosis Using Retinal SLO Imaging and OCT Thickness-maps

Multimodal Radiopathomics Signature for Prediction of Response to Immunotherapy-based Combination Therapy in Gastric Cancer Using Interpretable Machine Learning.

Ready to Sharpen Your Edge?