Latest Papers on Radiology AI. Tags: Benchmark SOTA

Rapid and robust quantitative cartilage assessment for the clinical setting: deep learning-enhanced accelerated T2 mapping.

Carretero-Gómez L, Wiesinger F, Fung M, Nunes B, Pedoia V, Majumdar S, Desai AD, Gatti A, Chaudhari A, Sánchez-Lacalle E, Malpica N, Padrón M

•papers•Sep 18 2025

Clinical adoption of T2 mapping is limited by poor reproducibility, lengthy examination times, and cumbersome image analysis. This study aimed to develop an accelerated deep learning (DL)-enhanced cartilage T2 mapping sequence (DL CartiGram), demonstrate its repeatability and reproducibility, and evaluate its accuracy compared to conventional T2 mapping using a semi-automatic pipeline. DL CartiGram was implemented using a modified 2D Multi-Echo Spin-Echo sequence at 3 T, incorporating parallel imaging and DL-based image reconstruction. Phantom tests were performed at two sites to obtain test-retest T2 maps, using single-echo spin-echo (SE) measurements as reference values. At one site, DL CartiGram and conventional T2 mapping were performed on 43 patients. T2 values were extracted from 52 patellar and femoral compartments using DL knee segmentation and the DOSMA framework. Repeatability and reproducibility were assessed using coefficients of variation (CV), Bland-Altman analysis, and concordance correlation coefficients (CCC). T2 differences were evaluated with Wilcoxon signed-rank tests, paired t tests, and accuracy CV. Phantom tests showed intra-site repeatability with CVs ≤ 2.52% and T2 precision ≤ 1 ms. Inter-site reproducibility showed a CV of 2.74% and a CCC of 99% (CI 92-100%). Bland-Altman analysis showed a bias of 1.56 ms between sites (p = 0.03), likely due to temperature effects. In vivo, DL CartiGram reduced scan time by 40%, yielding accurate cartilage T2 measurements (CV = 0.97%) with no significant differences compared to conventional T2 mapping (p = 0.1). DL CartiGram significantly accelerates T2 mapping, while still assuring excellent repeatability and reproducibility. Combined with the semi-automatic post-processing pipeline, it emerges as a promising tool for quantitative T2 cartilage biomarker assessment in clinical settings.

MRI Reconstruction Musculoskeletal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Multimodal radiomics fusion for predicting postoperative recurrence in NSCLC patients.

Mehri-Kakavand G, Mdletshe S, Amini M, Wang A

•papers•Sep 18 2025

Postoperative recurrence in non-small cell lung cancer (NSCLC) affects up to 55% of patients, underscoring limits of TNM staging. We assessed multimodal radiomics—positron emission tomography (PET), computed tomography (CT), and clinicopathological (CP) data—for personalized recurrence prediction. Data from 131 NSCLC patients with PET/CT imaging and CP variables were analysed. Radiomics features were extracted using PyRadiomics (1,316 PET and 1,409 CT features per tumor), with robustness testing and selection yielding 20 CT, 20 PET, and 23 CP variables. Prediction models were trained using Logistic Regression (L1, L2, Elastic Net), Random Forest, Gradient Boosting, XGBoost, and CatBoost. Nested cross-validation with SMOTE addressed class imbalance. Fusion strategies included early (feature concatenation), intermediate (stacked ensembles), and late (weighted averaging) fusion. Among single modalities, CT with Elastic Net achieved the highest cross-validated AUC (0.679, 95% CI: 0.57–0.79). Fusion improved performance: PET + CT + Clinical late fusion with Elastic Net achieved the best cross-validated AUC (0.811, 95% CI: 0.69–0.91). Out-of-fold ROC curves confirmed stronger discrimination for the fusion model (AUC = 0.836 vs. 0.741 for CT). Fusion also showed better calibration, higher net clinical benefit (decision-curve analysis), and clearer survival stratification (Kaplan–Meier). Integrating PET, CT, and CP data—particularly via late fusion with Elastic Net—enhances discrimination beyond single-modality models and supports more consistent risk stratification. These findings suggest practical potential for informing postoperative surveillance and adjuvant therapy decisions, encouraging a shift beyond TNM alone toward interpretable multimodal frameworks. External validation in larger, multicenter cohorts is warranted. The online version contains supplementary material available at 10.1007/s00432-025-06311-w.

Mixed Modality Classification Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

NeuroRAD-FM: A Foundation Model for Neuro-Oncology with Distributionally Robust Training

Moinak Bhattacharya, Angelica P. Kurtz, Fabio M. Iwamoto, Prateek Prasanna, Gagandeep Singh

•preprint•Sep 18 2025

Neuro-oncology poses unique challenges for machine learning due to heterogeneous data and tumor complexity, limiting the ability of foundation models (FMs) to generalize across cohorts. Existing FMs also perform poorly in predicting uncommon molecular markers, which are essential for treatment response and risk stratification. To address these gaps, we developed a neuro-oncology specific FM with a distributionally robust loss function, enabling accurate estimation of tumor phenotypes while maintaining cross-institution generalization. We pretrained self-supervised backbones (BYOL, DINO, MAE, MoCo) on multi-institutional brain tumor MRI and applied distributionally robust optimization (DRO) to mitigate site and class imbalance. Downstream tasks included molecular classification of common markers (MGMT, IDH1, 1p/19q, EGFR), uncommon alterations (ATRX, TP53, CDKN2A/2B, TERT), continuous markers (Ki-67, TP53), and overall survival prediction in IDH1 wild-type glioblastoma at UCSF, UPenn, and CUIMC. Our method improved molecular prediction and reduced site-specific embedding differences. At CUIMC, mean balanced accuracy rose from 0.744 to 0.785 and AUC from 0.656 to 0.676, with the largest gains for underrepresented endpoints (CDKN2A/2B accuracy 0.86 to 0.92, AUC 0.73 to 0.92; ATRX AUC 0.69 to 0.82; Ki-67 accuracy 0.60 to 0.69). For survival, c-index improved at all sites: CUIMC 0.592 to 0.597, UPenn 0.647 to 0.672, UCSF 0.600 to 0.627. Grad-CAM highlighted tumor and peri-tumoral regions, confirming interpretability. Overall, coupling FMs with DRO yields more site-invariant representations, improves prediction of common and uncommon markers, and enhances survival discrimination, underscoring the need for prospective validation and integration of longitudinal and interventional signals to advance precision neuro-oncology.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab Breakthrough Benchmark SOTA

A Compound-Eye-Inspired Multi-Scale Neural Architecture with Integrated Attention Mechanisms.

Neri F, Yang M, Xue Y

•papers•Sep 18 2025

In the context of neural system structure modeling and complex visual tasks, the effective integration of multi-scale features and contextual information is critical for enhancing model performance. This paper proposes a biologically inspired hybrid neural network architecture - CompEyeNet - which combines the global modeling capacity of transformers with the efficiency of lightweight convolutional structures. The backbone network, multi-attention transformer backbone network (MATBN), integrates multiple attention mechanisms to collaboratively model local details and long-range dependencies. The neck network, compound eye neck network (CENN), introduces high-resolution feature layers and efficient attention fusion modules to significantly enhance multi-scale information representation and reconstruction capability. CompEyeNet is evaluated on three authoritative medical image segmentation datasets: MICCAI-CVC-ClinicDB, ISIC2018, and MICCAI-tooth-segmentation, demonstrating its superior performance. Experimental results show that compared to models such as Deeplab, Unet, and the YOLO series, CompEyeNet achieves better performance with fewer parameters. Specifically, compared to the baseline model YOLOv11, CompEyeNet reduces the number of parameters by an average of 38.31%. On key performance metrics, the average Dice coefficient improves by 0.87%, the Jaccard index by 1.53%, Precision by 0.58%, and Recall by 1.11%. These findings verify the advantages of the proposed architecture in terms of parameter efficiency and accuracy, highlighting the broad application potential of bio-inspired attention-fusion hybrid neural networks in neural system modeling and image analysis.

Mixed Modality Segmentation Methodology In Silico Benchmark SOTA

Machine Learning based Radiomics from Multi-parametric Magnetic Resonance Imaging for Predicting Lymph Node Metastasis in Cervical Cancer.

Liu J, Zhu M, Li L, Zang L, Luo L, Zhu F, Zhang H, Xu Q

•papers•Sep 18 2025

Construct and compare multiple machine learning models to predict lymph node (LN) metastasis in cervical cancer, utilizing radiomic features extracted from preoperative multi-parametric magnetic resonance imaging (MRI). This study retrospectively enrolled 407 patients with cervical cancer who were randomly divided into a training cohort (n=284) and a validation cohort (n=123). A total of 4065 radiomic features were extracted from the tumor regions of interest on contrast-enhanced T1-weighted imaging, T2-weighted imaging, and diffusion-weighted imaging for each patient. The Mann-Whitney U test, Spearman correlation analysis, and selection operator Cox regression analysis were employed for radiomic feature selection. The relationship between MRI radiomic features and LN status was analyzed using five machine-learning algorithms. Model performance was evaluated by measuring the area under the receiver-operating characteristic curve (AUC) and accuracy (ACC). Moreover, Kaplan-Meier analysis was used to validate the prognostic value of selected clinical and radiomic characteristics. LN metastasis was pathologically detected in 24.3% (99/407) of patients. Following a three-step feature selection, 18 radiomic features were employed for model construction. The XGBoost model exhibited superior performance compared to other models, achieving an AUC, accuracy, sensitivity, specificity, and F1 score of 0.9268, 0.8969, 0.7419, 0.9891, and 0.8364, respectively, on the validation set. Additionally, Kaplan-Meier curves indicated a significant correlation between radiomic scores and progression-free survival in cervical cancer patients (p < 0.05). Among the machine learning models, XGBoost demonstrated the best predictive ability for LN metastasis and showed prognostic value through its radiomic score, highlighting its clinical potential. Machine learning-based multi-parametric MRI radiomic analysis demonstrated promising performance in the preoperative prediction of LN metastasis and clinical prognosis in cervical cancer.

MRI Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Artificial Intelligence in Cardiac Amyloidosis: A Systematic Review and Meta-Analysis of Diagnostic Accuracy Across Imaging and Non-Imaging Modalities

Kumbalath, R. M., Challa, D., Patel, M. K., Prajapati, S. D., Kumari, K., mehan, A., Chopra, R., Somegowda, Y. M., Khan, R., Ramteke, H. D., juneja, M.

•preprint•Sep 18 2025

IntroductionCardiac amyloidosis (CA) is an underdiagnosed infiltrative cardiomyopathy associated with poor outcomes if not detected early. Artificial intelligence (AI) has emerged as a promising adjunct to conventional diagnostics, leveraging imaging and non-imaging data to improve recognition of CA. However, evidence on the comparative diagnostic performance of AI across modalities remains fragmented. This meta-analysis aimed to synthesize and quantify the diagnostic performance of AI models in CA across multiple modalities. MethodsA systematic literature search was conducted in PubMed, Embase, Web of Science, and Cochrane Library from inception to August 2025. Only published observational studies applying AI to the diagnosis of CA were included. Data were extracted on patient demographics, AI algorithms, modalities, and diagnostic performance metrics. Risk of bias was assessed using QUADAS-2, and certainty of evidence was graded using GRADE. Random-effects meta-analysis (REML) was performed to pool accuracy, precision, recall, F1-score, and area under the curve (AUC). ResultsFrom 115 screened studies, 25 observational studies met the inclusion criteria, encompassing a total of 589,877 patients with a male predominance (372,458 males, 63.2%; 221,818 females, 36.6%). A wide range of AI algorithms were applied, most notably convolutional neural networks (CNNs), which accounted for 526,879 patients, followed by 3D-ResNet architectures (56,872 patients), hybrid segmentation-classification networks (3,747), and smaller studies employing random forests (636), Res-CRNN (89), and traditional machine learning approaches (769). Data modalities included ECG (341,989 patients), echocardiography (>70,000 patients across multiple cohorts), scintigraphy ([~]24,000 patients), cardiac MRI ([~]900 patients), CT (299 patients), and blood tests (261 patients). Pooled diagnostic performance across all modalities demonstrated an overall accuracy of 84.0% (95% CI: 74.6-93.5), precision of 85.8% (95% CI: 79.6-92.0), recall (sensitivity) of 89.6% (95% CI: 85.7-93.4), and an F1-score of 87.2% (95% CI: 81.8-92.6). Area under the curve (AUC) analysis revealed modality-specific variation, with scintigraphy achieving the highest pooled AUC (99.7%), followed by MRI (96.8%), echocardiography (94.3%), blood tests (95.0%), CT (98.0%), and ECG (88.5%). Subgroup analysis confirmed significant differences between modalities (p < 0.001), with MRI and scintigraphy showing consistent high performance and low-to-moderate heterogeneity, while echocardiography displayed moderate accuracy but marked variability, and ECG demonstrated the lowest and most heterogeneous results. ConclusionAI demonstrates strong potential for improving CA diagnosis, with MRI and scintigraphy providing the most reliable performance, echocardiography offering an accessible but heterogeneous option, and ECG models remaining least consistent. While promising, future prospective multicenter studies are needed to validate AI models, improve subtype discrimination, and optimize multimodal integration for real-world clinical use.

Mixed Modality Classification Cardiac Meta Analysis In Silico Benchmark SOTA

Multimodal deep learning integration for predicting renal function outcomes in living donor kidney transplantation: a retrospective cohort study.

Kim JM, Jung H, Kwon HE, Ko Y, Jung JH, Shin S, Kim YH, Kim YH, Jun TJ, Kwon H

•papers•Sep 17 2025

Accurately predicting post-transplant renal function is essential for optimizing donor-recipient matching and improving long-term outcomes in kidney transplantation (KT). Traditional models using only structured clinical data often fail to account for complex biological and anatomical factors. This study aimed to develop and validate a multimodal deep learning model that integrates computed tomography (CT) imaging, radiology report text, and structured clinical variables to predict 1-year estimated glomerular filtration rate (eGFR) in living donor kidney transplantation (LDKT) recipients. A retrospective cohort of 1,937 LDKT recipients was selected from 3,772 KT cases. Exclusions included deceased donor KT, immunologic high-risk recipients (n = 304), missing CT imaging, early graft complications, and anatomical abnormalities. eGFR at 1 year post-transplant was classified into four categories: > 90, 75-90, 60-75, and 45-60 mL/min/1.73 m2. Radiology reports were embedded using BioBERT, while CT videos were encoded using a CLIP-based visual extractor. These were fused with structured clinical features and input into ensemble classifiers including XGBoost. Model performance was evaluated using cross-validation and SHapley Additive exPlanations (SHAP) analysis. The full multimodal model achieved a macro F1 score of 0.675, micro F1 score of 0.704, and weighted F1 score of 0.698-substantially outperforming the clinical-only model (macro F1 = 0.292). CT imaging contributed more than text data (clinical + CT macro F1 = 0.651; clinical + text = 0.486). The model showed highest accuracy in the >90 (F1 = 0.7773) and 60-75 (F1 = 0.7303) categories. SHAP analysis identified donor age, BMI, and donor sex as key predictors. Dimensionality reduction confirmed internal feature validity. Multimodal deep learning integrating clinical, imaging, and textual data enhances prediction of post-transplant renal function. This framework offers a robust and interpretable approach for individualized risk stratification in LDKT, supporting precision medicine in transplantation.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Influence of Mammography Acquisition Parameters on AI and Radiologist Interpretive Performance.

Lotter W, Hippe DS, Oshiro T, Lowry KP, Milch HS, Miglioretti DL, Elmore JG, Lee CI, Hsu W

•papers•Sep 17 2025

"Just Accepted" papers have undergone full peer review and have been accepted for publication in Radiology: Artificial Intelligence. This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content. Purpose To evaluate the impact of screening mammography acquisition parameters on the interpretive performance of AI and radiologists. Materials and Methods The associations between seven mammogram acquisition parameters-mammography machine version, kVp, x-ray exposure delivered, relative x-ray exposure, paddle size, compression force, and breast thickness-and AI and radiologist performance in interpreting two-dimensional screening mammograms acquired by a diverse health system between December 2010 and 2019 were retrospectively evaluated. The top 11 AI models and the ensemble model from the Digital Mammography DREAM Challenge were assessed. The associations between each acquisition parameter and the sensitivity and specificity of the AI models and the radiologists' interpretations were separately evaluated using generalized estimating equations-based models at the examination level, adjusted for several clinical factors. Results The dataset included 28,278 screening two-dimensional mammograms from 22,626 women (mean age 58.5 years ± 11.5 [SD]; 4913 women had multiple mammograms). Of these, 324 examinations resulted in breast cancer diagnosis within 1 year. The acquisition parameters were significantly associated with the performance of both AI and radiologists, with absolute effect sizes reaching 10% for sensitivity and 5% for specificity; however, the associations differed between AI and radiologists for several parameters. Increased exposure delivered reduced the specificity for the ensemble AI (-4.5% per 1 SD increase; P < .001) but not radiologists (P = .44). Increased compression force reduced the specificity for radiologists (-1.3% per 1 SD increase; P < .001) but not for AI (P = .60). Conclusion Screening mammography acquisition parameters impacted the performance of both AI and radiologists, with some parameters impacting performance differently. ©RSNA, 2025.

Mammography Classification Breast Retrospective Clinical In Silico Consortium Benchmark SOTA

Habitat-aware radiomics and adaptive 2.5D deep learning predict treatment response and long-term survival in ESCC patients undergoing neoadjuvant chemoimmunotherapy.

Gao X, Yang L, She T, Wang F, Ding H, Lu Y, Xu Y, Wang Y, Li P, Duan X, Leng X

•papers•Sep 17 2025

Current radiomic approaches inadequately resolve spatial intratumoral heterogeneity (ITH) in esophageal squamous cell carcinoma (ESCC), limiting neoadjuvant chemoimmunotherapy (NACI) response prediction. We propose an interpretable multimodal framework to: (1) quantitatively map intra-/peritumoral heterogeneity via voxel-wise habitat radiomics; (2) model cross-sectional tumor biology using 2.5D deep learning; and (3) establish mechanism-driven biomarkers via SHAP interpretability to identify resistance-linked subregions. This dual-center retrospective study analyzed 269 treatment-naïve ESCC patients with baseline PET/CT (training: n = 144; validation: n = 62; test: n = 63). Habitat radiomics delineated tumor subregions via K-means clustering (Calinski-Harabasz-optimized) on PET/CT, extracting 1,834 radiomic features per modality. A multi-stage pipeline (univariate filtering, mRMR, LASSO regression) selected 32 discriminative features. The 2.5D model aggregated ± 4 peri-tumoral slices, fusing PET/CT via MixUp channels using a fine-tuned ResNet50 (ImageNet-pretrained), with multi-instance learning (MIL) translating slice-level features to patient-level predictions. Habitat features, MIL signatures, and clinical variables were integrated via five-classifier ensemble (ExtraTrees/SVM/RandomForest) and Crossformer architecture (SMOTE-balanced). Validation included AUC, sensitivity, specificity, calibration curves, decision curve analysis (DCA), survival metrics (C-index, Kaplan-Meier), and interpretability (SHAP, Grad-CAM). Habitat radiomics achieved superior validation AUC (0.865, 95% CI: 0.778-0.953), outperforming conventional radiomics (ΔAUC + 3.6%, P < 0.01) and clinical models (ΔAUC + 6.4%, P < 0.001). SHAP identified the invasive front (H2) as dominant predictor (40% of top features), with wavelet_LHH_firstorder_Entropy showing highest impact (SHAP = + 0.42). The 2.5D MIL model demonstrated strong generalizability (validation AUC: 0.861). The combined model achieved state-of-the-art test performance (AUC = 0.824, sensitivity = 0.875) with superior calibration (Hosmer-Lemeshow P > 0.800), effective survival stratification (test C-index: 0.809), and 23-41% net benefit improvement in DCA. Integrating habitat radiomics and 2.5D deep learning enables interpretable dual diagnostic-prognostic stratification in ESCC, advancing precision oncology by decoding spatial heterogeneity.

Mixed Modality Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

A Deep Learning Framework for Synthesizing Longitudinal Infant Brain MRI during Early Development.

Fang Y, Xiong H, Huang J, Liu F, Shen Z, Cai X, Zhang H, Wang Q

•papers•Sep 17 2025

"Just Accepted" papers have undergone full peer review and have been accepted for publication in Radiology: Artificial Intelligence. This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content. Purpose To develop a three-stage, age-and modality-conditioned framework to synthesize longitudinal infant brain MRI scans, and account for rapid structural and contrast changes during early brain development. Materials and Methods This retrospective study used T1- and T2-weighted MRI scans (848 scans) from 139 infants in the Baby Connectome Project, collected since September 2016. The framework models three critical image cues related: volumetric expansion, cortical folding, and myelination, predicting missing time points with age and modality as predictive factors. The method was compared with LGAN, CounterSyn, and Diffusion-based approach using peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM) and the Dice similarity coefficient (DSC). Results The framework was trained on 119 participants (average age: 11.25 ± 6.16 months, 60 female, 59 male) and tested on 20 (average age: 12.98 ± 6.59 months, 11 female, 9 male). For T1-weighted images, PSNRs were 25.44 ± 1.95 and 26.93 ± 2.50 for forward and backward MRI synthesis, and SSIMs of 0.87 ± 0.03 and 0.90 ± 0.02. For T2-weighted images, PSNRs were 26.35 ± 2.30 and 26.40 ± 2.56, with SSIMs of 0.87 ± 0.03 and 0.89 ± 0.02, significantly outperforming competing methods (P < .001). The framework also excelled in tissue segmentation (P < .001) and cortical reconstruction, achieving DSC of 0.85 for gray matter and 0.86 for white matter, with intraclass correlation coefficients exceeding 0.8 in most cortical regions. Conclusion The proposed three-stage framework effectively synthesized age-specific infant brain MRI scans, outperforming competing methods in image quality and tissue segmentation with strong performance in cortical reconstruction, demonstrating potential for developmental modeling and longitudinal analyses. ©RSNA, 2025.

MRI Image Synthesis Neurological Retrospective Clinical In Silico Benchmark SOTA

Filter Papers

Tags

Rapid and robust quantitative cartilage assessment for the clinical setting: deep learning-enhanced accelerated T2 mapping.

Multimodal radiomics fusion for predicting postoperative recurrence in NSCLC patients.

NeuroRAD-FM: A Foundation Model for Neuro-Oncology with Distributionally Robust Training

A Compound-Eye-Inspired Multi-Scale Neural Architecture with Integrated Attention Mechanisms.

Machine Learning based Radiomics from Multi-parametric Magnetic Resonance Imaging for Predicting Lymph Node Metastasis in Cervical Cancer.

Artificial Intelligence in Cardiac Amyloidosis: A Systematic Review and Meta-Analysis of Diagnostic Accuracy Across Imaging and Non-Imaging Modalities

Multimodal deep learning integration for predicting renal function outcomes in living donor kidney transplantation: a retrospective cohort study.

Influence of Mammography Acquisition Parameters on AI and Radiologist Interpretive Performance.

Habitat-aware radiomics and adaptive 2.5D deep learning predict treatment response and long-term survival in ESCC patients undergoing neoadjuvant chemoimmunotherapy.

A Deep Learning Framework for Synthesizing Longitudinal Infant Brain MRI during Early Development.

Ready to Sharpen Your Edge?