Latest Papers on Radiology AI. Tags: Benchmark SOTA

Deep learning-based prediction of cardiopulmonary disease in retinal images of premature infants

Singh, P., Kumar, S., Tyagi, R., Young, B. K., Jordan, B. K., Scottoline, B., Evers, P. D., Ostmo, S., Coyner, A. S., Lin, W.-C., Gupta, A., Erdogmus, D., Chan, R. V. P., McCourt, E. A., Barry, J. S., McEvoy, C. T., Chiang, M. F., Campbell, J. P., Kalpathy-Cramer, J.

•preprint•Sep 19 2025

ImportanceBronchopulmonary dysplasia (BPD) and pulmonary hypertension (PH) are leading causes of morbidity and mortality in premature infants. ObjectiveTo determine whether images obtained as part of retinopathy of prematurity (ROP) screening might contain features associated with BPD and PH in infants, and whether a multi-modal model integrating imaging features with demographic risk factors might outperform a model based on demographic risk alone. DesignA deep learning model was used to study retinal images collected from patients enrolled in the multi-institutional Imaging and Informatics in Retinopathy of Prematurity (i-ROP) study. SettingSeven neonatal intensive care units. Participants493 infants at risk for ROP undergoing routine ROP screening examinations from 2012 to 2020. Images were limited to <=34 weeks post-menstrual age (PMA) so as to precede the clinical diagnosis of BPD or PH. ExposureBPD was diagnosed by the presence of an oxygen requirement at 36 weeks PMA, and PH was diagnosed by echocardiogram at 34 weeks. A support vector machine model was trained to predict BPD, or PH, diagnosis using: A) image features alone (extracted using Resnet18), B) demographics alone, C) image features concatenated with demographics. To reduce the possibility of confounding with ROP, secondary models were trained using only images without clinical signs of ROP. Main Outcome MeasureFor both BPD and PH, we report performance on a held-out testset (99 patients from the BPD cohort and 37 patients from the PH cohort), assessed by the area under receiver operating characteristic curve. ResultsFor BPD, the diagnostic accuracy of a multimodal model was 0.82 (95% CI: 0.72-0.90), compared to demographics 0.72 (0.60-0.82; P=0.07) or imaging 0.72 (0.61-0.82; P=0.002) alone. For PH, it was 0.91 (0.71-1.0) combined compared to 0.68 (0.43-0.9; P=0.04) for demographics and 0.91 (0.78-1.0; P=0.4) for imaging alone. These associations remained even when models were trained on the subset of images without any clinical signs of ROP. Conclusions and RelevanceRetinal images obtained during ROP screening can be used to predict the diagnosis of BPD and PH in preterm infants, which may lead to earlier diagnosis and avoid the need for invasive diagnostic testing in the future. KEY POINTSO_ST_ABSQuestionC_ST_ABSCan an artificial intelligence (AI) algorithm diagnose bronchopulmonary dysplasia (BPD) or pulmonary hypertension (PH) in retinal images in preterm infants obtained during retinopathy of prematurity (ROP) screening examinations? FindingsAI was able to predict the presence of both BPD and PH in retinal images with higher accuracy than what could be predicted based on baseline demographic risk alone. MeaningDeploying AI models using images obtained during retinopathy of prematurity screening could lead to earlier diagnosis and avoid the need for more invasive diagnostic testing.

OCT Classification Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Leveraging transfer learning from Acute Lymphoblastic Leukemia (ALL) pretraining to enhance Acute Myeloid Leukemia (AML) prediction

Duraiswamy, A., Harris-Birtill, D.

•preprint•Sep 19 2025

We overcome current limitations in Acute Myeloid Leukemia (AML) diagnosis by leveraging a transfer learning approach from Acute Lymphoblastic Leukemia (ALL) classification models, thus addressing the urgent need for more accurate and accessible AML diagnostic tools. AML has poorer prognosis than ALL, with a 5-year relative survival rate of only 17-19% compared to ALL survival rates of up to 75%, making early and accurate detection of AML paramount. Current diagnostic methods, rely heavily on manual microscopic examination, and are often subjective, time-consuming, and can suffer from inter-observer variability. While machine learning has shown promise in cancer classification, its application to AML detection, particularly leveraging the potential of transfer learning from related cancers like Acute Lymphoblastic Leukemia (ALL), remains underexplored. A comprehensive review of state-of-the-art advancements in acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML) classification using deep learning algorithms is undertaken and key approaches are evaluated. The insights gained from this review inform the development of two novel machine learning pipelines designed to benchmark effectiveness of proposed transfer learning approaches. Five pre-trained models are fine-tuned using ALL training data (a novel approach in this context) to optimize their potential for AML classification. The result was the development of a best-in-class (BIC) model that surpasses current state-of-the-art (SOTA) performance in AML classification, advancing the accuracy of machine learning (ML)-driven cancer diagnostics. Author summaryAcute Myeloid Leukemia (AML) is an aggressive cancer with a poor prognosis. Early and accurate diagnosis is critical, but current methods are often subjective and time-consuming. We wanted to create a more accurate diagnostic tool by applying a technique called transfer learning from a similar cancer, Acute Lymphoblastic Leukemia (ALL). Two machine learning pipelines were developed. The first trained five different models on a large AML dataset to establish a baseline. The second pipeline first trained these models on an ALL dataset to "learn" from it before fine-tuning them on the AML data. Our experiments showed that the models that underwent transfer learning process consistently performed better than the models trained on AML data alone. The MobileNetV2 model, in particular, was the best-in-class, outperforming all other models and surpassing the best-reported metrics for AML classification in current literature. Our research demonstrates that transfer learning can enable highly accurate AML diagnostic models. The best-in-class model could potentially be used as a AML diagnostic tool, helping clinicians make faster and more accurate diagnoses, improving patient outcomes.

Mixed Modality Classification Methodology In Silico Benchmark SOTA

Enhancing the reliability of Alzheimer's disease prediction in MRI images.

Islam J, Furqon EN, Farady I, Alex JSR, Shih CT, Kuo CC, Lin CY

•papers•Sep 19 2025

Alzheimer's Disease (AD) diagnostic procedures employing Magnetic Resonance Imaging (MRI) analysis encounter considerable obstacles pertaining to reliability and accuracy, especially when deep learning models are utilized within clinical environments. Present deep learning methodologies for MRI-based AD detection frequently demonstrate spatial dependencies and exhibit deficiencies in robust validation mechanisms. Extant validation techniques inadequately integrate anatomical knowledge and exhibit challenges in feature interpretability across a range of imaging conditions. To address this fundamental gap, we introduce a reverse validation paradigm that systematically repositions anatomical structures to test whether models recognize features based on anatomical characteristics rather than spatial memorization. Our research endeavors to rectify these shortcomings by proposing three innovative methodologies: Feature Position Invariance (FPI) for the validation of anatomical features, biomarker location augmentation aimed at enhancing spatial learning, and High-Confidence Cohort (HCC) selection for the reliable identification of training samples. The FPI methodology leverages reverse validation approach to substantiate model predictions through the reconstruction of anatomical features, bolstered by our extensive data augmentation strategy and a confidence-based sample selection technique. The application of this framework utilizing YOLO and MobileNet architecture has yielded significant advancements in both binary and three-class AD classification tasks, achieving state-of-the-art accuracy with enhancements of 2-4 % relative to baseline models. Additionally, our methodology generates interpretable insights through anatomy-aligned validation, establishing direct links between model decisions and neuropathological features. Our experimental findings reveal consistent performance across various anatomical presentations, signifying that the framework effectively enhances both the reliability and interpretability of AD diagnosis through MRI analysis, thereby equipping medical professionals with a more robust diagnostic support system.

MRI Classification Neurological Methodology In Silico Benchmark SOTA

NeuroRAD-FM: A Foundation Model for Neuro-Oncology with Distributionally Robust Training

Moinak Bhattacharya, Angelica P. Kurtz, Fabio M. Iwamoto, Prateek Prasanna, Gagandeep Singh

•preprint•Sep 18 2025

Neuro-oncology poses unique challenges for machine learning due to heterogeneous data and tumor complexity, limiting the ability of foundation models (FMs) to generalize across cohorts. Existing FMs also perform poorly in predicting uncommon molecular markers, which are essential for treatment response and risk stratification. To address these gaps, we developed a neuro-oncology specific FM with a distributionally robust loss function, enabling accurate estimation of tumor phenotypes while maintaining cross-institution generalization. We pretrained self-supervised backbones (BYOL, DINO, MAE, MoCo) on multi-institutional brain tumor MRI and applied distributionally robust optimization (DRO) to mitigate site and class imbalance. Downstream tasks included molecular classification of common markers (MGMT, IDH1, 1p/19q, EGFR), uncommon alterations (ATRX, TP53, CDKN2A/2B, TERT), continuous markers (Ki-67, TP53), and overall survival prediction in IDH1 wild-type glioblastoma at UCSF, UPenn, and CUIMC. Our method improved molecular prediction and reduced site-specific embedding differences. At CUIMC, mean balanced accuracy rose from 0.744 to 0.785 and AUC from 0.656 to 0.676, with the largest gains for underrepresented endpoints (CDKN2A/2B accuracy 0.86 to 0.92, AUC 0.73 to 0.92; ATRX AUC 0.69 to 0.82; Ki-67 accuracy 0.60 to 0.69). For survival, c-index improved at all sites: CUIMC 0.592 to 0.597, UPenn 0.647 to 0.672, UCSF 0.600 to 0.627. Grad-CAM highlighted tumor and peri-tumoral regions, confirming interpretability. Overall, coupling FMs with DRO yields more site-invariant representations, improves prediction of common and uncommon markers, and enhances survival discrimination, underscoring the need for prospective validation and integration of longitudinal and interventional signals to advance precision neuro-oncology.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab Breakthrough Benchmark SOTA

Rapid and robust quantitative cartilage assessment for the clinical setting: deep learning-enhanced accelerated T2 mapping.

Carretero-Gómez L, Wiesinger F, Fung M, Nunes B, Pedoia V, Majumdar S, Desai AD, Gatti A, Chaudhari A, Sánchez-Lacalle E, Malpica N, Padrón M

•papers•Sep 18 2025

Clinical adoption of T2 mapping is limited by poor reproducibility, lengthy examination times, and cumbersome image analysis. This study aimed to develop an accelerated deep learning (DL)-enhanced cartilage T2 mapping sequence (DL CartiGram), demonstrate its repeatability and reproducibility, and evaluate its accuracy compared to conventional T2 mapping using a semi-automatic pipeline. DL CartiGram was implemented using a modified 2D Multi-Echo Spin-Echo sequence at 3 T, incorporating parallel imaging and DL-based image reconstruction. Phantom tests were performed at two sites to obtain test-retest T2 maps, using single-echo spin-echo (SE) measurements as reference values. At one site, DL CartiGram and conventional T2 mapping were performed on 43 patients. T2 values were extracted from 52 patellar and femoral compartments using DL knee segmentation and the DOSMA framework. Repeatability and reproducibility were assessed using coefficients of variation (CV), Bland-Altman analysis, and concordance correlation coefficients (CCC). T2 differences were evaluated with Wilcoxon signed-rank tests, paired t tests, and accuracy CV. Phantom tests showed intra-site repeatability with CVs ≤ 2.52% and T2 precision ≤ 1 ms. Inter-site reproducibility showed a CV of 2.74% and a CCC of 99% (CI 92-100%). Bland-Altman analysis showed a bias of 1.56 ms between sites (p = 0.03), likely due to temperature effects. In vivo, DL CartiGram reduced scan time by 40%, yielding accurate cartilage T2 measurements (CV = 0.97%) with no significant differences compared to conventional T2 mapping (p = 0.1). DL CartiGram significantly accelerates T2 mapping, while still assuring excellent repeatability and reproducibility. Combined with the semi-automatic post-processing pipeline, it emerges as a promising tool for quantitative T2 cartilage biomarker assessment in clinical settings.

MRI Reconstruction Musculoskeletal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

MDFNet: a multi-dimensional feature fusion model based on structural magnetic resonance imaging representations for brain age estimation.

Zhang C, Nan P, Song L, Wang Y, Su K, Zheng Q

•papers•Sep 18 2025

Brain age estimation plays a significant role in understanding the aging process and its relationship with neurodegenerative diseases. The aim of the study is to devise a unified multi-dimensional feature fusion model (MDFNet) to enhance the brain age estimation solely on structural MRI but with a diverse representation of whole brain, tissue segmentation of gray matter volume, node message passing of brain network, edge-based graph path convolution of brain connectivity, and demographic data. The MDFNet was developed by devising and integrating a whole-brain-level Euclidean-Convolution channel (WBEC-channel), a tissue-level Euclidean-convolution channel (TEC-channel), a Graph-convolution channel based on node message passing (nodeGCN-channel) and an edge-based graph path convolution channel on brain connectivity (edgeGCN-channel), and a multilayer perceptron (MLP) channel for demographic data (MLP-channel) to enhance the multi-dimensional feature fusion. The MDFNet was validated on 1872 healthy subjects from four public datasets, and applied to an independent cohort of Alzheimer's Disease (AD) patients. The interpretability analysis and normative modeling of the MDFNet in brain age estimation were also performed. The MDFNet achieved a superior performance of Mean Absolute Error (MAE) of 4.396 ± 0.244 years, a Pearson Correlation Coefficient (PCC) of 0.912 ± 0.002, and a Spearman's Rank Correlation (SRCC) of 0.819 ± 0.015 when comparing with the state-of-the-art deep learning models. The AD group exhibited a significantly greater brain age gap (BAG) than health group (P < 0.05), and the normative modeling also exhibited a significantly higher mean Z-scores of AD patients than healthy subjects (P < 0.05). The interpretability was also visualized at both the group and individual level, enhancing the reliability of the MDFNet. The MDFNet enhanced the brain age estimation solely on structural MRI by employing a multi-dimensional feature integration strategy.

MRI Registration Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Integrating artificial intelligence with Gamma Knife radiosurgery in treating meningiomas and schwannomas: a review.

Alhosanie TN, Hammo B, Klaib AF, Alshudifat A

•papers•Sep 18 2025

Meningiomas and schwannomas are benign tumors that affect the central nervous system, comprising up to one-third of intracranial neoplasms. Gamma Knife radiosurgery (GKRS), or stereotactic radiosurgery (SRS), is a form of radiation therapy. Although referred to as "surgery," GKRS does not involve incisions. The GK medical device effectively utilizes highly focused gamma rays to treat lesions or tumors, primarily in the brain. In radiation oncology, machine learning (ML) has been used in various aspects, including outcome prediction, quality control, treatment planning, and image segmentation. This review will showcase the advantages of integrating artificial intelligence with Gamma Knife technology in treating schwannomas and meningiomas.This review adheres to PRISMA guidelines. We searched the PubMed, Scopus, and IEEE databases to identify studies published between 2021 and March 2025 that met our inclusion and exclusion criteria. The focus was on AI algorithms applied to patients with vestibular schwannoma and meningioma treated with GKRS. Two reviewers participated in the data extraction and quality assessment process.A total of nine studies were reviewed in this analysis. One distinguished deep learning (DL) model is a dual-pathway convolutional neural network (CNN) that integrates T1-weighted (T1W) and T2-weighted (T2W) MRI scans. This model was tested on 861 patients who underwent GKRS, achieving a Dice Similarity Coefficient (DSC) of 0.90. ML-based radiomics models have also demonstrated that certain radiomic features can predict the response of vestibular schwannomas and meningiomas to radiosurgery. Among these, the neural network model exhibited the best performance. AI models were also employed to predict complications following GKRS, such as peritumoral edema. A Random Survival Forest (RSF) model was developed using clinical, semantic, and radiomics variables, achieving a C-index score of 0.861 and 0.780. This model enables the classification of patients into high-risk and low-risk categories for developing post-GKRS edema.AI and ML models show great potential in tumor segmentation, volumetric assessment, and predicting treatment outcomes for vestibular schwannomas and meningiomas treated with GKRS. However, their successful clinical implementation relies on overcoming challenges related to external validation, standardization, and computational demands. Future research should focus on large-scale, multi-institutional validation studies, integrating multimodal data, and developing cost-effective strategies for deploying AI technologies.

MRI Segmentation Neurological Review In Silico Academic Lab Benchmark SOTA

Limitations of Public Chest Radiography Datasets for Artificial Intelligence: Label Quality, Domain Shift, Bias and Evaluation Challenges

Amy Rafferty, Rishi Ramaesh, Ajitha Rajan

•preprint•Sep 18 2025

Artificial intelligence has shown significant promise in chest radiography, where deep learning models can approach radiologist-level diagnostic performance. Progress has been accelerated by large public datasets such as MIMIC-CXR, ChestX-ray14, PadChest, and CheXpert, which provide hundreds of thousands of labelled images with pathology annotations. However, these datasets also present important limitations. Automated label extraction from radiology reports introduces errors, particularly in handling uncertainty and negation, and radiologist review frequently disagrees with assigned labels. In addition, domain shift and population bias restrict model generalisability, while evaluation practices often overlook clinically meaningful measures. We conduct a systematic analysis of these challenges, focusing on label quality, dataset bias, and domain shift. Our cross-dataset domain shift evaluation across multiple model architectures revealed substantial external performance degradation, with pronounced reductions in AUPRC and F1 scores relative to internal testing. To assess dataset bias, we trained a source-classification model that distinguished datasets with near-perfect accuracy, and performed subgroup analyses showing reduced performance for minority age and sex groups. Finally, expert review by two board-certified radiologists identified significant disagreement with public dataset labels. Our findings highlight important clinical weaknesses of current benchmarks and emphasise the need for clinician-validated datasets and fairer evaluation frameworks.

X-Ray Classification Chest Review In Silico Benchmark SOTA Ethics

A Compound-Eye-Inspired Multi-Scale Neural Architecture with Integrated Attention Mechanisms.

Neri F, Yang M, Xue Y

•papers•Sep 18 2025

In the context of neural system structure modeling and complex visual tasks, the effective integration of multi-scale features and contextual information is critical for enhancing model performance. This paper proposes a biologically inspired hybrid neural network architecture - CompEyeNet - which combines the global modeling capacity of transformers with the efficiency of lightweight convolutional structures. The backbone network, multi-attention transformer backbone network (MATBN), integrates multiple attention mechanisms to collaboratively model local details and long-range dependencies. The neck network, compound eye neck network (CENN), introduces high-resolution feature layers and efficient attention fusion modules to significantly enhance multi-scale information representation and reconstruction capability. CompEyeNet is evaluated on three authoritative medical image segmentation datasets: MICCAI-CVC-ClinicDB, ISIC2018, and MICCAI-tooth-segmentation, demonstrating its superior performance. Experimental results show that compared to models such as Deeplab, Unet, and the YOLO series, CompEyeNet achieves better performance with fewer parameters. Specifically, compared to the baseline model YOLOv11, CompEyeNet reduces the number of parameters by an average of 38.31%. On key performance metrics, the average Dice coefficient improves by 0.87%, the Jaccard index by 1.53%, Precision by 0.58%, and Recall by 1.11%. These findings verify the advantages of the proposed architecture in terms of parameter efficiency and accuracy, highlighting the broad application potential of bio-inspired attention-fusion hybrid neural networks in neural system modeling and image analysis.

Mixed Modality Segmentation Methodology In Silico Benchmark SOTA

Machine Learning based Radiomics from Multi-parametric Magnetic Resonance Imaging for Predicting Lymph Node Metastasis in Cervical Cancer.

Liu J, Zhu M, Li L, Zang L, Luo L, Zhu F, Zhang H, Xu Q

•papers•Sep 18 2025

Construct and compare multiple machine learning models to predict lymph node (LN) metastasis in cervical cancer, utilizing radiomic features extracted from preoperative multi-parametric magnetic resonance imaging (MRI). This study retrospectively enrolled 407 patients with cervical cancer who were randomly divided into a training cohort (n=284) and a validation cohort (n=123). A total of 4065 radiomic features were extracted from the tumor regions of interest on contrast-enhanced T1-weighted imaging, T2-weighted imaging, and diffusion-weighted imaging for each patient. The Mann-Whitney U test, Spearman correlation analysis, and selection operator Cox regression analysis were employed for radiomic feature selection. The relationship between MRI radiomic features and LN status was analyzed using five machine-learning algorithms. Model performance was evaluated by measuring the area under the receiver-operating characteristic curve (AUC) and accuracy (ACC). Moreover, Kaplan-Meier analysis was used to validate the prognostic value of selected clinical and radiomic characteristics. LN metastasis was pathologically detected in 24.3% (99/407) of patients. Following a three-step feature selection, 18 radiomic features were employed for model construction. The XGBoost model exhibited superior performance compared to other models, achieving an AUC, accuracy, sensitivity, specificity, and F1 score of 0.9268, 0.8969, 0.7419, 0.9891, and 0.8364, respectively, on the validation set. Additionally, Kaplan-Meier curves indicated a significant correlation between radiomic scores and progression-free survival in cervical cancer patients (p < 0.05). Among the machine learning models, XGBoost demonstrated the best predictive ability for LN metastasis and showed prognostic value through its radiomic score, highlighting its clinical potential. Machine learning-based multi-parametric MRI radiomic analysis demonstrated promising performance in the preoperative prediction of LN metastasis and clinical prognosis in cervical cancer.

MRI Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Filter Papers

Tags

Deep learning-based prediction of cardiopulmonary disease in retinal images of premature infants

Leveraging transfer learning from Acute Lymphoblastic Leukemia (ALL) pretraining to enhance Acute Myeloid Leukemia (AML) prediction

Enhancing the reliability of Alzheimer's disease prediction in MRI images.

NeuroRAD-FM: A Foundation Model for Neuro-Oncology with Distributionally Robust Training

Rapid and robust quantitative cartilage assessment for the clinical setting: deep learning-enhanced accelerated T2 mapping.

MDFNet: a multi-dimensional feature fusion model based on structural magnetic resonance imaging representations for brain age estimation.

Integrating artificial intelligence with Gamma Knife radiosurgery in treating meningiomas and schwannomas: a review.

Limitations of Public Chest Radiography Datasets for Artificial Intelligence: Label Quality, Domain Shift, Bias and Evaluation Challenges

A Compound-Eye-Inspired Multi-Scale Neural Architecture with Integrated Attention Mechanisms.

Machine Learning based Radiomics from Multi-parametric Magnetic Resonance Imaging for Predicting Lymph Node Metastasis in Cervical Cancer.

Ready to Sharpen Your Edge?