Latest Papers on Radiology AI. Tags: Benchmark SOTA, Order: Best Match, Limit: 10.

MedCAL-Bench: A Comprehensive Benchmark on Cold-Start Active Learning with Foundation Models for Medical Image Analysis

Ning Zhu, Xiaochuan Ma, Shaoting Zhang, Guotai Wang

•preprint•Aug 5 2025

Cold-Start Active Learning (CSAL) aims to select informative samples for annotation without prior knowledge, which is important for improving annotation efficiency and model performance under a limited annotation budget in medical image analysis. Most existing CSAL methods rely on Self-Supervised Learning (SSL) on the target dataset for feature extraction, which is inefficient and limited by insufficient feature representation. Recently, pre-trained Foundation Models (FMs) have shown powerful feature extraction ability with a potential for better CSAL. However, this paradigm has been rarely investigated, with a lack of benchmarks for comparison of FMs in CSAL tasks. To this end, we propose MedCAL-Bench, the first systematic FM-based CSAL benchmark for medical image analysis. We evaluate 14 FMs and 7 CSAL strategies across 7 datasets under different annotation budgets, covering classification and segmentation tasks from diverse medical modalities. It is also the first CSAL benchmark that evaluates both the feature extraction and sample selection stages. Our experimental results reveal that: 1) Most FMs are effective feature extractors for CSAL, with DINO family performing the best in segmentation; 2) The performance differences of these FMs are large in segmentation tasks, while small for classification; 3) Different sample selection strategies should be considered in CSAL on different datasets, with Active Learning by Processing Surprisal (ALPS) performing the best in segmentation while RepDiv leading for classification. The code is available at https://github.com/HiLab-git/MedCAL-Bench.

Mixed Modality Segmentation Dataset Release In Silico Academic Lab Open Code Benchmark SOTA

STARFormer: A novel spatio-temporal aggregation reorganization transformer of FMRI for brain disorder diagnosis.

Dong W, Li Y, Zeng W, Chen L, Yan H, Siok WT, Wang N

•papers•Aug 5 2025

Many existing methods that use functional magnetic resonance imaging (fMRI) to classify brain disorders, such as autism spectrum disorder (ASD) and attention deficit hyperactivity disorder (ADHD), often overlook the integration of spatial and temporal dependencies of the blood oxygen level-dependent (BOLD) signals, which may lead to inaccurate or imprecise classification results. To solve this problem, we propose a spatio-temporal aggregation reorganization transformer (STARFormer) that effectively captures both spatial and temporal features of BOLD signals by incorporating three key modules. The region of interest (ROI) spatial structure analysis module uses eigenvector centrality (EC) to reorganize brain regions based on effective connectivity, highlighting critical spatial relationships relevant to the brain disorder. The temporal feature reorganization module systematically segments the time series into equal-dimensional window tokens and captures multiscale features through variable window and cross-window attention. The spatio-temporal feature fusion module employs a parallel transformer architecture with dedicated temporal and spatial branches to extract integrated features. The proposed STARFormer has been rigorously evaluated on two publicly available datasets for the classification of ASD and ADHD. The experimental results confirm that STARFormer achieves state-of-the-art performance across multiple evaluation metrics, providing a more accurate and reliable tool for the diagnosis of brain disorders and biomedical research. The official implementation codes are available at: https://github.com/NZWANG/STARFormer.

MRI Classification Neurological Methodology In Silico Academic Lab Benchmark SOTA Open Code

GRASPing Anatomy to Improve Pathology Segmentation

Keyi Li, Alexander Jaus, Jens Kleesiek, Rainer Stiefelhagen

•preprint•Aug 5 2025

Radiologists rely on anatomical understanding to accurately delineate pathologies, yet most current deep learning approaches use pure pattern recognition and ignore the anatomical context in which pathologies develop. To narrow this gap, we introduce GRASP (Guided Representation Alignment for the Segmentation of Pathologies), a modular plug-and-play framework that enhances pathology segmentation models by leveraging existing anatomy segmentation models through pseudolabel integration and feature alignment. Unlike previous approaches that obtain anatomical knowledge via auxiliary training, GRASP integrates into standard pathology optimization regimes without retraining anatomical components. We evaluate GRASP on two PET/CT datasets, conduct systematic ablation studies, and investigate the framework's inner workings. We find that GRASP consistently achieves top rankings across multiple evaluation metrics and diverse architectures. The framework's dual anatomy injection strategy, combining anatomical pseudo-labels as input channels with transformer-guided anatomical feature fusion, effectively incorporates anatomical context.

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA

Prediction of breast cancer HER2 status changes based on ultrasound radiomics attention network.

Liu J, Xue X, Yan Y, Song Q, Cheng Y, Wang L, Wang X, Xu D

•papers•Aug 5 2025

Following Neoadjuvant Chemotherapy (NAC), there exists a probability of changes occurring in the Human Epidermal Growth Factor Receptor 2 (HER2) status. If these changes are not promptly addressed, it could hinder the timely adjustment of treatment plans, thereby affecting the optimal management of breast cancer. Consequently, the accurate prediction of HER2 status changes holds significant clinical value, underscoring the need for a model capable of precisely forecasting these alterations. In this paper, we elucidate the intricacies surrounding HER2 status changes, and propose a deep learning architecture combined with radiomics techniques, named as Ultrasound Radiomics Attention Network (URAN), to predict HER2 status changes. Firstly, radiomics technology is used to extract ultrasound image features to provide rich and comprehensive medical information. Secondly, HER2 Key Feature Selection (HKFS) network is constructed for retain crucial features relevant to HER2 status change. Thirdly, we design Max and Average Attention and Excitation (MAAE) network to adjust the model's focus on different key features. Finally, a fully connected neural network is utilized to predict HER2 status changes. The code to reproduce our experiments can be found at https://github.com/joanaapa/Foundation-Medical. Our research was carried out using genuine ultrasound images sourced from hospitals. On this dataset, URAN outperformed both state-of-the-art and traditional methods in predicting HER2 status changes, achieving an accuracy of 0.8679 and an AUC of 0.8328 (95% CI: 0.77-0.90). Comparative experiments on the public BUS_UCLM dataset further demonstrated URAN's superiority, attaining an accuracy of 0.9283 and an AUC of 0.9161 (95% CI: 0.91-0.92). Additionally, we undertook rigorously crafted ablation studies, which validated the logicality and effectiveness of the radiomics techniques, as well as the HKFS and MAAE modules integrated within the URAN model. The results pertaining to specific HER2 statuses indicate that URAN exhibits superior accuracy in predicting changes in HER2 status characterized by low expression and IHC scores of 2+ or below. Furthermore, we examined the radiomics attributes of ultrasound images and discovered that various wavelet transform features significantly impacted the changes in HER2 status. We have developed a URAN method for predicting HER2 status changes that combines radiomics techniques and deep learning. URAN model have better predictive performance compared to other competing algorithms, and can mine key radiomics features related to HER2 status changes.

Ultrasound Classification Breast Retrospective Clinical In Silico Academic Lab Open Code Benchmark SOTA

Multi-Center 3D CNN for Parkinson's disease diagnosis and prognosis using clinical and T1-weighted MRI data.

Basaia S, Sarasso E, Sciancalepore F, Balestrino R, Musicco S, Pisano S, Stankovic I, Tomic A, Micco R, Tessitore A, Salvi M, Meiburger KM, Kostic VS, Molinari F, Agosta F, Filippi M

•papers•Aug 5 2025

Parkinson's disease (PD) presents challenges in early diagnosis and progression prediction. Recent advancements in machine learning, particularly convolutional-neural-networks (CNNs), show promise in enhancing diagnostic accuracy and prognostic capabilities using neuroimaging data. The aims of this study were: (i) develop a 3D-CNN based on MRI to distinguish controls and PD patients and (ii) employ CNN to predict the progression of PD. Three cohorts were selected: 86 mild, 62 moderate-to-severe PD patients, and 60 controls; 14 mild-PD patients and 14 controls from Parkinson's Progression Markers Initiative database, and 38 de novo mild-PD patients and 38 controls. All participants underwent MRI scans and clinical evaluation at baseline and over 2-years. PD subjects were classified in two clusters of different progression using k-means clustering based on baseline and follow-up UDPRS-III scores. A 3D-CNN was built and tested on PD patients and controls, with binary classifications: controls vs moderate-to-severe PD, controls vs mild-PD, and two clusters of PD progression. The effect of transfer learning was also tested. CNN effectively differentiated moderate-to-severe PD from controls (74% accuracy) using MRI data alone. Transfer learning significantly improved performance in distinguishing mild-PD from controls (64% accuracy). For predicting disease progression, the model achieved over 70% accuracy by combining MRI and clinical data. Brain regions most influential in the CNN's decisions were visualized. CNN, integrating multimodal data and transfer learning, provides encouraging results toward early-stage classification and progression monitoring in PD. Its explainability through activation maps offers potential for clinical application in early diagnosis and personalized monitoring.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Development and Validation of an Explainable MRI-Based Habitat Radiomics Model for Predicting p53-Abnormal Endometrial Cancer: A Multicentre Feasibility Study.

Jin W, Zhang H, Ning Y, Chen X, Zhang G, Li H, Zhang H

•papers•Aug 4 2025

We developed an MRI-based habitat radiomics model (HRM) to predict p53-abnormal (p53abn) molecular subtypes of endometrial cancer (EC). Patients with pathologically confirmed EC were retrospectively enrolled from three hospitals and categorized into a training cohort (n = 270), test cohort 1 (n = 70), and test cohort 2 (n = 154). The tumour was divided into habitat sub-regions using diffusion-weighted imaging (DWI) and contrast-enhanced (CE) images with the K-means algorithm. Radiomics features were extracted from T1-weighted imaging (T1WI), T2-weighted imaging (T2WI), DWI, and CE images. Three machine learning classifiers-logistic regression, support vector machines, and random forests-were applied to develop predictive models for p53abn EC. Model performance was validated using receiver operating characteristic (ROC) curves, and the model with the best predictive performance was selected as the HRM. A whole-region radiomics model (WRM) was also constructed, and a clinical model (CM) with five clinical features was developed. The SHApley Additive ExPlanations (SHAP) method was used to explain the outputs of the models. DeLong's test evaluated and compared the performance across the cohorts. A total of 1920 habitat radiomics features were considered. Eight features were selected for the HRM, ten for the WRM, and three clinical features for the CM. The HRM achieved the highest AUC: 0.855 (training), 0.769 (test1), and 0.766 (test2). The AUCs of the WRM were 0.707 (training), 0.703 (test1), and 0.738 (test2). The AUCs of the CM were 0.709 (training), 0.641 (test1), and 0.665 (test2). The MRI-based HRM successfully predicted p53abn EC. The results indicate that habitat combined with machine learning, radiomics, and SHAP can effectively predict p53abn EC, providing clinicians with intuitive insights and interpretability regarding the impact of risk factors in the model.

MRI Classification Abdominal Retrospective Clinical In Silico Benchmark SOTA GenAI

Natural language processing evaluation of trends in cervical cancer incidence in radiology reports: A ten-year survey.

López-Úbeda P, Martín-Noguerol T, Luna A

•papers•Aug 4 2025

Cervical cancer commonly associated with human papillomavirus (HPV) infection, remains the fourth most common cancer in women globally. This study aims to develop and evaluate a Natural Language Processing (NLP) system to identify and analyze cervical cancer incidence trends from 2013 to 2023 at our institution, focusing on age-specific variations and evaluating the possible impact of HPV vaccination. This retrospective cohort study, we analyzed unstructured radiology reports collected between 2013 and 2023, comprising 433,207 studies involving 250,181 women who underwent CT, MRI, or ultrasound scans of the abdominopelvic region. A rule-based NLP system was developed to extract references to cervical cancer from these reports and validated against a set of 200 manually annotated cases reviewed by an experienced radiologist. The NLP system demonstrated excellent performance, achieving an accuracy of over 99.5 %. This high reliability enabled its application in a large-scale population study. Results show that the women under 30 maintain a consistently low cervical cancer incidence, likely reflecting early HPV vaccination impact. The 30-40 cohorts declined until 2020, followed by a slight increase, while the 40-60 groups exhibited an overall downward trend with fluctuations, suggesting long-term vaccine effects. Incidence in patients over 60 also declined, though with greater variability, possibly due to other risk factors. The developed NLP system effectively identified cervical cancer cases from unstructured radiology reports, facilitating an accurate analysis of the impact of HPV vaccination on cervical cancer prevalence and imaging study requirements. This approach demonstrates the potential of AI and NLP tools in enhancing data accuracy and efficiency in medical epidemiology research. NLP-based approaches can significantly improve the collection and analysis of epidemiological data on cervical cancer, supporting the development of more targeted and personalized prevention strategies-particularly in populations with heterogeneous HPV vaccination coverage.

Mixed Modality Classification Abdominal Retrospective Clinical In Silico Benchmark SOTA

Do Edges Matter? Investigating Edge-Enhanced Pre-Training for Medical Image Segmentation

Paul Zaha, Lars Böcking, Simeon Allmendinger, Leopold Müller, Niklas Kühl

•preprint•Aug 4 2025

Medical image segmentation is crucial for disease diagnosis and treatment planning, yet developing robust segmentation models often requires substantial computational resources and large datasets. Existing research shows that pre-trained and finetuned foundation models can boost segmentation performance. However, questions remain about how particular image preprocessing steps may influence segmentation performance across different medical imaging modalities. In particular, edges-abrupt transitions in pixel intensity-are widely acknowledged as vital cues for object boundaries but have not been systematically examined in the pre-training of foundation models. We address this gap by investigating to which extend pre-training with data processed using computationally efficient edge kernels, such as kirsch, can improve cross-modality segmentation capabilities of a foundation model. Two versions of a foundation model are first trained on either raw or edge-enhanced data across multiple medical imaging modalities, then finetuned on selected raw subsets tailored to specific medical modalities. After systematic investigation using the medical domains Dermoscopy, Fundus, Mammography, Microscopy, OCT, US, and XRay, we discover both increased and reduced segmentation performance across modalities using edge-focused pre-training, indicating the need for a selective application of this approach. To guide such selective applications, we propose a meta-learning strategy. It uses standard deviation and image entropy of the raw image to choose between a model pre-trained on edge-enhanced or on raw data for optimal performance. Our experiments show that integrating this meta-learning layer yields an overall segmentation performance improvement across diverse medical imaging tasks by 16.42% compared to models pre-trained on edge-enhanced data only and 19.30% compared to models pre-trained on raw data only.

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA

Enhanced detection of ovarian cancer using AI-optimized 3D CNNs for PET/CT scan analysis.

Sadeghi MH, Sina S, Faghihi R, Alavi M, Giammarile F, Omidi H

•papers•Aug 4 2025

This study investigates how deep learning (DL) can enhance ovarian cancer diagnosis and staging using large imaging datasets. Specifically, we compare six conventional convolutional neural network (CNN) architectures-ResNet, DenseNet, GoogLeNet, U-Net, VGG, and AlexNet-with OCDA-Net, an enhanced model designed for [<sup>18</sup>F]FDG PET image analysis. The OCDA-Net, an advancement on the ResNet architecture, was thoroughly compared using randomly split datasets of training (80%), validation (10%), and test (10%) images. Trained over 100 epochs, OCDA-Net achieved superior diagnostic classification with an accuracy of 92%, and staging results of 94%, supported by robust precision, recall, and F-measure metrics. Grad-CAM ++ heat-maps confirmed that the network attends to hyper-metabolic lesions, supporting clinical interpretability. Our findings show that OCDA-Net outperforms existing CNN models and has strong potential to transform ovarian cancer diagnosis and staging. The study suggests that implementing these DL models in clinical practice could ultimately improve patient prognoses. Future research should expand datasets, enhance model interpretability, and validate these models in clinical settings.

Mixed Modality Classification Abdominal Methodology In Silico Academic Lab Benchmark SOTA

AI-Driven Integration of Deep Learning with Lung Imaging, Functional Analysis, and Blood Gas Metrics for Perioperative Hypoxemia Prediction: Progress and Perspectives.

Huang K, Wu C, Fang J, Pi R

•papers•Aug 4 2025

This Perspective article explores the transformative role of artificial intelligence (AI) in predicting perioperative hypoxemia through the integration of deep learning (DL) with multimodal clinical data, including lung imaging, pulmonary function tests (PFTs), and arterial blood gas (ABG) analysis. Perioperative hypoxemia, defined as arterial oxygen partial pressure (PaO₂) <60 mmHg or oxygen saturation (SpO₂) <90%, poses significant risks of delayed recovery and organ dysfunction. Traditional diagnostic methods, such as radiological imaging and ABG analysis, often lack integrated predictive accuracy. AI frameworks, particularly convolutional neural networks (CNNs) and hybrid models like TD-CNNLSTM-LungNet, demonstrate exceptional performance in detecting pulmonary inflammation and stratifying hypoxemia risk, achieving up to 96.57% accuracy in pneumonia subtype differentiation and an AUC of 0.96 for postoperative hypoxemia prediction. Multimodal AI systems, such as DeepLung-Predict, unify CT scans, PFTs, and ABG parameters to enhance predictive precision, surpassing conventional methods by 22%. However, challenges persist, including dataset heterogeneity, model interpretability, and clinical workflow integration. Future directions emphasize multicenter validation, explainable AI (XAI) frameworks, and pragmatic trials to ensure equitable and reliable deployment. This AI-driven approach not only optimizes resource allocation but also mitigates financial burdens on healthcare systems by enabling early interventions and reducing ICU admission risks.

CT Classification Chest Review Concept Academic Lab Benchmark SOTA Ethics

MedCAL-Bench: A Comprehensive Benchmark on Cold-Start Active Learning with Foundation Models for Medical Image Analysis

STARFormer: A novel spatio-temporal aggregation reorganization transformer of FMRI for brain disorder diagnosis.

GRASPing Anatomy to Improve Pathology Segmentation

Prediction of breast cancer HER2 status changes based on ultrasound radiomics attention network.

Multi-Center 3D CNN for Parkinson's disease diagnosis and prognosis using clinical and T1-weighted MRI data.

Development and Validation of an Explainable MRI-Based Habitat Radiomics Model for Predicting p53-Abnormal Endometrial Cancer: A Multicentre Feasibility Study.

Natural language processing evaluation of trends in cervical cancer incidence in radiology reports: A ten-year survey.

Do Edges Matter? Investigating Edge-Enhanced Pre-Training for Medical Image Segmentation

Enhanced detection of ovarian cancer using AI-optimized 3D CNNs for PET/CT scan analysis.

AI-Driven Integration of Deep Learning with Lung Imaging, Functional Analysis, and Blood Gas Metrics for Perioperative Hypoxemia Prediction: Progress and Perspectives.

Ready to Sharpen Your Edge?