Latest Papers on Radiology AI. Tags: Benchmark SOTA

ClinicalFMamba: Advancing Clinical Assessment using Mamba-based Multimodal Neuroimaging Fusion

Meng Zhou, Farzad Khalvati

•preprint•Aug 5 2025

Multimodal medical image fusion integrates complementary information from different imaging modalities to enhance diagnostic accuracy and treatment planning. While deep learning methods have advanced performance, existing approaches face critical limitations: Convolutional Neural Networks (CNNs) excel at local feature extraction but struggle to model global context effectively, while Transformers achieve superior long-range modeling at the cost of quadratic computational complexity, limiting clinical deployment. Recent State Space Models (SSMs) offer a promising alternative, enabling efficient long-range dependency modeling in linear time through selective scan mechanisms. Despite these advances, the extension to 3D volumetric data and the clinical validation of fused images remains underexplored. In this work, we propose ClinicalFMamba, a novel end-to-end CNN-Mamba hybrid architecture that synergistically combines local and global feature modeling for 2D and 3D images. We further design a tri-plane scanning strategy for effectively learning volumetric dependencies in 3D images. Comprehensive evaluations on three datasets demonstrate the superior fusion performance across multiple quantitative metrics while achieving real-time fusion. We further validate the clinical utility of our approach on downstream 2D/3D brain tumor classification tasks, achieving superior performance over baseline methods. Our method establishes a new paradigm for efficient multimodal medical image fusion suitable for real-time clinical deployment.

Mixed Modality Image Synthesis Neurological Methodology In Silico Benchmark SOTA

GRASPing Anatomy to Improve Pathology Segmentation

Keyi Li, Alexander Jaus, Jens Kleesiek, Rainer Stiefelhagen

•preprint•Aug 5 2025

Radiologists rely on anatomical understanding to accurately delineate pathologies, yet most current deep learning approaches use pure pattern recognition and ignore the anatomical context in which pathologies develop. To narrow this gap, we introduce GRASP (Guided Representation Alignment for the Segmentation of Pathologies), a modular plug-and-play framework that enhances pathology segmentation models by leveraging existing anatomy segmentation models through pseudolabel integration and feature alignment. Unlike previous approaches that obtain anatomical knowledge via auxiliary training, GRASP integrates into standard pathology optimization regimes without retraining anatomical components. We evaluate GRASP on two PET/CT datasets, conduct systematic ablation studies, and investigate the framework's inner workings. We find that GRASP consistently achieves top rankings across multiple evaluation metrics and diverse architectures. The framework's dual anatomy injection strategy, combining anatomical pseudo-labels as input channels with transformer-guided anatomical feature fusion, effectively incorporates anatomical context.

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA

Integration of Spatiotemporal Dynamics and Structural Connectivity for Automated Epileptogenic Zone Localization in Temporal Lobe Epilepsy.

Xiao L, Zheng Q, Li S, Wei Y, Si W, Pan Y

•papers•Aug 5 2025

Accurate localization of the epileptogenic zone (EZ) is essential for surgical success in temporal lobe epilepsy. While stereoelectroencephalography (SEEG) and structural magnetic resonance imaging (MRI) provide complementary insights, existing unimodal methods fail to fully capture epileptogenic brain activity, and multimodal fusion remains challenging due to data complexity and surgeon-dependent interpretations. To address these issues, we proposed a novel multimodal framework to improve EZ localization with SEEG-drived electrophysiology with structural connectivity in temporal lobe epilepsy. By retrospectively analyzing SEEG, post-implant Computed Tomography (CT) and MRI (T1 & Diffusion Tensor Imaging (DTI)) data from 15 patients, we reconstructed SEEG electrode positions and obtained the SEEG and structural connectivity fusion features. We then proposed a spatiotemporal co-attention deep neural network (ST-CANet) to identify the fusion features, categorizing electrodes into seizure onset zone (SOZ), propagation zone (PZ), and non-involved zone (NIZ). Anatomical EZ boundaries were delineated by fusing the electrode position and classification information on brain atlas. The proposed method was evaluated based on the identification and localization performance of three epilepsy-related zones. The experiment results demonstrate that our method achieves 98.08% average accuracy and outperforms other identification methods, and improves the localization with Dice similarity coefficients (DSC) of 95.65% (SOZ), 92.13% (PZ), and 99.61% (NIZ), aligning with clinically validated surgical resection areas. This multimodal fusion strategy based on electrophysiological and structural connectivity information promises to assist neurosurgeons in accurately localizing EZ and may find broader applications in preoperative planning for epilepsy surgeries.

Mixed Modality Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Prediction of breast cancer HER2 status changes based on ultrasound radiomics attention network.

Liu J, Xue X, Yan Y, Song Q, Cheng Y, Wang L, Wang X, Xu D

•papers•Aug 5 2025

Following Neoadjuvant Chemotherapy (NAC), there exists a probability of changes occurring in the Human Epidermal Growth Factor Receptor 2 (HER2) status. If these changes are not promptly addressed, it could hinder the timely adjustment of treatment plans, thereby affecting the optimal management of breast cancer. Consequently, the accurate prediction of HER2 status changes holds significant clinical value, underscoring the need for a model capable of precisely forecasting these alterations. In this paper, we elucidate the intricacies surrounding HER2 status changes, and propose a deep learning architecture combined with radiomics techniques, named as Ultrasound Radiomics Attention Network (URAN), to predict HER2 status changes. Firstly, radiomics technology is used to extract ultrasound image features to provide rich and comprehensive medical information. Secondly, HER2 Key Feature Selection (HKFS) network is constructed for retain crucial features relevant to HER2 status change. Thirdly, we design Max and Average Attention and Excitation (MAAE) network to adjust the model's focus on different key features. Finally, a fully connected neural network is utilized to predict HER2 status changes. The code to reproduce our experiments can be found at https://github.com/joanaapa/Foundation-Medical. Our research was carried out using genuine ultrasound images sourced from hospitals. On this dataset, URAN outperformed both state-of-the-art and traditional methods in predicting HER2 status changes, achieving an accuracy of 0.8679 and an AUC of 0.8328 (95% CI: 0.77-0.90). Comparative experiments on the public BUS_UCLM dataset further demonstrated URAN's superiority, attaining an accuracy of 0.9283 and an AUC of 0.9161 (95% CI: 0.91-0.92). Additionally, we undertook rigorously crafted ablation studies, which validated the logicality and effectiveness of the radiomics techniques, as well as the HKFS and MAAE modules integrated within the URAN model. The results pertaining to specific HER2 statuses indicate that URAN exhibits superior accuracy in predicting changes in HER2 status characterized by low expression and IHC scores of 2+ or below. Furthermore, we examined the radiomics attributes of ultrasound images and discovered that various wavelet transform features significantly impacted the changes in HER2 status. We have developed a URAN method for predicting HER2 status changes that combines radiomics techniques and deep learning. URAN model have better predictive performance compared to other competing algorithms, and can mine key radiomics features related to HER2 status changes.

Ultrasound Classification Breast Retrospective Clinical In Silico Academic Lab Open Code Benchmark SOTA

MedCAL-Bench: A Comprehensive Benchmark on Cold-Start Active Learning with Foundation Models for Medical Image Analysis

Ning Zhu, Xiaochuan Ma, Shaoting Zhang, Guotai Wang

•preprint•Aug 5 2025

Cold-Start Active Learning (CSAL) aims to select informative samples for annotation without prior knowledge, which is important for improving annotation efficiency and model performance under a limited annotation budget in medical image analysis. Most existing CSAL methods rely on Self-Supervised Learning (SSL) on the target dataset for feature extraction, which is inefficient and limited by insufficient feature representation. Recently, pre-trained Foundation Models (FMs) have shown powerful feature extraction ability with a potential for better CSAL. However, this paradigm has been rarely investigated, with a lack of benchmarks for comparison of FMs in CSAL tasks. To this end, we propose MedCAL-Bench, the first systematic FM-based CSAL benchmark for medical image analysis. We evaluate 14 FMs and 7 CSAL strategies across 7 datasets under different annotation budgets, covering classification and segmentation tasks from diverse medical modalities. It is also the first CSAL benchmark that evaluates both the feature extraction and sample selection stages. Our experimental results reveal that: 1) Most FMs are effective feature extractors for CSAL, with DINO family performing the best in segmentation; 2) The performance differences of these FMs are large in segmentation tasks, while small for classification; 3) Different sample selection strategies should be considered in CSAL on different datasets, with Active Learning by Processing Surprisal (ALPS) performing the best in segmentation while RepDiv leading for classification. The code is available at https://github.com/HiLab-git/MedCAL-Bench.

Mixed Modality Segmentation Dataset Release In Silico Academic Lab Open Code Benchmark SOTA

S-RRG-Bench: Structured Radiology Report Generation with Fine-Grained Evaluation Framework

Yingshu Li, Yunyi Liu, Zhanyu Wang, Xinyu Liang, Lingqiao Liu, Lei Wang, Luping Zhou

•preprint•Aug 4 2025

Radiology report generation (RRG) for diagnostic images, such as chest X-rays, plays a pivotal role in both clinical practice and AI. Traditional free-text reports suffer from redundancy and inconsistent language, complicating the extraction of critical clinical details. Structured radiology report generation (S-RRG) offers a promising solution by organizing information into standardized, concise formats. However, existing approaches often rely on classification or visual question answering (VQA) pipelines that require predefined label sets and produce only fragmented outputs. Template-based approaches, which generate reports by replacing keywords within fixed sentence patterns, further compromise expressiveness and often omit clinically important details. In this work, we present a novel approach to S-RRG that includes dataset construction, model training, and the introduction of a new evaluation framework. We first create a robust chest X-ray dataset (MIMIC-STRUC) that includes disease names, severity levels, probabilities, and anatomical locations, ensuring that the dataset is both clinically relevant and well-structured. We train an LLM-based model to generate standardized, high-quality reports. To assess the generated reports, we propose a specialized evaluation metric (S-Score) that not only measures disease prediction accuracy but also evaluates the precision of disease-specific details, thus offering a clinically meaningful metric for report quality that focuses on elements critical to clinical decision-making and demonstrates a stronger alignment with human assessments. Our approach highlights the effectiveness of structured reports and the importance of a tailored evaluation metric for S-RRG, providing a more clinically relevant measure of report quality.

X-Ray LLM Radiology Report Chest Methodology In Silico Open Dataset Benchmark SOTA GenAI

AI-Driven Integration of Deep Learning with Lung Imaging, Functional Analysis, and Blood Gas Metrics for Perioperative Hypoxemia Prediction: Progress and Perspectives.

Huang K, Wu C, Fang J, Pi R

•papers•Aug 4 2025

This Perspective article explores the transformative role of artificial intelligence (AI) in predicting perioperative hypoxemia through the integration of deep learning (DL) with multimodal clinical data, including lung imaging, pulmonary function tests (PFTs), and arterial blood gas (ABG) analysis. Perioperative hypoxemia, defined as arterial oxygen partial pressure (PaO₂) <60 mmHg or oxygen saturation (SpO₂) <90%, poses significant risks of delayed recovery and organ dysfunction. Traditional diagnostic methods, such as radiological imaging and ABG analysis, often lack integrated predictive accuracy. AI frameworks, particularly convolutional neural networks (CNNs) and hybrid models like TD-CNNLSTM-LungNet, demonstrate exceptional performance in detecting pulmonary inflammation and stratifying hypoxemia risk, achieving up to 96.57% accuracy in pneumonia subtype differentiation and an AUC of 0.96 for postoperative hypoxemia prediction. Multimodal AI systems, such as DeepLung-Predict, unify CT scans, PFTs, and ABG parameters to enhance predictive precision, surpassing conventional methods by 22%. However, challenges persist, including dataset heterogeneity, model interpretability, and clinical workflow integration. Future directions emphasize multicenter validation, explainable AI (XAI) frameworks, and pragmatic trials to ensure equitable and reliable deployment. This AI-driven approach not only optimizes resource allocation but also mitigates financial burdens on healthcare systems by enabling early interventions and reducing ICU admission risks.

CT Classification Chest Review Concept Academic Lab Benchmark SOTA Ethics

Do Edges Matter? Investigating Edge-Enhanced Pre-Training for Medical Image Segmentation

Paul Zaha, Lars Böcking, Simeon Allmendinger, Leopold Müller, Niklas Kühl

•preprint•Aug 4 2025

Medical image segmentation is crucial for disease diagnosis and treatment planning, yet developing robust segmentation models often requires substantial computational resources and large datasets. Existing research shows that pre-trained and finetuned foundation models can boost segmentation performance. However, questions remain about how particular image preprocessing steps may influence segmentation performance across different medical imaging modalities. In particular, edges-abrupt transitions in pixel intensity-are widely acknowledged as vital cues for object boundaries but have not been systematically examined in the pre-training of foundation models. We address this gap by investigating to which extend pre-training with data processed using computationally efficient edge kernels, such as kirsch, can improve cross-modality segmentation capabilities of a foundation model. Two versions of a foundation model are first trained on either raw or edge-enhanced data across multiple medical imaging modalities, then finetuned on selected raw subsets tailored to specific medical modalities. After systematic investigation using the medical domains Dermoscopy, Fundus, Mammography, Microscopy, OCT, US, and XRay, we discover both increased and reduced segmentation performance across modalities using edge-focused pre-training, indicating the need for a selective application of this approach. To guide such selective applications, we propose a meta-learning strategy. It uses standard deviation and image entropy of the raw image to choose between a model pre-trained on edge-enhanced or on raw data for optimal performance. Our experiments show that integrating this meta-learning layer yields an overall segmentation performance improvement across diverse medical imaging tasks by 16.42% compared to models pre-trained on edge-enhanced data only and 19.30% compared to models pre-trained on raw data only.

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA

Development and Validation of an Explainable MRI-Based Habitat Radiomics Model for Predicting p53-Abnormal Endometrial Cancer: A Multicentre Feasibility Study.

Jin W, Zhang H, Ning Y, Chen X, Zhang G, Li H, Zhang H

•papers•Aug 4 2025

We developed an MRI-based habitat radiomics model (HRM) to predict p53-abnormal (p53abn) molecular subtypes of endometrial cancer (EC). Patients with pathologically confirmed EC were retrospectively enrolled from three hospitals and categorized into a training cohort (n = 270), test cohort 1 (n = 70), and test cohort 2 (n = 154). The tumour was divided into habitat sub-regions using diffusion-weighted imaging (DWI) and contrast-enhanced (CE) images with the K-means algorithm. Radiomics features were extracted from T1-weighted imaging (T1WI), T2-weighted imaging (T2WI), DWI, and CE images. Three machine learning classifiers-logistic regression, support vector machines, and random forests-were applied to develop predictive models for p53abn EC. Model performance was validated using receiver operating characteristic (ROC) curves, and the model with the best predictive performance was selected as the HRM. A whole-region radiomics model (WRM) was also constructed, and a clinical model (CM) with five clinical features was developed. The SHApley Additive ExPlanations (SHAP) method was used to explain the outputs of the models. DeLong's test evaluated and compared the performance across the cohorts. A total of 1920 habitat radiomics features were considered. Eight features were selected for the HRM, ten for the WRM, and three clinical features for the CM. The HRM achieved the highest AUC: 0.855 (training), 0.769 (test1), and 0.766 (test2). The AUCs of the WRM were 0.707 (training), 0.703 (test1), and 0.738 (test2). The AUCs of the CM were 0.709 (training), 0.641 (test1), and 0.665 (test2). The MRI-based HRM successfully predicted p53abn EC. The results indicate that habitat combined with machine learning, radiomics, and SHAP can effectively predict p53abn EC, providing clinicians with intuitive insights and interpretability regarding the impact of risk factors in the model.

MRI Classification Abdominal Retrospective Clinical In Silico Benchmark SOTA GenAI

Scaling Artificial Intelligence for Prostate Cancer Detection on MRI towards Population-Based Screening and Primary Diagnosis in a Global, Multiethnic Population (Study Protocol)

Anindo Saha, Joeran S. Bosma, Jasper J. Twilt, Alexander B. C. D. Ng, Aqua Asif, Kirti Magudia, Peder Larson, Qinglin Xie, Xiaodong Zhang, Chi Pham Minh, Samuel N. Gitau, Ivo G. Schoots, Martijn F. Boomsma, Renato Cuocolo, Nikolaos Papanikolaou, Daniele Regge, Derya Yakar, Mattijs Elschot, Jeroen Veltman, Baris Turkbey, Nancy A. Obuchowski, Jurgen J. Fütterer, Anwar R. Padhani, Hashim U. Ahmed, Tobias Nordström, Martin Eklund, Veeru Kasivisvanathan, Maarten de Rooij, Henkjan Huisman

•preprint•Aug 4 2025

In this intercontinental, confirmatory study, we include a retrospective cohort of 22,481 MRI examinations (21,288 patients; 46 cities in 22 countries) to train and externally validate the PI-CAI-2B model, i.e., an efficient, next-generation iteration of the state-of-the-art AI system that was developed for detecting Gleason grade group $\geq$2 prostate cancer on MRI during the PI-CAI study. Of these examinations, 20,471 cases (19,278 patients; 26 cities in 14 countries) from two EU Horizon projects (ProCAncer-I, COMFORT) and 12 independent centers based in Europe, North America, Asia and Africa, are used for training and internal testing. Additionally, 2010 cases (2010 patients; 20 external cities in 12 countries) from population-based screening (STHLM3-MRI, IP1-PROSTAGRAM trials) and primary diagnostic settings (PRIME trial) based in Europe, North and South Americas, Asia and Australia, are used for external testing. Primary endpoint is the proportion of AI-based assessments in agreement with the standard of care diagnoses (i.e., clinical assessments made by expert uropathologists on histopathology, if available, or at least two expert urogenital radiologists in consensus; with access to patient history and peer consultation) in the detection of Gleason grade group $\geq$2 prostate cancer within the external testing cohorts. Our statistical analysis plan is prespecified with a hypothesis of diagnostic interchangeability to the standard of care at the PI-RADS $\geq$3 (primary diagnosis) or $\geq$4 (screening) cut-off, considering an absolute margin of 0.05 and reader estimates derived from the PI-CAI observer study (62 radiologists reading 400 cases). Secondary measures comprise the area under the receiver operating characteristic curve (AUROC) of the AI system stratified by imaging quality, patient age and patient ethnicity to identify underlying biases (if any).

MRI Detection Abdominal Retrospective Clinical In Silico Consortium Benchmark SOTA Open Dataset Ethics

Filter Papers

Tags

ClinicalFMamba: Advancing Clinical Assessment using Mamba-based Multimodal Neuroimaging Fusion

GRASPing Anatomy to Improve Pathology Segmentation

Integration of Spatiotemporal Dynamics and Structural Connectivity for Automated Epileptogenic Zone Localization in Temporal Lobe Epilepsy.

Prediction of breast cancer HER2 status changes based on ultrasound radiomics attention network.

MedCAL-Bench: A Comprehensive Benchmark on Cold-Start Active Learning with Foundation Models for Medical Image Analysis

S-RRG-Bench: Structured Radiology Report Generation with Fine-Grained Evaluation Framework

AI-Driven Integration of Deep Learning with Lung Imaging, Functional Analysis, and Blood Gas Metrics for Perioperative Hypoxemia Prediction: Progress and Perspectives.

Do Edges Matter? Investigating Edge-Enhanced Pre-Training for Medical Image Segmentation

Development and Validation of an Explainable MRI-Based Habitat Radiomics Model for Predicting p53-Abnormal Endometrial Cancer: A Multicentre Feasibility Study.

Scaling Artificial Intelligence for Prostate Cancer Detection on MRI towards Population-Based Screening and Primary Diagnosis in a Global, Multiethnic Population (Study Protocol)

Ready to Sharpen Your Edge?