Latest Papers on Radiology AI. Tags: Benchmark SOTA

From Concept to Code: AI- Powered CODE-ICH Transforming Acute Neurocritical Response for Hemorrhagic Strokes

salman, s., corro, r., menser, t., Sanghavi, D., kramer, c., moreno franco, p., Freeman, W. D.

•preprint•Oct 1 2025

BackgroundIntracerebral hemorrhage (ICH) is among the most devastating forms of stroke, characterized by high early mortality and limited time-sensitive treatment protocols compared to ischemic stroke. The absence of standardized emergency response frameworks and the shortcomings of conventional scoring systems highlight the urgent need for innovation in neurocritical care. ObjectiveThis paper introduces and evaluates the CODE-ICH framework, along with two AI-powered tools HEADS-UP and SAHVAI designed to transform acute ICH management through real-time detection, volumetric analysis, and predictive modeling. MethodsWe describe the development and implementation of HEADS-UP, a cloud-based AI system for early ICH detection in underserved populations, and SAHVAI, a convolutional neural network-based tool for subarachnoid hemorrhage volume quantification. These tools were integrated into a novel paging and workflow system at a comprehensive stroke center to facilitate ultra-early intervention. ResultsSAHVAI achieved 99.8% accuracy in volumetric analysis and provided 2D, 3D, and 4D visualization of hemorrhage progression. HEADS-UP enabled rapid triage and transfer, reducing reliance on subjective interpretation. Together, these tools operationalized the time is brain principle for hemorrhagic stroke and supported proactive, data-driven care in the neuro-intensive care unit (NICU). ConclusionCODE-ICH, HEADS-UP, and SAHVAI represent a paradigm shift in hemorrhagic stroke care, delivering scalable, explainable, and multimodal AI solutions that enhance clinical decision-making, minimize delays, and promote equitable access to neurocritical care.

CT Detection Neurological Methodology Clinical Pilot Academic Lab Benchmark SOTA GenAI

Advances in Medical Image Segmentation: A Comprehensive Survey with a Focus on Lumbar Spine Applications

Ahmed Kabil, Ghada Khoriba, Mina Yousef, Essam A. Rashed

•preprint•Oct 1 2025

Medical Image Segmentation (MIS) stands as a cornerstone in medical image analysis, playing a pivotal role in precise diagnostics, treatment planning, and monitoring of various medical conditions. This paper presents a comprehensive and systematic survey of MIS methodologies, bridging the gap between traditional image processing techniques and modern deep learning approaches. The survey encompasses thresholding, edge detection, region-based segmentation, clustering algorithms, and model-based techniques while also delving into state-of-the-art deep learning architectures such as Convolutional Neural Networks (CNNs), Fully Convolutional Networks (FCNs), and the widely adopted U-Net and its variants. Moreover, integrating attention mechanisms, semi-supervised learning, generative adversarial networks (GANs), and Transformer-based models is thoroughly explored. In addition to covering established methods, this survey highlights emerging trends, including hybrid architectures, cross-modality learning, federated and distributed learning frameworks, and active learning strategies, which aim to address challenges such as limited labeled datasets, computational complexity, and model generalizability across diverse imaging modalities. Furthermore, a specialized case study on lumbar spine segmentation is presented, offering insights into the challenges and advancements in this relatively underexplored anatomical region. Despite significant progress in the field, critical challenges persist, including dataset bias, domain adaptation, interpretability of deep learning models, and integration into real-world clinical workflows.

Mixed Modality Segmentation Musculoskeletal Review Benchmark SOTA Ethics

Artificial Intelligence Model for Imaging-Based Extranodal Extension Detection and Outcome Prediction in Human Papillomavirus-Positive Oropharyngeal Cancer.

Dayan GS, Hénique G, Bahig H, Nelson K, Brodeur C, Christopoulos A, Filion E, Nguyen-Tan PF, O'Sullivan B, Ayad T, Bissada E, Tabet P, Guertin L, Desilets A, Kadoury S, Letourneau-Guillon L

•papers•Sep 30 2025

Although not included in the eighth edition of the American Joint Committee on Cancer Staging System, there is growing evidence suggesting that imaging-based extranodal extension (iENE) is associated with worse outcomes in HPV-associated oropharyngeal carcinoma (OPC). Key challenges with iENE include the lack of standardized criteria, reliance on radiological expertise, and interreader variability. To develop an artificial intelligence (AI)-driven pipeline for lymph node segmentation and iENE classification using pretreatment computed tomography (CT) scans, and to evaluate its association with oncologic outcomes in HPV-positive OPC. This was a single-center cohort study conducted at a tertiary oncology center in Montreal, Canada, of adult patients with HPV-positive cN+ OPC treated with up-front (chemo)radiotherapy from January 2009 to January 2020. Participants were followed up until January 2024. Data analysis was performed from March 2024 to April 2025. Pretreatment planning CT scans along with lymph node gross tumor volume segmentations performed by expert radiation oncologists were extracted. For lymph node segmentation, an nnU-Net model was developed. For iENE classification, radiomic and deep learning feature extraction methods were compared. iENE classification accuracy was assessed against 2 expert neuroradiologist evaluations using area under the receiver operating characteristic curve (AUC). Subsequently, the association of AI-predicted iENE with oncologic outcomes-ie, overall survival (OS), recurrence-free survival (RFS), distant control (DC), and locoregional control (LRC)-was assessed. Among 397 patients (mean [SD] age, 62.3 [9.1] years; 80 females [20.2%] and 317 males [79.8%]), AI-iENE classification using radiomics achieved an AUC of 0.81. Patients with AI-predicted iENE had worse 3-year OS (83.8% vs 96.8%), RFS (80.7% vs 93.7%), and DC (84.3% vs 97.1%), but similar LRC. AI-iENE had significantly higher Concordance indices than radiologist-assessed iENE for OS (0.64 vs 0.55), RFS (0.67 vs 0.60), and DC (0.79 vs 0.68). In multivariable analysis, AI-iENE remained independently associated with OS (adjusted hazard ratio [aHR], 2.82; 95% CI, 1.21-6.57), RFS (aHR, 4.20; 95% CI, 1.93-9.11), and DC (aHR, 12.33; 95% CI, 4.15-36.67), adjusting for age, tumor category, node category, and number of lymph nodes. This single-center cohort study found that an AI-driven pipeline can successfully automate lymph node segmentation and iENE classification from pretreatment CT scans in HPV-associated OPC. Predicted iENE was independently associated with worse oncologic outcomes. External validation is required to assess generalizability and the potential for implementation in institutions without specialized imaging expertise.

CT Segmentation Retrospective Clinical In Silico Academic Lab Benchmark SOTA

A phase-aware Cross-Scale U-MAMba with uncertainty-aware segmentation and Switch Atrous Bifovea EfficientNetB7 classification of kidney lesion subtype.

Rmr SS, Mb S, R D, M T, P V

•papers•Sep 30 2025

Kidney lesion subtype identification is essential for precise diagnosis and personalized treatment planning. However, achieving reliable classification remains challenging due to factors such as inter-patient anatomical variability, incomplete multi-phase CT acquisitions, and ill-defined or overlapping lesion boundaries. In addition, genetic and ethnic morphological variations introduce inconsistent imaging patterns, reducing the generalizability of conventional deep learning models. To address these challenges, we introduce a unified framework called Phase-aware Cross-Scale U-MAMba and Switch Atrous Bifovea EfficientNet B7 (PCU-SABENet), which integrates multi-phase reconstruction, fine-grained lesion segmentation, and robust subtype classification. The PhaseGAN-3D synthesizes missing CT phases using binary mask-guided inter-phase priors, enabling complete four-phase reconstruction even under partial acquisition conditions. The PCU segmentation module combines Contextual Attention Blocks, Cross-Scale Skip Connections, and uncertainty-aware pseudo-labeling to delineate lesion boundaries with high anatomical fidelity. These enhancements help mitigate low contrast and intra-class ambiguity. For classification, SABENet employs Switch Atrous Convolution for multi-scale receptive field adaptation, Hierarchical Tree Pooling for structure-aware abstraction, and Bi-Fovea Self-Attention to emphasize fine lesion cues and global morphology. This configuration is particularly effective in addressing morphological diversity across patient populations. Experimental results show that the proposed model achieves state-of-the-art performance, with 99.3% classification accuracy, 94.8% Dice similarity, 89.3% IoU, 98.8% precision, 99.2% recall, a phase-consistency score of 0.94, and a subtype confidence deviation of 0.08. Moreover, the model generalizes well on external datasets (TCIA) with 98.6% accuracy and maintains efficient computational performance, requiring only 0.138 GFLOPs and 8.2 ms inference time. These outcomes confirm the model's robustness in phase-incomplete settings and its adaptability to diverse patient cohorts. The PCU-SABENet framework sets a new standard in kidney lesion subtype analysis, combining segmentation precision with clinically actionable classification, thus offering a powerful tool for enhancing diagnostic accuracy and decision-making in real-world renal cancer management.

CT Segmentation Abdominal Methodology In Silico Benchmark SOTA

Enhancing Microscopic Image Quality With DiffusionFormer and Crow Search Optimization.

Patel SC, Kamath RN, Murthy TSN, Subash K, Avanija J, Sangeetha M

•papers•Sep 30 2025

Medical Image plays a vital role in diagnosis, but noise in patient scans severely affects the accuracy and quality of images. Denoising methods are important to increase the clarity of these images, particularly in low-resource settings where current diagnostic roles are inaccessible. Pneumonia is a widespread disease that presents significant diagnostic challenges due to the high similarity between its various types and the lack of medical images for emerging variants. This study introduces a novel Diffusion with swin transformer-based Optimized Crow Search algorithm to increase the image's quality and reliability. This technique utilizes four datasets such as brain tumor MRI dataset, chest X-ray image, chest CT-scan image, and BUSI. The preprocessing steps involve conversion to grayscale, resizing, and normalization to improve image quality in medical image (MI) datasets. Gaussian noise is introduced to further enhance image quality. The method incorporates a diffusion process, swin transformer networks, and optimized crow search algorithm to improve the denoising of medical images. The diffusion process reduces noise by iteratively refining images while swin transformer captures complex image features that help differentiate between noise and essential diagnostic information. The crow search optimization algorithm fine-tunes the hyperparameters, which minimizes the fitness function for optimal denoising performance. The method is tested across four datasets, indicating its optimal effectiveness against other techniques. The proposed method achieves a peak signal-to-noise ratio of 38.47 dB, a structural similarity index measure of 98.14%, a mean squared error of 0.55, and a feature similarity index measure of 0.980, which outperforms existing techniques. These outcomes reflect that the proposed approach effectively enhances the quality of images, resulting in precise and dependable diagnoses.

Mixed Modality Reconstruction Methodology In Silico Academic Lab Benchmark SOTA

A Pretraining Approach for Small-sample Training Employing Radiographs (PASTER): a Multimodal Transformer Trained by Chest Radiography and Free-text Reports.

Chen KC, Kuo M, Lee CH, Liao HC, Tsai DJ, Lin SA, Hsiang CW, Chang CK, Ko KH, Hsu YC, Chang WC, Huang GS, Fang WH, Lin CS, Lin SH, Chen YH, Hung YJ, Tsai CS, Lin C

•papers•Sep 30 2025

While deep convolutional neural networks (DCNNs) have achieved remarkable performance in chest X-ray interpretation, their success typically depends on access to large-scale, expertly annotated datasets. However, collecting such data in real-world clinical settings can be difficult because of limited labeling resources, privacy concerns, and patient variability. In this study, we applied a multimodal Transformer pretrained on free-text reports and their paired CXRs to evaluate the effectiveness of this method in settings with limited labeled data. Our dataset consisted of more than 1 million CXRs, each accompanied by reports from board-certified radiologists and 31 structured labels. The results indicated that a linear model trained on embeddings from the pretrained model achieved AUCs of 0.907 and 0.903 on internal and external test sets, respectively, using only 128 cases and 384 controls; the results were comparable those of DenseNet trained on the entire dataset, whose AUCs were 0.908 and 0.903, respectively. Additionally, we demonstrated similar results by extending the application of this approach to a subset annotated with structured echocardiographic reports. Furthermore, this multimodal model exhibited excellent small sample learning capabilities when tested on external validation sets such as CheXpert and ChestX-ray14. This research significantly reduces the sample size necessary for future artificial intelligence advancements in CXR interpretation.

X-Ray Classification Chest Methodology In Silico Academic Lab Benchmark SOTA

Causally Guided Gaussian Perturbations for Out-Of-Distribution Generalization in Medical Imaging

Haoran Pei, Yuguang Yang, Kexin Liu, Baochang Zhang

•preprint•Sep 30 2025

Out-of-distribution (OOD) generalization remains a central challenge in deploying deep learning models to real-world scenarios, particularly in domains such as biomedical images, where distribution shifts are both subtle and pervasive. While existing methods often pursue domain invariance through complex generative models or adversarial training, these approaches may overlook the underlying causal mechanisms of generalization.In this work, we propose Causally-Guided Gaussian Perturbations (CGP)-a lightweight framework that enhances OOD generalization by injecting spatially varying noise into input images, guided by soft causal masks derived from Vision Transformers. By applying stronger perturbations to background regions and weaker ones to foreground areas, CGP encourages the model to rely on causally relevant features rather than spurious correlations.Experimental results on the challenging WILDS benchmark Camelyon17 demonstrate consistent performance gains over state-of-the-art OOD baselines, highlighting the potential of causal perturbation as a tool for reliable and interpretable generalization.

Mixed Modality Classification Methodology In Silico Academic Lab Benchmark SOTA

Dolphin v1.0 Technical Report

Taohan Weng, Chi zhang, Chaoran Yan, Siya Liu, Xiaoyang Liu, Yalun Wu, Boyang Wang, Boyan Wang, Jiren Ren, Kaiwen Yan, Jinze Yu, Kaibing Hu, Henan Liu, Haoyun zheng, Anjie Le, Hongcheng Guo

•preprint•Sep 30 2025

Ultrasound is crucial in modern medicine but faces challenges like operator dependence, image noise, and real-time scanning, hindering AI integration. While large multimodal models excel in other medical imaging areas, they struggle with ultrasound's complexities. To address this, we introduce Dolphin v1.0 (V1) and its reasoning-augmented version, Dolphin R1-the first large-scale multimodal ultrasound foundation models unifying diverse clinical tasks in a single vision-language framework.To tackle ultrasound variability and noise, we curated a 2-million-scale multimodal dataset, combining textbook knowledge, public data, synthetic samples, and general corpora. This ensures robust perception, generalization, and clinical adaptability.The Dolphin series employs a three-stage training strategy: domain-specialized pretraining, instruction-driven alignment, and reinforcement-based refinement. Dolphin v1.0 delivers reliable performance in classification, detection, regression, and report generation. Dolphin R1 enhances diagnostic inference, reasoning transparency, and interpretability through reinforcement learning with ultrasound-specific rewards.Evaluated on U2-Bench across eight ultrasound tasks, Dolphin R1 achieves a U2-score of 0.5835-over twice the second-best model (0.2968) setting a new state of the art. Dolphin v1.0 also performs competitively, validating the unified framework. Comparisons show reasoning-enhanced training significantly improves diagnostic accuracy, consistency, and interpretability, highlighting its importance for high-stakes medical AI.

Ultrasound LLM Radiology Report Methodology In Silico Academic Lab Benchmark SOTA Open Dataset GenAI

Transformer Classification of Breast Lesions: The BreastDCEDL_AMBL Benchmark Dataset and 0.92 AUC Baseline

Naomi Fridman, Anat Goldstein

•preprint•Sep 30 2025

The error is caused by special characters that arXiv's system doesn't recognize. Here's the cleaned version with all problematic characters replaced: Breast magnetic resonance imaging is a critical tool for cancer detection and treatment planning, but its clinical utility is hindered by poor specificity, leading to high false-positive rates and unnecessary biopsies. This study introduces a transformer-based framework for automated classification of breast lesions in dynamic contrast-enhanced MRI, addressing the challenge of distinguishing benign from malignant findings. We implemented a SegFormer architecture that achieved an AUC of 0.92 for lesion-level classification, with 100% sensitivity and 67% specificity at the patient level - potentially eliminating one-third of unnecessary biopsies without missing malignancies. The model quantifies malignant pixel distribution via semantic segmentation, producing interpretable spatial predictions that support clinical decision-making. To establish reproducible benchmarks, we curated BreastDCEDL_AMBL by transforming The Cancer Imaging Archive's AMBL collection into a standardized deep learning dataset with 88 patients and 133 annotated lesions (89 benign, 44 malignant). This resource addresses a key infrastructure gap, as existing public datasets lack benign lesion annotations, limiting benign-malignant classification research. Training incorporated an expanded cohort of over 1,200 patients through integration with BreastDCEDL datasets, validating transfer learning approaches despite primary tumor-only annotations. Public release of the dataset, models, and evaluation protocols provides the first standardized benchmark for DCE-MRI lesion classification, enabling methodological advancement toward clinical deployment.

MRI Classification Breast Dataset Release In Silico Academic Lab Open Dataset Open Code Benchmark SOTA

Multi-modal Liver Segmentation and Fibrosis Staging Using Real-world MRI Images

Yang Zhou, Kunhao Yuan, Ye Wei, Jishizhan Chen

•preprint•Sep 30 2025

Liver fibrosis represents the accumulation of excessive extracellular matrix caused by sustained hepatic injury. It disrupts normal lobular architecture and function, increasing the chances of cirrhosis and liver failure. Precise staging of fibrosis for early diagnosis and intervention is often invasive, which carries risks and complications. To address this challenge, recent advances in artificial intelligence-based liver segmentation and fibrosis staging offer a non-invasive alternative. As a result, the CARE 2025 Challenge aimed for automated methods to quantify and analyse liver fibrosis in real-world scenarios, using multi-centre, multi-modal, and multi-phase MRI data. This challenge included tasks of precise liver segmentation (LiSeg) and fibrosis staging (LiFS). In this study, we developed an automated pipeline for both tasks across all the provided MRI modalities. This pipeline integrates pseudo-labelling based on multi-modal co-registration, liver segmentation using deep neural networks, and liver fibrosis staging based on shape, textural, appearance, and directional (STAD) features derived from segmentation masks and MRI images. By solely using the released data with limited annotations, our proposed pipeline demonstrated excellent generalisability for all MRI modalities, achieving top-tier performance across all competition subtasks. This approach provides a rapid and reproducible framework for quantitative MRI-based liver fibrosis assessment, supporting early diagnosis and clinical decision-making. Code is available at https://github.com/YangForever/care2025_liver_biodreamer.

MRI Segmentation Abdominal Methodology In Silico Academic Lab Benchmark SOTA Open Code

Filter Papers

Tags

From Concept to Code: AI- Powered CODE-ICH Transforming Acute Neurocritical Response for Hemorrhagic Strokes

Advances in Medical Image Segmentation: A Comprehensive Survey with a Focus on Lumbar Spine Applications

Artificial Intelligence Model for Imaging-Based Extranodal Extension Detection and Outcome Prediction in Human Papillomavirus-Positive Oropharyngeal Cancer.

A phase-aware Cross-Scale U-MAMba with uncertainty-aware segmentation and Switch Atrous Bifovea EfficientNetB7 classification of kidney lesion subtype.

Enhancing Microscopic Image Quality With DiffusionFormer and Crow Search Optimization.

A Pretraining Approach for Small-sample Training Employing Radiographs (PASTER): a Multimodal Transformer Trained by Chest Radiography and Free-text Reports.

Causally Guided Gaussian Perturbations for Out-Of-Distribution Generalization in Medical Imaging

Dolphin v1.0 Technical Report

Transformer Classification of Breast Lesions: The BreastDCEDL_AMBL Benchmark Dataset and 0.92 AUC Baseline

Multi-modal Liver Segmentation and Fibrosis Staging Using Real-world MRI Images

Ready to Sharpen Your Edge?