Latest Papers on Radiology AI. Tags: Benchmark SOTA

Pathology-Guided AI System for Accurate Segmentation and Diagnosis of Cervical Spondylosis.

Zhang Q, Chen X, He Z, Wu L, Wang K, Sun J, Shen H

•papers•Aug 13 2025

Cervical spondylosis, a complex and prevalent condition, demands precise and efficient diagnostic techniques for accurate assessment. While MRI offers detailed visualization of cervical spine anatomy, manual interpretation remains labor-intensive and prone to error. To address this, we developed an innovative AI-assisted Expert-based Diagnosis System that automates both segmentation and diagnosis of cervical spondylosis using MRI. Leveraging multi-center datasets of cervical MRI images from patients with cervical spondylosis, our system features a pathology-guided segmentation model capable of accurately segmenting key cervical anatomical structures. The segmentation is followed by an expert-based diagnostic framework that automates the calculation of critical clinical indicators. Our segmentation model achieved an impressive average Dice coefficient exceeding 0.90 across four cervical spinal anatomies and demonstrated enhanced accuracy in herniation areas. Diagnostic evaluation further showcased the system's precision, with the lowest mean average errors (MAE) for the C2-C7 Cobb angle and the Maximum Spinal Cord Compression (MSCC) coefficient. In addition, our method delivered high accuracy, precision, recall, and F1 scores in herniation localization, K-line status assessment, T2 hyperintensity detection, and Kang grading. Comparative analysis and external validation demonstrate that our system outperforms existing methods, establishing a new benchmark for segmentation and diagnostic tasks for cervical spondylosis.

MRI Segmentation Musculoskeletal Retrospective Clinical In Silico Benchmark SOTA

ES-UNet: efficient 3D medical image segmentation with enhanced skip connections in 3D UNet.

Park M, Oh S, Park J, Jeong T, Yu S

•papers•Aug 13 2025

Deep learning has significantly advanced medical image analysis, particularly in semantic segmentation, which is essential for clinical decisions. However, existing 3D segmentation models, like the traditional 3D UNet, face challenges in balancing computational efficiency and accuracy when processing volumetric medical data. This study aims to develop an improved architecture for 3D medical image segmentation with enhanced learning strategies to improve accuracy and address challenges related to limited training data. We propose ES-UNet, a 3D segmentation architecture that achieves superior segmentation performance while offering competitive efficiency across multiple computational metrics, including memory usage, inference time, and parameter count. The model builds upon the full-scale skip connection design of UNet3+ by integrating channel attention modules into each encoder-to-decoder path and incorporating full-scale deep supervision to enhance multi-resolution feature learning. We further introduce Region Specific Scaling (RSS), a data augmentation method that adaptively applies geometric transformations to annotated regions, and a Dynamically Weighted Dice (DWD) loss to improve the balance between precision and recall. The model was evaluated on the MICCAI HECKTOR dataset, and additional validation was conducted on selected tasks from the Medical Segmentation Decathlon (MSD). On the HECKTOR dataset, ES-UNet achieved a Dice Similarity Coefficient (DSC) of 76.87%, outperforming baseline models including 3D UNet, 3D UNet 3+, nnUNet, and Swin UNETR. Ablation studies showed that RSS and DWD contributed up to 1.22% and 1.06% improvement in DSC, respectively. A sensitivity analysis demonstrated that the chosen scaling range in RSS offered a favorable trade-off between deformation and anatomical plausibility. Cross-dataset evaluation on MSD Heart and Spleen tasks also indicated strong generalization. Computational analysis revealed that ES-UNet achieves superior segmentation performance with moderate computational demands. Specifically, the enhanced skip connection design with lightweight channel attention modules integrated throughout the network architecture enables this favorable balance between high segmentation accuracy and computational efficiency. ES-UNet integrates architectural and algorithmic improvements to achieve robust 3D medical image segmentation. While the framework incorporates established components, its core contributions lie in the optimized skip connection strategy and supporting techniques like RSS and DWD. Future work will explore adaptive scaling strategies and broader validation across diverse imaging modalities.

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA

An optimized multi-task contrastive learning framework for HIFU lesion detection and segmentation.

Zavar M, Ghaffari HR, Tabatabaee H

•papers•Aug 13 2025

Accurate detection and segmentation of lesions induced by High-Intensity Focused Ultrasound (HIFU) in medical imaging remain significant challenges in automated disease diagnosis. Traditional methods heavily rely on labeled data, which is often scarce, expensive, and time-consuming to obtain. Moreover, existing approaches frequently struggle with variations in medical data and the limited availability of annotated datasets, leading to suboptimal performance. To address these challenges, this paper introduces an innovative framework called the Optimized Multi-Task Contrastive Learning Framework (OMCLF), which leverages self-supervised learning (SSL) and genetic algorithms (GA) to enhance HIFU lesion detection and segmentation. OMCLF integrates classification and segmentation into a unified model, utilizing a shared backbone to extract common features. The framework systematically optimizes feature representations, hyperparameters, and data augmentation strategies tailored for medical imaging, ensuring that critical information, such as lesion details, is preserved. By employing a genetic algorithm, OMCLF explores and optimizes augmentation techniques suitable for medical data, avoiding distortions that could compromise diagnostic accuracy. Experimental results demonstrate that OMCLF outperforms single-task methods in both classification and segmentation tasks while significantly reducing dependency on labeled data. Specifically, OMCLF achieves an accuracy of 93.3% in lesion detection and a Dice score of 92.5% in segmentation, surpassing state-of-the-art methods such as SimCLR and MoCo. The proposed approach achieves superior accuracy in identifying and delineating HIFU-induced lesions, marking a substantial advancement in medical image interpretation and automated diagnosis. OMCLF represents a significant step forward in the evolutionary optimization of self-supervised learning, with potential applications across various medical imaging domains.

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA

A stacking ensemble framework integrating radiomics and deep learning for prognostic prediction in head and neck cancer.

Wang B, Liu J, Zhang X, Lin J, Li S, Wang Z, Cao Z, Wen D, Liu T, Ramli HRH, Harith HH, Hasan WZW, Dong X

•papers•Aug 13 2025

Radiomics models frequently face challenges related to reproducibility and robustness. To address these issues, we propose a multimodal, multi-model fusion framework utilizing stacking ensemble learning for prognostic prediction in head and neck cancer (HNC). This approach seeks to improve the accuracy and reliability of survival predictions. A total of 806 cases from nine centers were collected; 143 cases from two centers were assigned as the external validation cohort, while the remaining 663 were stratified and randomly split into training (n = 530) and internal validation (n = 133) sets. Radiomics features were extracted according to IBSI standards, and deep learning features were obtained using a 3D DenseNet-121 model. Following feature selection, the selected features were input into Cox, SVM, RSF, DeepCox, and DeepSurv models. A stacking fusion strategy was employed to develop the prognostic model. Model performance was evaluated using Kaplan-Meier survival curves and time-dependent ROC curves. On the external validation set, the model using combined PET and CT radiomics features achieved superior performance compared to single-modality models, with the RSF model obtaining the highest concordance index (C-index) of 0.7302. When using deep features extracted by 3D DenseNet-121, the PET + CT-based models demonstrated significantly improved prognostic accuracy, with Deepsurv and DeepCox achieving C-indices of 0.9217 and 0.9208, respectively. In stacking models, the PET + CT model using only radiomics features reached a C-index of 0.7324, while the deep feature-based stacking model achieved 0.9319. The best performance was obtained by the multi-feature fusion model, which integrated both radiomics and deep learning features from PET and CT, yielding a C-index of 0.9345. Kaplan-Meier survival analysis further confirmed the fusion model's ability to distinguish between high-risk and low-risk groups. The stacking-based ensemble model demonstrates superior performance compared to individual machine learning models, markedly improving the robustness of prognostic predictions.

Mixed Modality Classification Retrospective Clinical In Silico Academic Lab Benchmark SOTA

PPEA: Personalized positioning and exposure assistant based on multi-task shared pose estimation transformer.

Zhao J, Liu J, Yang C, Tang H, Chen Y, Zhang Y

•papers•Aug 13 2025

Hand and foot digital radiography (DR) is an indispensable tool in medical imaging, with varying diagnostic requirements necessitating different hand and foot positionings. Accurate positioning is crucial for obtaining diagnostically valuable images. Furthermore, adjusting exposure parameters such as exposure area based on patient conditions helps minimize the likelihood of image retakes. We propose a personalized positioning and exposure assistant capable of automatically recognizing hand and foot positionings and recommending appropriate exposure parameters to achieve these objectives. The assistant comprises three modules: (1) Progressive Iterative Hand-Foot Tracker (PIHFT) to iteratively locate hands or feet in RGB images, providing the foundation for accurate pose estimation; (2) Multi-Task Shared Pose Estimation Transformer (MTSPET), a Transformer-based model that encompasses hand and foot estimation branches with similar network architectures, sharing a common backbone. MTSPET outperformed MediaPipe in the hand pose estimation task and successfully transferred this capability to the foot pose estimation task; (3) Domain Expertise-embedded Positioning and Exposure Assistant (DEPEA), which combines the key-point coordinates of hands and feet with specific positioning and exposure parameter requirements, capable of checking patient positioning and inferring exposure areas and Regions of Interest (ROIs) of Digital Automatic Exposure Control (DAEC). Additionally, two datasets were collected and used to train MTSPET. A preliminary clinical trial showed strong agreement between PPEA's outputs and manual annotations, indicating the system's effectiveness in typical clinical scenarios. The contributions of this study lay the foundation for personalized, patient-specific imaging strategies, ultimately enhancing diagnostic outcomes and minimizing the risk of errors in clinical settings.

X-Ray Detection Musculoskeletal Retrospective Clinical Clinical Pilot Academic Lab Benchmark SOTA

BSA-Net: Boundary-prioritized spatial adaptive network for efficient left atrial segmentation.

Xu F, Tu W, Feng F, Yang J, Gunawardhana M, Gu Y, Huang J, Zhao J

•papers•Aug 13 2025

Atrial fibrillation, a common cardiac arrhythmia with rapid and irregular atrial electrical activity, requires accurate left atrial segmentation for effective treatment planning. Recently, deep learning methods have gained encouraging success in left atrial segmentation. However, current methodologies critically depend on the assumption of consistently complete centered left atrium as input, which neglects the structural incompleteness and boundary discontinuities arising from random-crop operations during inference. In this paper, we propose BSA-Net, which exploits an adaptive adjustment strategy in both feature position and loss optimization to establish long-range feature relationships and strengthen robust intermediate feature representations in boundary regions. Specifically, we propose a Spatial-adaptive Convolution (SConv) that employs a shuffle operation combined with lightweight convolution to directly establish cross-positional relationships within regions of potential relevance. Moreover, we develop the dual Boundary Prioritized loss, which enhances boundary precision by differentially weighting foreground and background boundaries, thus optimizing complex boundary regions. With the above technologies, the proposed method enjoys a better speed-accuracy trade-off compared to current methods. BSA-Net attains Dice scores of 92.55%, 91.42%, and 84.67% on the LA, Utah, and Waikato datasets, respectively, with a mere 2.16 M parameters-approximately 80% fewer than other contemporary state-of-the-art models. Extensive experimental results on three benchmark datasets have demonstrated that BSA-Net, consistently and significantly outperforms existing state-of-the-art methods.

CT Segmentation Cardiac Methodology In Silico Academic Lab Benchmark SOTA

A Chain of Diagnosis Framework for Accurate and Explainable Radiology Report Generation

Haibo Jin, Haoxuan Che, Sunan He, Hao Chen

•preprint•Aug 13 2025

Despite the progress of radiology report generation (RRG), existing works face two challenges: 1) The performances in clinical efficacy are unsatisfactory, especially for lesion attributes description; 2) the generated text lacks explainability, making it difficult for radiologists to trust the results. To address the challenges, we focus on a trustworthy RRG model, which not only generates accurate descriptions of abnormalities, but also provides basis of its predictions. To this end, we propose a framework named chain of diagnosis (CoD), which maintains a chain of diagnostic process for clinically accurate and explainable RRG. It first generates question-answer (QA) pairs via diagnostic conversation to extract key findings, then prompts a large language model with QA diagnoses for accurate generation. To enhance explainability, a diagnosis grounding module is designed to match QA diagnoses and generated sentences, where the diagnoses act as a reference. Moreover, a lesion grounding module is designed to locate abnormalities in the image, further improving the working efficiency of radiologists. To facilitate label-efficient training, we propose an omni-supervised learning strategy with clinical consistency to leverage various types of annotations from different datasets. Our efforts lead to 1) an omni-labeled RRG dataset with QA pairs and lesion boxes; 2) a evaluation tool for assessing the accuracy of reports in describing lesion location and severity; 3) extensive experiments to demonstrate the effectiveness of CoD, where it outperforms both specialist and generalist models consistently on two RRG benchmarks and shows promising explainability by accurately grounding generated sentences to QA diagnoses and images.

Mixed Modality Report Generation Methodology In Silico Academic Lab Open Dataset Benchmark SOTA GenAI

Quest for a clinically relevant medical image segmentation metric: the definition and implementation of Medical Similarity Index

Szuzina Fazekas, Bettina Katalin Budai, Viktor Bérczi, Pál Maurovich-Horvat, Zsolt Vizi

•preprint•Aug 13 2025

Background: In the field of radiology and radiotherapy, accurate delineation of tissues and organs plays a crucial role in both diagnostics and therapeutics. While the gold standard remains expert-driven manual segmentation, many automatic segmentation methods are emerging. The evaluation of these methods primarily relies on traditional metrics that only incorporate geometrical properties and fail to adapt to various applications. Aims: This study aims to develop and implement a clinically relevant segmentation metric that can be adapted for use in various medical imaging applications. Methods: Bidirectional local distance was defined, and the points of the test contour were paired with points of the reference contour. After correcting for the distance between the test and reference center of mass, Euclidean distance was calculated between the paired points, and a score was given to each test point. The overall medical similarity index was calculated as the average score across all the test points. For demonstration, we used myoma and prostate datasets; nnUNet neural networks were trained for segmentation. Results: An easy-to-use, sustainable image processing pipeline was created using Python. The code is available in a public GitHub repository along with Google Colaboratory notebooks. The algorithm can handle multislice images with multiple masks per slice. Mask splitting algorithm is also provided that can separate the concave masks. We demonstrate the adaptability with prostate segmentation evaluation. Conclusions: A novel segmentation evaluation metric was implemented, and an open-access image processing pipeline was also provided, which can be easily used for automatic measurement of clinical relevance of medical image segmentation.}

Mixed Modality Segmentation Methodology In Silico Academic Lab Open Code Benchmark SOTA

AST-n: A Fast Sampling Approach for Low-Dose CT Reconstruction using Diffusion Models

Tomás de la Sotta, José M. Saavedra, Héctor Henríquez, Violeta Chang, Aline Xavier

•preprint•Aug 13 2025

Low-dose CT (LDCT) protocols reduce radiation exposure but increase image noise, compromising diagnostic confidence. Diffusion-based generative models have shown promise for LDCT denoising by learning image priors and performing iterative refinement. In this work, we introduce AST-n, an accelerated inference framework that initiates reverse diffusion from intermediate noise levels, and integrate high-order ODE solvers within conditioned models to further reduce sampling steps. We evaluate two acceleration paradigms--AST-n sampling and standard scheduling with high-order solvers -- on the Low Dose CT Grand Challenge dataset, covering head, abdominal, and chest scans at 10-25 % of standard dose. Conditioned models using only 25 steps (AST-25) achieve peak signal-to-noise ratio (PSNR) above 38 dB and structural similarity index (SSIM) above 0.95, closely matching standard baselines while cutting inference time from ~16 seg to under 1 seg per slice. Unconditional sampling suffers substantial quality loss, underscoring the necessity of conditioning. We also assess DDIM inversion, which yields marginal PSNR gains at the cost of doubling inference time, limiting its clinical practicality. Our results demonstrate that AST-n with high-order samplers enables rapid LDCT reconstruction without significant loss of image fidelity, advancing the feasibility of diffusion-based methods in clinical workflows.

CT Reconstruction Methodology In Silico Academic Lab Benchmark SOTA

Exploring the robustness of TractOracle methods in RL-based tractography.

Levesque J, Théberge A, Descoteaux M, Jodoin PM

•papers•Aug 13 2025

Tractography algorithms leverage diffusion MRI to reconstruct the fibrous architecture of the brain's white matter. Among machine learning approaches, reinforcement learning (RL) has emerged as a promising framework for tractography, outperforming traditional methods in several key aspects. TractOracle-RL, a recent RL-based approach, reduces false positives by incorporating anatomical priors into the training process via a reward-based mechanism. In this paper, we investigate four extensions of the original TractOracle-RL framework by integrating recent advances in RL, and we evaluate their performance across five diverse diffusion MRI datasets. Results demonstrate that combining an oracle with the RL framework consistently leads to robust and reliable tractography, regardless of the specific method or dataset used. We also introduce a novel RL training scheme called Iterative Reward Training (IRT), inspired by the Reinforcement Learning from Human Feedback (RLHF) paradigm. Instead of relying on human input, IRT leverages bundle filtering methods to iteratively refine the oracle's guidance throughout training. Experimental results show that RL methods trained with oracle feedback significantly outperform widely used tractography techniques in terms of accuracy and anatomical validity.

MRI Segmentation Neurological Methodology In Silico Benchmark SOTA

Filter Papers

Tags

Pathology-Guided AI System for Accurate Segmentation and Diagnosis of Cervical Spondylosis.

ES-UNet: efficient 3D medical image segmentation with enhanced skip connections in 3D UNet.

An optimized multi-task contrastive learning framework for HIFU lesion detection and segmentation.

A stacking ensemble framework integrating radiomics and deep learning for prognostic prediction in head and neck cancer.

PPEA: Personalized positioning and exposure assistant based on multi-task shared pose estimation transformer.

BSA-Net: Boundary-prioritized spatial adaptive network for efficient left atrial segmentation.

A Chain of Diagnosis Framework for Accurate and Explainable Radiology Report Generation

Quest for a clinically relevant medical image segmentation metric: the definition and implementation of Medical Similarity Index

AST-n: A Fast Sampling Approach for Low-Dose CT Reconstruction using Diffusion Models

Exploring the robustness of TractOracle methods in RL-based tractography.

Ready to Sharpen Your Edge?