Latest Papers on Radiology AI. Tags: Benchmark SOTA

Deep learning-driven modality imputation and subregion segmentation to enhance high-grade glioma grading.

Yu J, Liu Q, Xu C, Zhou Q, Xu J, Zhu L, Chen C, Zhou Y, Xiao B, Zheng L, Zhou X, Zhang F, Ye Y, Mi H, Zhang D, Yang L, Wu Z, Wang J, Chen M, Zhou Z, Wang H, Wang VY, Wang E, Xu D

•papers•May 30 2025

This study aims to develop a deep learning framework that leverages modality imputation and subregion segmentation to improve grading accuracy in high-grade gliomas. A retrospective analysis was conducted using data from 1,251 patients in the BraTS2021 dataset as the main cohort and 181 clinical cases collected from a medical center between April 2013 and June 2018 (51 years ± 17; 104 males) as the external test set. We propose a PatchGAN-based modality imputation network with an Aggregated Residual Transformer (ART) module combining Transformer self-attention and CNN feature extraction via residual links, paired with a U-Net variant for segmentation. Generative accuracy used PSNR and SSIM for modality conversions, while segmentation performance was measured with DSC and HD95 across necrotic core (NCR), edema (ED), and enhancing tumor (ET) regions. Senior radiologists conducted a comprehensive Likert-based assessment, with diagnostic accuracy evaluated by AUC. Statistical analysis was performed using the Wilcoxon signed-rank test and the DeLong test. The best source-target modality pairs for imputation were T1 to T1ce and T1ce to T2 (p < 0.001). In subregion segmentation, the overall DSC was 0.878 and HD95 was 19.491, with the ET region showing the highest segmentation accuracy (DSC: 0.877, HD95: 12.149). Clinical validation revealed an improvement in grading accuracy by the senior radiologist, with the AUC increasing from 0.718 to 0.913 (P < 0.001) when using the combined imputation and segmentation models. The proposed deep learning framework improves high-grade glioma grading by modality imputation and segmentation, aiding the senior radiologist and offering potential to advance clinical decision-making.

MRI Segmentation Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Deep learning reconstruction improves computer-aided pulmonary nodule detection and measurement accuracy for ultra-low-dose chest CT.

Wang J, Zhu Z, Pan Z, Tan W, Han W, Zhou Z, Hu G, Ma Z, Xu Y, Ying Z, Sui X, Jin Z, Song L, Song W

•papers•May 30 2025

To compare the image quality and pulmonary nodule detectability and measurement accuracy between deep learning reconstruction (DLR) and hybrid iterative reconstruction (HIR) of chest ultra-low-dose CT (ULDCT). Participants who underwent chest standard-dose CT (SDCT) followed by ULDCT from October 2020 to January 2022 were prospectively included. ULDCT images reconstructed with HIR and DLR were compared with SDCT images to evaluate image quality, nodule detection rate, and measurement accuracy using a commercially available deep learning-based nodule evaluation system. Wilcoxon signed-rank test was used to evaluate the percentage errors of nodule size and nodule volume between HIR and DLR images. Eighty-four participants (54 ± 13 years; 26 men) were finally enrolled. The effective radiation doses of ULDCT and SDCT were 0.16 ± 0.02 mSv and 1.77 ± 0.67 mSv, respectively (P < 0.001). The mean ± standard deviation of the lung tissue noises was 61.4 ± 3.0 HU for SDCT, 61.5 ± 2.8 HU and 55.1 ± 3.4 HU for ULDCT reconstructed with HIR-Strong setting (HIR-Str) and DLR-Strong setting (DLR-Str), respectively (P < 0.001). A total of 535 nodules were detected. The nodule detection rates of ULDCT HIR-Str and ULDCT DLR-Str were 74.0% and 83.4%, respectively (P < 0.001). The absolute percentage error in nodule volume from that of SDCT was 19.5% in ULDCT HIR-Str versus 17.9% in ULDCT DLR-Str (P < 0.001). Compared with HIR, DLR reduced image noise, increased nodule detection rate, and improved measurement accuracy of nodule volume at chest ULDCT. Not applicable.

CT Reconstruction Chest Prospective Clinical Pilot Consortium Benchmark SOTA

Machine learning-based hemodynamics quantitative assessment of pulmonary circulation using computed tomographic pulmonary angiography.

Xie H, Zhao X, Zhang N, Liu J, Yang G, Cao Y, Xu J, Xu L, Sun Z, Wen Z, Chai S, Liu D

•papers•May 30 2025

Pulmonary hypertension (pH) is a malignant pulmonary circulation disease. Right heart catheterization (RHC) is the gold standard procedure for quantitative evaluation of pulmonary hemodynamics. Accurate and noninvasive quantitative evaluation of pulmonary hemodynamics is challenging due to the limitations of currently available assessment methods. Patients who underwent computed tomographic pulmonary angiography (CTPA) and RHC examinations within 2 weeks were included. The dataset was randomly divided into a training set and a test set at an 8:2 ratio. A radiomic feature model and another two-dimensional (2D) feature model aimed to quantitatively evaluate of pulmonary hemodynamics were constructed. The performance of models was determined by calculating the mean squared error, the intraclass correlation coefficient (ICC) and the area under the precision-recall curve (AUC-PR) and performing Bland-Altman analyses. 345 patients: 271 patients with PH (mean age 50 ± 17 years, 93 men) and 74 without PH (mean age 55 ± 16 years, 26 men) were identified. The predictive results of pulmonary hemodynamics of radiomic feature model integrating 5 2D features and other 30 radiomic features were consistent with the results from RHC, and outperformed another 2D feature model. The radiomic feature model exhibited moderate to good reproducibility to predict pulmonary hemodynamic parameters (ICC reached 0.87). In addition, pH can be accurately identified based on a classification model (AUC-PR =0.99). This study provides a noninvasive method for comprehensively and quantitatively evaluating pulmonary hemodynamics using CTPA images, which has the potential to serve as an alternative to RHC, pending further validation.

CT Classification Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Deep learning enables fast and accurate quantification of MRI-guided near-infrared spectral tomography for breast cancer diagnosis.

Feng J, Tang Y, Lin S, Jiang S, Xu J, Zhang W, Geng M, Dang Y, Wei C, Li Z, Sun Z, Jia K, Pogue BW, Paulsen KD

•papers•May 29 2025

The utilization of magnetic resonance (MR) im-aging to guide near-infrared spectral tomography (NIRST) shows significant potential for improving the specificity and sensitivity of breast cancer diagnosis. However, the ef-ficiency and accuracy of NIRST image reconstruction have been limited by the complexities of light propagation mod-eling and MRI image segmentation. To address these chal-lenges, we developed and evaluated a deep learning-based approach for MR-guided 3D NIRST image reconstruction (DL-MRg-NIRST). Using a network trained on synthetic data, the DL-MRg-NIRST system reconstructed images from data acquired during 38 clinical imaging exams of pa-tients with breast abnormalities. Statistical analysis of the results demonstrated a sensitivity of 87.5%, a specificity of 92.9%, and a diagnostic accuracy of 89.5% in distinguishing pathologically defined benign from malignant lesions. Ad-ditionally, the combined use of MRI and DL-MRg-NIRST di-agnoses achieved an area under the receiver operating characteristic (ROC) curve of 0.98. Remarkably, the DL-MRg-NIRST image reconstruction process required only 1.4 seconds, significantly faster than state-of-the-art MR-guided NIRST methods.

Mixed Modality Reconstruction Breast Retrospective Clinical Clinical Pilot Academic Lab Benchmark SOTA

CT-denoimer: efficient contextual transformer network for low-dose CT denoising.

Zhang Y, Xu F, Zhang R, Guo Y, Wang H, Wei B, Ma F, Meng J, Liu J, Lu H, Chen Y

•papers•May 29 2025

Low-dose computed tomography (LDCT) effectively reduces radiation exposure to patients, but introduces severe noise artifacts that affect diagnostic accuracy. Recently, Transformer-based network architectures have been widely applied to LDCT image denoising, generally achieving superior results compared to traditional convolutional methods. However, these methods are often hindered by high computational costs and struggles in capturing complex local contextual features, which negatively impact denoising performance. In this work, we propose CT-Denoimer, an efficient CT Denoising Transformer network that captures both global correlations and intricate, spatially varying local contextual details in CT images, enabling the generation of high-quality images. The core of our framework is a Transformer module that consists of two key components: the Multi-Dconv head Transposed Attention (MDTA) and the Mixed Contextual Feed-forward Network (MCFN). The MDTA block captures global correlations in the image with linear computational complexity, while the MCFN block manages multi-scale local contextual information, both static and dynamic, through a series of Enhanced Contextual Transformer (eCoT) modules. In addition, we incorporate Operation-Wise Attention Layers (OWALs) to enable collaborative refinement in the proposed CT-Denoimer, enhancing its ability to more effectively handle complex and varying noise patterns in LDCT images. Extensive experimental validation on both the AAPM-Mayo public dataset and a real-world clinical dataset demonstrated the state-of-the-art performance of the proposed CT-Denoimer. It achieved a peak signal-to-noise ratio (PSNR) of 33.681 dB, a structural similarity index measure (SSIM) of 0.921, an information fidelity criterion (IFC) of 2.857 and a visual information fidelity (VIF) of 0.349. Subjective assessment by radiologists gave an average score of 4.39, confirming its clinical applicability and clear advantages over existing methods. This study presents an innovative CT denoising Transformer network that sets a new benchmark in LDCT image denoising, excelling in both noise reduction and fine structure preservation.

CT Reconstruction Methodology In Silico Academic Lab Benchmark SOTA

Super-temporal-resolution Photoacoustic Imaging with Dynamic Reconstruction through Implicit Neural Representation in Sparse-view

Youshen Xiao, Yiling Shi, Ruixi Sun, Hongjiang Wei, Fei Gao, Yuyao Zhang

•preprint•May 29 2025

Dynamic Photoacoustic Computed Tomography (PACT) is an important imaging technique for monitoring physiological processes, capable of providing high-contrast images of optical absorption at much greater depths than traditional optical imaging methods. However, practical instrumentation and geometric constraints limit the number of acoustic sensors available around the imaging target, leading to sparsity in sensor data. Traditional photoacoustic (PA) image reconstruction methods, when directly applied to sparse PA data, produce severe artifacts. Additionally, these traditional methods do not consider the inter-frame relationships in dynamic imaging. Temporal resolution is crucial for dynamic photoacoustic imaging, which is fundamentally limited by the low repetition rate (e.g., 20 Hz) and high cost of high-power laser technology. Recently, Implicit Neural Representation (INR) has emerged as a powerful deep learning tool for solving inverse problems with sparse data, by characterizing signal properties as continuous functions of their coordinates in an unsupervised manner. In this work, we propose an INR-based method to improve dynamic photoacoustic image reconstruction from sparse-views and enhance temporal resolution, using only spatiotemporal coordinates as input. Specifically, the proposed INR represents dynamic photoacoustic images as implicit functions and encodes them into a neural network. The weights of the network are learned solely from the acquired sparse sensor data, without the need for external training datasets or prior images. Benefiting from the strong implicit continuity regularization provided by INR, as well as explicit regularization for low-rank and sparsity, our proposed method outperforms traditional reconstruction methods under two different sparsity conditions, effectively suppressing artifacts and ensuring image quality.

OCT Reconstruction Methodology In Silico Academic Lab Benchmark SOTA

Interpreting Chest X-rays Like a Radiologist: A Benchmark with Clinical Reasoning

Jinquan Guan, Qi Chen, Lizhou Liang, Yuhang Liu, Vu Minh Hieu Phan, Minh-Son To, Jian Chen, Yutong Xie

•preprint•May 29 2025

Artificial intelligence (AI)-based chest X-ray (CXR) interpretation assistants have demonstrated significant progress and are increasingly being applied in clinical settings. However, contemporary medical AI models often adhere to a simplistic input-to-output paradigm, directly processing an image and an instruction to generate a result, where the instructions may be integral to the model's architecture. This approach overlooks the modeling of the inherent diagnostic reasoning in chest X-ray interpretation. Such reasoning is typically sequential, where each interpretive stage considers the images, the current task, and the contextual information from previous stages. This oversight leads to several shortcomings, including misalignment with clinical scenarios, contextless reasoning, and untraceable errors. To fill this gap, we construct CXRTrek, a new multi-stage visual question answering (VQA) dataset for CXR interpretation. The dataset is designed to explicitly simulate the diagnostic reasoning process employed by radiologists in real-world clinical settings for the first time. CXRTrek covers 8 sequential diagnostic stages, comprising 428,966 samples and over 11 million question-answer (Q&A) pairs, with an average of 26.29 Q&A pairs per sample. Building on the CXRTrek dataset, we propose a new vision-language large model (VLLM), CXRTrekNet, specifically designed to incorporate the clinical reasoning flow into the VLLM framework. CXRTrekNet effectively models the dependencies between diagnostic stages and captures reasoning patterns within the radiological context. Trained on our dataset, the model consistently outperforms existing medical VLLMs on the CXRTrek benchmarks and demonstrates superior generalization across multiple tasks on five diverse external datasets. The dataset and model can be found in our repository (https://github.com/guanjinquan/CXRTrek).

X-Ray LLM Radiology Report Chest Dataset Release In Silico Academic Lab Open Dataset Open Code Benchmark SOTA GenAI

Image Aesthetic Reasoning: A New Benchmark for Medical Image Screening with MLLMs

Zheng Sun, Yi Wei, Long Yu

•preprint•May 29 2025

Multimodal Large Language Models (MLLMs) are of great application across many domains, such as multimodal understanding and generation. With the development of diffusion models (DM) and unified MLLMs, the performance of image generation has been significantly improved, however, the study of image screening is rare and its performance with MLLMs is unsatisfactory due to the lack of data and the week image aesthetic reasoning ability in MLLMs. In this work, we propose a complete solution to address these problems in terms of data and methodology. For data, we collect a comprehensive medical image screening dataset with 1500+ samples, each sample consists of a medical image, four generated images, and a multiple-choice answer. The dataset evaluates the aesthetic reasoning ability under four aspects: \textit{(1) Appearance Deformation, (2) Principles of Physical Lighting and Shadow, (3) Placement Layout, (4) Extension Rationality}. For methodology, we utilize long chains of thought (CoT) and Group Relative Policy Optimization with Dynamic Proportional Accuracy reward, called DPA-GRPO, to enhance the image aesthetic reasoning ability of MLLMs. Our experimental results reveal that even state-of-the-art closed-source MLLMs, such as GPT-4o and Qwen-VL-Max, exhibit performance akin to random guessing in image aesthetic reasoning. In contrast, by leveraging the reinforcement learning approach, we are able to surpass the score of both large-scale models and leading closed-source models using a much smaller model. We hope our attempt on medical image screening will serve as a regular configuration in image aesthetic reasoning in the future.

Classification Dataset Release In Silico Academic Lab Benchmark SOTA Open Dataset

Deep Learning-Based Breast Cancer Detection in Mammography: A Multi-Center Validation Study in Thai Population

Isarun Chamveha, Supphanut Chaiyungyuen, Sasinun Worakriangkrai, Nattawadee Prasawang, Warasinee Chaisangmongkon, Pornpim Korpraphong, Voraparee Suvannarerg, Shanigarn Thiravit, Chalermdej Kannawat, Kewalin Rungsinaporn, Suwara Issaragrisil, Payia Chadbunchachai, Pattiya Gatechumpol, Chawiporn Muktabhant, Patarachai Sereerat

•preprint•May 29 2025

This study presents a deep learning system for breast cancer detection in mammography, developed using a modified EfficientNetV2 architecture with enhanced attention mechanisms. The model was trained on mammograms from a major Thai medical center and validated on three distinct datasets: an in-domain test set (9,421 cases), a biopsy-confirmed set (883 cases), and an out-of-domain generalizability set (761 cases) collected from two different hospitals. For cancer detection, the model achieved AUROCs of 0.89, 0.96, and 0.94 on the respective datasets. The system's lesion localization capability, evaluated using metrics including Lesion Localization Fraction (LLF) and Non-Lesion Localization Fraction (NLF), demonstrated robust performance in identifying suspicious regions. Clinical validation through concordance tests showed strong agreement with radiologists: 83.5% classification and 84.0% localization concordance for biopsy-confirmed cases, and 78.1% classification and 79.6% localization concordance for out-of-domain cases. Expert radiologists' acceptance rate also averaged 96.7% for biopsy-confirmed cases, and 89.3% for out-of-domain cases. The system achieved a System Usability Scale score of 74.17 for source hospital, and 69.20 for validation hospitals, indicating good clinical acceptance. These results demonstrate the model's effectiveness in assisting mammogram interpretation, with the potential to enhance breast cancer screening workflows in clinical practice.

Mammography Detection Breast Retrospective Clinical Clinical Pilot Academic Lab Benchmark SOTA

Integrating SEResNet101 and SE-VGG19 for advanced cervical lesion detection: a step forward in precision oncology.

Ye Y, Chen Y, Pan J, Li P, Ni F, He H

•papers•May 28 2025

Cervical cancer remains a significant global health issue, with accurate differentiation between low-grade (LSIL) and high-grade squamous intraepithelial lesions (HSIL) crucial for effective screening and management. Current methods, such as Pap smears and HPV testing, often fall short in sensitivity and specificity. Deep learning models hold the potential to enhance the accuracy of cervical cancer screening but require thorough evaluation to ascertain their practical utility. This study compares the performance of two advanced deep learning models, SEResNet101 and SE-VGG19, in classifying cervical lesions using a dataset of 3,305 high-quality colposcopy images. We assessed the models based on their accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC). The SEResNet101 model demonstrated superior performance over SE-VGG19 across all evaluated metrics. Specifically, SEResNet101 achieved a sensitivity of 95%, a specificity of 97%, and an AUC of 0.98, compared to 89% sensitivity, 93% specificity, and an AUC of 0.94 for SE-VGG19. These findings suggest that SEResNet101 could significantly reduce both over- and under-treatment rates by enhancing diagnostic precision. Our results indicate that SEResNet101 offers a promising enhancement over existing screening methods, integrating advanced deep learning algorithms to significantly improve the precision of cervical lesion classification. This study advocates for the inclusion of SEResNet101 in clinical workflows to enhance cervical cancer screening protocols, thereby improving patient outcomes. Future work should focus on multicentric trials to validate these findings and facilitate widespread clinical adoption.

Mixed Modality Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Filter Papers

Tags

Deep learning-driven modality imputation and subregion segmentation to enhance high-grade glioma grading.

Deep learning reconstruction improves computer-aided pulmonary nodule detection and measurement accuracy for ultra-low-dose chest CT.

Machine learning-based hemodynamics quantitative assessment of pulmonary circulation using computed tomographic pulmonary angiography.

Deep learning enables fast and accurate quantification of MRI-guided near-infrared spectral tomography for breast cancer diagnosis.

CT-denoimer: efficient contextual transformer network for low-dose CT denoising.

Super-temporal-resolution Photoacoustic Imaging with Dynamic Reconstruction through Implicit Neural Representation in Sparse-view

Interpreting Chest X-rays Like a Radiologist: A Benchmark with Clinical Reasoning

Image Aesthetic Reasoning: A New Benchmark for Medical Image Screening with MLLMs

Deep Learning-Based Breast Cancer Detection in Mammography: A Multi-Center Validation Study in Thai Population

Integrating SEResNet101 and SE-VGG19 for advanced cervical lesion detection: a step forward in precision oncology.

Ready to Sharpen Your Edge?