Latest Papers on Radiology AI. Tags: Benchmark SOTA, Order: Best Match, Limit: 10.

Deep learning aging marker from retinal images unveils sex-specific clinical and genetic signatures

Trofimova, O., Böttger, L., Bors, S., Pan, Y., Liefers, B., Beyeler, M. J., Presby, D. M., Bontempi, D., Hastings, J., Klaver, C. C. W., Bergmann, S.

•preprint•Jul 29 2025

Retinal fundus images offer a non-invasive window into systemic aging. Here, we fine-tuned a foundation model (RETFound) to predict chronological age from color fundus images in 71,343 participants from the UK Biobank, achieving a mean absolute error of 2.85 years. The resulting retinal age gap (RAG), i.e., the difference between predicted and chronological age, was associated with cardiometabolic traits, inflammation, cognitive performance, mortality, dementia, cancer, and incident cardiovascular disease. Genome-wide analyses identified genes related to longevity, metabolism, neurodegeneration, and age-related eye diseases. Sex-stratified models revealed consistent performance but divergent biological signatures: males had younger-appearing retinas and stronger links to metabolic syndrome, while in females, both model attention and genetic associations pointed to a greater involvement of retinal vasculature. Our study positions retinal aging as a biologically meaningful and sex-sensitive biomarker that can support more personalized approaches to risk assessment and aging-related healthcare.

OCT Registration Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Multi-Faceted Consistency learning with active cross-labeling for barely-supervised 3D medical image segmentation.

Wu X, Xu Z, Tong RK

•papers•Jul 29 2025

Deep learning-driven 3D medical image segmentation generally necessitates dense voxel-wise annotations, which are expensive and labor-intensive to acquire. Cross-annotation, which labels only a few orthogonal slices per scan, has recently emerged as a cost-effective alternative that better preserves the shape and precise boundaries of the 3D object than traditional weak labeling methods such as bounding boxes and scribbles. However, learning from such sparse labels, referred to as barely-supervised learning (BSL), remains challenging due to less fine-grained object perception, less compact class features and inferior generalizability. To tackle these challenges and foster collaboration between model training and human expertise, we propose a Multi-Faceted ConSistency learning (MF-ConS) framework with a Diversity and Uncertainty Sampling-based Active Learning (DUS-AL) strategy, specifically designed for the active BSL scenario. This framework combines a cross-annotation BSL strategy, where only three orthogonal slices are labeled per scan, with an AL paradigm guided by DUS to direct human-in-the-loop annotation toward the most informative volumes under a fixed budget. Built upon a teacher-student architecture, MF-ConS integrates three complementary consistency regularization modules: (i) neighbor-informed object prediction consistency for advancing fine-grained object perception by encouraging the student model to infer complete segmentation from masked inputs; (ii) prototype-driven consistency, which enhances intra-class compactness and discriminativeness by aligning latent feature and decision spaces using fused prototypes; and (iii) stability constraint that promotes model robustness against input perturbations. Extensive experiments on three benchmark datasets demonstrate that MF-ConS (DUS-AL) consistently outperforms state-of-the-art methods under extremely limited annotation.

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA

Radiomics, machine learning, and deep learning for hippocampal sclerosis identification: a systematic review and diagnostic meta-analysis.

Baptista JM, Brenner LO, Koga JV, Ohannesian VA, Ito LA, Nabarro PH, Santos LP, Henrique A, de Oliveira Almeida G, Berbet LU, Paranhos T, Nespoli V, Bertani R

•papers•Jul 29 2025

Hippocampal sclerosis (HS) is the primary pathological finding in temporal lobe epilepsy (TLE) and a common cause of refractory seizures. Conventional diagnostic methods, such as EEG and MRI, have limitations. Artificial intelligence (AI) and radiomics, utilizing machine learning and deep learning, offer a non-invasive approach to enhance diagnostic accuracy. This study synthesized recent AI and radiomics research to improve HS detection in TLE. PubMed/Medline, Embase, and Web of Science were systematically searched following PRISMA-DTA guidelines until May 2024. Statistical analysis was conducted using STATA 14. A bivariate model was used to pool sensitivity (SEN) and specificity (SPE) for HS detection, with I2 assessing heterogeneity. Six studies were included. The pooled sensitivity and specificity of AI-based models for HS detection in medial temporal lobe epilepsy (MTLE) were 0.91 (95 % CI: 0.83-0.96; I2 = 71.48 %) and 0.9 (95 % CI: 0.83-0.94; I2 = 69.62 %), with an AUC of 0.96. AI alone showed higher sensitivity (0.92) and specificity (0.93) than AI combined with radiomics (sensitivity: 0.88; specificity: 0.9). Among algorithms, support vector machine (SVM) had the highest performance (SEN: 0.92; SPE: 0.95), followed by convolutional neural networks (CNN) and logistic regression (LR). AI models, particularly SVM, demonstrate high accuracy in detecting HS, with AI alone outperforming its combination with radiomics. These findings support the integration of AI into non-invasive diagnostic workflows, potentially enabling earlier detection and more personalized clinical decision-making in epilepsy care-ultimately contributing to improved patient outcomes and behavioral management.

MRI Classification Neurological Meta Analysis In Silico Academic Lab Benchmark SOTA

MFFBi-Unet: Merging Dynamic Sparse Attention and Multi-scale Feature Fusion for Medical Image Segmentation.

Sun B, Liu C, Wang Q, Bi K, Zhang W

•papers•Jul 29 2025

The advancement of deep learning has driven extensive research validating the effectiveness of U-Net-style symmetric encoder-decoder architectures based on Transformers for medical image segmentation. However, the inherent design requiring attention mechanisms to compute token affinities across all spatial locations leads to prohibitive computational complexity and substantial memory demands. Recent efforts have attempted to address these limitations through sparse attention mechanisms. However, existing approaches employing artificial, content-agnostic sparse attention patterns demonstrate limited capability in modeling long-range dependencies effectively. We propose MFFBi-Unet, a novel architecture incorporating dynamic sparse attention through bi-level routing, enabling context-aware computation allocation with enhanced adaptability. The encoder-decoder module integrates BiFormer to optimize semantic feature extraction and facilitate high-fidelity feature map reconstruction. A novel Multi-scale Feature Fusion (MFF) module in skip connections synergistically combines multi-level contextual information with processed multi-scale features. Extensive evaluations on multiple public medical benchmarks demonstrate that our method consistently exhibits significant advantages. Notably, our method achieves statistically significant improvements, outperforming state-of-the-art approaches like MISSFormer by 2.02% and 1.28% Dice scores on respective benchmarks.

Mixed Modality Segmentation Methodology In Silico Benchmark SOTA

A data assimilation framework for predicting the spatiotemporal response of high-grade gliomas to chemoradiation.

Miniere HJM, Hormuth DA, Lima EABF, Farhat M, Panthi B, Langshaw H, Shanker MD, Talpur W, Thrower S, Goldman J, Ty S, Chung C, Yankeelov TE

•papers•Jul 29 2025

High-grade gliomas are highly invasive and respond variably to chemoradiation. Accurate, patient-specific predictions of tumor response could enhance treatment planning. We present a novel computational platform that assimilates MRI data to continually predict spatiotemporal tumor changes during chemoradiotherapy. Tumor growth and response to chemoradiation was described using a two-species reaction-diffusion model of enhancing and non-enhancing regions of the tumor. Two evaluation scenarios were used to test the predictive accuracy of this model. In scenario 1, the model was calibrated on a patient-specific basis (n = 21) to weekly MRI data during the course of chemoradiotherapy. A data assimilation framework was used to update model parameters with each new imaging visit which were then used to update model predictions. In scenario 2, we evaluated the predictive accuracy of the model when fewer data points are available by calibrating the same model using only the first two imaging visits and then predicted tumor response at the remaining five weeks of treatment. We investigated three approaches to assign model parameters for scenario 2: (1) predictions using only parameters estimated by fitting the data obtained from an individual patient's first two imaging visits, (2) predictions made by averaging the patient-specific parameters with the cohort-derived parameters, and (3) predictions using only cohort-derived parameters. Scenario 1 achieved a median [range] concordance correlation coefficient (CCC) between the predicted and measured total tumor cell counts of 0.91 [0.84, 0.95], and a median [range] percent error in tumor volume of -2.6% [-19.7, 8.0%], demonstrating strong agreement throughout the course of treatment. For scenario 2, the three approaches yielded CCCs of: (1) 0.65 [0.51, 0.88], (2) 0.74 [0.70, 0.91], (3) 0.76 [0.73, 0.92] with significant differences between the approach (1) that does not use the cohort parameters and the two approaches (2 and 3) that do. The proposed data assimilation framework enhances the accuracy of tumor growth forecasts by integrating patient-specific and cohort-based data. These findings show a practical method for identifying more personalized treatment strategies in high-grade glioma patients.

MRI Segmentation Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

SwinECAT: A Transformer-based fundus disease classification model with Shifted Window Attention and Efficient Channel Attention

Peiran Gu, Teng Yao, Mengshen He, Fuhao Duan, Feiyan Liu, RenYuan Peng, Bao Ge

•preprint•Jul 29 2025

In recent years, artificial intelligence has been increasingly applied in the field of medical imaging. Among these applications, fundus image analysis presents special challenges, including small lesion areas in certain fundus diseases and subtle inter-disease differences, which can lead to reduced prediction accuracy and overfitting in the models. To address these challenges, this paper proposes the Transformer-based model SwinECAT, which combines the Shifted Window (Swin) Attention with the Efficient Channel Attention (ECA) Attention. SwinECAT leverages the Swin Attention mechanism in the Swin Transformer backbone to effectively capture local spatial structures and long-range dependencies within fundus images. The lightweight ECA mechanism is incorporated to guide the SwinECAT's attention toward critical feature channels, enabling more discriminative feature representation. In contrast to previous studies that typically classify fundus images into 4 to 6 categories, this work expands fundus disease classification to 9 distinct types, thereby enhancing the granularity of diagnosis. We evaluate our method on the Eye Disease Image Dataset (EDID) containing 16,140 fundus images for 9-category classification. Experimental results demonstrate that SwinECAT achieves 88.29\% accuracy, with weighted F1-score of 0.88 and macro F1-score of 0.90. The classification results of our proposed model SwinECAT significantly outperform the baseline Swin Transformer and multiple compared baseline models. To our knowledge, this represents the highest reported performance for 9-category classification on this public dataset.

OCT Classification Methodology In Silico Academic Lab Benchmark SOTA

Cardiac-CLIP: A Vision-Language Foundation Model for 3D Cardiac CT Images

Yutao Hu, Ying Zheng, Shumei Miao, Xiaolei Zhang, Jiahao Xia, Yaolei Qi, Yiyang Zhang, Yuting He, Qian Chen, Jing Ye, Hongyan Qiao, Xiuhua Hu, Lei Xu, Jiayin Zhang, Hui Liu, Minwen Zheng, Yining Wang, Daimin Zhang, Ji Zhang, Wenqi Shao, Yun Liu, Longjiang Zhang, Guanyu Yang

•preprint•Jul 29 2025

Foundation models have demonstrated remarkable potential in medical domain. However, their application to complex cardiovascular diagnostics remains underexplored. In this paper, we present Cardiac-CLIP, a multi-modal foundation model designed for 3D cardiac CT images. Cardiac-CLIP is developed through a two-stage pre-training strategy. The first stage employs a 3D masked autoencoder (MAE) to perform self-supervised representation learning from large-scale unlabeled volumetric data, enabling the visual encoder to capture rich anatomical and contextual features. In the second stage, contrastive learning is introduced to align visual and textual representations, facilitating cross-modal understanding. To support the pre-training, we collect 16641 real clinical CT scans, supplemented by 114k publicly available data. Meanwhile, we standardize free-text radiology reports into unified templates and construct the pathology vectors according to diagnostic attributes, based on which the soft-label matrix is generated to supervise the contrastive learning process. On the other hand, to comprehensively evaluate the effectiveness of Cardiac-CLIP, we collect 6,722 real-clinical data from 12 independent institutions, along with the open-source data to construct the evaluation dataset. Specifically, Cardiac-CLIP is comprehensively evaluated across multiple tasks, including cardiovascular abnormality classification, information retrieval and clinical analysis. Experimental results demonstrate that Cardiac-CLIP achieves state-of-the-art performance across various downstream tasks in both internal and external data. Particularly, Cardiac-CLIP exhibits great effectiveness in supporting complex clinical tasks such as the prospective prediction of acute coronary syndrome, which is notoriously difficult in real-world scenarios.

CT Classification Cardiac Methodology In Silico Academic Lab Benchmark SOTA Open Dataset

BioAug-Net: a bioimage sensor-driven attention-augmented segmentation framework with physiological coupling for early prostate cancer detection in T2-weighted MRI.

Arshad M, Wang C, Us Sima MW, Ali Shaikh J, Karamti H, Alharthi R, Selecky J

•papers•Jul 29 2025

Accurate segmentation of the prostate peripheral zone (PZ) in T2-weighted MRI is critical for the early detection of prostate cancer. Existing segmentation methods are hindered by significant inter-observer variability (37.4 ± 5.6%), poor boundary localization, and the presence of motion artifacts, along with challenges in clinical integration. In this study, we propose BioAug-Net, a novel framework that integrates real-time physiological signal feedback with MRI data, leveraging transformer-based attention mechanisms and a probabilistic clinical decision support system (PCDSS). BioAug-Net features a dual-branch asymmetric attention mechanism: one branch processes spatial MRI features, while the other incorporates temporal sensor signals through a BiGRU-driven adaptive masking module. Additionally, a Markov Decision Process-based PCDSS maps segmentation outputs to clinical PI-RADS scores, with uncertainty quantification. We validated BioAug-Net on a multi-institutional dataset (n=1,542) and demonstrated state-of-the-art performance, achieving a Dice Similarity Coefficient of 89.7% (p < 0.001), sensitivity of 91.2% (p < 0.001), specificity of 88.4% (p < 0.001), and HD95 of 2.14 mm (p < 0.001), outperforming U-Net, Attention U-Net, and TransUNet. Sensor integration improved segmentation accuracy by 12.6% (p < 0.001) and reduced inter-observer variation by 48.3% (p < 0.001). Radiologist evaluations (n=3) confirmed a 15.0% reduction in diagnosis time (p = 0.003) and an increase in inter-reader agreement from K = 0.68 to K = 0.82 (p = 0.001). Our results show that BioAug-Net offers a clinically viable solution for early prostate cancer detection through enhanced physiological coupling and explainable AI diagnostics.

MRI Segmentation Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Distribution-Based Masked Medical Vision-Language Model Using Structured Reports

Shreyank N Gowda, Ruichi Zhang, Xiao Gu, Ying Weng, Lu Yang

•preprint•Jul 29 2025

Medical image-language pre-training aims to align medical images with clinically relevant text to improve model performance on various downstream tasks. However, existing models often struggle with the variability and ambiguity inherent in medical data, limiting their ability to capture nuanced clinical information and uncertainty. This work introduces an uncertainty-aware medical image-text pre-training model that enhances generalization capabilities in medical image analysis. Building on previous methods and focusing on Chest X-Rays, our approach utilizes structured text reports generated by a large language model (LLM) to augment image data with clinically relevant context. These reports begin with a definition of the disease, followed by the `appearance' section to highlight critical regions of interest, and finally `observations' and `verdicts' that ground model predictions in clinical semantics. By modeling both inter- and intra-modal uncertainty, our framework captures the inherent ambiguity in medical images and text, yielding improved representations and performance on downstream tasks. Our model demonstrates significant advances in medical image-text pre-training, obtaining state-of-the-art performance on multiple downstream tasks.

X-Ray Classification Chest Methodology In Silico Benchmark SOTA GenAI

Fully automated 3D multi-modal deep learning model for preoperative T-stage prediction of colorectal cancer using <sup>18</sup>F-FDG PET/CT.

Zhang M, Li Y, Zheng C, Xie F, Zhao Z, Dai F, Wang J, Wu H, Zhu Z, Liu Q, Li Y

•papers•Jul 28 2025

This study aimed to develop a fully automated 3D multi-modal deep learning model using preoperative <sup>18</sup>F-FDG PET/CT to predict the T-stage of colorectal cancer (CRC) and evaluate its clinical utility. A retrospective cohort of 474 CRC patients was included, with 400 patients for internal cohort and 74 patients for external cohort. Patients were classified into early T-stage (T1-T2) and advanced T-stage (T3-T4) groups. Automatic segmentation of the volume of interest (VOI) was achieved based on TotalSegmentator. A 3D ResNet18-based deep learning model integrated with a cross-multi-head attention mechanism was developed. Five models (CT + PET + Clinic (CPC), CT + PET (CP), PET (P), CT (C), Clinic) and two radiologists' assessment were compared. Performance was evaluated using Area Under the Curve (AUC). Grad-CAM was employed to provide visual interpretability of decision-critical regions. The automated segmentation achieved Dice scores of 0.884 (CT) and 0.888 (PET). The CPC and CP models achieved superior performance, with AUCs of 0.869 and 0.869 in the internal validation cohort, respectively, outperforming single-modality models (P: 0.832; C: 0.809; Clinic: 0.728) and the radiologists (AUC: 0.627, P < 0.05 for all models vs. radiologists, except for the Clinical model). External validation exhibited a similar trend, with AUCs of 0.814, 0.812, 0.763, 0.714, 0.663 and 0.704, respectively. Grad-CAM visualization highlighted tumor-centric regions for early T-stage and peri-tumoral tissue infiltration for advanced T-stage. The fully automated multimodal, fusing PET/CT with cross-multi-head-attention, improved T-stage prediction in CRC, surpassing the single-modality models and radiologists, offering a time-efficient tool to aid clinical decision-making.

Mixed Modality Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Deep learning aging marker from retinal images unveils sex-specific clinical and genetic signatures

Multi-Faceted Consistency learning with active cross-labeling for barely-supervised 3D medical image segmentation.

Radiomics, machine learning, and deep learning for hippocampal sclerosis identification: a systematic review and diagnostic meta-analysis.

MFFBi-Unet: Merging Dynamic Sparse Attention and Multi-scale Feature Fusion for Medical Image Segmentation.

A data assimilation framework for predicting the spatiotemporal response of high-grade gliomas to chemoradiation.

SwinECAT: A Transformer-based fundus disease classification model with Shifted Window Attention and Efficient Channel Attention

Cardiac-CLIP: A Vision-Language Foundation Model for 3D Cardiac CT Images

BioAug-Net: a bioimage sensor-driven attention-augmented segmentation framework with physiological coupling for early prostate cancer detection in T2-weighted MRI.

Distribution-Based Masked Medical Vision-Language Model Using Structured Reports

Fully automated 3D multi-modal deep learning model for preoperative T-stage prediction of colorectal cancer using <sup>18</sup>F-FDG PET/CT.

Ready to Sharpen Your Edge?