Latest Papers on Radiology AI. Tags: Benchmark SOTA

Mitigating Overfitting in Medical Imaging: Self-Supervised Pretraining vs. ImageNet Transfer Learning for Dermatological Diagnosis

Iván Matas, Carmen Serrano, Miguel Nogales, David Moreno, Lara Ferrándiz, Teresa Ojeda, Begoña Acha

•preprint•May 22 2025

Deep learning has transformed computer vision but relies heavily on large labeled datasets and computational resources. Transfer learning, particularly fine-tuning pretrained models, offers a practical alternative; however, models pretrained on natural image datasets such as ImageNet may fail to capture domain-specific characteristics in medical imaging. This study introduces an unsupervised learning framework that extracts high-value dermatological features instead of relying solely on ImageNet-based pretraining. We employ a Variational Autoencoder (VAE) trained from scratch on a proprietary dermatological dataset, allowing the model to learn a structured and clinically relevant latent space. This self-supervised feature extractor is then compared to an ImageNet-pretrained backbone under identical classification conditions, highlighting the trade-offs between general-purpose and domain-specific pretraining. Our results reveal distinct learning patterns. The self-supervised model achieves a final validation loss of 0.110 (-33.33%), while the ImageNet-pretrained model stagnates at 0.100 (-16.67%), indicating overfitting. Accuracy trends confirm this: the self-supervised model improves from 45% to 65% (+44.44%) with a near-zero overfitting gap, whereas the ImageNet-pretrained model reaches 87% (+50.00%) but plateaus at 75% (+19.05%), with its overfitting gap increasing to +0.060. These findings suggest that while ImageNet pretraining accelerates convergence, it also amplifies overfitting on non-clinically relevant features. In contrast, self-supervised learning achieves steady improvements, stronger generalization, and superior adaptability, underscoring the importance of domain-specific feature extraction in medical imaging.

X-Ray Classification Methodology In Silico Academic Lab Benchmark SOTA

Radiomics-Based Early Triage of Prostate Cancer: A Multicenter Study from the CHAIMELEON Project

Vraka, A., Marfil-Trujillo, M., Ribas-Despuig, G., Flor-Arnal, S., Cerda-Alberich, L., Jimenez-Gomez, P., Jimenez-Pastor, A., Marti-Bonmati, L.

•preprint•May 22 2025

Prostate cancer (PCa) is the most commonly diagnosed malignancy in men worldwide. Accurate triage of patients based on tumor aggressiveness and staging is critical for selecting appropriate management pathways. While magnetic resonance imaging (MRI) has become a mainstay in PCa diagnosis, most predictive models rely on multiparametric imaging or invasive inputs, limiting generalizability in real-world clinical settings. This study aimed to develop and validate machine learning (ML) models using radiomic features extracted from T2-weighted MRI--alone and in combination with clinical variables--to predict ISUP grade (tumor aggressiveness), lymph node involvement (cN) and distant metastasis (cM). A retrospective multicenter cohort from three European sites in the Chaimeleon project was analyzed. Radiomic features were extracted from prostate zone segmentations and lesion masks, following standardized preprocessing and ComBat harmonization. Feature selection and model optimization were performed using nested cross-validation and Bayesian tuning. Hybrid models were trained using XGBoost and interpreted with SHAP values. The ISUP model achieved an AUC of 0.66, while the cN and cM models reached AUCs of 0.77 and 0.80, respectively. The best-performing models consistently combined prostate zone radiomics with clinical features such as PSA, PIRADSv2 and ISUP grade. SHAP analysis confirmed the importance of both clinical and texture-based radiomic features, with entropy and non-uniformity measures playing central roles in all tasks. Our results demonstrate the feasibility of using T2-weighted MRI and zonal radiomics for robust prediction of aggressiveness, nodal involvement and distant metastasis in PCa. This fully automated pipeline offers an interpretable, accessible and clinically translatable tool for first-line PCa triage, with potential integration into real-world diagnostic workflows.

MRI Classification Abdominal Retrospective Clinical In Silico Consortium Benchmark SOTA

A Novel Dynamic Neural Network for Heterogeneity-Aware Structural Brain Network Exploration and Alzheimer's Disease Diagnosis.

Cui W, Leng Y, Peng Y, Bai C, Li L, Jiang X, Yuan G, Zheng J

•papers•May 22 2025

Heterogeneity is a fundamental characteristic of brain diseases, distinguished by variability not only in brain atrophy but also in the complexity of neural connectivity and brain networks. However, existing data-driven methods fail to provide a comprehensive analysis of brain heterogeneity. Recently, dynamic neural networks (DNNs) have shown significant advantages in capturing sample-wise heterogeneity. Therefore, in this article, we first propose a novel dynamic heterogeneity-aware network (DHANet) to identify critical heterogeneous brain regions, explore heterogeneous connectivity between them, and construct a heterogeneous-aware structural brain network (HGA-SBN) using structural magnetic resonance imaging (sMRI). Specifically, we develop a 3-D dynamic convmixer to extract abundant heterogeneous features from sMRI first. Subsequently, the critical brain atrophy regions are identified by dynamic prototype learning with embedding the hierarchical brain semantic structure. Finally, we employ a joint dynamic edge-correlation (JDE) modeling approach to construct the heterogeneous connectivity between these regions and analyze the HGA-SBN. To evaluate the effectiveness of the DHANet, we conduct elaborate experiments on three public datasets and the method achieves state-of-the-art (SOTA) performance on two classification tasks.

MRI Classification Neurological Methodology In Silico Academic Lab Benchmark SOTA

On factors that influence deep learning-based dose prediction of head and neck tumors.

Gao R, Mody P, Rao C, Dankers F, Staring M

•papers•May 22 2025

Objective.This study investigates key factors influencing deep learning-based dose prediction models for head and neck cancer radiation therapy. The goal is to evaluate model accuracy, robustness, and computational efficiency, and to identify key components necessary for optimal performance.Approach.We systematically analyze the impact of input and dose grid resolution, input type, loss function, model architecture, and noise on model performance. Two datasets are used: a public dataset (OpenKBP) and an in-house clinical dataset. Model performance is primarily evaluated using two metrics: dose score and dose-volume histogram (DVH) score.Main results.High-resolution inputs improve prediction accuracy (dose score and DVH score) by 8.6%-13.5% compared to low resolution. Using a combination of CT, planning target volumes, and organs-at-risk as input significantly enhances accuracy, with improvements of 57.4%-86.8% over using CT alone. Integrating mean absolute error (MAE) loss with value-based and criteria-based DVH loss functions further boosts DVH score by 7.2%-7.5% compared to MAE loss alone. In the robustness analysis, most models show minimal degradation under Poisson noise (0-0.3 Gy) but are more susceptible to adversarial noise (0.2-7.8 Gy). Notably, certain models, such as SwinUNETR, demonstrate superior robustness against adversarial perturbations.Significance.These findings highlight the importance of optimizing deep learning models and provide valuable guidance for achieving more accurate and reliable radiotherapy dose prediction.

CT Registration Neurological Methodology In Silico Academic Lab Benchmark SOTA

Predictive value of machine learning for PD-L1 expression in NSCLC: a systematic review and meta-analysis.

Zheng T, Li X, Zhou L, Jin J

•papers•May 22 2025

As machine learning (ML) continuously develops in cancer diagnosis and treatment, some researchers have attempted to predict the expression of programmed death ligand-1 (PD-L1) in non-small cell lung cancer (NSCLC) by ML. However, there is a lack of systematic evidence on the effectiveness of ML. We conducted a thorough search across Embase, PubMed, the Cochrane Library, and Web of Science from inception to December 14th, 2023.A systematic review and meta-analysis was conducted to assess the value of ML for predicting PD-L1 expression in NSCLC. Totally 30 studies with 12,898 NSCLC patients were included. The thresholds of PD-L1 expression level were < 1%, 1-49%, and ≥ 50%. In the validation set, in the binary classification for PD-L1 ≥ 1%, the pooled C-index was 0.646 (95%CI: 0.587-0.705), 0.799 (95%CI: 0.782-0.817), 0.806 (95%CI: 0.753-0.858), and 0.800 (95%CI: 0.717-0.883), respectively, for the clinical feature-, radiomics-, radiomics + clinical feature-, and pathomics-based ML models; in the binary classification for PD-L1 ≥ 50%, the pooled C-index was 0.649 (95%CI: 0.553-0.744), 0.771 (95%CI: 0.728-0.814), and 0.826 (95%CI: 0.783-0.869), respectively, for the clinical feature-, radiomics-, and radiomics + clinical feature-based ML models. At present, radiomics- or pathomics-based ML methods are applied for the prediction of PD-L1 expression in NSCLC, which both achieve satisfactory accuracy. In particular, the radiomics-based ML method seems to have wider clinical applicability as a non-invasive diagnostic tool. Both radiomics and pathomics serve as processing methods for medical images. In the future, we expect to develop medical image-based DL methods for intelligently predicting PD-L1 expression.

Mixed Modality Classification Chest Meta Analysis In Silico Academic Lab Benchmark SOTA

Leveraging deep learning-based kernel conversion for more precise airway quantification on CT.

Choe J, Yun J, Kim MJ, Oh YJ, Bae S, Yu D, Seo JB, Lee SM, Lee HY

•papers•May 22 2025

To evaluate the variability of fully automated airway quantitative CT (QCT) measures caused by different kernels and the effect of kernel conversion. This retrospective study included 96 patients who underwent non-enhanced chest CT at two centers. CT scans were reconstructed using four kernels (medium soft, medium sharp, sharp, very sharp) from three vendors. Kernel conversion targeting the medium soft kernel as reference was applied to sharp kernel images. Fully automated airway quantification was performed before and after conversion. The effects of kernel type and conversion on airway quantification were evaluated using analysis of variance, paired t-tests, and concordance correlation coefficient (CCC). Airway QCT measures (e.g., Pi10, wall thickness, wall area percentage, lumen diameter) decreased with sharper kernels (all, p < 0.001), with varying degrees of variability across variables and vendors. Kernel conversion substantially reduced variability between medium soft and sharp kernel images for vendors A (pooled CCC: 0.59 vs. 0.92) and B (0.40 vs. 0.91) and lung-dedicated sharp kernels of vendor C (0.26 vs. 0.71). However, it was ineffective for non-lung-dedicated sharp kernels of vendor C (0.81 vs. 0.43) and showed limited improvement in variability of QCT measures at the subsegmental level. Consistent airway segmentation and identical anatomic labeling improved subsegmental airway variability in theoretical tests. Deep learning-based kernel conversion reduced the measurement variability of airway QCT across various kernels and vendors but was less effective for non-lung-dedicated kernels and subsegmental airways. Consistent airway segmentation and precise anatomic labeling can further enhance reproducibility for reliable automated quantification. Question How do different CT reconstruction kernels affect the measurement variability of automated airway measurements, and can deep learning-based kernel conversion reduce this variability? Findings Kernel conversion improved measurement consistency across vendors for lung-dedicated kernels, but showed limited effectiveness for non-lung-dedicated kernels and subsegmental airways. Clinical relevance Understanding kernel-related variability in airway quantification and mitigating it through deep learning enables standardized analysis, but further refinements are needed for robust airway segmentation, particularly for improving measurement variability in subsegmental airways and specific kernels.

CT Reconstruction Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

ÆMMamba: An Efficient Medical Segmentation Model With Edge Enhancement.

Dong X, Zhou B, Yin C, Liao IY, Jin Z, Xu Z, Pu B

•papers•May 21 2025

Medical image segmentation is critical for disease diagnosis, treatment planning, and prognosis assessment, yet the complexity and diversity of medical images pose significant challenges to accurate segmentation. While Convolutional Neural Networks capture local features and Vision Transformers excel in the global context, both struggle with efficient long-range dependency modeling. Inspired by Mamba's State Space Modeling efficiency, we propose ÆMMamba, a novel multi-scale feature extraction framework built on the Mamba backbone network. AÆMMamba integrates several innovative modules: the Efficient Fusion Bridge (EFB) module, which employs a bidirectional state-space model and attention mechanisms to fuse multi-scale features; the Edge-Aware Module (EAM), which enhances low-level edge representation using Sobel-based edge extraction; and the Boundary Sensitive Decoder (BSD), which leverages inverse attention and residual convolutional layers to handle cross-level complex boundaries. ÆMMamba achieves state-of-the-art performance across 8 medical segmentation datasets. On polyp segmentation datasets (Kvasir, ClinicDB, ColonDB, EndoScene, ETIS), it records the highest mDice and mIoU scores, outperforming methods like MADGNet and Swin-UMamba, with a standout mDice of 72.22 on ETIS, the most challenging dataset in this domain. For lung and breast segmentation, ÆMMamba surpasses competitors such as H2Former and SwinUnet, achieving Dice scores of 84.24 on BUSI and 79.83 on COVID-19 Lung. And on the LGG brain MRI dataset, ÆMMamba attains an mDice of 87.25 and an mIoU of 79.31, outperforming all compared methods. The source code will be released at https://github.com/xingbod/eMMamba.

Segmentation Methodology In Silico Benchmark SOTA Open Code

TAGS: 3D Tumor-Adaptive Guidance for SAM

Sirui Li, Linkai Peng, Zheyuan Zhang, Gorkem Durak, Ulas Bagci

•preprint•May 21 2025

Foundation models (FMs) such as CLIP and SAM have recently shown great promise in image segmentation tasks, yet their adaptation to 3D medical imaging-particularly for pathology detection and segmentation-remains underexplored. A critical challenge arises from the domain gap between natural images and medical volumes: existing FMs, pre-trained on 2D data, struggle to capture 3D anatomical context, limiting their utility in clinical applications like tumor segmentation. To address this, we propose an adaptation framework called TAGS: Tumor Adaptive Guidance for SAM, which unlocks 2D FMs for 3D medical tasks through multi-prompt fusion. By preserving most of the pre-trained weights, our approach enhances SAM's spatial feature extraction using CLIP's semantic insights and anatomy-specific prompts. Extensive experiments on three open-source tumor segmentation datasets prove that our model surpasses the state-of-the-art medical image segmentation models (+46.88% over nnUNet), interactive segmentation frameworks, and other established medical FMs, including SAM-Med2D, SAM-Med3D, SegVol, Universal, 3D-Adapter, and SAM-B (at least +13% over them). This highlights the robustness and adaptability of our proposed framework across diverse medical segmentation tasks.

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA

Large medical image database impact on generalizability of synthetic CT scan generation.

Boily C, Mazellier JP, Meyer P

•papers•May 21 2025

This study systematically examines the impact of training database size and the generalizability of deep learning models for synthetic medical image generation. Specifically, we employ a Cycle-Consistency Generative Adversarial Network (CycleGAN) with softly paired data to synthesize kilovoltage computed tomography (kVCT) images from megavoltage computed tomography (MVCT) scans. Unlike previous works, which were constrained by limited data availability, our study uses an extensive database comprising 4,000 patient CT scans, an order of magnitude larger than prior research, allowing for a more rigorous assessment of database size in medical image translation. We quantitatively evaluate the fidelity of the generated synthetic images using established image similarity metrics, including Mean Absolute Error (MAE) and Structural Similarity Index Measure (SSIM). Beyond assessing image quality, we investigate the model's capacity for generalization by analyzing its performance across diverse patient subgroups, considering factors such as sex, age, and anatomical region. This approach enables a more granular understanding of how dataset composition influences model robustness.

CT Image Synthesis Whole Body Methodology In Silico Academic Lab Benchmark SOTA

Domain Adaptive Skin Lesion Classification via Conformal Ensemble of Vision Transformers

Mehran Zoravar, Shadi Alijani, Homayoun Najjaran

•preprint•May 21 2025

Exploring the trustworthiness of deep learning models is crucial, especially in critical domains such as medical imaging decision support systems. Conformal prediction has emerged as a rigorous means of providing deep learning models with reliable uncertainty estimates and safety guarantees. However, conformal prediction results face challenges due to the backbone model's struggles in domain-shifted scenarios, such as variations in different sources. To aim this challenge, this paper proposes a novel framework termed Conformal Ensemble of Vision Transformers (CE-ViTs) designed to enhance image classification performance by prioritizing domain adaptation and model robustness, while accounting for uncertainty. The proposed method leverages an ensemble of vision transformer models in the backbone, trained on diverse datasets including HAM10000, Dermofit, and Skin Cancer ISIC datasets. This ensemble learning approach, calibrated through the combined mentioned datasets, aims to enhance domain adaptation through conformal learning. Experimental results underscore that the framework achieves a high coverage rate of 90.38\%, representing an improvement of 9.95\% compared to the HAM10000 model. This indicates a strong likelihood that the prediction set includes the true label compared to singular models. Ensemble learning in CE-ViTs significantly improves conformal prediction performance, increasing the average prediction set size for challenging misclassified samples from 1.86 to 3.075.

OCT Classification Methodology In Silico Benchmark SOTA

Filter Papers

Tags

Mitigating Overfitting in Medical Imaging: Self-Supervised Pretraining vs. ImageNet Transfer Learning for Dermatological Diagnosis

Radiomics-Based Early Triage of Prostate Cancer: A Multicenter Study from the CHAIMELEON Project

A Novel Dynamic Neural Network for Heterogeneity-Aware Structural Brain Network Exploration and Alzheimer's Disease Diagnosis.

On factors that influence deep learning-based dose prediction of head and neck tumors.

Predictive value of machine learning for PD-L1 expression in NSCLC: a systematic review and meta-analysis.

Leveraging deep learning-based kernel conversion for more precise airway quantification on CT.

ÆMMamba: An Efficient Medical Segmentation Model With Edge Enhancement.

TAGS: 3D Tumor-Adaptive Guidance for SAM

Large medical image database impact on generalizability of synthetic CT scan generation.

Domain Adaptive Skin Lesion Classification via Conformal Ensemble of Vision Transformers

Ready to Sharpen Your Edge?