Latest Papers on Radiology AI. Tags: Benchmark SOTA, Order: Best Match, Limit: 10.

Depthwise-Dilated Convolutional Adapters for Medical Object Tracking and Segmentation Using the Segment Anything Model 2

Guoping Xu, Christopher Kabat, You Zhang

•preprint•Jul 19 2025

Recent advances in medical image segmentation have been driven by deep learning; however, most existing methods remain limited by modality-specific designs and exhibit poor adaptability to dynamic medical imaging scenarios. The Segment Anything Model 2 (SAM2) and its related variants, which introduce a streaming memory mechanism for real-time video segmentation, present new opportunities for prompt-based, generalizable solutions. Nevertheless, adapting these models to medical video scenarios typically requires large-scale datasets for retraining or transfer learning, leading to high computational costs and the risk of catastrophic forgetting. To address these challenges, we propose DD-SAM2, an efficient adaptation framework for SAM2 that incorporates a Depthwise-Dilated Adapter (DD-Adapter) to enhance multi-scale feature extraction with minimal parameter overhead. This design enables effective fine-tuning of SAM2 on medical videos with limited training data. Unlike existing adapter-based methods focused solely on static images, DD-SAM2 fully exploits SAM2's streaming memory for medical video object tracking and segmentation. Comprehensive evaluations on TrackRad2025 (tumor segmentation) and EchoNet-Dynamic (left ventricle tracking) datasets demonstrate superior performance, achieving Dice scores of 0.93 and 0.97, respectively. To the best of our knowledge, this work provides an initial attempt at systematically exploring adapter-based SAM2 fine-tuning for medical video segmentation and tracking. Code, datasets, and models will be publicly available at https://github.com/apple1986/DD-SAM2.

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA Open Code Open Dataset

Artificial intelligence-based models for quantification of intra-pancreatic fat deposition and their clinical relevance: a systematic review of imaging studies.

Joshi T, Virostko J, Petrov MS

•papers•Jul 19 2025

High intra-pancreatic fat deposition (IPFD) plays an important role in diseases of the pancreas. The intricate anatomy of the pancreas and the surrounding structures has historically made IPFD quantification a challenging measurement to make accurately on radiological images. To take on the challenge, automated IPFD quantification methods using artificial intelligence (AI) have recently been deployed. The aim was to benchmark the current knowledge on the use of AI-based models to measure IPFD automatedly. The search was conducted in the MEDLINE, Embase, Scopus, and IEEE Xplore databases. Studies were eligible if they used AI for both segmentation of the pancreas and quantification of IPFD. The ground truth was manual segmentation by radiologists. When possible, data were pooled statistically using a random-effects model. A total of 12 studies (10 cross-sectional and 2 longitudinal) encompassing more than 50 thousand people were included. Eight of the 12 studies used MRI, whereas four studies employed CT. U-Net model and nnU-Net model were the most frequently used AI-based models. The pooled Dice similarity coefficient of AI-based models in quantifying IPFD was 82.3% (95% confidence interval, 73.5 to 91.1%). The clinical application of AI-based models showed the relevance of high IPFD to acute pancreatitis, pancreatic cancer, and type 2 diabetes mellitus. Current AI-based models for IPFD quantification are suboptimal, as the dissimilarity between AI-based and manual quantification of IPFD is not negligible. Future advancements in fully automated measurements of IPFD will accelerate the accumulation of robust, large-scale evidence on the role of high IPFD in pancreatic diseases. KEY POINTS: Question What is the current evidence on the performance and clinical applicability of artificial intelligence-based models for automated quantification of intra-pancreatic fat deposition? Findings The nnU-Net model achieved the highest Dice similarity coefficient among MRI-based studies, whereas the nnTransfer model demonstrated the highest Dice similarity coefficient in CT-based studies. Clinical relevance Standardisation of reporting on artificial intelligence-based models for the quantification of intra-pancreatic fat deposition will be essential to enhancing the clinical applicability and reliability of artificial intelligence in imaging patients with diseases of the pancreas.

Mixed Modality Segmentation Abdominal Review In Silico Academic Lab Benchmark SOTA

QUTCC: Quantile Uncertainty Training and Conformal Calibration for Imaging Inverse Problems

Cassandra Tong Ye, Shamus Li, Tyler King, Kristina Monakhova

•preprint•Jul 19 2025

Deep learning models often hallucinate, producing realistic artifacts that are not truly present in the sample. This can have dire consequences for scientific and medical inverse problems, such as MRI and microscopy denoising, where accuracy is more important than perceptual quality. Uncertainty quantification techniques, such as conformal prediction, can pinpoint outliers and provide guarantees for image regression tasks, improving reliability. However, existing methods utilize a linear constant scaling factor to calibrate uncertainty bounds, resulting in larger, less informative bounds. We propose QUTCC, a quantile uncertainty training and calibration technique that enables nonlinear, non-uniform scaling of quantile predictions to enable tighter uncertainty estimates. Using a U-Net architecture with a quantile embedding, QUTCC enables the prediction of the full conditional distribution of quantiles for the imaging task. During calibration, QUTCC generates uncertainty bounds by iteratively querying the network for upper and lower quantiles, progressively refining the bounds to obtain a tighter interval that captures the desired coverage. We evaluate our method on several denoising tasks as well as compressive MRI reconstruction. Our method successfully pinpoints hallucinations in image estimates and consistently achieves tighter uncertainty intervals than prior methods while maintaining the same statistical coverage.

MRI Reconstruction Methodology In Silico Academic Lab Benchmark SOTA

Performance comparison of medical image classification systems using TensorFlow Keras, PyTorch, and JAX

Merjem Bećirović, Amina Kurtović, Nordin Smajlović, Medina Kapo, Amila Akagić

•preprint•Jul 19 2025

Medical imaging plays a vital role in early disease diagnosis and monitoring. Specifically, blood microscopy offers valuable insights into blood cell morphology and the detection of hematological disorders. In recent years, deep learning-based automated classification systems have demonstrated high potential in enhancing the accuracy and efficiency of blood image analysis. However, a detailed performance analysis of specific deep learning frameworks appears to be lacking. This paper compares the performance of three popular deep learning frameworks, TensorFlow with Keras, PyTorch, and JAX, in classifying blood cell images from the publicly available BloodMNIST dataset. The study primarily focuses on inference time differences, but also classification performance for different image sizes. The results reveal variations in performance across frameworks, influenced by factors such as image resolution and framework-specific optimizations. Classification accuracy for JAX and PyTorch was comparable to current benchmarks, showcasing the efficiency of these frameworks for medical image classification.

Mixed Modality Classification Methodology In Silico Academic Lab Benchmark SOTA

Benchmarking GANs, Diffusion Models, and Flow Matching for T1w-to-T2w MRI Translation

Andrea Moschetto, Lemuel Puglisi, Alec Sargood, Pierluigi Dell'Acqua, Francesco Guarnera, Sebastiano Battiato, Daniele Ravì

•preprint•Jul 19 2025

Magnetic Resonance Imaging (MRI) enables the acquisition of multiple image contrasts, such as T1-weighted (T1w) and T2-weighted (T2w) scans, each offering distinct diagnostic insights. However, acquiring all desired modalities increases scan time and cost, motivating research into computational methods for cross-modal synthesis. To address this, recent approaches aim to synthesize missing MRI contrasts from those already acquired, reducing acquisition time while preserving diagnostic quality. Image-to-image (I2I) translation provides a promising framework for this task. In this paper, we present a comprehensive benchmark of generative models$\unicode{x2013}$specifically, Generative Adversarial Networks (GANs), diffusion models, and flow matching (FM) techniques$\unicode{x2013}$for T1w-to-T2w 2D MRI I2I translation. All frameworks are implemented with comparable settings and evaluated on three publicly available MRI datasets of healthy adults. Our quantitative and qualitative analyses show that the GAN-based Pix2Pix model outperforms diffusion and FM-based methods in terms of structural fidelity, image quality, and computational efficiency. Consistent with existing literature, these results suggest that flow-based models are prone to overfitting on small datasets and simpler tasks, and may require more data to match or surpass GAN performance. These findings offer practical guidance for deploying I2I translation techniques in real-world MRI workflows and highlight promising directions for future research in cross-modal medical image synthesis. Code and models are publicly available at https://github.com/AndreaMoschetto/medical-I2I-benchmark.

MRI Image Synthesis Neurological Methodology In Silico Academic Lab Open Code Benchmark SOTA

Explainable CT-based deep learning model for predicting hematoma expansion including intraventricular hemorrhage growth.

Zhao X, Zhang Z, Shui J, Xu H, Yang Y, Zhu L, Chen L, Chang S, Du C, Yao Z, Fang X, Shi L

•papers•Jul 18 2025

Hematoma expansion (HE), including intraventricular hemorrhage (IVH) growth, significantly affects outcomes in patients with intracerebral hemorrhage (ICH). This study aimed to develop, validate, and interpret a deep learning model, HENet, for predicting three definitions of HE. Using CT scans and clinical data from 718 ICH patients across three hospitals, the multicenter retrospective study focused on revised hematoma expansion (RHE) definitions 1 and 2, and conventional HE (CHE). HENet's performance was compared with 2D models and physician predictions using two external validation sets. Results showed that HENet achieved high AUC values for RHE1, RHE2, and CHE predictions, surpassing physicians' predictions and 2D models in net reclassification index and integrated discrimination index for RHE1 and RHE2 outcomes. The Grad-CAM technique provided visual insights into the model's decision-making process. These findings suggest that integrating HENet into clinical practice could improve prediction accuracy and patient outcomes in ICH cases.

CT Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

SegMamba-V2: Long-range Sequential Modeling Mamba For General 3D Medical Image Segmentation.

Xing Z, Ye T, Yang Y, Cai D, Gai B, Wu XJ, Gao F, Zhu L

•papers•Jul 18 2025

The Transformer architecture has demonstrated remarkable results in 3D medical image segmentation due to its capability of modeling global relationships. However, it poses a significant computational burden when processing high-dimensional medical images. Mamba, as a State Space Model (SSM), has recently emerged as a notable approach for modeling long-range dependencies in sequential data. Although a substantial amount of Mamba-based research has focused on natural language and 2D image processing, few studies explore the capability of Mamba on 3D medical images. In this paper, we propose SegMamba-V2, a novel 3D medical image segmentation model, to effectively capture long-range dependencies within whole-volume features at each scale. To achieve this goal, we first devise a hierarchical scale downsampling strategy to enhance the receptive field and mitigate information loss during downsampling. Furthermore, we design a novel tri-orientated spatial Mamba block that extends the global dependency modeling process from one plane to three orthogonal planes to improve feature representation capability. Moreover, we collect and annotate a large-scale dataset (named CRC-2000) with fine-grained categories to facilitate benchmarking evaluation in 3D colorectal cancer (CRC) segmentation. We evaluate the effectiveness of our SegMamba-V2 on CRC-2000 and three other large-scale 3D medical image segmentation datasets, covering various modalities, organs, and segmentation targets. Experimental results demonstrate that our Segmamba-V2 outperforms state-of-the-art methods by a significant margin, which indicates the universality and effectiveness of the proposed model on 3D medical image segmentation tasks. The code for SegMamba-V2 is publicly available at: https://github.com/ge-xing/SegMamba-V2.

Mixed Modality Segmentation Abdominal Methodology In Silico Academic Lab Open Code Open Dataset Benchmark SOTA

AI Prognostication in Nonsmall Cell Lung Cancer: A Systematic Review.

Augustin M, Lyons K, Kim H, Kim DG, Kim Y

•papers•Jul 18 2025

The systematic literature review was performed on the use of artificial intelligence (AI) algorithms in nonsmall cell lung cancer (NSCLC) prognostication. Studies were evaluated for the type of input data (histology and whether CT, PET, and MRI were used), cancer therapy intervention, prognosis performance, and comparisons to clinical prognosis systems such as TNM staging. Further comparisons were drawn between different types of AI, such as machine learning (ML) and deep learning (DL). Syntheses of therapeutic interventions and algorithm input modalities were performed for comparison purposes. The review adheres to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). The initial database identified 3880 results, which were reduced to 513 after the automatic screening, and 309 after the exclusion criteria. The prognostic performance of AI for NSCLC has been investigated using histology and genetic data, and CT, PET, and MR imaging for surgery, immunotherapy, and radiation therapy patients with and without chemotherapy. Studies per therapy intervention were 13 for immunotherapy, 10 for radiotherapy, 14 for surgery, and 34 for other, multiple, or no specific therapy. The results of this systematic review demonstrate that AI-based prognostication methods consistently present higher prognostic performance for NSCLC, especially when directly compared with traditional prognostication techniques such as TNM staging. The use of DL outperforms ML-based prognostication techniques. DL-based prognostication demonstrates the potential for personalized precision cancer therapy as a supplementary decision-making tool. Before it is fully utilized in clinical practice, it is recommended that it be thoroughly validated through well-designed clinical trials.

Mixed Modality Classification Chest Review Concept Academic Lab Benchmark SOTA

Using Convolutional Neural Networks for the Classification of Suboptimal Chest Radiographs.

Liu EH, Carrion D, Badawy MK

•papers•Jul 18 2025

Chest X-rays (CXR) rank among the most conducted X-ray examinations. They often require repeat imaging due to inadequate quality, leading to increased radiation exposure and delays in patient care and diagnosis. This research assesses the efficacy of DenseNet121 and YOLOv8 neural networks in detecting suboptimal CXRs, which may minimise delays and enhance patient outcomes. The study included 3587 patients with a median age of 67 (0-102). It utilised an initial dataset comprising 10,000 CXRs randomly divided into a training subset (4000 optimal and 4000 suboptimal) and a validation subset (400 optimal and 400 suboptimal). The test subset (25 optimal and 25 suboptimal) was curated from the remaining images to provide adequate variation. Neural networks DenseNet121 and YOLOv8 were chosen due to their capabilities in image classification. DenseNet121 is a robust, well-tested model in the medical industry with high accuracy in object recognition. YOLOv8 is a cutting-edge commercial model targeted at all industries. Their performance was assessed via the area under the receiver operating curve (AUROC) and compared to radiologist classification, utilising the chi-squared test. DenseNet121 attained an AUROC of 0.97, while YOLOv8 recorded a score of 0.95, indicating a strong capability in differentiating between optimal and suboptimal CXRs. The alignment between radiologists and models exhibited variability, partly due to the lack of clinical indications. However, the performance was not statistically significant. Both AI models effectively classified chest X-ray quality, demonstrating the potential for providing radiographers with feedback to improve image quality. Notably, this was the first study to include both PA and lateral CXRs as well as paediatric cases and the first to evaluate YOLOv8 for this application.

X-Ray Classification Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Establishment of an interpretable MRI radiomics-based machine learning model capable of predicting axillary lymph node metastasis in invasive breast cancer.

Zhang D, Shen M, Zhang L, He X, Huang X

•papers•Jul 18 2025

This study sought to develop a radiomics model capable of predicting axillary lymph node metastasis (ALNM) in patients with invasive breast cancer (IBC) based on dual-sequence magnetic resonance imaging(MRI) of diffusion-weighted imaging (DWI) and dynamic contrast enhancement (DCE) data. The interpretability of the resultant model was probed with the SHAP (Shapley Additive Explanations) method. Established inclusion/exclusion criteria were used to retrospectively compile MRI and matching clinical data from 183 patients with pathologically confirmed IBC from our hospital evaluated between June 2021 and December 2023. All of these patients had undergone plain and enhanced MRI scans prior to treatment. These patients were separated according to their pathological biopsy results into those with ALNM (n = 107) and those without ALNM (n = 76). These patients were then randomized into training (n = 128) and testing (n = 55) cohorts at a 7:3 ratio. Optimal radiomics features were selected from the extracted data. The random forest method was used to establish three predictive models (DWI, DCE, and combined DWI + DCE sequence models). Area under the curve (AUC) values for receiver operating characteristic (ROC) curves were utilized to assess model performance. The DeLong test was utilized to compare model predictive efficacy. Model discrimination was assessed based on the integrated discrimination improvement (IDI) method. Decision curves revealed net clinical benefits for each of these models. The SHAP method was used to achieve the best model interpretability. Clinicopathological characteristics (age, menopausal status, molecular subtypes, and estrogen receptor, progesterone receptor, human epidermal growth factor receptor 2, and Ki-67 status) were comparable when comparing the ALNM and non-ALNM groups as well as the training and testing cohorts (P > 0.05). AUC values for the DWI, DCE, and combined models in the training cohort were 0.793, 0.774, and 0.864, respectively, with corresponding values of 0.728, 0.760, and 0.859 in the testing cohort. The predictive efficacy of the DWI and combined models was found to differ significantly according to the DeLong test, as did the predictive efficacy of the DCE and combined models in the training groups (P < 0.05), while no other significant differences were noted in model performance (P > 0.05). IDI results indicated that the combined model offered predictive power levels that were 13.5% (P < 0.05) and 10.2% (P < 0.05) higher than those for the respective DWI and DCE models. In a decision curve analysis, the combined model offered a net clinical benefit over the DCE model. The combined dual-sequence MRI-based radiomics model constructed herein and the supporting interpretability analyses can aid in the prediction of the ALNM status of IBC patients, helping to guide clinical decision-making in these cases.

MRI Classification Breast Retrospective Clinical In Silico Academic Lab Benchmark SOTA Ethics

Depthwise-Dilated Convolutional Adapters for Medical Object Tracking and Segmentation Using the Segment Anything Model 2

Artificial intelligence-based models for quantification of intra-pancreatic fat deposition and their clinical relevance: a systematic review of imaging studies.

QUTCC: Quantile Uncertainty Training and Conformal Calibration for Imaging Inverse Problems

Performance comparison of medical image classification systems using TensorFlow Keras, PyTorch, and JAX

Benchmarking GANs, Diffusion Models, and Flow Matching for T1w-to-T2w MRI Translation

Explainable CT-based deep learning model for predicting hematoma expansion including intraventricular hemorrhage growth.

SegMamba-V2: Long-range Sequential Modeling Mamba For General 3D Medical Image Segmentation.

AI Prognostication in Nonsmall Cell Lung Cancer: A Systematic Review.

Using Convolutional Neural Networks for the Classification of Suboptimal Chest Radiographs.

Establishment of an interpretable MRI radiomics-based machine learning model capable of predicting axillary lymph node metastasis in invasive breast cancer.

Ready to Sharpen Your Edge?