Latest Papers on Radiology AI. Tags: Benchmark SOTA

Sex-specific body fat distribution predicts cardiovascular ageing.

Losev V, Lu C, Tahasildar S, Senevirathne DS, Inglese P, Bai W, King AP, Shah M, de Marvao A, O'Regan DP

•papers•Aug 22 2025

Cardiovascular ageing is a progressive loss of physiological reserve, modified by environmental and genetic risk factors, that contributes to multi-morbidity due to accumulated damage across diverse cell types, tissues, and organs. Obesity is implicated in premature ageing, but the effect of body fat distribution in humans is unknown. This study determined the influence of sex-dependent fat phenotypes on human cardiovascular ageing. Data from 21 241 participants in the UK Biobank were analysed. Machine learning was used to predict cardiovascular age from 126 image-derived traits of vascular function, cardiac motion, and myocardial fibrosis. An age-delta was calculated as the difference between predicted age and chronological age. The volume and distribution of body fat was assessed from whole-body imaging. The association between fat phenotypes and cardiovascular age-delta was assessed using multivariable linear regression with age and sex as co-covariates, reporting β coefficients with 95% confidence intervals (CI). Two-sample Mendelian randomization was used to assess causal associations. Visceral adipose tissue volume [β = 0.656, (95% CI, .537-.775), P < .0001], muscle adipose tissue infiltration [β = 0.183, (95% CI, .122-.244), P = .0003], and liver fat fraction [β = 1.066, (95% CI .835-1.298), P < .0001] were the strongest predictors of increased cardiovascular age-delta for both sexes. Abdominal subcutaneous adipose tissue volume [β = 0.432, (95% CI, .269-.596), P < .0001] and android fat mass [β = 0.983, (95% CI, .64-1.326), P < .0001] were each associated with increased age-delta only in males. Genetically predicted gynoid fat showed an association with decreased age-delta. Shared and sex-specific patterns of body fat are associated with both protective and harmful changes in cardiovascular ageing, highlighting adipose tissue distribution and function as a key target for interventions to extend healthy lifespan.

Mixed Modality Registration Whole Body Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Deep Learning-based Automated Coronary Plaque Quantification: First Demonstration With Ultra-high Resolution Photon-counting Detector CT at Different Temporal Resolutions.

Klambauer K, Burger SD, Demmert TT, Mergen V, Moser LJ, Gulsun MA, Schöbinger M, Schwemmer C, Wels M, Allmendinger T, Eberhard M, Alkadhi H, Schmidt B

•papers•Aug 22 2025

The aim of this study was to evaluate the feasibility and reproducibility of a novel deep learning (DL)-based coronary plaque quantification tool with automatic case preparation in patients undergoing ultra-high resolution (UHR) photon-counting detector CT coronary angiography (CCTA), and to assess the influence of temporal resolution on plaque quantification. In this retrospective single-center study, 45 patients undergoing clinically indicated UHR CCTA were included. In each scan, 2 image data sets were reconstructed: one in the dual-source mode with 66 ms temporal resolution and one simulating a single-source mode with 125 ms temporal resolution. A novel, DL-based algorithm for fully automated coronary segmentation and intensity-based plaque quantification was applied to both data sets in each patient. Plaque volume quantification was performed at the vessel-level for the entire left anterior descending artery (LAD), left circumflex artery (CX), and right coronary artery (RCA), as well as at the lesion-level for the largest coronary plaque in each vessel. Diameter stenosis grade was quantified for the coronary lesion with the greatest longitudinal extent in each vessel. To assess reproducibility, the algorithm was rerun 3 times in 10 randomly selected patients, and all outputs were visually reviewed and confirmed by an expert reader. Paired Wilcoxon signed-rank tests with Benjamini-Hochberg correction were used for statistical comparisons. One hundred nineteen out of 135 (88.1%) coronary arteries showed atherosclerotic plaques and were included in the analysis. In the reproducibility analysis, repeated runs of the algorithm yielded identical results across all plaque and lumen measurements (P > 0.999). All outputs were confirmed to be anatomically correct, visually consistent, and did not require manual correction. At the vessel level, total plaque volumes were higher in the 125 ms reconstructions compared with the 66 ms reconstructions in 28 of 45 patients (62%), with both calcified and noncalcified plaque volumes being higher in 32 (71%) and 28 (62%) patients, respectively. Total plaque volumes in the LAD, CX, and RCA were significantly higher in the 125 ms reconstructions (681.3 vs. 647.8  mm3, P < 0.05). At the lesion level, total plaque volumes were higher in the 125 ms reconstructions in 44 of 45 patients (98%; 447.3 vs. 414.9  mm3, P < 0.001), with both calcified and noncalcified plaque volumes being higher in 42 of 45 patients (93%). The median diameter stenosis grades for all vessels were significantly higher in the 125 ms reconstructions (35.4% vs. 28.1%, P < 0.01). This study evaluated a novel DL-based tool with automatic case preparation for quantitative coronary plaque in UHR CCTA data sets. The algorithm was technically robust and reproducible, delivering anatomically consistent outputs not requiring manual correction. Reconstructions with lower temporal resolution (125 ms) systematically overestimated plaque burden compared with higher temporal resolution (66 ms), underscoring that protocol standardization is essential for reliable DL-based plaque quantification.

CT Segmentation Cardiac Retrospective Clinical In Silico Academic Lab Benchmark SOTA

AlzhiNet: Traversing from 2D-CNN to 3D-CNN, Towards Early Detection and Diagnosis of Alzheimer's Disease.

Akindele RG, Adebayo S, Yu M, Kanda PS

•papers•Aug 22 2025

Alzheimer's disease (AD) is a progressive neurodegenerative disorder with increasing prevalence among the ageing population, necessitating early and accurate diagnosis for effective disease management. In this study, we present a novel hybrid deep learning framework, AlzhiNet, that integrates both 2D convolutional neural networks (2D-CNNs) and 3D convolutional neural networks (3D-CNNs), along with a custom loss function and volumetric data augmentation, to enhance feature extraction and improve classification performance in AD diagnosis. According to extensive experiments, AlzhiNet outperforms standalone 2D and 3D models, highlighting the importance of combining these complementary representations of data. The depth and quality of 3D volumes derived from the augmented 2D slices also significantly influence the model's performance. The results indicate that carefully selecting weighting factors in hybrid predictions is imperative for achieving optimal results. Our framework has been validated on the magnetic resonance imaging (MRI) from Kaggle and MIRIAD datasets, obtaining accuracies of 98.9% and 99.99%, respectively, with an AUC of 100%. Furthermore, AlzhiNet was studied under a variety of perturbation scenarios on the Alzheimer's Kaggle dataset, including Gaussian noise, brightness, contrast, salt and pepper noise, color jitter, and occlusion. The results obtained show that AlzhiNet is more robust to perturbations than ResNet-18, making it an excellent choice for real-world applications. This approach represents a promising advancement in the early diagnosis and treatment planning for AD.

MRI Classification Neurological Methodology In Silico Benchmark SOTA

Deep learning ensemble for abdominal aortic calcification scoring from lumbar spine X-ray and DXA images.

Voss A, Suoranta S, Nissinen T, Hurskainen O, Masarwah A, Sund R, Tohka J, Väänänen SP

•papers•Aug 22 2025

Abdominal aortic calcification (AAC) is an independent predictor of cardiovascular diseases (CVDs). AAC is typically detected as an incidental finding in spine scans. Early detection of AAC through opportunistic screening using any available imaging modalities could help identify individuals with a higher risk of developing clinical CVDs. However, AAC is not routinely assessed in clinics, and manual scoring from projection images is time-consuming and prone to inter-rater variability. Also, automated AAC scoring methods exist, but earlier methods have not accounted for the inherent variability in AAC scoring and were developed for a single imaging modality at a time. We propose an automated method for quantifying AAC from lumbar spine X-ray and Dual-energy X-ray Absorptiometry (DXA) images using an ensemble of convolutional neural network models that predicts a distribution of probable AAC scores. We treat AAC score as a normally distributed random variable to account for the variability of manual scoring. The mean and variance of the assumed normal AAC distributions are estimated based on manual annotations, and the models in the ensemble are trained by simulating AAC scores from these distributions. Our proposed ensemble approach successfully extracted AAC scores from both X-ray and DXA images with predicted score distributions demonstrating strong agreement with manual annotations, as evidenced by concordance correlation coefficients of 0.930 for X-ray and 0.912 for DXA. The prediction error between the average estimates of our approach and the average manual annotations was lower than the errors reported previously, highlighting the benefit of incorporating uncertainty in AAC scoring.

Mixed Modality Classification Abdominal Methodology In Silico Academic Lab Benchmark SOTA

A Disease-Centric Vision-Language Foundation Model for Precision Oncology in Kidney Cancer

Yuhui Tao, Zhongwei Zhao, Zilong Wang, Xufang Luo, Feng Chen, Kang Wang, Chuanfu Wu, Xue Zhang, Shaoting Zhang, Jiaxi Yao, Xingwei Jin, Xinyang Jiang, Yifan Yang, Dongsheng Li, Lili Qiu, Zhiqiang Shao, Jianming Guo, Nengwang Yu, Shuo Wang, Ying Xiong

•preprint•Aug 22 2025

The non-invasive assessment of increasingly incidentally discovered renal masses is a critical challenge in urologic oncology, where diagnostic uncertainty frequently leads to the overtreatment of benign or indolent tumors. In this study, we developed and validated RenalCLIP using a dataset of 27,866 CT scans from 8,809 patients across nine Chinese medical centers and the public TCIA cohort, a visual-language foundation model for characterization, diagnosis and prognosis of renal mass. The model was developed via a two-stage pre-training strategy that first enhances the image and text encoders with domain-specific knowledge before aligning them through a contrastive learning objective, to create robust representations for superior generalization and diagnostic precision. RenalCLIP achieved better performance and superior generalizability across 10 core tasks spanning the full clinical workflow of kidney cancer, including anatomical assessment, diagnostic classification, and survival prediction, compared with other state-of-the-art general-purpose CT foundation models. Especially, for complicated task like recurrence-free survival prediction in the TCIA cohort, RenalCLIP achieved a C-index of 0.726, representing a substantial improvement of approximately 20% over the leading baselines. Furthermore, RenalCLIP's pre-training imparted remarkable data efficiency; in the diagnostic classification task, it only needs 20% training data to achieve the peak performance of all baseline models even after they were fully fine-tuned on 100% of the data. Additionally, it achieved superior performance in report generation, image-text retrieval and zero-shot diagnosis tasks. Our findings establish that RenalCLIP provides a robust tool with the potential to enhance diagnostic accuracy, refine prognostic stratification, and personalize the management of patients with kidney cancer.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Spatial imaging features derived from SUVmax location in resectable NSCLC are associated with tumor aggressiveness.

Jiang Z, Spielvogel C, Haberl D, Yu J, Krisch M, Szakall S, Molnar P, Fillinger J, Horvath L, Renyi-Vamos F, Aigner C, Dome B, Lang C, Megyesfalvi Z, Kenner L, Hacker M

•papers•Aug 21 2025

Accurate non-invasive prediction of histopathologic invasiveness and recurrence risk remains a clinical challenge in resectable non-small cell lung cancer (NSCLC). We developed and validated the Edge Proximity Score (EPS), a novel [<sup>18</sup>F]FDG PET/CT-based spatial imaging feature that quantifies the displacement of SUVmax relative to the tumor centroid and perimeter, to assess tumor aggressiveness and predict progression-free survival (PFS). This retrospective study included 244 NSCLC patients with preoperative [<sup>18</sup>F]FDG PET/CT. EPS was computed from normalized SUVmax-to-centroid and SUVmax-to-perimeter distances. A total of 115 PET radiomics features were extracted and standardized. Eight machine learning models (80:20 split) were trained to predict lymphovascular invasion (LVI), visceral pleural invasion (VPI), and spread through air spaces (STAS), with feature importance assessed using SHAP. Prognostic analysis was conducted using multivariable Cox regression. A survival prediction model incorporating EPS was externally validated in the TCIA cohort. RNA sequencing data from 76 TCIA patients were used for transcriptomic and immune profiling. EPS was significantly elevated in tumors with LVI, VPI, and STAS (P < 0.001), consistently ranked among the top SHAP features, and was an independent predictor of PFS (HR = 2.667, P = 0.015). The EPS-based nomogram achieved AUCs of 0.67, 0.70, and 0.68 for predicting 1-, 3-, and 5-year PFS in the TCIA validation cohort. High EPS was associated with proliferative and metabolic gene signatures, whereas low EPS was linked to immune activation and neutrophil infiltration. EPS is a biologically relevant, non-invasive imaging biomarker that may improve risk stratification in NSCLC.

PET Classification Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Combined use of two artificial intelligence-based algorithms for mammography triaging: a retrospective simulation study.

Kim HJ, Kim HH, Eom HJ, Choi WJ, Chae EY, Shin HJ, Cha JH

•papers•Aug 21 2025

To evaluate triaging scenarios involving two commercial AI algorithms to enhance mammography interpretation and reduce workload. A total of 3012 screening or diagnostic mammograms, including 213 cancer cases, were analyzed using two AI algorithms (AI-1, AI-2) and categorized as "high-risk" (top 10%), "minimal-risk" (bottom 20%), or "indeterminate" based on malignancy likelihood. Five triaging scenarios of combined AI use (Sensitive, Specific, Conservative, Sequential Modes A and B) determined whether cases would be autonomously recalled, classified as negative, or referred for radiologist interpretation. Sensitivity, specificity, number of mammograms requiring review, and abnormal interpretation rate (AIR) were compared against single AIs and manual reading using McNemar's test. Sensitive Mode achieved 84% sensitivity, outperforming single AI (p = 0.03 [AI-1], 0.01 [AI-2]) and manual reading (p = 0.03), with an 18.3% reduction in mammograms requiring review (AIR, 23.3%). Specific Mode achieved 87.7% specificity, exceeding single AI (p < 0.001 [AI-1, AI-2]) and comparable to manual reading (p = 0.37), with a 41.7% reduction in mammograms requiring review (AIR, 17%). Conservative and Sequential Modes A and B achieved sensitivities of 82.2%, 80.8%, and 80.3%, respectively, comparable to single AI or manual reading (p > 0.05, all), with reductions of 9.8%, 49.8%, and 49.8% in mammograms requiring review (AIRs, 18.6%, 21.6%, 21.7%). Combining two AI algorithms improved sensitivity or specificity in mammography interpretation while reducing mammograms requiring radiologist review in this cancer-enriched dataset from a tertiary center. Scenario selection should consider clinical needs and requires validation in a screening population. Question AI algorithms have the potential to improve workflow efficiency by triaging mammograms. Combining algorithms trained under different conditions may offer synergistic benefits. Findings The combined use of two commercial AI algorithms for triaging mammograms improved sensitivity or specificity, depending on the scenario, while also reducing mammograms requiring radiologist review. Clinical relevance Integrating two commercial AI algorithms could enhance mammography interpretation over using a single AI for triaging or manual reading.

Mammography Triage Breast Retrospective Clinical In Silico Academic Lab Benchmark SOTA

LGMSNet: Thinning a medical image segmentation model via dual-level multiscale fusion

Chengqi Dong, Fenghe Tang, Rongge Mao, Xinpei Gao, S. Kevin Zhou

•preprint•Aug 21 2025

Medical image segmentation plays a pivotal role in disease diagnosis and treatment planning, particularly in resource-constrained clinical settings where lightweight and generalizable models are urgently needed. However, existing lightweight models often compromise performance for efficiency and rarely adopt computationally expensive attention mechanisms, severely restricting their global contextual perception capabilities. Additionally, current architectures neglect the channel redundancy issue under the same convolutional kernels in medical imaging, which hinders effective feature extraction. To address these challenges, we propose LGMSNet, a novel lightweight framework based on local and global dual multiscale that achieves state-of-the-art performance with minimal computational overhead. LGMSNet employs heterogeneous intra-layer kernels to extract local high-frequency information while mitigating channel redundancy. In addition, the model integrates sparse transformer-convolutional hybrid branches to capture low-frequency global information. Extensive experiments across six public datasets demonstrate LGMSNet's superiority over existing state-of-the-art methods. In particular, LGMSNet maintains exceptional performance in zero-shot generalization tests on four unseen datasets, underscoring its potential for real-world deployment in resource-limited medical scenarios. The whole project code is in https://github.com/cq-dong/LGMSNet.

Segmentation Methodology In Silico Open Code Benchmark SOTA

DoSReMC: Domain Shift Resilient Mammography Classification using Batch Normalization Adaptation

Uğurcan Akyüz, Deniz Katircioglu-Öztürk, Emre K. Süslü, Burhan Keleş, Mete C. Kaya, Gamze Durhan, Meltem G. Akpınar, Figen B. Demirkazık, Gözde B. Akar

•preprint•Aug 21 2025

Numerous deep learning-based solutions have been developed for the automatic recognition of breast cancer using mammography images. However, their performance often declines when applied to data from different domains, primarily due to domain shift - the variation in data distributions between source and target domains. This performance drop limits the safe and equitable deployment of AI in real-world clinical settings. In this study, we present DoSReMC (Domain Shift Resilient Mammography Classification), a batch normalization (BN) adaptation framework designed to enhance cross-domain generalization without retraining the entire model. Using three large-scale full-field digital mammography (FFDM) datasets - including HCTP, a newly introduced, pathologically confirmed in-house dataset - we conduct a systematic cross-domain evaluation with convolutional neural networks (CNNs). Our results demonstrate that BN layers are a primary source of domain dependence: they perform effectively when training and testing occur within the same domain, and they significantly impair model generalization under domain shift. DoSReMC addresses this limitation by fine-tuning only the BN and fully connected (FC) layers, while preserving pretrained convolutional filters. We further integrate this targeted adaptation with an adversarial training scheme, yielding additional improvements in cross-domain generalizability. DoSReMC can be readily incorporated into existing AI pipelines and applied across diverse clinical environments, providing a practical pathway toward more robust and generalizable mammography classification systems.

Mammography Classification Breast Methodology In Silico Academic Lab Benchmark SOTA

TPA: Temporal Prompt Alignment for Fetal Congenital Heart Defect Classification

Darya Taratynova, Alya Almsouti, Beknur Kalmakhanbet, Numan Saeed, Mohammad Yaqub

•preprint•Aug 21 2025

Congenital heart defect (CHD) detection in ultrasound videos is hindered by image noise and probe positioning variability. While automated methods can reduce operator dependence, current machine learning approaches often neglect temporal information, limit themselves to binary classification, and do not account for prediction calibration. We propose Temporal Prompt Alignment (TPA), a method leveraging foundation image-text model and prompt-aware contrastive learning to classify fetal CHD on cardiac ultrasound videos. TPA extracts features from each frame of video subclips using an image encoder, aggregates them with a trainable temporal extractor to capture heart motion, and aligns the video representation with class-specific text prompts via a margin-hinge contrastive loss. To enhance calibration for clinical reliability, we introduce a Conditional Variational Autoencoder Style Modulation (CVAESM) module, which learns a latent style vector to modulate embeddings and quantifies classification uncertainty. Evaluated on a private dataset for CHD detection and on a large public dataset, EchoNet-Dynamic, for systolic dysfunction, TPA achieves state-of-the-art macro F1 scores of 85.40% for CHD diagnosis, while also reducing expected calibration error by 5.38% and adaptive ECE by 6.8%. On EchoNet-Dynamic's three-class task, it boosts macro F1 by 4.73% (from 53.89% to 58.62%). Temporal Prompt Alignment (TPA) is a framework for fetal congenital heart defect (CHD) classification in ultrasound videos that integrates temporal modeling, prompt-aware contrastive learning, and uncertainty quantification.

Ultrasound Classification Cardiac Methodology In Silico Benchmark SOTA

Filter Papers

Tags

Sex-specific body fat distribution predicts cardiovascular ageing.

Deep Learning-based Automated Coronary Plaque Quantification: First Demonstration With Ultra-high Resolution Photon-counting Detector CT at Different Temporal Resolutions.

AlzhiNet: Traversing from 2D-CNN to 3D-CNN, Towards Early Detection and Diagnosis of Alzheimer's Disease.

Deep learning ensemble for abdominal aortic calcification scoring from lumbar spine X-ray and DXA images.

A Disease-Centric Vision-Language Foundation Model for Precision Oncology in Kidney Cancer

Spatial imaging features derived from SUVmax location in resectable NSCLC are associated with tumor aggressiveness.

Combined use of two artificial intelligence-based algorithms for mammography triaging: a retrospective simulation study.

LGMSNet: Thinning a medical image segmentation model via dual-level multiscale fusion

DoSReMC: Domain Shift Resilient Mammography Classification using Batch Normalization Adaptation

TPA: Temporal Prompt Alignment for Fetal Congenital Heart Defect Classification

Ready to Sharpen Your Edge?