Latest Papers on Radiology AI. Tags: Benchmark SOTA, Order: Best Match, Limit: 10.

X-ray transferable polyrepresentation learning

Weronika Hryniewska-Guzik, Przemyslaw Biecek

•preprint•Jul 7 2025

The success of machine learning algorithms is inherently related to the extraction of meaningful features, as they play a pivotal role in the performance of these algorithms. Central to this challenge is the quality of data representation. However, the ability to generalize and extract these features effectively from unseen datasets is also crucial. In light of this, we introduce a novel concept: the polyrepresentation. Polyrepresentation integrates multiple representations of the same modality extracted from distinct sources, for example, vector embeddings from the Siamese Network, self-supervised models, and interpretable radiomic features. This approach yields better performance metrics compared to relying on a single representation. Additionally, in the context of X-ray images, we demonstrate the transferability of the created polyrepresentation to a smaller dataset, underscoring its potential as a pragmatic and resource-efficient approach in various image-related solutions. It is worth noting that the concept of polyprepresentation on the example of medical data can also be applied to other domains, showcasing its versatility and broad potential impact.

X-Ray Classification Methodology In Silico Academic Lab Benchmark SOTA

ViTaL: A Multimodality Dataset and Benchmark for Multi-pathological Ovarian Tumor Recognition

You Zhou, Lijiang Chen, Guangxia Cui, Wenpei Bai, Yu Guo, Shuchang Lyu, Guangliang Cheng, Qi Zhao

•preprint•Jul 6 2025

Ovarian tumor, as a common gynecological disease, can rapidly deteriorate into serious health crises when undetected early, thus posing significant threats to the health of women. Deep neural networks have the potential to identify ovarian tumors, thereby reducing mortality rates, but limited public datasets hinder its progress. To address this gap, we introduce a vital ovarian tumor pathological recognition dataset called \textbf{ViTaL} that contains \textbf{V}isual, \textbf{T}abular and \textbf{L}inguistic modality data of 496 patients across six pathological categories. The ViTaL dataset comprises three subsets corresponding to different patient data modalities: visual data from 2216 two-dimensional ultrasound images, tabular data from medical examinations of 496 patients, and linguistic data from ultrasound reports of 496 patients. It is insufficient to merely distinguish between benign and malignant ovarian tumors in clinical practice. To enable multi-pathology classification of ovarian tumor, we propose a ViTaL-Net based on the Triplet Hierarchical Offset Attention Mechanism (THOAM) to minimize the loss incurred during feature fusion of multi-modal data. This mechanism could effectively enhance the relevance and complementarity between information from different modalities. ViTaL-Net serves as a benchmark for the task of multi-pathology, multi-modality classification of ovarian tumors. In our comprehensive experiments, the proposed method exhibited satisfactory performance, achieving accuracies exceeding 90\% on the two most common pathological types of ovarian tumor and an overall performance of 85\%. Our dataset and code are available at https://github.com/GGbond-study/vitalnet.

Ultrasound Classification Abdominal Dataset Release In Silico Academic Lab Open Dataset Open Code Benchmark SOTA

FB-Diff: Fourier Basis-guided Diffusion for Temporal Interpolation of 4D Medical Imaging

Xin You, Runze Yang, Chuyan Zhang, Zhongliang Jiang, Jie Yang, Nassir Navab

•preprint•Jul 6 2025

The temporal interpolation task for 4D medical imaging, plays a crucial role in clinical practice of respiratory motion modeling. Following the simplified linear-motion hypothesis, existing approaches adopt optical flow-based models to interpolate intermediate frames. However, realistic respiratory motions should be nonlinear and quasi-periodic with specific frequencies. Intuited by this property, we resolve the temporal interpolation task from the frequency perspective, and propose a Fourier basis-guided Diffusion model, termed FB-Diff. Specifically, due to the regular motion discipline of respiration, physiological motion priors are introduced to describe general characteristics of temporal data distributions. Then a Fourier motion operator is elaborately devised to extract Fourier bases by incorporating physiological motion priors and case-specific spectral information in the feature space of Variational Autoencoder. Well-learned Fourier bases can better simulate respiratory motions with motion patterns of specific frequencies. Conditioned on starting and ending frames, the diffusion model further leverages well-learned Fourier bases via the basis interaction operator, which promotes the temporal interpolation task in a generative manner. Extensive results demonstrate that FB-Diff achieves state-of-the-art (SOTA) perceptual performance with better temporal consistency while maintaining promising reconstruction metrics. Codes are available.

CT Reconstruction Chest Methodology In Silico Academic Lab Benchmark SOTA Open Code

A CT-Based Deep Learning Radiomics Nomogram for Early Recurrence Prediction in Pancreatic Cancer: A Multicenter Study.

Guan X, Liu J, Xu L, Jiang W, Wang C

•papers•Jul 6 2025

Early recurrence (ER) following curative-intent surgery remains a major obstacle to improving long-term outcomes in patients with pancreatic cancer (PC). The accurate preoperative prediction of ER could significantly aid clinical decision-making and guide postoperative management. A retrospective cohort of 493 patients with histologically confirmed PC who underwent resection was analyzed. Contrast-enhanced computed tomography (CT) images were used for tumor segmentation, followed by radiomics and deep learning feature extraction. In total, four distinct feature selection algorithms were employed. Predictive models were constructed using random forest (RF) and support vector machine (SVM) classifiers. The model performance was evaluated by the area under the receiver operating characteristic curve (AUC). A comprehensive nomogram integrating feature scores and clinical factors was developed and validated. Among all of the constructed models, the Inte-SVM demonstrated superior classification performance. The nomogram, incorporating the Inte-feature score, CT-assessed lymph node status, and carbohydrate antigen 19-9 (CA19-9), yielded excellent predictive accuracy in the validation cohort (AUC = 0.920). Calibration curves showed strong agreement between predicted and observed outcomes, and decision curve analysis confirmed the clinical utility of the nomogram. A CT-based deep learning radiomics nomogram enabled the accurate preoperative prediction of early recurrence in patients with pancreatic cancer. This model may serve as a valuable tool to assist clinicians in tailoring postoperative strategies and promoting personalized therapeutic approaches.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Artificial Intelligence in Prenatal Ultrasound: A Systematic Review of Diagnostic Tools for Detecting Congenital Anomalies

Dunne, J., Kumarasamy, C., Belay, D. G., Betran, A. P., Gebremedhin, A. T., Mengistu, S., Nyadanu, S. D., Roy, A., Tessema, G., Tigest, T., Pereira, G.

•preprint•Jul 5 2025

BackgroundArtificial intelligence (AI) has potentially shown promise in interpreting ultrasound imaging through flexible pattern recognition and algorithmic learning, but implementation in clinical practice remains limited. This study aimed to investigate the current application of AI in prenatal ultrasounds to identify congenital anomalies, and to synthesise challenges and opportunities for the advancement of AI-assisted ultrasound diagnosis. This comprehensive analysis addresses the clinical translation gap between AI performance metrics and practical implementation in prenatal care. MethodsSystematic searches were conducted in eight electronic databases (CINAHL Plus, Ovid/EMBASE, Ovid/MEDLINE, ProQuest, PubMed, Scopus, Web of Science and Cochrane Library) and Google Scholar from inception to May 2025. Studies were included if they applied an AI-assisted ultrasound diagnostic tool to identify a congenital anomaly during pregnancy. This review adhered to PRISMA guidelines for systematic reviews. We evaluated study quality using the Checklist for Artificial Intelligence in Medical Imaging (CLAIM) guidelines. FindingsOf 9,918 records, 224 were identified for full-text review and 20 met the inclusion criteria. The majority of studies (11/20, 55%) were conducted in China, with most published after 2020 (16/20, 80%). All AI models were developed as an assistive tool for anomaly detection or classification. Most models (85%) focused on single-organ systems: heart (35%), brain/cranial (30%), or facial features (20%), while three studies (15%) attempted multi-organ anomaly detection. Fifty percent of the included studies reported exceptionally high model performance, with both sensitivity and specificity exceeding 0.95, with AUC-ROC values ranging from 0.91 to 0.97. Most studies (75%) lacked external validation, with internal validation often limited to small training and testing datasets. InterpretationWhile AI applications in prenatal ultrasound showed potential, current evidence indicates significant limitations in their practical implementation. Much work is required to optimise their application, including the external validation of diagnostic models with clinical utility to have real-world implications. Future research should prioritise larger-scale multi-centre studies, developing multi-organ anomaly detection capabilities rather than the current single-organ focus, and robust evaluation of AI tools in real-world clinical settings.

Ultrasound Detection Review In Silico Academic Lab Benchmark SOTA

Bridging Vision and Language: Optimal Transport-Driven Radiology Report Generation via LLMs

Haifeng Zhao, Yufei Zhang, Leilei Ma, Shuo Xu, Dengdi Sun

•preprint•Jul 5 2025

Radiology report generation represents a significant application within medical AI, and has achieved impressive results. Concurrently, large language models (LLMs) have demonstrated remarkable performance across various domains. However, empirical validation indicates that general LLMs tend to focus more on linguistic fluency rather than clinical effectiveness, and lack the ability to effectively capture the relationship between X-ray images and their corresponding texts, thus resulting in poor clinical practicability. To address these challenges, we propose Optimal Transport-Driven Radiology Report Generation (OTDRG), a novel framework that leverages Optimal Transport (OT) to align image features with disease labels extracted from reports, effectively bridging the cross-modal gap. The core component of OTDRG is Alignment \& Fine-Tuning, where OT utilizes results from the encoding of label features and image visual features to minimize cross-modal distances, then integrating image and text features for LLMs fine-tuning. Additionally, we design a novel disease prediction module to predict disease labels contained in X-ray images during validation and testing. Evaluated on the MIMIC-CXR and IU X-Ray datasets, OTDRG achieves state-of-the-art performance in both natural language generation (NLG) and clinical efficacy (CE) metrics, delivering reports that are not only linguistically coherent but also clinically accurate.

X-Ray Report Generation Chest Methodology In Silico GenAI Benchmark SOTA

Quantitative CT Imaging in Chronic Obstructive Pulmonary Disease.

Park S, Lee SM, Hwang HJ, Oh SY, Choe J, Seo JB

•papers•Jul 4 2025

Chronic obstructive pulmonary disease (COPD) is a highly heterogeneous condition characterized by diverse pulmonary and extrapulmonary manifestations. Efforts to quantify its various components using CT imaging have advanced, aiming for more precise, objective, and reproducible assessment and management. Beyond emphysema and small airway disease, the two major components of COPD, CT quantification enables the evaluation of pulmonary vascular alteration, ventilation-perfusion mismatches, fissure completeness, and extrapulmonary features such as altered body composition, osteoporosis, and atherosclerosis. Recent advancements, including the application of deep learning techniques, have facilitated fully automated segmentation and quantification of CT parameters, while innovations such as image standardization hold promise for enhancing clinical applicability. Numerous studies have reported associations between quantitative CT parameters and clinical or physiologic outcomes in patients with COPD. However, barriers remain to the routine implementation of these technologies in clinical practice. This review highlights recent research on COPD quantification, explores advances in technology, and also discusses current challenges and potential solutions for improving quantification methods.

CT Segmentation Chest Review In Silico Academic Lab Benchmark SOTA GenAI

Novel CAC Dispersion and Density Score to Predict Myocardial Infarction and Cardiovascular Mortality.

Huangfu G, Ihdayhid AR, Kwok S, Konstantopoulos J, Niu K, Lu J, Smallbone H, Figtree GA, Chow CK, Dembo L, Adler B, Hamilton-Craig C, Grieve SM, Chan MTV, Butler C, Tandon V, Nagele P, Woodard PK, Mrkobrada M, Szczeklik W, Aziz YFA, Biccard B, Devereaux PJ, Sheth T, Dwivedi G, Chow BJW

•papers•Jul 4 2025

Coronary artery calcification (CAC) provides robust prediction for major adverse cardiovascular events (MACE), but current techniques disregard plaque distribution and protective effects of high CAC density. We investigated whether a novel CAC-dispersion and density (CAC-DAD) score will exhibit superior prognostic value compared with the Agatston score (AS) for MACE prediction. We conducted a multicenter, retrospective, cross-sectional study of 961 patients (median age, 67 years; 61% male) who underwent cardiac computed tomography for cardiovascular or perioperative risk assessment. Blinded analyzers applied deep learning algorithms to noncontrast scans to calculate the CAC-DAD score, which adjusts for the spatial distribution of CAC and assigns a protective weight factor for lesions with ≥1000 Hounsfield units. Associations were assessed using frailty regression. Over a median follow-up of 30 (30-460) days, 61 patients experienced MACE (nonfatal myocardial infarction or cardiovascular mortality). An elevated CAC-DAD score (≥2050 based on optimal cutoff) captured more MACE than AS ≥400 (74% versus 57%; P=0.002). Univariable analysis revealed that an elevated CAC-DAD score, AS ≥400 and AS ≥100, age, diabetes, hypertension, and statin use predicted MACE. On multivariable analysis, only the CAC-DAD score (hazard ratio, 2.57 [95% CI, 1.43-4.61]; P=0.002), age, statins, and diabetes remained significant. The inclusion of the CAC-DAD score in a predictive model containing demographic factors and AS improved the C statistic from 0.61 to 0.66 (P=0.008). The fully automated CAC-DAD score improves MACE prediction compared with the AS. Patients with a high CAC-DAD score, including those with a low AS, may be at higher risk and warrant intensification of their preventative therapies.

CT Segmentation Cardiac Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Causal-SAM-LLM: Large Language Models as Causal Reasoners for Robust Medical Segmentation

Tao Tang, Shijie Xu, Yiting Wu, Zhixiang Lu

•preprint•Jul 4 2025

The clinical utility of deep learning models for medical image segmentation is severely constrained by their inability to generalize to unseen domains. This failure is often rooted in the models learning spurious correlations between anatomical content and domain-specific imaging styles. To overcome this fundamental challenge, we introduce Causal-SAM-LLM, a novel framework that elevates Large Language Models (LLMs) to the role of causal reasoners. Our framework, built upon a frozen Segment Anything Model (SAM) encoder, incorporates two synergistic innovations. First, Linguistic Adversarial Disentanglement (LAD) employs a Vision-Language Model to generate rich, textual descriptions of confounding image styles. By training the segmentation model's features to be contrastively dissimilar to these style descriptions, it learns a representation robustly purged of non-causal information. Second, Test-Time Causal Intervention (TCI) provides an interactive mechanism where an LLM interprets a clinician's natural language command to modulate the segmentation decoder's features in real-time, enabling targeted error correction. We conduct an extensive empirical evaluation on a composite benchmark from four public datasets (BTCV, CHAOS, AMOS, BraTS), assessing generalization under cross-scanner, cross-modality, and cross-anatomy settings. Causal-SAM-LLM establishes a new state of the art in out-of-distribution (OOD) robustness, improving the average Dice score by up to 6.2 points and reducing the Hausdorff Distance by 15.8 mm over the strongest baseline, all while using less than 9% of the full model's trainable parameters. Our work charts a new course for building robust, efficient, and interactively controllable medical AI systems.

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA

SAMed-2: Selective Memory Enhanced Medical Segment Anything Model

Zhiling Yan, Sifan Song, Dingjie Song, Yiwei Li, Rong Zhou, Weixiang Sun, Zhennong Chen, Sekeun Kim, Hui Ren, Tianming Liu, Quanzheng Li, Xiang Li, Lifang He, Lichao Sun

•preprint•Jul 4 2025

Recent "segment anything" efforts show promise by learning from large-scale data, but adapting such models directly to medical images remains challenging due to the complexity of medical data, noisy annotations, and continual learning requirements across diverse modalities and anatomical structures. In this work, we propose SAMed-2, a new foundation model for medical image segmentation built upon the SAM-2 architecture. Specifically, we introduce a temporal adapter into the image encoder to capture image correlations and a confidence-driven memory mechanism to store high-certainty features for later retrieval. This memory-based strategy counters the pervasive noise in large-scale medical datasets and mitigates catastrophic forgetting when encountering new tasks or modalities. To train and evaluate SAMed-2, we curate MedBank-100k, a comprehensive dataset spanning seven imaging modalities and 21 medical segmentation tasks. Our experiments on both internal benchmarks and 10 external datasets demonstrate superior performance over state-of-the-art baselines in multi-task scenarios. The code is available at: https://github.com/ZhilingYan/Medical-SAM-Bench.

Mixed Modality Segmentation Methodology In Silico Academic Lab Open Code Open Dataset Benchmark SOTA

X-ray transferable polyrepresentation learning

ViTaL: A Multimodality Dataset and Benchmark for Multi-pathological Ovarian Tumor Recognition

FB-Diff: Fourier Basis-guided Diffusion for Temporal Interpolation of 4D Medical Imaging

A CT-Based Deep Learning Radiomics Nomogram for Early Recurrence Prediction in Pancreatic Cancer: A Multicenter Study.

Artificial Intelligence in Prenatal Ultrasound: A Systematic Review of Diagnostic Tools for Detecting Congenital Anomalies

Bridging Vision and Language: Optimal Transport-Driven Radiology Report Generation via LLMs

Quantitative CT Imaging in Chronic Obstructive Pulmonary Disease.

Novel CAC Dispersion and Density Score to Predict Myocardial Infarction and Cardiovascular Mortality.

Causal-SAM-LLM: Large Language Models as Causal Reasoners for Robust Medical Segmentation

SAMed-2: Selective Memory Enhanced Medical Segment Anything Model

Ready to Sharpen Your Edge?