Latest Papers on Radiology AI. Tags: Benchmark SOTA

Towards more reliable prostate cancer detection: Incorporating clinical data and uncertainty in MRI deep learning.

Taguelmimt K, Andrade-Miranda G, Harb H, Thanh TT, Dang HP, Malavaud B, Bert J

•papers•Jun 11 2025

Prostate cancer (PCa) is one of the most common cancers among men, and artificial intelligence (AI) is emerging as a promising tool to enhance its diagnosis. This work proposes a classification approach for PCa cases using deep learning techniques. We conducted a comparison between unimodal models based either on biparametric magnetic resonance imaging (bpMRI) or clinical data (such as prostate-specific antigen levels, prostate volume, and age). We also introduced a bimodal model that simultaneously integrates imaging and clinical data to address the limitations of unimodal approaches. Furthermore, we propose a framework that not only detects the presence of PCa but also evaluates the uncertainty associated with the predictions. This approach makes it possible to identify highly confident predictions and distinguish them from those characterized by uncertainty, thereby enhancing the reliability and applicability of automated medical decisions in clinical practice. The results show that the bimodal model significantly improves performance, with an area under the curve (AUC) reaching 0.82±0.03, a sensitivity of 0.73±0.04, while maintaining high specificity. Uncertainty analysis revealed that the bimodal model produces more confident predictions, with an uncertainty accuracy of 0.85, surpassing the imaging-only model (which is 0.71). This increase in reliability is crucial in a clinical context, where precise and dependable diagnostic decisions are essential for patient care. The integration of clinical data with imaging data in a bimodal model not only improves diagnostic performance but also strengthens the reliability of predictions, making this approach particularly suitable for clinical use.

MRI Classification Abdominal Methodology In Silico Academic Lab Benchmark SOTA

Non-invasive prediction of nuclear grade in renal cell carcinoma using CT-Based radiomics: a systematic review and meta-analysis.

Salimi M, Hajikarimloo B, Vadipour P, Abdolizadeh A, Fayedeh F, Seifi S

•papers•Jun 11 2025

Renal cell carcinoma (RCC) represents the most prevalent malignant neoplasm of the kidney, with a rising global incidence. Tumor nuclear grade is a crucial prognostic factor, guiding treatment decisions, but current histopathological grading via biopsy is invasive and prone to sampling errors. This study aims to assess the diagnostic performance and quality of CT-based radiomics for preoperatively predicting RCC nuclear grade. A comprehensive search was conducted across PubMed, Scopus, Embase, and Web of Science to identify relevant studies up until 19 April 2025. Quality was assessed using the QUADAS-2 and METRICS tools. A bivariate random-effects meta-analysis was performed to evaluate model performance, including sensitivity, specificity, and Area Under the Curve (AUC). Results from separate validation cohorts were pooled, and clinical and combined models were analyzed separately in distinct analyses. A total of 26 studies comprising 1993 individuals in 10 external and 16 internal validation cohorts were included. Meta-analysis of radiomics models showed pooled AUC of 0.88, sensitivity of 0.78, and specificity of 0.82. Clinical and combined (clinical-radiomics) models showed AUCs of 0.73 and 0.86, respectively. QUADAS-2 revealed significant risk of bias in the Index Test and Flow and Timing domains. METRICS scores ranged from 49.7 to 88.4%, with an average of 66.65%, indicating overall good quality, though gaps in some aspects of study methodologies were identified. This study suggests that radiomics models show great potential and diagnostic accuracy for non-invasive preoperative nuclear grading of RCC. However, challenges related to generalizability and clinical applicability remain, as further research with standardized methodologies, external validation, and larger cohorts is needed to enhance their reliability and integration into routine clinical practice.

CT Classification Abdominal Meta Analysis In Silico Academic Lab Benchmark SOTA

Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report.

Li R, Mao S, Zhu C, Yang Y, Tan C, Li L, Mu X, Liu H, Yang Y

•papers•Jun 11 2025

The rapid advancements in natural language processing, particularly the development of large language models (LLMs), have opened new avenues for managing complex clinical text data. However, the inherent complexity and specificity of medical texts present significant challenges for the practical application of prompt engineering in diagnostic tasks. This paper explores LLMs with new prompt engineering technology to enhance model interpretability and improve the prediction performance of pulmonary disease based on a traditional deep learning model. A retrospective dataset including 2965 chest CT radiology reports was constructed. The reports were from 4 cohorts, namely, healthy individuals and patients with pulmonary tuberculosis, lung cancer, and pneumonia. Then, a novel prompt engineering strategy that integrates feature summarization (F-Sum), chain of thought (CoT) reasoning, and a hybrid retrieval-augmented generation (RAG) framework was proposed. A feature summarization approach, leveraging term frequency-inverse document frequency (TF-IDF) and K-means clustering, was used to extract and distill key radiological findings related to 3 diseases. Simultaneously, the hybrid RAG framework combined dense and sparse vector representations to enhance LLMs' comprehension of disease-related text. In total, 3 state-of-the-art LLMs, GLM-4-Plus, GLM-4-air (Zhipu AI), and GPT-4o (OpenAI), were integrated with the prompt strategy to evaluate the efficiency in recognizing pneumonia, tuberculosis, and lung cancer. The traditional deep learning model, BERT (Bidirectional Encoder Representations from Transformers), was also compared to assess the superiority of LLMs. Finally, the proposed method was tested on an external validation dataset consisted of 343 chest computed tomography (CT) report from another hospital. Compared with BERT-based prediction model and various other prompt engineering techniques, our method with GLM-4-Plus achieved the best performance on test dataset, attaining an F1-score of 0.89 and accuracy of 0.89. On the external validation dataset, F1-score (0.86) and accuracy (0.92) of the proposed method with GPT-4o were the highest. Compared to the popular strategy with manually selected typical samples (few-shot) and CoT designed by doctors (F1-score=0.83 and accuracy=0.83), the proposed method that summarized disease characteristics (F-Sum) based on LLM and automatically generated CoT performed better (F1-score=0.89 and accuracy=0.90). Although the BERT-based model got similar results on the test dataset (F1-score=0.85 and accuracy=0.88), its predictive performance significantly decreased on the external validation set (F1-score=0.48 and accuracy=0.78). These findings highlight the potential of LLMs to revolutionize pulmonary disease prediction, particularly in resource-constrained settings, by surpassing traditional models in both accuracy and flexibility. The proposed prompt engineering strategy not only improves predictive performance but also enhances the adaptability of LLMs in complex medical contexts, offering a promising tool for advancing disease diagnosis and clinical decision-making.

CT Classification Chest Methodology In Silico Academic Lab GenAI Benchmark SOTA

A fully open AI foundation model applied to chest radiography.

Ma D, Pang J, Gotway MB, Liang J

•papers•Jun 11 2025

Chest radiography frequently serves as baseline imaging for most lung diseases1. Deep learning has great potential for automating the interpretation of chest radiography2. However, existing chest radiographic deep learning models are limited in diagnostic scope, generalizability, adaptability, robustness and extensibility. To overcome these limitations, we have developed Ark+, a foundation model applied to chest radiography and pretrained by cyclically accruing and reusing the knowledge from heterogeneous expert labels in numerous datasets. Ark+ excels in diagnosing thoracic diseases. It expands the diagnostic scope and addresses potential misdiagnosis. It can adapt to evolving diagnostic needs and respond to novel diseases. It can learn rare conditions from a few samples and transfer to new diagnostic settings without training. It tolerates data biases and long-tailed distributions, and it supports federated learning to preserve privacy. All codes and pretrained models have been released, so that Ark+ is open for fine-tuning, local adaptation and improvement. It is extensible to several modalities. Thus, it is a foundation model for medical imaging. The exceptional capabilities of Ark+ stem from our insight: aggregating various datasets diversifies the patient populations and accrues knowledge from many experts to yield unprecedented performance while reducing annotation costs3. The development of Ark+ reveals that open models trained by accruing and reusing knowledge from heterogeneous expert annotations with a multitude of public (big or small) datasets can surpass the performance of proprietary models trained on large data. We hope that our findings will inspire more researchers to share code and datasets or federate privacy-preserving data to create open foundation models with diverse, global expertise and patient populations, thus accelerating open science and democratizing AI for medicine.

X-Ray Classification Chest Methodology In Silico Academic Lab Open Code Open Dataset Benchmark SOTA

Non-enhanced CT deep learning model for differentiating lung adenocarcinoma from tuberculoma: a multicenter diagnostic study.

Zhang G, Shang L, Li S, Zhang J, Zhang Z, Zhang X, Qian R, Yang K, Li X, Liu Y, Wu Y, Pu H, Cao Y, Man Q, Kong W

•papers•Jun 11 2025

To develop and validate a deep learning model based on three-dimensional features (DL_3D) for distinguishing lung adenocarcinoma (LUAD) from tuberculoma (TBM). A total of 1160 patients were collected from three hospitals. A vision transformer network-based DL_3D model was trained, and its performance in differentiating LUAD from TBM was evaluated using validation and external test sets. The performance of the DL_3D model was compared with that of two-dimensional features (DL_2D), radiomics, and six radiologists. Diagnostic performance was assessed using the area under the receiver operating characteristic curves (AUCs) analysis. The study included 840 patients in the training set (mean age, 54.8 years [range, 19-86 years]; 514 men), 210 patients in the validation set (mean age, 54.3 years [range, 18-86 years]; 128 men), and 110 patients in the external test set (mean age, 54.7 years [range, 22-88 years]; 51 men). In both the validation and external test sets, DL_3D exhibited excellent diagnostic performance (AUCs, 0.895 and 0.913, respectively). In the test set, the DL_3D model showed better performance (AUC, 0.913; 95% CI: 0.854, 0.973) than the DL_2D (AUC, 0.804, 95% CI: 0.722, 0.886; p < 0.001), radiomics (AUC, 0.676, 95% CI: 0.574, 0.777; p < 0.001), and six radiologists (AUCs, 0.692 to 0.810; p value range < 0.001-0.035). The DL_3D model outperforms expert radiologists in distinguishing LUAD from TBM. Question Can a deep learning model perform in differentiating LUAD from TBM on non-enhanced CT images? Findings The DL_3D model demonstrated higher diagnostic performance than the DL_2D model, radiomics model, and six radiologists in differentiating LUAD and TBM. Clinical relevance The DL_3D model could accurately differentiate between LUAD and TBM, which can help clinicians make personalized treatment plans.

CT Classification Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Diagnostic accuracy of machine learning-based magnetic resonance imaging models in breast cancer classification: a systematic review and meta-analysis.

Zhang J, Wu Q, Lei P, Zhu X, Li B

•papers•Jun 11 2025

This meta-analysis evaluates the diagnostic accuracy of machine learning (ML)-based magnetic resonance imaging (MRI) models in distinguishing benign from malignant breast lesions and explores factors influencing their performance. A systematic search of PubMed, Embase, Cochrane Library, Scopus, and Web of Science identified 12 eligible studies (from 3,739 records) up to August 2024. Data were extracted to calculate sensitivity, specificity, and area under the curve (AUC) using bivariate models in R 4.4.1. Study quality was assessed via QUADAS-2. Pooled sensitivity and specificity were 0.86 (95% CI: 0.82-0.90) and 0.82 (95% CI: 0.78-0.86), respectively, with an overall AUC of 0.90 (95% CI: 0.85-0.90). Diagnostic odds ratio (DOR) was 39.11 (95% CI: 25.04-53.17). Support vector machine (SVM) classifiers outperformed Naive Bayes, with higher sensitivity (0.88 vs. 0.86) and specificity (0.82 vs. 0.78). Heterogeneity was primarily attributed to MRI equipment (P = 0.037). ML-based MRI models demonstrate high diagnostic accuracy for breast cancer classification, with pooled sensitivity of 0.86 (95% CI: 0.82-0.90), specificity of 0.82 (95% CI: 0.78-0.86), and AUC of 0.90 (95% CI: 0.85-0.90). These results support their clinical utility as screening and diagnostic adjuncts, while highlighting the need for standardized protocols to improve generalizability.

MRI Classification Breast Meta Analysis In Silico Academic Lab Benchmark SOTA

Advancements and Applications of Hyperpolarized Xenon MRI for COPD Assessment in China.

Li H, Li H, Zhang M, Fang Y, Shen L, Liu X, Xiao S, Zeng Q, Zhou Q, Zhao X, Shi L, Han Y, Zhou X

•papers•Jun 10 2025

Chronic obstructive pulmonary disease (COPD) is one of the leading causes of morbidity and mortality in China, highlighting the importance of early diagnosis and ongoing monitoring for effective management. In recent years, hyperpolarized 129Xe MRI technology has gained significant clinical attention due to its ability to non-invasively and visually assess lung ventilation, microstructure, and gas exchange function. Its recent clinical approval in China, the United States and several European countries, represents a significant advancement in pulmonary imaging. This review provides an overview of the latest developments in hyperpolarized 129Xe MRI technology for COPD assessment in China. It covers the progress in instrument development, advanced imaging techniques, artificial intelligence-driven reconstruction methods, molecular imaging, and the application of this technology in both COPD patients and animal models. Furthermore, the review explores potential technical innovations in 129Xe MRI and discusses future directions for its clinical applications, aiming to address existing challenges and expand the technology's impact in clinical practice.

MRI Reconstruction Chest Review Clinical Pilot CE Mark Academic Lab Benchmark SOTA GenAI

DWI-based Biologically Interpretable Radiomic Nomogram for Predicting 1- year Biochemical Recurrence after Radical Prostatectomy: A Deep Learning, Multicenter Study.

Niu X, Li Y, Wang L, Xu G

•papers•Jun 10 2025

It is not rare to experience a biochemical recurrence (BCR) following radical prostatectomy (RP) for prostate cancer (PCa). It has been reported that early detection and management of BCR following surgery could improve survival in PCa. This study aimed to develop a nomogram integrating deep learning-based radiomic features and clinical parameters to predict 1-year BCR after RP and to examine the associations between radiomic scores and the tumor microenvironment (TME). In this retrospective multicenter study, two independent cohorts of patients (n = 349) who underwent RP after multiparametric magnetic resonance imaging (mpMRI) between January 2015 and January 2022 were included in the analysis. Single-cell RNA sequencing data from four prospectively enrolled participants were used to investigate the radiomic score-related TME. The 3D U-Net was trained and optimized for prostate cancer segmentation using diffusion-weighted imaging, and radiomic features of the target lesion were extracted. Predictive nomograms were developed via multivariate Cox proportional hazard regression analysis. The nomograms were assessed for discrimination, calibration, and clinical usefulness. In the development cohort, the clinical-radiomic nomogram had an AUC of 0.892 (95% confidence interval: 0.783--0.939), which was considerably greater than those of the radiomic signature and clinical model. The Hosmer-Lemeshow test demonstrated that the clinical-radiomic model performed well in both the development (P = 0.461) and validation (P = 0.722) cohorts. Decision curve analysis revealed that the clinical-radiomic nomogram displayed better clinical predictive usefulness than the clinical or radiomic signature alone in both cohorts. Radiomic scores were associated with a significant difference in the TME pattern. Our study demonstrated the feasibility of a DWI-based clinical-radiomic nomogram combined with deep learning for the prediction of 1-year BCR. The findings revealed that the radiomic score was associated with a distinctive tumor microenvironment.

MRI Segmentation Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Uncertainty estimation for trust attribution to speed-of-sound reconstruction with variational networks.

Laguna S, Zhang L, Bezek CD, Farkas M, Schweizer D, Kubik-Huch RA, Goksel O

•papers•Jun 10 2025

Speed-of-sound (SoS) is a biomechanical characteristic of tissue, and its imaging can provide a promising biomarker for diagnosis. Reconstructing SoS images from ultrasound acquisitions can be cast as a limited-angle computed-tomography problem, with variational networks being a promising model-based deep learning solution. Some acquired data frames may, however, get corrupted by noise due to, e.g., motion, lack of contact, and acoustic shadows, which in turn negatively affects the resulting SoS reconstructions. We propose to use the uncertainty in SoS reconstructions to attribute trust to each individual acquired frame. Given multiple acquisitions, we then use an uncertainty-based automatic selection among these retrospectively, to improve diagnostic decisions. We investigate uncertainty estimation based on Monte Carlo Dropout and Bayesian Variational Inference. We assess our automatic frame selection method for differential diagnosis of breast cancer, distinguishing between benign fibroadenoma and malignant carcinoma. We evaluate 21 lesions classified as BI-RADS 4, which represents suspicious cases for probable malignancy. The most trustworthy frame among four acquisitions of each lesion was identified using uncertainty-based criteria. Selecting a frame informed by uncertainty achieved an area under curve of 76% and 80% for Monte Carlo Dropout and Bayesian Variational Inference, respectively, superior to any uncertainty-uninformed baselines with the best one achieving 64%. A novel use of uncertainty estimation is proposed for selecting one of multiple data acquisitions for further processing and decision making.

Ultrasound Reconstruction Breast Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Foundation Models in Medical Imaging -- A Review and Outlook

Vivien van Veldhuizen, Vanessa Botha, Chunyao Lu, Melis Erdal Cesur, Kevin Groot Lipman, Edwin D. de Jong, Hugo Horlings, Clárisa I. Sanchez, Cees G. M. Snoek, Lodewyk Wessels, Ritse Mann, Eric Marcus, Jonas Teuwen

•preprint•Jun 10 2025

Foundation models (FMs) are changing the way medical images are analyzed by learning from large collections of unlabeled data. Instead of relying on manually annotated examples, FMs are pre-trained to learn general-purpose visual features that can later be adapted to specific clinical tasks with little additional supervision. In this review, we examine how FMs are being developed and applied in pathology, radiology, and ophthalmology, drawing on evidence from over 150 studies. We explain the core components of FM pipelines, including model architectures, self-supervised learning methods, and strategies for downstream adaptation. We also review how FMs are being used in each imaging domain and compare design choices across applications. Finally, we discuss key challenges and open questions to guide future research.

Mixed Modality Classification Review Concept Academic Lab Benchmark SOTA GenAI

Filter Papers

Tags

Towards more reliable prostate cancer detection: Incorporating clinical data and uncertainty in MRI deep learning.

Non-invasive prediction of nuclear grade in renal cell carcinoma using CT-Based radiomics: a systematic review and meta-analysis.

Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report.

A fully open AI foundation model applied to chest radiography.

Non-enhanced CT deep learning model for differentiating lung adenocarcinoma from tuberculoma: a multicenter diagnostic study.

Diagnostic accuracy of machine learning-based magnetic resonance imaging models in breast cancer classification: a systematic review and meta-analysis.

Advancements and Applications of Hyperpolarized Xenon MRI for COPD Assessment in China.

DWI-based Biologically Interpretable Radiomic Nomogram for Predicting 1- year Biochemical Recurrence after Radical Prostatectomy: A Deep Learning, Multicenter Study.

Uncertainty estimation for trust attribution to speed-of-sound reconstruction with variational networks.

Foundation Models in Medical Imaging -- A Review and Outlook

Ready to Sharpen Your Edge?