Latest Papers on Radiology AI. Sources: pubmed, Tags: Benchmark SOTA, Order: Best Match, Limit: 10.

Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report.

Li R, Mao S, Zhu C, Yang Y, Tan C, Li L, Mu X, Liu H, Yang Y

•papers•Jun 11 2025

The rapid advancements in natural language processing, particularly the development of large language models (LLMs), have opened new avenues for managing complex clinical text data. However, the inherent complexity and specificity of medical texts present significant challenges for the practical application of prompt engineering in diagnostic tasks. This paper explores LLMs with new prompt engineering technology to enhance model interpretability and improve the prediction performance of pulmonary disease based on a traditional deep learning model. A retrospective dataset including 2965 chest CT radiology reports was constructed. The reports were from 4 cohorts, namely, healthy individuals and patients with pulmonary tuberculosis, lung cancer, and pneumonia. Then, a novel prompt engineering strategy that integrates feature summarization (F-Sum), chain of thought (CoT) reasoning, and a hybrid retrieval-augmented generation (RAG) framework was proposed. A feature summarization approach, leveraging term frequency-inverse document frequency (TF-IDF) and K-means clustering, was used to extract and distill key radiological findings related to 3 diseases. Simultaneously, the hybrid RAG framework combined dense and sparse vector representations to enhance LLMs' comprehension of disease-related text. In total, 3 state-of-the-art LLMs, GLM-4-Plus, GLM-4-air (Zhipu AI), and GPT-4o (OpenAI), were integrated with the prompt strategy to evaluate the efficiency in recognizing pneumonia, tuberculosis, and lung cancer. The traditional deep learning model, BERT (Bidirectional Encoder Representations from Transformers), was also compared to assess the superiority of LLMs. Finally, the proposed method was tested on an external validation dataset consisted of 343 chest computed tomography (CT) report from another hospital. Compared with BERT-based prediction model and various other prompt engineering techniques, our method with GLM-4-Plus achieved the best performance on test dataset, attaining an F1-score of 0.89 and accuracy of 0.89. On the external validation dataset, F1-score (0.86) and accuracy (0.92) of the proposed method with GPT-4o were the highest. Compared to the popular strategy with manually selected typical samples (few-shot) and CoT designed by doctors (F1-score=0.83 and accuracy=0.83), the proposed method that summarized disease characteristics (F-Sum) based on LLM and automatically generated CoT performed better (F1-score=0.89 and accuracy=0.90). Although the BERT-based model got similar results on the test dataset (F1-score=0.85 and accuracy=0.88), its predictive performance significantly decreased on the external validation set (F1-score=0.48 and accuracy=0.78). These findings highlight the potential of LLMs to revolutionize pulmonary disease prediction, particularly in resource-constrained settings, by surpassing traditional models in both accuracy and flexibility. The proposed prompt engineering strategy not only improves predictive performance but also enhances the adaptability of LLMs in complex medical contexts, offering a promising tool for advancing disease diagnosis and clinical decision-making.

CT Classification Chest Methodology In Silico Academic Lab GenAI Benchmark SOTA

Non-invasive prediction of nuclear grade in renal cell carcinoma using CT-Based radiomics: a systematic review and meta-analysis.

Salimi M, Hajikarimloo B, Vadipour P, Abdolizadeh A, Fayedeh F, Seifi S

•papers•Jun 11 2025

Renal cell carcinoma (RCC) represents the most prevalent malignant neoplasm of the kidney, with a rising global incidence. Tumor nuclear grade is a crucial prognostic factor, guiding treatment decisions, but current histopathological grading via biopsy is invasive and prone to sampling errors. This study aims to assess the diagnostic performance and quality of CT-based radiomics for preoperatively predicting RCC nuclear grade. A comprehensive search was conducted across PubMed, Scopus, Embase, and Web of Science to identify relevant studies up until 19 April 2025. Quality was assessed using the QUADAS-2 and METRICS tools. A bivariate random-effects meta-analysis was performed to evaluate model performance, including sensitivity, specificity, and Area Under the Curve (AUC). Results from separate validation cohorts were pooled, and clinical and combined models were analyzed separately in distinct analyses. A total of 26 studies comprising 1993 individuals in 10 external and 16 internal validation cohorts were included. Meta-analysis of radiomics models showed pooled AUC of 0.88, sensitivity of 0.78, and specificity of 0.82. Clinical and combined (clinical-radiomics) models showed AUCs of 0.73 and 0.86, respectively. QUADAS-2 revealed significant risk of bias in the Index Test and Flow and Timing domains. METRICS scores ranged from 49.7 to 88.4%, with an average of 66.65%, indicating overall good quality, though gaps in some aspects of study methodologies were identified. This study suggests that radiomics models show great potential and diagnostic accuracy for non-invasive preoperative nuclear grading of RCC. However, challenges related to generalizability and clinical applicability remain, as further research with standardized methodologies, external validation, and larger cohorts is needed to enhance their reliability and integration into routine clinical practice.

CT Classification Abdominal Meta Analysis In Silico Academic Lab Benchmark SOTA

Towards more reliable prostate cancer detection: Incorporating clinical data and uncertainty in MRI deep learning.

Taguelmimt K, Andrade-Miranda G, Harb H, Thanh TT, Dang HP, Malavaud B, Bert J

•papers•Jun 11 2025

Prostate cancer (PCa) is one of the most common cancers among men, and artificial intelligence (AI) is emerging as a promising tool to enhance its diagnosis. This work proposes a classification approach for PCa cases using deep learning techniques. We conducted a comparison between unimodal models based either on biparametric magnetic resonance imaging (bpMRI) or clinical data (such as prostate-specific antigen levels, prostate volume, and age). We also introduced a bimodal model that simultaneously integrates imaging and clinical data to address the limitations of unimodal approaches. Furthermore, we propose a framework that not only detects the presence of PCa but also evaluates the uncertainty associated with the predictions. This approach makes it possible to identify highly confident predictions and distinguish them from those characterized by uncertainty, thereby enhancing the reliability and applicability of automated medical decisions in clinical practice. The results show that the bimodal model significantly improves performance, with an area under the curve (AUC) reaching 0.82±0.03, a sensitivity of 0.73±0.04, while maintaining high specificity. Uncertainty analysis revealed that the bimodal model produces more confident predictions, with an uncertainty accuracy of 0.85, surpassing the imaging-only model (which is 0.71). This increase in reliability is crucial in a clinical context, where precise and dependable diagnostic decisions are essential for patient care. The integration of clinical data with imaging data in a bimodal model not only improves diagnostic performance but also strengthens the reliability of predictions, making this approach particularly suitable for clinical use.

MRI Classification Abdominal Methodology In Silico None Academic Lab Benchmark SOTA

AI-based radiomic features predict outcomes and the added benefit of chemoimmunotherapy over chemotherapy in extensive stage small cell lung cancer: A Multi-institutional study.

Khorrami M, Mutha P, Barrera C, Viswanathan VS, Ardeshir-Larijani F, Jain P, Higgins K, Madabhushi A

•papers•Jun 11 2025

Small cell lung cancer (SCLC) is aggressive with poor survival outcomes, and most patients develop resistance to chemotherapy. No predictive biomarkers currently guide therapy. This study evaluates radiomic features to predict PFS and OS in limited-stage SCLC (LS-SCLC) and assesses PFS, OS, and the added benefit of chemoimmunotherapy (CHIO) in extensive-stage SCLC (ES-SCLC). A total of 660 SCLC patients (470 ES-SCLC, 190 LS-SCLC) from three sites were analyzed. LS-SCLC patients received chemotherapy and radiation, while ES-SCLC patients received either chemotherapy alone or chemoimmunotherapy. Radiomic and quantitative vasculature tortuosity features were extracted from CT scans. A LASSO-Cox regression model was used to construct the ES- Risk-Score (ESRS) and LS- Risk-Score (LSRS). ESRS was associated with PFS in training (HR = 1.54, adj. P = .0013) and validation sets (HR = 1.32, adj. P = .0001; HR = 2.4, adj. P = .0073) and with OS in training (HR = 1.37, adj. P = .0054) and validation sets (HR = 1.35, adj. P < .0006; HR = 1.6, adj. P < .0085) in ES-SCLC patients treated with chemotherapy. High-risk patients had improved PFS (HR = 0.68, adj. P < .001) and OS (HR = 0.78, adj. P = .026) with chemoimmunotherapy. LSRS was associated with PFS in training and validation sets (HR = 1.9, adj. P = .007; HR = 1.4, adj. P = .0098; HR = 2.1, adj. P = .028) in LS-SCLC patients receiving chemoradiation. Radiomics is prognostic for PFS and OS and predicts chemoimmunotherapy benefit in high-risk ES-SCLC patients.

CT Classification Chest Retrospective Clinical In Silico None Academic Lab Benchmark SOTA

Automated Segmentation of Thoracic Aortic Lumen and Vessel Wall on 3D Bright- and Black-Blood MRI using nnU-Net.

Cesario M, Littlewood SJ, Nadel J, Fletcher TJ, Fotaki A, Castillo-Passi C, Hajhosseiny R, Pouliopoulos J, Jabbour A, Olivero R, Rodríguez-Palomares J, Kooi ME, Prieto C, Botnar RM

•papers•Jun 11 2025

Magnetic resonance angiography (MRA) is an important tool for aortic assessment in several cardiovascular diseases. Assessment of MRA images relies on manual segmentation; a time-intensive process that is subject to operator variability. We aimed to optimize and validate two deep-learning models for automatic segmentation of the aortic lumen and vessel wall in high-resolution ECG-triggered free-breathing respiratory motion-corrected 3D bright- and black-blood MRA images. Manual segmentation, serving as the ground truth, was performed on 25 bright-blood and 15 black-blood 3D MRA image sets acquired with the iT2PrepIR-BOOST sequence (1.5T) in thoracic aortopathy patients. The training was performed with nnU-Net for bright-blood (lumen) and black-blood image sets (lumen and vessel wall). Training consisted of a 70:20:10% training: validation: testing split. Inference was run on datasets (single vendor) from different centres (UK, Spain, and Australia), sequences (iT2PrepIR-BOOST, T2 prepared CMRA, and TWIST MRA), acquired resolutions (from 0.9 mm3 to 3 mm3), and field strengths (0.55T, 1.5T, and 3T). Predictive measurements comprised Dice Similarity Coefficient (DSC), and Intersection over Union (IoU). Postprocessing (3D slicer) included centreline extraction, diameter measurement, and curved planar reformatting (CPR). The optimal configuration was the 3D U-Net. Bright blood segmentation at 1.5T on iT2PrepIR-BOOST datasets (1.3 and 1.8 mm3) and 3D CMRA datasets (0.9 mm3) resulted in DSC ≥ 0.96 and IoU ≥ 0.92. For bright-blood segmentation on 3D CMRA at 0.55T, the nnUNet achieved DSC and IoU scores of 0.93 and 0.88 at 1.5 mm³, and 0.68 and 0.52 at 3.0 mm³, respectively. DSC and IoU scores of 0.89 and 0.82 were obtained for CMRA image sets (1 mm3) at 1.5T (Barcelona dataset). DSC and IoU score of the BRnnUNet model were 0.90 and 0.82 respectively for the contrast-enhanced dataset (TWIST MRA). Lumen segmentation on black blood 1.5T iT2PrepIR-BOOST image sets achieved DSC ≥ 0.95 and IoU ≥ 0.90, and vessel wall segmentation resulted in DSC ≥ 0.80 and IoU ≥ 0.67. Automated centreline tracking, diameter measurement and CPR were successfully implemented in all subjects. Automated aortic lumen and wall segmentation on 3D bright- and black-blood image sets demonstrated excellent agreement with ground truth. This technique demonstrates a fast and comprehensive assessment of aortic morphology with great potential for future clinical application in various cardiovascular diseases.

MRI Segmentation Cardiac Retrospective Clinical In Silico None Academic Lab Benchmark SOTA

A Deep Learning Model for Identifying the Risk of Mesenteric Malperfusion in Acute Aortic Dissection Using Initial Diagnostic Data: Algorithm Development and Validation.

Jin Z, Dong J, Li C, Jiang Y, Yang J, Xu L, Li P, Xie Z, Li Y, Wang D, Ji Z

•papers•Jun 10 2025

Mesenteric malperfusion (MMP) is an uncommon but devastating complication of acute aortic dissection (AAD) that combines 2 life-threatening conditions-aortic dissection and acute mesenteric ischemia. The complex pathophysiology of MMP poses substantial diagnostic and management challenges. Currently, delayed diagnosis remains a critical contributor to poor outcomes because of the absence of reliable individualized risk assessment tools. This study aims to develop and validate a deep learning-based model that integrates multimodal data to identify patients with AAD at high risk of MMP. This multicenter retrospective study included 525 patients with AAD from 2 hospitals. The training and internal validation cohort consisted of 450 patients from Beijing Anzhen Hospital, whereas the external validation cohort comprised 75 patients from Nanjing Drum Tower Hospital. Three machine learning models were developed: the benchmark model using laboratory parameters, the multiorgan feature-based AAD complicating MMP (MAM) model based on computed tomography angiography images, and the integrated model combining both data modalities. Model performance was assessed using the area under the curve, accuracy, sensitivity, specificity, and Brier score. To improve interpretability, gradient-weighted class activation mapping was used to identify and visualize discriminative imaging features. Univariate and multivariate regression analyses were used to evaluate the prognostic significance of the risk score generated by the optimal model. In the external validation cohort, the integrated model demonstrated superior performance, with an area under the curve of 0.780 (95% CI 0.777-0.785), which was significantly greater than those of the benchmark model (0.586, 95% CI 0.574-0.586) and the MAM model (0.732, 95% CI 0.724-0.734). This highlights the benefits of multimodal integration over single-modality approaches. Additional classification metrics revealed that the integrated model had an accuracy of 0.760 (95% CI 0.758-0.764), a sensitivity of 0.667 (95% CI 0.659-0.675), a specificity of 0.783 (95% CI 0.781-0.788), and a Brier score of 0.143 (95% CI 0.143-0.145). Moreover, gradient-weighted class activation mapping visualizations of the MAM model revealed that during positive predictions, the model focused more on key anatomical areas, particularly the superior mesenteric artery origin and intestinal regions with characteristic gas or fluid accumulation. Univariate and multivariate analyses also revealed that the risk score derived from the integrated model was independently associated with inhospital mortality risk among patients with AAD undergoing endovascular or surgical treatment (odds ratio 1.030, 95% CI 1.004-1.056; P=.02). Our findings demonstrate that compared with unimodal approaches, an integrated deep learning model incorporating both imaging and clinical data has greater diagnostic accuracy for MMP in patients with AAD. This model may serve as a valuable tool for early risk identification, facilitating timely therapeutic decision-making. Further prospective validation is warranted to confirm its clinical utility. Chinese Clinical Registry Center ChiCTR2400086050; http://www.chictr.org.cn/showproj.html?proj=226129.

CT Classification Vascular Retrospective Clinical In Silico None Academic Lab Benchmark SOTA

Advancements and Applications of Hyperpolarized Xenon MRI for COPD Assessment in China.

Li H, Li H, Zhang M, Fang Y, Shen L, Liu X, Xiao S, Zeng Q, Zhou Q, Zhao X, Shi L, Han Y, Zhou X

•papers•Jun 10 2025

Chronic obstructive pulmonary disease (COPD) is one of the leading causes of morbidity and mortality in China, highlighting the importance of early diagnosis and ongoing monitoring for effective management. In recent years, hyperpolarized 129Xe MRI technology has gained significant clinical attention due to its ability to non-invasively and visually assess lung ventilation, microstructure, and gas exchange function. Its recent clinical approval in China, the United States and several European countries, represents a significant advancement in pulmonary imaging. This review provides an overview of the latest developments in hyperpolarized 129Xe MRI technology for COPD assessment in China. It covers the progress in instrument development, advanced imaging techniques, artificial intelligence-driven reconstruction methods, molecular imaging, and the application of this technology in both COPD patients and animal models. Furthermore, the review explores potential technical innovations in 129Xe MRI and discusses future directions for its clinical applications, aiming to address existing challenges and expand the technology's impact in clinical practice.

MRI Reconstruction Chest Review Clinical Pilot CE Mark Academic Lab Benchmark SOTA GenAI

Uncertainty estimation for trust attribution to speed-of-sound reconstruction with variational networks.

Laguna S, Zhang L, Bezek CD, Farkas M, Schweizer D, Kubik-Huch RA, Goksel O

•papers•Jun 10 2025

Speed-of-sound (SoS) is a biomechanical characteristic of tissue, and its imaging can provide a promising biomarker for diagnosis. Reconstructing SoS images from ultrasound acquisitions can be cast as a limited-angle computed-tomography problem, with variational networks being a promising model-based deep learning solution. Some acquired data frames may, however, get corrupted by noise due to, e.g., motion, lack of contact, and acoustic shadows, which in turn negatively affects the resulting SoS reconstructions. We propose to use the uncertainty in SoS reconstructions to attribute trust to each individual acquired frame. Given multiple acquisitions, we then use an uncertainty-based automatic selection among these retrospectively, to improve diagnostic decisions. We investigate uncertainty estimation based on Monte Carlo Dropout and Bayesian Variational Inference. We assess our automatic frame selection method for differential diagnosis of breast cancer, distinguishing between benign fibroadenoma and malignant carcinoma. We evaluate 21 lesions classified as BI-RADS 4, which represents suspicious cases for probable malignancy. The most trustworthy frame among four acquisitions of each lesion was identified using uncertainty-based criteria. Selecting a frame informed by uncertainty achieved an area under curve of 76% and 80% for Monte Carlo Dropout and Bayesian Variational Inference, respectively, superior to any uncertainty-uninformed baselines with the best one achieving 64%. A novel use of uncertainty estimation is proposed for selecting one of multiple data acquisitions for further processing and decision making.

Ultrasound Reconstruction Breast Retrospective Clinical In Silico None Academic Lab Benchmark SOTA

DWI-based Biologically Interpretable Radiomic Nomogram for Predicting 1- year Biochemical Recurrence after Radical Prostatectomy: A Deep Learning, Multicenter Study.

Niu X, Li Y, Wang L, Xu G

•papers•Jun 10 2025

It is not rare to experience a biochemical recurrence (BCR) following radical prostatectomy (RP) for prostate cancer (PCa). It has been reported that early detection and management of BCR following surgery could improve survival in PCa. This study aimed to develop a nomogram integrating deep learning-based radiomic features and clinical parameters to predict 1-year BCR after RP and to examine the associations between radiomic scores and the tumor microenvironment (TME). In this retrospective multicenter study, two independent cohorts of patients (n = 349) who underwent RP after multiparametric magnetic resonance imaging (mpMRI) between January 2015 and January 2022 were included in the analysis. Single-cell RNA sequencing data from four prospectively enrolled participants were used to investigate the radiomic score-related TME. The 3D U-Net was trained and optimized for prostate cancer segmentation using diffusion-weighted imaging, and radiomic features of the target lesion were extracted. Predictive nomograms were developed via multivariate Cox proportional hazard regression analysis. The nomograms were assessed for discrimination, calibration, and clinical usefulness. In the development cohort, the clinical-radiomic nomogram had an AUC of 0.892 (95% confidence interval: 0.783--0.939), which was considerably greater than those of the radiomic signature and clinical model. The Hosmer-Lemeshow test demonstrated that the clinical-radiomic model performed well in both the development (P = 0.461) and validation (P = 0.722) cohorts. Decision curve analysis revealed that the clinical-radiomic nomogram displayed better clinical predictive usefulness than the clinical or radiomic signature alone in both cohorts. Radiomic scores were associated with a significant difference in the TME pattern. Our study demonstrated the feasibility of a DWI-based clinical-radiomic nomogram combined with deep learning for the prediction of 1-year BCR. The findings revealed that the radiomic score was associated with a distinctive tumor microenvironment.

MRI Segmentation Abdominal Retrospective Clinical In Silico None Academic Lab Benchmark SOTA

Improving Patient Communication by Simplifying AI-Generated Dental Radiology Reports With ChatGPT: Comparative Study.

Stephan D, Bertsch AS, Schumacher S, Puladi B, Burwinkel M, Al-Nawas B, Kämmerer PW, Thiem DG

•papers•Jun 9 2025

Medical reports, particularly radiology findings, are often written for professional communication, making them difficult for patients to understand. This communication barrier can reduce patient engagement and lead to misinterpretation. Artificial intelligence (AI), especially large language models such as ChatGPT, offers new opportunities for simplifying medical documentation to improve patient comprehension. We aimed to evaluate whether AI-generated radiology reports simplified by ChatGPT improve patient understanding, readability, and communication quality compared to original AI-generated reports. In total, 3 versions of radiology reports were created using ChatGPT: an original AI-generated version (text 1), a patient-friendly, simplified version (text 2), and a further simplified and accessibility-optimized version (text 3). A total of 300 patients (n=100, 33.3% per group), excluding patients with medical education, were randomly assigned to review one text version and complete a standardized questionnaire. Readability was assessed using the Flesch Reading Ease (FRE) score and LIX indices. Both simplified texts showed significantly higher readability scores (text 1: FRE score=51.1; text 2: FRE score=55.0; and text 3: FRE score=56.4; P<.001) and lower LIX scores, indicating enhanced clarity. Text 3 had the shortest sentences, had the fewest long words, and scored best on all patient-rated dimensions. Questionnaire results revealed significantly higher ratings for texts 2 and 3 across clarity (P<.001), tone (P<.001), structure, and patient engagement. For example, patients rated the ability to understand findings without help highest for text 3 (mean 1.5, SD 0.7) and lowest for text 1 (mean 3.1, SD 1.4). Both simplified texts significantly improved patients' ability to prepare for clinical conversations and promoted shared decision-making. AI-generated simplification of radiology reports significantly enhances patient comprehension and engagement. These findings highlight the potential of ChatGPT as a tool to improve patient-centered communication. While promising, future research should focus on ensuring clinical accuracy and exploring applications across diverse patient populations to support equitable and effective integration of AI in health care communication.

X-Ray LLM Radiology Report Other Prospective Clinical Pilot Academic Lab GenAI Benchmark SOTA

Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report.

Non-invasive prediction of nuclear grade in renal cell carcinoma using CT-Based radiomics: a systematic review and meta-analysis.

Towards more reliable prostate cancer detection: Incorporating clinical data and uncertainty in MRI deep learning.

AI-based radiomic features predict outcomes and the added benefit of chemoimmunotherapy over chemotherapy in extensive stage small cell lung cancer: A Multi-institutional study.

Automated Segmentation of Thoracic Aortic Lumen and Vessel Wall on 3D Bright- and Black-Blood MRI using nnU-Net.

A Deep Learning Model for Identifying the Risk of Mesenteric Malperfusion in Acute Aortic Dissection Using Initial Diagnostic Data: Algorithm Development and Validation.

Advancements and Applications of Hyperpolarized Xenon MRI for COPD Assessment in China.

Uncertainty estimation for trust attribution to speed-of-sound reconstruction with variational networks.

DWI-based Biologically Interpretable Radiomic Nomogram for Predicting 1- year Biochemical Recurrence after Radical Prostatectomy: A Deep Learning, Multicenter Study.

Improving Patient Communication by Simplifying AI-Generated Dental Radiology Reports With ChatGPT: Comparative Study.

Ready to Sharpen Your Edge?