Latest Papers on Radiology AI.

A Novel Metric for Detecting Memorization in Generative Models for Brain MRI Synthesis

Antonio Scardace, Lemuel Puglisi, Francesco Guarnera, Sebastiano Battiato, Daniele Ravì

•preprint•Sep 20 2025

Deep generative models have emerged as a transformative tool in medical imaging, offering substantial potential for synthetic data generation. However, recent empirical studies highlight a critical vulnerability: these models can memorize sensitive training data, posing significant risks of unauthorized patient information disclosure. Detecting memorization in generative models remains particularly challenging, necessitating scalable methods capable of identifying training data leakage across large sets of generated samples. In this work, we propose DeepSSIM, a novel self-supervised metric for quantifying memorization in generative models. DeepSSIM is trained to: i) project images into a learned embedding space and ii) force the cosine similarity between embeddings to match the ground-truth SSIM (Structural Similarity Index) scores computed in the image space. To capture domain-specific anatomical features, training incorporates structure-preserving augmentations, allowing DeepSSIM to estimate similarity reliably without requiring precise spatial alignment. We evaluate DeepSSIM in a case study involving synthetic brain MRI data generated by a Latent Diffusion Model (LDM) trained under memorization-prone conditions, using 2,195 MRI scans from two publicly available datasets (IXI and CoRR). Compared to state-of-the-art memorization metrics, DeepSSIM achieves superior performance, improving F1 scores by an average of +52.03% over the best existing method. Code and data of our approach are publicly available at the following link: https://github.com/brAIn-science/DeepSSIM.

MRI Image Synthesis Neurological Methodology In Silico Academic Lab Open Code

Radiologist Interaction with AI-Generated Preliminary Reports: A Longitudinal Multi-Reader Study.

Hong EK, Suh CH, Nukala M, Esfahani A, Licaros A, Madan R, Hunsaker A, Hammer M

•papers•Sep 20 2025

To investigate the integration of multimodal AI-generated reports into radiology workflow over time, focusing on their impact on efficiency, acceptability, and report quality. A multicase, multireader study involved 756 publicly available chest radiographs interpreted by five radiologists using preliminary reports generated by a radiology-specific multimodal AI model, divided into seven sequential batches of 108 radiographs each. Two thoracic radiologists assessed the final reports using RADPEER criteria for agreement and 5-point Likert scale for quality. Reading times, rate of acceptance without modification, agreement, and quality scores were measured, with statistical analyses evaluating trends across seven sequential batches. Radiologists' reading times for chest radiographs decreased from 25.8 seconds in Batch 1 to 19.3 seconds in Batch 7 (p < .001). Acceptability increased from 54.6% to 60.2% (p < .001), with normal chest radiographs demonstrating high rates (68.9%) compared to abnormal chest radiographs (52.6%; p < .001). Median agreement and quality scores remained stable for normal chest radiographs but varied significantly for abnormal chest radiographs (ps < .05). The introduction of AI-generated reports improved efficiency of chest radiograph interpretation, acceptability increased over time. However, agreement and quality scores showed variability, particularly in abnormal cases, emphasizing the need for oversight in the interpretation of complex chest radiographs.

X-Ray Report Generation Chest Retrospective Clinical Clinical Pilot Academic Lab Benchmark SOTA

Uncovering genetic architecture of the heart via genetic association studies of unsupervised deep learning derived endophenotypes.

You L, Zhao X, Xie Z, Patel KA, Chen C, Kitkungvan D, Mohammed KK, Narula N, Arbustini E, Cassidy CK, Narula J, Zhi D

•papers•Sep 20 2025

Recent genome-wide association studies (GWAS) have effectively linked genetic variants to quantitative traits derived from time-series cardiac magnetic resonance imaging, revealing insights into cardiac morphology and function. Deep learning approach generally requires extensive supervised training on manually annotated data. In this study, we developed a novel framework using a 3D U-architecture autoencoder (cineMAE) to learn deep image phenotypes from cardiac magnetic resonance (CMR) imaging for genetic discovery, focusing on long-axis two-chamber and four-chamber views. We trained a masked autoencoder to develop U nsupervised D erived I mage P henotypes for heart (Heart-UDIPs). These representations were found to be informative to indicate various heart-specific phenotypes (e.g., left ventricular hypertrophy) and diseases (e.g., hypertrophic cardiomyopathy). GWAS on Heart UDIP identified 323 lead SNP and 628 SNP-prioritized genes, which exceeded previous methods. The genes identified by method described herein, exhibited significant associations with cardiac function and showed substantial enrichment in pathways related to cardiac disorders. These results underscore the utility of our Heart-UDIP approach in enhancing the discovery potential for genetic associations, without the need for clinically defined phenotypes or manual annotations.

MRI Segmentation Cardiac Methodology In Silico Academic Lab Breakthrough GenAI

Multimodal AI-driven Biomarker for Early Detection of Cancer Cachexia

Ahmed, S., Parker, N., Park, M., Davis, E. W., Jeong, D., Permuth, J. B., Schabath, M. B., Yilmaz, Y., Rasool, G.

•preprint•Sep 19 2025

Cancer cachexia, a multifactorial metabolic syndrome characterized by severe muscle wasting and weight loss, contributes to poor outcomes across various cancer types but lacks a standardized, generalizable biomarker for early detection. We present a multimodal AI-based biomarker trained on real-world clinical, radiologic, laboratory, and unstructured clinical note data, leveraging foundation models and large language models (LLMs) to identify cachexia at the time of cancer diagnosis. Prediction accuracy improved with each added modality: 77% using clinical variables alone, 81% with added laboratory data, and 85% with structured symptom features extracted from clinical notes. Incorporating embeddings from clinical text and CT images further improved accuracy to 92%. The framework also demonstrated prognostic utility, improving survival prediction as data modalities were integrated. Designed for real-world clinical deployment, the framework accommodates missing modalities without requiring imputation or case exclusion, supporting scalability across diverse oncology settings. Unlike prior models trained on curated datasets, our approach utilizes standard-of-care clinical data, facilitating integration into oncology workflows. In contrast to fixed-threshold composite indices such as the cachexia index (CXI), the model generates patient-specific predictions, enabling adaptable, cancer-agnostic performance. To enhance clinical reliability and safety, the framework incorporates uncertainty estimation to flag low-confidence cases for expert review. This work advances a clinically applicable, scalable, and trustworthy AI-driven decision support tool for early cachexia detection and personalized oncology care.

CT Classification Whole Body Retrospective Clinical In Silico Academic Lab GenAI

MFFC-Net: Multi-feature Fusion Deep Networks for Classifying Pulmonary Edema of a Pilot Study by Using Lung Ultrasound Image with Texture Analysis and Transfer Learning Technique.

Bui NT, Luoma CE, Zhang X

•papers•Sep 19 2025

Lung ultrasound (LUS) has been widely used by point-of-care systems in both children and adult populations to provide different clinical diagnostics. This research aims to develop an interpretable system that uses a deep fusion network for classifying LUS video/patients based on extracted features by using texture analysis and transfer learning techniques to assist physicians. The pulmonary edema dataset includes 56 LUS videos and 4234 LUS frames. The COVID-BLUES dataset includes 294 LUS videos and 15,826 frames. The proposed multi-feature fusion classification network (MFFC-Net) includes the following: (1) two features extracted from Inception-ResNet-v2, Inception-v3, and 9 texture features of gray-level co-occurrence matrix (GLCM) and histogram of the region of interest (ROI); (2) a neural network for classifying LUS images with feature fusion input; and (3) four models (i.e., ANN, SVM, XGBoost, and kNN) used for classifying COVID/NON COVID patients. The training process was evaluated based on accuracy (0.9969), F1-score (0.9968), sensitivity (0.9967), specificity (0.9990), and precision (0.9970) metrics after the fivefold cross-validation stage. The results of the ANOVA analysis with 9 features of LUS images show that there was a significant difference between pulmonary edema and normal lungs (p < 0.01). The test results at the frame level of the MFFC-Net model achieved an accuracy of 100% and ROC-AUC (1.000) compared with ground truth at the video level with 4 groups of LUS videos. Test results at the patient level with the COVID-BLUES dataset achieved the highest accuracy of 81.25% with the kNN model. The proposed MFFC-Net model has 125 times higher information density (ID) compared to Inception-ResNet-v2 and 53.2 times compared with Inception-v3.

Ultrasound Classification Chest Methodology In Silico Academic Lab Benchmark SOTA

Deep learning-based acceleration and denoising of 0.55T MRI for enhanced conspicuity of vestibular Schwannoma post contrast administration.

Hinsen M, Nagel A, Heiss R, May M, Wiesmueller M, Mathy C, Zeilinger M, Hornung J, Mueller S, Uder M, Kopp M

•papers•Sep 19 2025

Deep-learning (DL) based MRI denoising techniques promise improved image quality and shorter examination times. This advancement is particularly beneficial for 0.55T MRI, where the inherently lower signal-to-noise (SNR) ratio can compromise image quality. Sufficient SNR is crucial for the reliable detection of vestibular schwannoma (VS). The objective of this study is to evaluate the VS conspicuity and acquisition time (TA) of 0.55T MRI examinations with contrast agents using a DL-denoising algorithm. From January 2024 to October 2024, we retrospectively included 30 patients with VS (9 women). We acquired a clinical reference protocol of the cerebellopontine angle containing a T1w fat-saturated (fs) axial (number of signal averages [NSA] 4) and a T1w Spectral Attenuated Inversion Recovery (SPAIR) coronal (NSA 2) sequence after contrast agent (CA) application without advanced DL-based denoising (w/o DL). We reconstructed the T1w fs CA sequence axial and the T1w SPAIR CA coronal with full DL-denoising mode without change of NSA, and secondly with 1 NSA for T1w fs CA axial and T1w SPAIR coronal (DL&1NSA). Each sequence was rated on a 5-point Likert scale (1: insufficient, 3: moderate, clinically sufficient; 5: perfect) for: overall image quality; VS conspicuity, and artifacts. Secondly, we analyzed the reliability of the size measurements. Two radiologists specializing in head and neck imaging performed the reading and measurements. The Wilcoxon Signed-Rank Test was used for non-parametric statistical comparison. The DL&4NSA axial/coronal study sequence achieved the highest overall IQ (median 4.9). The image quality (IQ) for DL&1NSA was higher (M: 4.0) than for the reference sequence (w/o DL; median 4.0 versus 3.5, each p < 0.01). Similarly, the VS conspicuity was best for DL&4NSA (M: 4.9), decreased for DL&1NSA (M: 4.1), and was lower but still sufficient for w/o DL (M: 3.7, each p < 0.01). The TA for the axial and coronal post-contrast sequences was 8:59 minutes for DL&4NSA and w/o DL and decreased to 3:24 minutes with DL&1NSA. This study underlines that advanced DL-based denoising techniques can reduce the examination time by more than half while simultaneously improving image quality.

MRI Reconstruction Neurological Retrospective Clinical In Silico Academic Lab

Assessing Inter-rater Reliability of ChatGPT-4 and Orthopaedic Clinicians in Radiographic Fracture Classification.

Walker AN, Smith JB, Simister SK, Patel O, Choudhary S, Seidu M, Dallas-Orr D, Tse S, Shahzad H, Wise P, Scott M, Saiz AM, Lum ZC

•papers•Sep 19 2025

To assess the inter-rater reliability of ChatGPT-4 to that of orthopaedic surgery attendings and residents in classifying fractures on upper extremity (UE) and lower extremity (LE) radiographs. 84 radiographs of various fracture patterns were collected from publicly available online repositories. These images were presented to ChatGPT-4 with the prompt asking it to identify the view, body location, fracture type, and AO/OTA fracture classification. Two orthopaedic surgery residents and two attending orthopaedic surgeons also independently reviewed the images and identified the same categories. Fleiss' Kappa values were calculated to determine inter-rater reliability (IRR) for the following: All Raters Combined, AI vs. Residents (AIR); AI vs. Attendings (AIA); Attendings vs. Residents (AR). ChatGPT-4 achieved substantial to almost perfect agreement with clinicians on location (UE: κ = 0.655-0.708, LE: κ = 0.834-0.909) and fracture type (UE: κ = 0.546-0.563, LE: κ = 0.58-0.697). For view, ChatGPT-4 showed consistent fair agreement for both UE (κ = 0.370-0.404) and LE (κ = 0.309-0.390). ChatGPT-4 struggled the most with AO/OTA classification achieving slight agreement for UE (κ = -0.062-0.159) and moderate agreement for LE (κ = 0.418-0.455). IRR for AIR was consistently lower than IRR for AR. For AR comparisons, almost perfect agreement was observed for location (UE: κ = 0.896, LE: κ = 0.912) and fracture type (UE: κ = 0.948, LE: κ = 0.859), while AO/OTA classification showed fair agreement for UE (κ = 0.257) and moderate for LE (κ = 0.517). The p-values for all comparison groups were significant except for LE AO/OTA classification between AI and residents (p = 0.051). Although ChatcGPT-4 showed promise in classifying basic fracture features, it was not yet at a level comparable to experts, especially with more nuanced interpretations. These findings suggest that the use of AI is more effective as an adjunct to the judgment of trained clinicians rather than a replacement for it.

X-Ray Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab

AI-Driven Multimodality Fusion in Cardiac Imaging: Integrating CT, MRI, and Echocardiography for Precision.

Tran HH, Thu A, Twayana AR, Fuertes A, Gonzalez M, Basta M, James M, Mehta KA, Elias D, Figaro YM, Islek D, Frishman WH, Aronow WS

•papers•Sep 19 2025

Artificial intelligence (AI)-enabled multimodal cardiovascular imaging holds significant promise for improving diagnostic accuracy, enhancing risk stratification, and supporting clinical decision-making. However, its translation into routine practice remains limited by multiple technical, infrastructural, and clinical barriers. This review synthesizes current challenges, including variability in image quality, alignment, and acquisition protocols; scarcity of large, annotated multimodality datasets; interoperability limitations across vendors and institutions; clinical skepticism due to limited prospective validation; and substantial development and implementation costs. Drawing from recent advances, we outline future research priorities to bridge the gap between technical feasibility and clinical utility. Key strategies include developing unified, vendor-agnostic AI models resilient to inter-institutional variability; integrating diverse data types such as genomics, wearable biosensors, and longitudinal clinical records; leveraging reinforcement learning for adaptive decision-support systems; and employing longitudinal imaging fusion for disease tracking and predictive analytics. We emphasize the need for rigorous prospective clinical trials, harmonized imaging standards, and collaborative data-sharing frameworks to ensure robust, equitable, and scalable deployment. Addressing these challenges through coordinated multidisciplinary efforts will be essential to realize the full potential of AI-driven multimodal cardiovascular imaging in advancing precision cardiovascular care.

Mixed Modality Registration Cardiac Review Concept Academic Lab Benchmark SOTA

MUSCLE: A New Perspective to Multi-scale Fusion for Medical Image Classification based on the Theory of Evidence.

Qiu J, Cao J, Huang Y, Zhu Z, Wang F, Lu C, Li Y, Zheng Y

•papers•Sep 19 2025

In the field of medical image analysis, medical image classification is one of the most fundamental and critical tasks. Current researches often rely on the off-the-shelf backbone networks derived from the field of computer vision, hoping to achieve satisfactory classification performance for medical images. However, given the characteristics of medical images, such as scattered distribution and varying sizes of lesions, features extracted with a single scale from the existing backbones often fail to perform accurate medical image classification. To this end, we propose a novel multi-scale learning paradigm, namely MUlti-SCale Learning with trusted Evidences (MUSCLE), which extracts and integrates features from different scales based on the theory of evidence, to generate the more comprehensive feature representation for the medical image classification task. Particularly, the proposed MUSCLE first estimates the uncertainties of features extracted from different scales/stages of the classification backbone as the evidences, and accordingly form the opinions regarding to the feature trustworthiness via a set of evidential deep neural networks. Then, these opinions on different scales of features are ensembled to yield an aggregated opinion, which can be used to adaptively tune the weights of multi-scale features for scatteredly distributed and size-varying lesions, and consequently improve the network capacity for accurate medical image classification. Our MUSCLE paradigm has been evaluated on five publicly available medical image datasets. The experimental results show that the proposed MUSCLE not only improves the accuracy of the original backbone network, but also enhances the reliability and interpretability of model decisions with the trusted evidences (https://github.com/Q4CS/MUSCLE).

Classification Methodology In Silico Open Code

Intratumoral and peritumoral heterogeneity based on CT to predict the pathological response after neoadjuvant chemoimmunotherapy in esophageal squamous cell carcinoma.

Ling X, Yang X, Wang P, Li Y, Wen Z, Wang J, Chen K, Yu Y, Liu A, Ma J, Meng W

•papers•Sep 19 2025

Neoadjuvant chemoimmunotherapy (NACI) regimen (camrelizumab plus paclitaxel and nedaplatin) has shown promising potential in patients with esophageal squamous cell carcinoma (ESCC), but accurately predicting the therapeutic response remains a challenge. To develop and validate a CT-based machine learning model that incorporates both intratumoral and peritumoral heterogeneity for predicting the pathological response of ESCC patients after NACI. Patients with ESCC who underwent surgery following NACI between June 2020 and July 2024 were included retrospectively and prospectively. Univariate and multivariate logistic regression analyses were performed to identify clinical variables associated with pathological response. Traditional radiomics features and habitat radiomics features from the intratumoral and peritumoral regions were extracted from post-treatment CT images, and six predictive models were established using 14 machine learning algorithms. The combined model was developed by integrating intratumoral and peritumoral habitat radiomics features with clinical variables. The performance of the models was evaluated using the area under the receiver operating characteristic curve (AUC). A total of 157 patients (mean [SD] age, 59.6 [6.5] years) were enrolled in our study, of whom 60 (38.2%) achieved major pathological response (MPR) and 40 (25.5%) achieved pathological complete response (pCR). The combined model demonstrated excellent predictive ability for MPR after NACI, with an AUC of 0.915 (95% CI, 0.844-0.981), accuracy of 0.872, sensitivity of 0.733, and specificity of 0.938 in the test set. In sensitivity analysis focusing on pCR, the combined model exhibited robust performance, with an AUC of 0.895 (95% CI, 0.782-0.980) in the test set. The combined model integrating intratumoral and peritumoral habitat radiomics features with clinical variables can accurately predict MPR in ESCC patients after NACI and shows promising potential in predicting pCR.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab

Filter Papers

Tags

A Novel Metric for Detecting Memorization in Generative Models for Brain MRI Synthesis

Radiologist Interaction with AI-Generated Preliminary Reports: A Longitudinal Multi-Reader Study.

Uncovering genetic architecture of the heart via genetic association studies of unsupervised deep learning derived endophenotypes.

Multimodal AI-driven Biomarker for Early Detection of Cancer Cachexia

MFFC-Net: Multi-feature Fusion Deep Networks for Classifying Pulmonary Edema of a Pilot Study by Using Lung Ultrasound Image with Texture Analysis and Transfer Learning Technique.

Deep learning-based acceleration and denoising of 0.55T MRI for enhanced conspicuity of vestibular Schwannoma post contrast administration.

Assessing Inter-rater Reliability of ChatGPT-4 and Orthopaedic Clinicians in Radiographic Fracture Classification.

AI-Driven Multimodality Fusion in Cardiac Imaging: Integrating CT, MRI, and Echocardiography for Precision.

MUSCLE: A New Perspective to Multi-scale Fusion for Medical Image Classification based on the Theory of Evidence.

Intratumoral and peritumoral heterogeneity based on CT to predict the pathological response after neoadjuvant chemoimmunotherapy in esophageal squamous cell carcinoma.

Ready to Sharpen Your Edge?