Latest Papers on Radiology AI.

Multi-Strategy Guided Diffusion via Sparse Masking Temporal Reweighting Distribution Correction

Zekun Zhou, Yanru Gong, Liu Shi, Qiegen Liu

•preprint•Sep 7 2025

Diffusion models have demonstrated remarkable generative capabilities in image processing tasks. We propose a Sparse condition Temporal Rewighted Integrated Distribution Estimation guided diffusion model (STRIDE) for sparse-view CT reconstruction. Specifically, we design a joint training mechanism guided by sparse conditional probabilities to facilitate the model effective learning of missing projection view completion and global information modeling. Based on systematic theoretical analysis, we propose a temporally varying sparse condition reweighting guidance strategy to dynamically adjusts weights during the progressive denoising process from pure noise to the real image, enabling the model to progressively perceive sparse-view information. The linear regression is employed to correct distributional shifts between known and generated data, mitigating inconsistencies arising during the guidance process. Furthermore, we construct a dual-network parallel architecture to perform global correction and optimization across multiple sub-frequency components, thereby effectively improving the model capability in both detail restoration and structural preservation, ultimately achieving high-quality image reconstruction. Experimental results on both public and real datasets demonstrate that the proposed method achieves the best improvement of 2.58 dB in PSNR, increase of 2.37\% in SSIM, and reduction of 0.236 in MSE compared to the best-performing baseline methods. The reconstructed images exhibit excellent generalization and robustness in terms of structural consistency, detail restoration, and artifact suppression.

CT Reconstruction Methodology In Silico Academic Lab

Imagining Alternatives: Towards High-Resolution 3D Counterfactual Medical Image Generation via Language Guidance

Mohamed Mohamed, Brennan Nichyporuk, Douglas L. Arnold, Tal Arbel

•preprint•Sep 7 2025

Vision-language models have demonstrated impressive capabilities in generating 2D images under various conditions; however the impressive performance of these models in 2D is largely enabled by extensive, readily available pretrained foundation models. Critically, comparable pretrained foundation models do not exist for 3D, significantly limiting progress in this domain. As a result, the potential of vision-language models to produce high-resolution 3D counterfactual medical images conditioned solely on natural language descriptions remains completely unexplored. Addressing this gap would enable powerful clinical and research applications, such as personalized counterfactual explanations, simulation of disease progression scenarios, and enhanced medical training by visualizing hypothetical medical conditions in realistic detail. Our work takes a meaningful step toward addressing this challenge by introducing a framework capable of generating high-resolution 3D counterfactual medical images of synthesized patients guided by free-form language prompts. We adapt state-of-the-art 3D diffusion models with enhancements from Simple Diffusion and incorporate augmented conditioning to improve text alignment and image quality. To our knowledge, this represents the first demonstration of a language-guided native-3D diffusion model applied specifically to neurological imaging data, where faithful three-dimensional modeling is essential to represent the brain's three-dimensional structure. Through results on two distinct neurological MRI datasets, our framework successfully simulates varying counterfactual lesion loads in Multiple Sclerosis (MS), and cognitive states in Alzheimer's disease, generating high-quality images while preserving subject fidelity in synthetically generated medical images. Our results lay the groundwork for prompt-driven disease progression analysis within 3D medical imaging.

MRI Image Synthesis Neurological Methodology In Silico Academic Lab GenAI

Physics-Guided Diffusion Transformer with Spherical Harmonic Posterior Sampling for High-Fidelity Angular Super-Resolution in Diffusion MRI

Mu Nan, Taohui Xiao, Ruoyou Wu, Shoujun Yu, Ye Li, Hairong Zheng, Shanshan Wang

•preprint•Sep 7 2025

Diffusion MRI (dMRI) angular super-resolution (ASR) aims to reconstruct high-angular-resolution (HAR) signals from limited low-angular-resolution (LAR) data without prolonging scan time. However, existing methods are limited in recovering fine-grained angular details or preserving high fidelity due to inadequate modeling of q-space geometry and insufficient incorporation of physical constraints. In this paper, we introduce a Physics-Guided Diffusion Transformer (PGDiT) designed to explore physical priors throughout both training and inference stages. During training, a Q-space Geometry-Aware Module (QGAM) with b-vector modulation and random angular masking facilitates direction-aware representation learning, enabling the network to generate directionally consistent reconstructions with fine angular details from sparse and noisy data. In inference, a two-stage Spherical Harmonics-Guided Posterior Sampling (SHPS) enforces alignment with the acquired data, followed by heat-diffusion-based SH regularization to ensure physically plausible reconstructions. This coarse-to-fine refinement strategy mitigates oversmoothing and artifacts commonly observed in purely data-driven or generative models. Extensive experiments on general ASR tasks and two downstream applications, Diffusion Tensor Imaging (DTI) and Neurite Orientation Dispersion and Density Imaging (NODDI), demonstrate that PGDiT outperforms existing deep learning models in detail recovery and data fidelity. Our approach presents a novel generative ASR framework that offers high-fidelity HAR dMRI reconstructions, with potential applications in neuroscience and clinical research.

MRI Reconstruction Neurological Methodology In Silico Academic Lab GenAI

RetinaGuard: Obfuscating Retinal Age in Fundus Images for Biometric Privacy Preserving

Zhengquan Luo, Chi Liu, Dongfu Xiao, Zhen Yu, Yueye Wang, Tianqing Zhu

•preprint•Sep 7 2025

The integration of AI with medical images enables the extraction of implicit image-derived biomarkers for a precise health assessment. Recently, retinal age, a biomarker predicted from fundus images, is a proven predictor of systemic disease risks, behavioral patterns, aging trajectory and even mortality. However, the capability to infer such sensitive biometric data raises significant privacy risks, where unauthorized use of fundus images could lead to bioinformation leakage, breaching individual privacy. In response, we formulate a new research problem of biometric privacy associated with medical images and propose RetinaGuard, a novel privacy-enhancing framework that employs a feature-level generative adversarial masking mechanism to obscure retinal age while preserving image visual quality and disease diagnostic utility. The framework further utilizes a novel multiple-to-one knowledge distillation strategy incorporating a retinal foundation model and diverse surrogate age encoders to enable a universal defense against black-box age prediction models. Comprehensive evaluations confirm that RetinaGuard successfully obfuscates retinal age prediction with minimal impact on image quality and pathological feature representation. RetinaGuard is also flexible for extension to other medical image derived biomarkers. RetinaGuard is also flexible for extension to other medical image biomarkers.

OCT Image Synthesis Methodology In Silico GenAI Ethics

Early postnatal characteristics and differential diagnosis of choledochal cyst and cystic biliary atresia.

Tian Y, Chen S, Ji C, Wang XP, Ye M, Chen XY, Luo JF, Li X, Li L

•papers•Sep 7 2025

Choledochal cysts (CC) and cystic biliary atresia (CBA) present similarly in early infancy but require different treatment approaches. While CC surgery can be delayed until 3-6 months of age in asymptomatic patients, CBA requires intervention within 60 days to prevent cirrhosis. To develop a diagnostic model for early differentiation between these conditions. A total of 319 patients with hepatic hilar cysts (< 60 days old at surgery) were retrospectively analyzed; these patients were treated at three hospitals between 2011 and 2022. Clinical features including biochemical markers and ultrasonographic measurements were compared between CC (n = 274) and CBA (n = 45) groups. Least absolute shrinkage and selection operator regression identified key diagnostic features, and 11 machine learning models were developed and compared. The CBA group showed higher levels of total bile acid, total bilirubin, γ-glutamyl transferase, aspartate aminotransferase, and alanine aminotransferase, and direct bilirubin, while longitudinal diameter of the cysts and transverse diameter of the cysts were larger in the CC group. The multilayer perceptron model demonstrated optimal performance with 95.8% accuracy, 92.9% sensitivity, 96.3% specificity, and an area under the curve of 0.990. Decision curve analysis confirmed its clinical utility. Based on the model, we developed user-friendly diagnostic software for clinical implementation. Our machine learning approach differentiates CC from CBA in early infancy using routinely available clinical parameters. Early accurate diagnosis facilitates timely surgical intervention for CBA cases, potentially improving patient outcomes.

Ultrasound Classification Abdominal Retrospective Clinical In Silico Academic Lab GenAI

The Effect of Image Resolution on the Performance of Deep Learning Algorithms in Detecting Calcaneus Fractures on X-Ray

Yee, N. J., Taseh, A., Ghandour, S., Sirls, E., Halai, M., Whyne, C., DiGiovanni, C. W., Kwon, J. Y., Ashkani-Esfahani, S. J.

•preprint•Sep 7 2025

PurposeTo evaluate convolutional neural network (CNN) model training strategies that optimize the performance of calcaneus fracture detection on radiographs at different image resolutions. Materials and MethodsThis retrospective study included foot radiographs from a single hospital between 2015 and 2022 for a total of 1,775 x-ray series (551 fractures; 1,224 without) and was split into training (70%), validation (15%), and testing (15%). ImageNet pre-trained ResNet models were fine-tuned on the dataset. Three training strategies were evaluated: 1) single size: trained exclusively on 128x128, 256x256, 512x512, 640x640, or 900x900 radiographs (5 model sets); 2) curriculum learning: trained exclusively on 128x128 radiographs then exclusively on 256x256, then 512x512, then 640x640, and finally on 900x900 (5 model sets); and 3) multi-scale augmentation: trained on x-ray images resized along continuous dimensions between 128x128 to 900x900 (1 model set). Inference time and training time were compared. ResultsMulti-scale augmentation trained models achieved the highest average area under the Receiver Operating Characteristic curve of 0.938 [95% CI: 0.936 - 0.939] for a single model across image resolutions compared to the other strategies without prolonging training or inference time. Using the optimal model sets, curriculum learning had the highest sensitivity on in-distribution low-resolution images (85.4% to 90.1%) and on out-of-distribution high-resolution images (78.2% to 89.2%). However, curriculum learning models took significantly longer to train (11.8 [IQR: 11.1-16.4] hours; P<.001). ConclusioWhile 512x512 images worked well for fracture identification, curriculum learning and multi-scale augmentation training strategies algorithmically improved model robustness towards different image resolutions without requiring additional annotated data. Summary statementDifferent deep learning training strategies affect performance in detecting calcaneus fractures on radiographs across in- and out-of-distribution image resolutions, with a multi-scale augmentation strategy conferring the greatest overall performance improvement in a single model. Key pointsO_LITraining strategies addressing differences in radiograph image resolution (or pixel dimensions) could improve deep learning performance. C_LIO_LIThe highest average performance across different image resolutions in a single model was achieved by multi-scale augmentation, where the sampled training dataset is uniformly resized between square resolutions of 128x128 to 900x900. C_LIO_LICompared to model training on a single image resolution, sequentially training on increasingly higher resolution images up to 900x900 (i.e., curriculum learning) resulted in higher fracture detection performance on images resolutions between 128x128 and 2048x2048. C_LI

X-Ray Detection Musculoskeletal Retrospective Clinical In Silico Academic Lab

Multi-task learning for classification and prediction of adolescent idiopathic scoliosis based on fringe-projection three-dimensional imaging.

Feng CK, Chen YJ, Dinh QT, Tran KT, Liu CY

•papers•Sep 6 2025

This study aims to address the limitations of radiographic imaging and single-task learning models in adolescent idiopathic scoliosis assessment by developing a noninvasive, radiation-free diagnostic framework. A multi-task deep learning model was trained using structured back surface data acquired via fringe projection three-dimensional imaging. The model was designed to simultaneously predict the Cobb angle, curve type (thoracic, lumbar, mixed, none), and curve direction (left, right, none) by learning shared morphological features. The multi-task model achieved a mean absolute error (MAE) of 2.9° and a root mean square error (RMSE) of 6.9° for Cobb angle prediction, outperforming the single-task baseline (5.4° MAE, 12.5° RMSE). It showed strong correlation with radiographic measurements (R = 0.96, R² = 0.91). For curve classification, it reached 89% sensitivity in lumbar and mixed types, and 80% and 75% sensitivity for right and left directions, respectively, with an 87% positive predictive value for right-sided curves. The proposed multi-task learning model demonstrates that jointly learning related clinical tasks allows for the extraction of more robust and clinically meaningful geometric features from surface data. It outperforms traditional single-task approaches in both accuracy and stability. This framework provides a safe, efficient, and non-invasive alternative to X-ray-based scoliosis assessment and has the potential to support real-time screening and long-term monitoring of adolescent idiopathic scoliosis in clinical practice.

Mixed Modality Classification Musculoskeletal Methodology In Silico Academic Lab

Artificial intelligence-assisted assessment of metabolic response to tebentafusp in metastatic uveal melanoma: a long axial field-of-view [18F]FDG PET/CT study.

Sachpekidis C, Machiraju D, Strauss DS, Pan L, Kopp-Schneider A, Edenbrandt L, Dimitrakopoulou-Strauss A, Hassel JC

•papers•Sep 6 2025

Tebentafusp has emerged as the first systemic therapy to significantly prolong survival in treatment-naïve HLA-A*02:01 + patients with unresectable or metastatic uveal melanoma (mUM). Notably, a survival benefit has been observed even in the absence of radiographic response. This study aims to investigate the feasibility and prognostic value of artificial intelligence (AI)-assisted quantification and metabolic response assessment of [18F]FDG long axial field-of-view (LAFOV) PET/CT in mUM patients undergoing tebentafusp therapy. Fifteen patients with mUM treated with tebentafusp underwent [18F]FDG LAFOV PET/CT at baseline and 3 months post-treatment. Total metabolic tumor volume (TMTV) and total lesion glycolysis (TLG) were quantified using a deep learning-based segmentation tool On the RECOMIA platform. Metabolic response was assessed according to AI-assisted PERCIST 1.0 criteria. Associations between PET-derived parameters and overall survival (OS) were evaluated using Kaplan-Meier survival analysis. The median follow up (95% CI) was 14.1 months (12.9 months - not available). Automated TMTV and TLG measurements were successfully obtained in all patients. Elevated baseline TMTV and TLG were significantly associated with shorter OS (TMTV: 16.9 vs. 27.2 months; TLG: 16.9 vs. 27.2 months; p < 0.05). Similarly, higher TMTV and TLG at 3 months post-treatment predicted poorer survival outcomes (TMTV: 14.3 vs. 24.5 months; TLG: 14.3 vs. 24.5 months; p < 0.05). AI-assisted PERCIST response evaluation identified six patients with disease control (complete metabolic response, partial metabolic response, stable metabolic disease) and nine with progressive metabolic disease. A trend toward improved OS was observed in patients with disease control (24.5 vs. 14.6 months, p = 0.08). Circulating tumor DNA (ctDNA) levels based on GNAQ and GNA11 mutations were available in 8 patients; after 3 months Of tebentafusp treatment, 5 showed reduced Or stable ctDNA levels, and 3 showed an increase (median OS: 24.5 vs. 3.3 months; p = 0.13). Patients with increasing ctDNA levels exhibited significantly higher TMTV and TLG on follow-up imaging. AI-assisted whole-body quantification of [1⁸F]FDG PET/CT and PERCIST-based response assessment are feasible and hold prognostic significance in tebentafusp-treated mUM. TMTV and TLG may serve as non-invasive imaging biomarkers for risk stratification and treatment monitoring in this malignancy.

PET Segmentation Whole Body Retrospective Clinical Clinical Pilot Academic Lab Benchmark SOTA

Interpreting BI-RADS-Free Breast MRI Reports Using a Large Language Model: Automated BI-RADS Classification From Narrative Reports Using ChatGPT.

Tekcan Sanli DE, Sanli AN, Ozmen G, Ozmen A, Cihan I, Kurt A, Esmerer E

•papers•Sep 6 2025

This study aimed to evaluate the performance of ChatGPT (GPT-4o) in interpreting free-text breast magnetic resonance imaging (MRI) reports by assigning BI-RADS categories and recommending appropriate clinical management steps in the absence of explicitly stated BI-RADS classifications. In this retrospective, single-center study, a total of 352 documented full-text breast MRI reports of at least one identifiable breast lesion with descriptive imaging findings between January 2024 and June 2025 were included in the study. Incomplete reports due to technical limitations, reports describing only normal findings, and MRI examinations performed at external institutions were excluded from the study. First, it was aimed to assess ChatGPT's ability to infer the correct BI-RADS category (2-3-4a-4b-4c-5 separately) based solely on the narrative imaging findings. Second, it was evaluated the model's ability to distinguish between benign versus suspicious/malignant imaging features in terms of clinical decision-making. Therefore, BI-RADS 2-3 categories were grouped as "benign," and BI-RADS 4-5 as "suspicious/malignant," in alignment with how BI-RADS categories are used to guide patient management, rather than to represent definitive diagnostic outcomes. Reports originally containing the term "BI-RADS" were manually de-identified by removing BI-RADS categories and clinical recommendations. Each narrative report was then processed through ChatGPT using two standardized prompts as follows: (1) What is the most appropriate BI-RADS category based on the findings in the report? (2) What should be the next clinical step (e.g., follow-up, biopsy)? Responses were evaluated in real time by two experienced breast radiologists, and consensus was used as the reference standard. ChatGPT demonstrated moderate agreement with radiologists' consensus for BI-RADS classification (Cohen's Kappa (κ): 0.510, p<0.001). Classification accuracy was highest for BI-RADS 5 reports (77.9%), whereas lower agreement was observed in intermediate categories such as BI-RADS 3 (52.4% correct) and 4B (29.4% correct). In the binary classification of reports as benign or malignant, ChatGPT achieved almost perfect agreement (κ: 0.843), correctly identifying 91.7% of benign and 93.2% of malignant reports. Notably, the model's management recommendations were 100% consistent with its assigned BI-RADS categories, advising biopsy for all BI-RADS 4-5 cases and short-interval follow-up or conditional biopsy for BI-RADS 3 reports. ChatGPT accurately interprets unstructured breast MRI reports, particularly in benign/malignant discrimination and corresponding clinical recommendations. This technology holds potential as a decision support tool to standardize reporting and enhance clinical workflows, especially in settings with variable reporting practices. Prospective, multi-institutional studies are needed for further validation.

MRI Classification Breast Retrospective Clinical In Silico GenAI

Brain Tumor Detection Through Diverse CNN Architectures in IoT Healthcare Industries: Fast R-CNN, U-Net, Transfer Learning-Based CNN, and Fully Connected CNN

Mohsen Asghari Ilani, Yaser M. Banad

•preprint•Sep 6 2025

Artificial intelligence (AI)-powered deep learning has advanced brain tumor diagnosis in Internet of Things (IoT)-healthcare systems, achieving high accuracy with large datasets. Brain health is critical to human life, and accurate diagnosis is essential for effective treatment. Magnetic Resonance Imaging (MRI) provides key data for brain tumor detection, serving as a major source of big data for AI-driven image classification. In this study, we classified glioma, meningioma, and pituitary tumors from MRI images using Region-based Convolutional Neural Network (R-CNN) and UNet architectures. We also applied Convolutional Neural Networks (CNN) and CNN-based transfer learning models such as Inception-V3, EfficientNetB4, and VGG19. Model performance was assessed using F-score, recall, precision, and accuracy. The Fast R-CNN achieved the best results with 99% accuracy, 98.5% F-score, 99.5% Area Under the Curve (AUC), 99.4% recall, and 98.5% precision. Combining R-CNN, UNet, and transfer learning enables earlier diagnosis and more effective treatment in IoT-healthcare systems, improving patient outcomes. IoT devices such as wearable monitors and smart imaging systems continuously collect real-time data, which AI algorithms analyze to provide immediate insights for timely interventions and personalized care. For external cohort cross-dataset validation, EfficientNetB2 achieved the strongest performance among fine-tuned EfficientNet models, with 92.11% precision, 92.11% recall/sensitivity, 95.96% specificity, 92.02% F1-score, and 92.23% accuracy. These findings underscore the robustness and reliability of AI models in handling diverse datasets, reinforcing their potential to enhance brain tumor classification and patient care in IoT healthcare environments.

MRI Classification Neurological Methodology In Silico Academic Lab

Filter Papers

Tags

Multi-Strategy Guided Diffusion via Sparse Masking Temporal Reweighting Distribution Correction

Imagining Alternatives: Towards High-Resolution 3D Counterfactual Medical Image Generation via Language Guidance

Physics-Guided Diffusion Transformer with Spherical Harmonic Posterior Sampling for High-Fidelity Angular Super-Resolution in Diffusion MRI

RetinaGuard: Obfuscating Retinal Age in Fundus Images for Biometric Privacy Preserving

Early postnatal characteristics and differential diagnosis of choledochal cyst and cystic biliary atresia.

The Effect of Image Resolution on the Performance of Deep Learning Algorithms in Detecting Calcaneus Fractures on X-Ray

Multi-task learning for classification and prediction of adolescent idiopathic scoliosis based on fringe-projection three-dimensional imaging.

Artificial intelligence-assisted assessment of metabolic response to tebentafusp in metastatic uveal melanoma: a long axial field-of-view [<sup>18</sup>F]FDG PET/CT study.

Interpreting BI-RADS-Free Breast MRI Reports Using a Large Language Model: Automated BI-RADS Classification From Narrative Reports Using ChatGPT.

Brain Tumor Detection Through Diverse CNN Architectures in IoT Healthcare Industries: Fast R-CNN, U-Net, Transfer Learning-Based CNN, and Fully Connected CNN

Ready to Sharpen Your Edge?