Latest Papers on Radiology AI. Tags: In Silico

Seeing is Believing-On the Utility of CT in Phenotyping COPD.

Awan HA, Chaudhary MFA, Reinhardt JM

•papers•Jul 12 2025

Chronic obstructive pulmonary disease (COPD) is a heterogeneous condition with complicated structural and functional impairments. For decades now, chest computed tomography (CT) has been used to quantify various abnormalities related to COPD. More recently, with the newer data-driven approaches, biomarker development and validation have evolved rapidly. Studies now target multiple anatomical structures including lung parenchyma, the airways, the vasculature, and the fissures to better characterize COPD. This review explores the evolution of chest CT biomarkers in COPD, beginning with traditional thresholding approaches that quantify emphysema and airway dimensions. We then highlight some of the texture analysis efforts that have been made over the years for subtyping lung tissue. We also discuss image registration-based biomarkers that have enabled spatially-aware mechanisms for understanding local abnormalities within the lungs. More recently, deep learning has enabled automated biomarker extraction, offering improved precision in phenotype characterization and outcome prediction. We highlight the most recent of these approaches as well. Despite these advancements, several challenges remain in terms of dataset heterogeneity, model generalizability, and clinical interpretability. This review lastly provides a structured overview of these limitations and highlights future potential of CT biomarkers in personalized COPD management.

CT Classification Chest Review In Silico GenAI

Novel deep learning framework for simultaneous assessment of left ventricular mass and longitudinal strain: clinical feasibility and validation in patients with hypertrophic cardiomyopathy.

Park J, Yoon YE, Jang Y, Jung T, Jeon J, Lee SA, Choi HM, Hwang IC, Chun EJ, Cho GY, Chang HJ

•papers•Jul 12 2025

This study aims to present the Segmentation-based Myocardial Advanced Refinement Tracking (SMART) system, a novel artificial intelligence (AI)-based framework for transthoracic echocardiography (TTE) that incorporates motion tracking and left ventricular (LV) myocardial segmentation for automated LV mass (LVM) and global longitudinal strain (LVGLS) assessment. The SMART system demonstrates LV speckle tracking based on motion vector estimation, refined by structural information using endocardial and epicardial segmentation throughout the cardiac cycle. This approach enables automated measurement of LVMSMART and LVGLSSMART. The feasibility of SMART is validated in 111 hypertrophic cardiomyopathy (HCM) patients (median age: 58 years, 69% male) who underwent TTE and cardiac magnetic resonance imaging (CMR). LVGLSSMART showed a strong correlation with conventional manual LVGLS measurements (Pearson's correlation coefficient [PCC] 0.851; mean difference 0 [-2-0]). When compared to CMR as the reference standard for LVM, the conventional dimension-based TTE method overestimated LVM (PCC 0.652; mean difference: 106 [90-123]), whereas LVMSMART demonstrated excellent agreement with CMR (PCC 0.843; mean difference: 1 [-11-13]). For predicting extensive myocardial fibrosis, LVGLSSMART and LVMSMART exhibited performance comparable to conventional LVGLS and CMR (AUC: 0.72 and 0.66, respectively). Patients identified as high risk for extensive fibrosis by LVGLSSMART and LVMSMART had significantly higher rates of adverse outcomes, including heart failure hospitalization, new-onset atrial fibrillation, and defibrillator implantation. The SMART technique provides a comparable LVGLS evaluation and a more accurate LVM assessment than conventional TTE, with predictive values for myocardial fibrosis and adverse outcomes. These findings support its utility in HCM management.

Ultrasound Segmentation Cardiac Retrospective Clinical In Silico Academic Lab

Accuracy of large language models in generating differential diagnosis from clinical presentation and imaging findings in pediatric cases.

Jung J, Phillipi M, Tran B, Chen K, Chan N, Ho E, Sun S, Houshyar R

•papers•Jul 12 2025

Large language models (LLM) have shown promise in assisting medical decision-making. However, there is limited literature exploring the diagnostic accuracy of LLMs in generating differential diagnoses from text-based image descriptions and clinical presentations in pediatric radiology. To examine the performance of multiple proprietary LLMs in producing accurate differential diagnoses for text-based pediatric radiological cases without imaging. One hundred sixty-four cases were retrospectively selected from a pediatric radiology textbook and converted into two formats: (1) image description only, and (2) image description with clinical presentation. The ChatGPT-4 V, Claude 3.5 Sonnet, and Gemini 1.5 Pro algorithms were given these inputs and tasked with providing a top 1 diagnosis and a top 3 differential diagnoses. Accuracy of responses was assessed by comparison with the original literature. Top 1 accuracy was defined as whether the top 1 diagnosis matched the textbook, and top 3 differential accuracy was defined as the number of diagnoses in the model-generated top 3 differential that matched any of the top 3 diagnoses in the textbook. McNemar's test, Cochran's Q test, Friedman test, and Wilcoxon signed-rank test were used to compare algorithms and assess the impact of added clinical information, respectively. There was no significant difference in top 1 accuracy between ChatGPT-4 V, Claude 3.5 Sonnet, and Gemini 1.5 Pro when only image descriptions were provided (56.1% [95% CI 48.4-63.5], 64.6% [95% CI 57.1-71.5], 61.6% [95% CI 54.0-68.7]; P = 0.11). Adding clinical presentation to image description significantly improved top 1 accuracy for ChatGPT-4 V (64.0% [95% CI 56.4-71.0], P = 0.02) and Claude 3.5 Sonnet (80.5% [95% CI 73.8-85.8], P < 0.001). For image description and clinical presentation cases, Claude 3.5 Sonnet significantly outperformed both ChatGPT-4 V and Gemini 1.5 Pro (P < 0.001). For top 3 differential accuracy, no significant differences were observed between ChatGPT-4 V, Claude 3.5 Sonnet, and Gemini 1.5 Pro, regardless of whether the cases included only image descriptions (1.29 [95% CI 1.16-1.41], 1.35 [95% CI 1.23-1.48], 1.37 [95% CI 1.25-1.49]; P = 0.60) or both image descriptions and clinical presentations (1.33 [95% CI 1.20-1.45], 1.52 [95% CI 1.41-1.64], 1.48 [95% 1.36-1.59]; P = 0.72). Only Claude 3.5 Sonnet performed significantly better when clinical presentation was added (P < 0.001). Commercial LLMs performed similarly on pediatric radiology cases in providing top 1 accuracy and top 3 differential accuracy when only a text-based image description was used. Adding clinical presentation significantly improved top 1 accuracy for ChatGPT-4 V and Claude 3.5 Sonnet, with Claude showing the largest improvement. Claude 3.5 Sonnet outperformed both ChatGPT-4 V and Gemini 1.5 Pro in top 1 accuracy when both image and clinical data were provided. No significant differences were found in top 3 differential accuracy across models in any condition.

Mixed Modality LLM Radiology Report Retrospective Clinical In Silico Academic Lab Benchmark SOTA

AI-powered disease progression prediction in multiple sclerosis using magnetic resonance imaging: a systematic review and meta-analysis.

Houshi S, Khodakarami Z, Shaygannejad A, Khosravi F, Shaygannejad V

•papers•Jul 12 2025

Disability progression despite disease-modifying therapy remains a major challenge in multiple sclerosis (MS). Artificial intelligence (AI) models exploiting magnetic resonance imaging (MRI) promise personalized prognostication, yet their real-world accuracy is uncertain. To systematically review and meta-analyze MRI-based AI studies predicting future disability progression in MS. Five databases were searched from inception to 17 May 2025 following PRISMA. Eligible studies used MRI in an AI model to forecast changes in the Expanded Disability Status Scale (EDSS) or equivalent metrics. Two reviewers conducted study selection, data extraction, and QUADAS-2 assessment. Random-effects meta-analysis was applied when ≥3 studies reported compatible regression statistics. Twenty-one studies with 12,252 MS patients met inclusion criteria. Five used regression on continuous EDSS, fourteen classification, one time-to-event, and one both. Conventional machine learning predominated (57%), and deep learning (38%). Median classification area under the curve (AUC) was 0.78 (range 0.57-0.86); median regression root-mean-square-error (RMSE) 1.08 EDSS points. Pooled RMSE across regression studies was 1.31 (95% CI 1.02-1.60; I2 = 95%). Deep learning conferred only marginal, non-significant gains over classical algorithms. External validation appeared in six studies; calibration, decision-curve analysis and code releases were seldom reported. QUADAS-2 indicated generally low patient-selection bias but frequent index-test concerns. MRI-driven AI models predict MS disability progression with moderate accuracy, but error margins that exceed one EDSS point limit individual-level utility. Harmonized endpoints, larger multicenter cohorts, rigorous external validation, and prospective clinician-in-the-loop trials are essential before routine clinical adoption.

MRI Classification Neurological Meta Analysis In Silico Academic Lab Benchmark SOTA

The role of neuro-imaging in multiple system atrophy.

Krismer F, Seppi K, Poewe W

•papers•Jul 12 2025

Neuroimaging plays a crucial role in diagnosing multiple system atrophy and monitoring progressive neurodegeneration in this fatal disease. Advanced MRI techniques and post-processing methods have demonstrated significant volume loss and microstructural changes in brain regions well known to be affected by MSA pathology. These observations can be exploited to support the differential diagnosis of MSA distinguishing it from Parkinson's disease and progressive supranuclear palsy with high sensitivity and specificity. Longitudinal studies reveal aggressive neurodegeneration in MSA, with notable atrophy rates in the cerebellum, pons, and putamen. Radiotracer imaging using PET and SPECT has shown characteristic disease-related patterns, aiding in differential diagnosis and tracking disease progression. Future research should focus on early diagnosis, particularly in prodromal stages, and the development of reliable biomarkers for clinical trials. Combining different neuroimaging modalities and machine learning algorithms can enhance diagnostic precision and provide a comprehensive understanding of MSA pathology.

Mixed Modality Classification Neurological Review In Silico

PanoDiff-SR: Synthesizing Dental Panoramic Radiographs using Diffusion and Super-resolution

Sanyam Jain, Bruna Neves de Freitas, Andreas Basse-OConnor, Alexandros Iosifidis, Ruben Pauwels

•preprint•Jul 12 2025

There has been increasing interest in the generation of high-quality, realistic synthetic medical images in recent years. Such synthetic datasets can mitigate the scarcity of public datasets for artificial intelligence research, and can also be used for educational purposes. In this paper, we propose a combination of diffusion-based generation (PanoDiff) and Super-Resolution (SR) for generating synthetic dental panoramic radiographs (PRs). The former generates a low-resolution (LR) seed of a PR (256 X 128) which is then processed by the SR model to yield a high-resolution (HR) PR of size 1024 X 512. For SR, we propose a state-of-the-art transformer that learns local-global relationships, resulting in sharper edges and textures. Experimental results demonstrate a Frechet inception distance score of 40.69 between 7243 real and synthetic images (in HR). Inception scores were 2.55, 2.30, 2.90 and 2.98 for real HR, synthetic HR, real LR and synthetic LR images, respectively. Among a diverse group of six clinical experts, all evaluating a mixture of 100 synthetic and 100 real PRs in a time-limited observation, the average accuracy in distinguishing real from synthetic images was 68.5% (with 50% corresponding to random guessing).

X-Ray Image Synthesis Methodology In Silico Open Dataset

Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distribution Shift

Behraj Khan, Tahir Syed

•preprint•Jul 12 2025

Foundation models like CLIP and SAM have transformed computer vision and medical imaging via low-shot transfer learning. However, deployment of these models hindered by two key challenges: \textit{distribution shift} between training and test data, and \textit{confidence misalignment} that leads to overconfident incorrect predictions. These issues manifest differently in vision-language classification and medical segmentation tasks, yet existing solutions remain domain-specific. We propose \textit{StaRFM}, a unified framework addressing both challenges. It introduces a Fisher information penalty (FIP), extended to 3D medical data via patch-wise regularization, to reduce covariate shift in CLIP and SAM embeddings. Additionally, a confidence misalignment penalty (CMP), reformulated for voxel-level predictions, calibrates uncertainty in segmentation tasks. We theoretically derive PAC-Bayes bounds showing FIP controls generalization via the Fisher-Rao norm, while CMP minimizes calibration error through Brier score optimization. StaRFM shows consistent performance like \texttt{+}3.5\% accuracy and 28\% lower ECE on 19 vision datasets (e.g., ImageNet, Office-Home), 84.7\% DSC and 4.8mm HD95 in medical segmentation (e.g., BraTS, ATLAS), and 40\% lower cross-domain performance gap compared to prior benchmarking methods. The framework is plug-and-play, requiring minimal architectural changes for seamless integration with foundation models. Code and models will be released at https://anonymous.4open.science/r/StaRFM-C0CD/README.md

MRI Segmentation Neurological Methodology In Silico Academic Lab Benchmark SOTA Open Code

Oriented tooth detection: a CBCT image processing method integrated with RoI transformer.

Zhao Z, Wu B, Su S, Liu D, Wu Z, Gao R, Zhang N

•papers•Jul 11 2025

Cone beam computed tomography (CBCT) has revolutionized dental imaging due to its high spatial resolution and ability to provide detailed three-dimensional reconstructions of dental structures. This study introduces an innovative CBCT image processing method using an oriented object detection approach integrated with a Region of Interest (RoI) Transformer. This study addresses the challenge of accurate tooth detection and classification in PAN derived from CBCT, introducing an innovative oriented object detection approach, which has not been previously applied in dental imaging. This method better aligns with the natural growth patterns of teeth, allowing for more accurate detection and classification of molars, premolars, canines, and incisors. By integrating RoI transformer, the model demonstrates relatively acceptable performance metrics compared to conventional horizontal detection methods, while also offering enhanced visualization capabilities. Furthermore, post-processing techniques, including distance and grayscale value constraints, are employed to correct classification errors and reduce false positives, especially in areas with missing teeth. The experimental results indicate that the proposed method achieves an accuracy of 98.48%, a recall of 97.21%, an F1 score of 97.21%, and an mAP of 98.12% in tooth detection. The proposed method enhances the accuracy of tooth detection in CBCT-derived PAN by reducing background interference and improving the visualization of tooth orientation.

CT Detection Methodology In Silico Academic Lab

HNOSeg-XS: Extremely Small Hartley Neural Operator for Efficient and Resolution-Robust 3D Image Segmentation.

Wong KCL, Wang H, Syeda-Mahmood T

•papers•Jul 11 2025

In medical image segmentation, convolutional neural networks (CNNs) and transformers are dominant. For CNNs, given the local receptive fields of convolutional layers, long-range spatial correlations are captured through consecutive convolutions and pooling. However, as the computational cost and memory footprint can be prohibitively large, 3D models can only afford fewer layers than 2D models with reduced receptive fields and abstract levels. For transformers, although long-range correlations can be captured by multi-head attention, its quadratic complexity with respect to input size is computationally demanding. Therefore, either model may require input size reduction to allow more filters and layers for better segmentation. Nevertheless, given their discrete nature, models trained with patch-wise training or image downsampling may produce suboptimal results when applied on higher resolutions. To address this issue, here we propose the resolution-robust HNOSeg-XS architecture. We model image segmentation by learnable partial differential equations through the Fourier neural operator which has the zero-shot super-resolution property. By replacing the Fourier transform by the Hartley transform and reformulating the problem in the frequency domain, we created the HNOSeg-XS model, which is resolution robust, fast, memory efficient, and extremely parameter efficient. When tested on the BraTS'23, KiTS'23, and MVSeg'23 datasets with a Tesla V100 GPU, HNOSeg-XS showed its superior resolution robustness with fewer than 34.7k model parameters. It also achieved the overall best inference time (< 0.24 s) and memory efficiency (< 1.8 GiB) compared to the tested CNN and transformer models1.

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA

Automated MRI protocoling in neuroradiology in the era of large language models.

Reiner LN, Chelbi M, Fetscher L, Stöckel JC, Csapó-Schmidt C, Guseynova S, Al Mohamad F, Bressem KK, Nawabi J, Siebert E, Wattjes MP, Scheel M, Meddeb A

•papers•Jul 11 2025

This study investigates the automation of MRI protocoling, a routine task in radiology, using large language models (LLMs), comparing an open-source (LLama 3.1 405B) and a proprietary model (GPT-4o) with and without retrieval-augmented generation (RAG), a method for incorporating domain-specific knowledge. This retrospective study included MRI studies conducted between January and December 2023, along with institution-specific protocol assignment guidelines. Clinical questions were extracted, and a neuroradiologist established the gold standard protocol. LLMs were tasked with assigning MRI protocols and contrast medium administration with and without RAG. The results were compared to protocols selected by four radiologists. Token-based symmetric accuracy, the Wilcoxon signed-rank test, and the McNemar test were used for evaluation. Data from 100 neuroradiology reports (mean age = 54.2 years ± 18.41, women 50%) were included. RAG integration significantly improved accuracy in sequence and contrast media prediction for LLama 3.1 (Sequences: 38% vs. 70%, P < .001, Contrast Media: 77% vs. 94%, P < .001), and GPT-4o (Sequences: 43% vs. 81%, P < .001, Contrast Media: 79% vs. 92%, P = .006). GPT-4o outperformed LLama 3.1 in MRI sequence prediction (81% vs. 70%, P < .001), with comparable accuracies to the radiologists (81% ± 0.21, P = .43). Both models equaled radiologists in predicting contrast media administration (LLama 3.1 RAG: 94% vs. 91% ± 0.2, P = .37, GPT-4o RAG: 92% vs. 91% ± 0.24, P = .48). Large language models show great potential as decision-support tools for MRI protocoling, with performance similar to radiologists. RAG enhances the ability of LLMs to provide accurate, institution-specific protocol recommendations.

MRI LLM Radiology Report Neurological Retrospective Clinical In Silico Academic Lab GenAI Benchmark SOTA

Filter Papers

Tags

Seeing is Believing-On the Utility of CT in Phenotyping COPD.

Novel deep learning framework for simultaneous assessment of left ventricular mass and longitudinal strain: clinical feasibility and validation in patients with hypertrophic cardiomyopathy.

Accuracy of large language models in generating differential diagnosis from clinical presentation and imaging findings in pediatric cases.

AI-powered disease progression prediction in multiple sclerosis using magnetic resonance imaging: a systematic review and meta-analysis.

The role of neuro-imaging in multiple system atrophy.

PanoDiff-SR: Synthesizing Dental Panoramic Radiographs using Diffusion and Super-resolution

Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distribution Shift

Oriented tooth detection: a CBCT image processing method integrated with RoI transformer.

HNOSeg-XS: Extremely Small Hartley Neural Operator for Efficient and Resolution-Robust 3D Image Segmentation.

Automated MRI protocoling in neuroradiology in the era of large language models.

Ready to Sharpen Your Edge?