Latest Papers on Radiology AI. Sources: pubmed, Tags: GenAI, Order: Best Match, Limit: 10.

Aphasia severity prediction using a multi-modal machine learning approach.

Hu X, Varkanitsa M, Kropp E, Betke M, Ishwar P, Kiran S

•papers•Aug 15 2025

The present study examined an integrated multiple neuroimaging modality (T1 structural, Diffusion Tensor Imaging (DTI), and resting-state FMRI (rsFMRI)) to predict aphasia severity using Western Aphasia Battery-Revised Aphasia Quotient (WAB-R AQ) in 76 individuals with post-stroke aphasia. We employed Support Vector Regression (SVR) and Random Forest (RF) models with supervised feature selection and a stacked feature prediction approach. The SVR model outperformed RF, achieving an average root mean square error (RMSE) of 16.38±5.57, Pearson's correlation coefficient (r) of 0.70±0.13, and mean absolute error (MAE) of 12.67±3.27, compared to RF's RMSE of 18.41±4.34, r of 0.66±0.15, and MAE of 14.64±3.04. Resting-state neural activity and structural integrity emerged as crucial predictors of aphasia severity, appearing in the top 20% of predictor combinations for both SVR and RF. Finally, the feature selection method revealed that functional connectivity in both hemispheres and between homologous language areas is critical for predicting language outcomes in patients with aphasia. The statistically significant difference in performance between the model using only single modality and the optimal multi-modal SVR/RF model (which included both resting-state connectivity and structural information) underscores that aphasia severity is influenced by factors beyond lesion location and volume. These findings suggest that integrating multiple neuroimaging modalities enhances the prediction of language outcomes in aphasia beyond lesion characteristics alone, offering insights that could inform personalized rehabilitation strategies.

Mixed Modality Registration Neurological Retrospective Clinical In Silico Academic Lab GenAI

Artificial Intelligence based fractional flow reserve.

Bednarek A, Gąsior P, Jaguszewski M, Buszman PP, Milewski K, Hawranek M, Gil R, Wojakowski W, Kochman J, Tomaniak M

•papers•Aug 14 2025

Fractional flow reserve (FFR) - a physiological indicator of coronary stenosis significance - has now become a widely used parameter also in the guidance of percutaneous coronary intervention (PCI). Several studies have shown the superiority of FFR compared to visual assessment, contributing to the reduction in clinical endpoints. However, the current approach to FFR assessment requires coronary instrumentation with a dedicated pressure wire and thus increasing invasiveness, cost, and duration of the procedure. Alternative, noninvasive methods of FFR assessment based on computational fluid dynamics are being widely tested; these approaches are generally not fully automated and may sometimes require substantial computational power. Nowadays, one of the most rapidly expanding fields in medicine is the use of artificial intelligence (AI) in therapy optimization, diagnosis, treatment, and risk stratification. AI usage contributes to the development of more sophisticated methods of imaging analysis and allows for the derivation of clinically important parameters in a faster and more accurate way. Over the recent years, AI utility in deriving FFR in a noninvasive manner has been increasingly reported. In this review, we critically summarize current knowledge in the field of AI-derived FFR based on data from computed tomography angiography, invasive angiography, optical coherence tomography, and intravascular ultrasound. Available solutions, possible future directions in optimizing cathlab performance, including the use of mixed reality, as well as current limitations standing behind the wide adoption of these techniques, are overviewed.

Mixed Modality Classification Cardiac Review In Silico Academic Lab GenAI

Exploring Radiologists' Use of AI Chatbots for Assistance in Image Interpretation: Patterns of Use and Trust Evaluation.

Alarifi M

•papers•Aug 13 2025

This study investigated radiologists' perceptions of AI-generated, patient-friendly radiology reports across three modalities: MRI, CT, and mammogram/ultrasound. The evaluation focused on report correctness, completeness, terminology complexity, and emotional impact. Seventy-nine radiologists from four major Saudi Arabian hospitals assessed AI-simplified versions of clinical radiology reports. Each participant reviewed one report from each modality and completed a structured questionnaire covering factual correctness, completeness, terminology complexity, and emotional impact. A structured and detailed prompt was used to guide ChatGPT-4 in generating the reports, which included clear findings, a lay summary, glossary, and clarification of ambiguous elements. Statistical analyses included descriptive summaries, Friedman tests, and Pearson correlations. Radiologists rated mammogram reports highest for correctness (M = 4.22), followed by CT (4.05) and MRI (3.95). Completeness scores followed a similar trend. Statistically significant differences were found in correctness (χ2(2) = 17.37, p < 0.001) and completeness (χ2(2) = 13.13, p = 0.001). Anxiety and complexity ratings were moderate, with MRI reports linked to slightly higher concern. A weak positive correlation emerged between radiologists' experience and mammogram correctness ratings (r = .235, p = .037). Radiologists expressed overall support for AI-generated simplified radiology reports when created using a structured prompt that includes summaries, glossaries, and clarification of ambiguous findings. While mammography and CT reports were rated favorably, MRI reports showed higher emotional impact, highlighting a need for clearer and more emotionally supportive language.

Mixed Modality LLM Radiology Report Retrospective Clinical In Silico Academic Lab GenAI

Quantitative Prostate MRI, From the AJR Special Series on Quantitative Imaging.

Margolis DJA, Chatterjee A, deSouza NM, Fedorov A, Fennessy F, Maier SE, Obuchowski N, Punwani S, Purysko AS, Rakow-Penner R, Shukla-Dave A, Tempany CM, Boss M, Malyarenko D

•papers•Aug 13 2025

Prostate MRI has traditionally relied on qualitative interpretation. However, quantitative components hold the potential to markedly improve performance. The ADC from DWI is probably the most widely recognized quantitative MRI biomarker and has shown strong discriminatory value for clinically significant prostate cancer as well as for recurrent cancer after treatment. Advanced diffusion techniques, including intravoxel incoherent motion imaging, diffusion kurtosis imaging, diffusion-tensor imaging, and specific implementations such as restriction spectrum imaging, purport even better discrimination but are more technically challenging. The inherent T1 and T2 of tissue also provide diagnostic value, with more advanced techniques deriving luminal water fraction and hybrid multidimensional MRI metrics. Dynamic contrast-enhanced imaging, primarily using a modified Tofts model, also shows independent discriminatory value. Finally, quantitative lesion size and shape features can be combined with the aforementioned techniques and can be further refined using radiomics, texture analysis, and artificial intelligence. Which technique will ultimately find widespread clinical use will depend on validation across a myriad of platforms and use cases.

MRI Classification Abdominal Review Concept GenAI

Exploring GPT-4o's multimodal reasoning capabilities with panoramic radiograph: the role of prompt engineering.

Xiong YT, Lian WJ, Sun YN, Liu W, Guo JX, Tang W, Liu C

•papers•Aug 12 2025

The aim of this study was to evaluate GPT-4o's multimodal reasoning ability to review panoramic radiograph (PR) and verify its radiologic findings, while exploring the role of prompt engineering in enhancing its performance. The study included 230 PRs from West China Hospital of Stomatology in 2024, which were interpreted to generate the PR findings. A total of 300 instances of interpretation errors, were manually inserted into the PR findings. The ablation study was conducted to assess whether GPT-4o can perform reasoning on PR under a zero-shot prompt. Prompt engineering was employed to enhance the reasoning capabilities of GPT-4o in identifying interpretation errors with PRs. The prompt strategies included chain-of-thought, self-consistency, in-context learning, multimodal in-context learning, and their systematic integration into a meta-prompt. Recall, accuracy, and F1 score were employed to evaluate the outputs. Subsequently, the localization capability of GPT-4o and its influence on reasoning capability were evaluated. In the ablation study, GPT-4o's recall increased significantly from 2.67 to 43.33% upon acquiring PRs (P < 0.001). GPT-4o with the meta prompt demonstrated improvements in recall (43.33% vs. 52.67%, P = 0.022), accuracy (39.95% vs. 68.75%, P < 0.001), and F1 score (0.42 vs. 0.60, P < 0.001) compared to the zero-shot prompt and other prompt strategies. The localization accuracy of GPT-4o was 45.67% (137 out of 300, 95% CI: 40.00 to 51.34). A significant correlation was observed between its localization accuracy and reasoning capability under the meta prompt (φ coefficient = 0.33, p < 0.001). The model's recall increased by 5.49% (P = 0.031) by providing accurate localization cues within the meta prompt. GPT-4o demonstrated a certain degree of multimodal capability for PR, with performance enhancement through prompt engineering. Nevertheless, its performance remains inadequate for clinical requirements. Future efforts will be necessary to identify additional factors influencing the model's reasoning capability or to develop more advanced models. Evaluating GPT-4o's capability to interpret and reason through PRs and exploring potential methods to enhance its performance before clinical application in assisting radiological assessments.

X-Ray LLM Radiology Report Methodology In Silico Academic Lab GenAI

Current imaging applications, radiomics, and machine learning modalities of CNS demyelinating disorders and its mimickers.

Alam Z, Maddali A, Patel S, Weber N, Al Rikabi S, Thiemann D, Desai K, Monoky D

•papers•Aug 12 2025

Distinguishing among neuroinflammatory demyelinating diseases of the central nervous system can present a significant diagnostic challenge due to substantial overlap in clinical presentations and imaging features. Collaboration between specialists, novel antibody testing, and dedicated magnetic resonance imaging protocols have helped to narrow the diagnostic gap, but challenging cases remain. Machine learning algorithms have proven to be able to identify subtle patterns that escape even the most experienced human eye. Indeed, machine learning and the subfield of radiomics have demonstrated exponential growth and improvement in diagnosis capacity within the past decade. The sometimes daunting diagnostic overlap of various demyelinating processes thus provides a unique opportunity: can the elite pattern recognition powers of machine learning close the gap in making the correct diagnosis? This review specifically focuses on neuroinflammatory demyelinating diseases, exploring the role of artificial intelligence in the detection, diagnosis, and differentiation of the most common pathologies: multiple sclerosis (MS), neuromyelitis optica spectrum disorder (NMOSD), acute disseminated encephalomyelitis (ADEM), Sjogren's syndrome, MOG antibody-associated disorder (MOGAD), and neuropsychiatric systemic lupus erythematosus (NPSLE). Understanding how these tools enhance diagnostic precision may lead to earlier intervention, improved outcomes, and optimized management strategies.

MRI Classification Neurological Review In Silico GenAI

The performance of large language models in dentomaxillofacial radiology: a systematic review.

Liu Z, Nalley A, Hao J, H Ai QY, Kan Yeung AW, Tanaka R, Hung KF

•papers•Aug 12 2025

This study aimed to systematically review the current performance of large language models (LLMs) in dento-maxillofacial radiology (DMFR). Five electronic databases were used to identify studies that developed, fine-tuned, or evaluated LLMs for DMFR-related tasks. Data extracted included study purpose, LLM type, images/text source, applied language, dataset characteristics, input and output, performance outcomes, evaluation methods, and reference standards. Customized assessment criteria adapted from the TRIPOD-LLM reporting guideline were used to evaluate the risk-of-bias in the included studies specifically regarding the clarity of dataset origin, the robustness of performance evaluation methods, and the validity of the reference standards. The initial search yielded 1621 titles, and nineteen studies were included. These studies investigated the use of LLMs for tasks including the production and answering of DMFR-related qualification exams and educational questions (n = 8), diagnosis and treatment recommendations (n = 7), and radiology report generation and patient communication (n = 4). LLMs demonstrated varied performance in diagnosing dental conditions, with accuracy ranging from 37-92.5% and expert ratings for differential diagnosis and treatment planning between 3.6-4.7 on a 5-point scale. For DMFR-related qualification exams and board-style questions, LLMs achieved correctness rates between 33.3-86.1%. Automated radiology report generation showed moderate performance with accuracy ranging from 70.4-81.3%. LLMs demonstrate promising potential in DMFR, particularly for diagnostic, educational, and report generation tasks. However, their current accuracy, completeness, and consistency remain variable. Further development, validation, and standardization are needed before LLMs can be reliably integrated as supportive tools in clinical workflows and educational settings.

X-Ray LLM Radiology Report Review Prototype GenAI

Generative Artificial Intelligence to Automate Cerebral Perfusion Mapping in Acute Ischemic Stroke from Non-contrast Head Computed Tomography Images: Pilot Study.

Primiano NJ, Changa AR, Kohli S, Greenspan H, Cahan N, Kummer BR

•papers•Aug 11 2025

Acute ischemic stroke (AIS) is a leading cause of death and long-term disability worldwide, where rapid reperfusion remains critical for salvaging brain tissue. Although CT perfusion (CTP) imaging provides essential hemodynamic information, its limitations-including extended processing times, additional radiation exposure, and variable software outputs-can delay treatment. In contrast, non-contrast head CT (NCHCT) is ubiquitously available in acute stroke settings. This study explores a generative artificial intelligence approach to predict key perfusion parameters (relative cerebral blood flow [rCBF] and time-to-maximum [Tmax]) directly from NCHCT, potentially streamlining stroke imaging workflows and expanding access to critical perfusion data. We retrospectively identified patients evaluated for AIS who underwent NCHCT, CT angiography, and CTP. Ground truth perfusion maps (rCBF and Tmax) were extracted from VIZ.ai post-processed CTP studies. A modified pix2pix-turbo generative adversarial network (GAN) was developed to translate co-registered NCHCT images into corresponding perfusion maps. The network was trained using paired NCHCT-CTP data, with training, validation, and testing splits of 80%:10%:10%. Performance was assessed on the test set using quantitative metrics including the structural similarity index measure (SSIM), peak signal-to-noise ratio (PSNR), and Fréchet inception distance (FID). Out of 120 patients, studies from 99 patients fitting our inclusion and exclusion criteria were used as the primary cohort (mean age 73.3 ± 13.5 years; 46.5% female). Cerebral occlusions were predominantly in the middle cerebral artery. GAN-generated Tmax maps achieved an SSIM of 0.827, PSNR of 16.99, and FID of 62.21, while the rCBF maps demonstrated comparable performance (SSIM 0.79, PSNR 16.38, FID 59.58). These results indicate that the model approximates ground truth perfusion maps to a moderate degree and successfully captures key cerebral hemodynamic features. Our findings demonstrate the feasibility of generating functional perfusion maps directly from widely available NCHCT images using a modified GAN. This cross-modality approach may serve as a valuable adjunct in AIS evaluation, particularly in resource-limited settings or when traditional CTP provides limited diagnostic information. Future studies with larger, multicenter datasets and further model refinements are warranted to enhance clinical accuracy and utility.

CT Image Synthesis Neurological Retrospective Clinical In Silico Academic Lab GenAI

Decoding fetal motion in 4D ultrasound with DeepLabCut.

Inubashiri E, Kaishi Y, Miyake T, Yamaguchi R, Hamaguchi T, Inubashiri M, Ota H, Watanabe Y, Deguchi K, Kuroki K, Maeda N

•papers•Aug 11 2025

This study aimed to objectively and quantitatively analyze fetal motor behavior using DeepLabCut (DLC), a markerless posture estimation tool based on deep learning, applied to four-dimensional ultrasound (4DUS) data collected during the second trimester. We propose a novel clinical method for precise assessment of fetal neurodevelopment. Fifty 4DUS video recordings of normal singleton fetuses aged 12 to 22 gestational weeks were analyzed. Eight fetal joints were manually labeled in 2% of each video to train a customized DLC model. The model's accuracy was evaluated using likelihood scores. Intra- and inter-rater reliability of manual labeling were assessed using intraclass correlation coefficients (ICC). Angular velocity time series derived from joint coordinates were analyzed to quantify fetal movement patterns and developmental coordination. Manual labeling demonstrated excellent reproducibility (inter-rater ICC = 0.990, intra-rater ICC = 0.961). The trained DLC model achieved a mean likelihood score of 0.960, confirming high tracking accuracy. Kinematic analysis revealed developmental trends: localized rapid limb movements were common at 12-13 weeks; movements became more coordinated and systemic by 18-20 weeks, reflecting advancing neuromuscular maturation. Although a modest increase in tracking accuracy was observed with gestational age, this trend did not reach statistical significance (p < 0.001). DLC enables precise quantitative analysis of fetal motor behavior from 4DUS recordings. This AI-driven approach offers a promising, noninvasive alternative to conventional qualitative assessments, providing detailed insights into early fetal neurodevelopmental trajectories and potential early screening for neurodevelopmental disorders.

Ultrasound Segmentation Abdominal Retrospective Clinical In Silico Academic Lab GenAI

Post-deployment Monitoring of AI Performance in Intracranial Hemorrhage Detection by ChatGPT.

Rohren E, Ahmadzade M, Colella S, Kottler N, Krishnan S, Poff J, Rastogi N, Wiggins W, Yee J, Zuluaga C, Ramis P, Ghasemi-Rad M

•papers•Aug 11 2025

To evaluate the post-deployment performance of an artificial intelligence (AI) system (Aidoc) for intracranial hemorrhage (ICH) detection and assess the utility of ChatGPT-4 Turbo for automated AI monitoring. This retrospective study evaluated 332,809 head CT examinations from 37 radiology practices across the United States (December 2023-May 2024). Of these, 13,569 cases were flagged as positive for ICH by the Aidoc AI system. A HIPAA (Health Insurance Portability and Accountability Act) -compliant version of ChatGPT-4 Turbo was used to extract data from radiology reports. Ground truth was established through radiologists' review of 200 randomly selected cases. Performance metrics were calculated for ChatGPT, Aidoc and radiologists. ChatGPT-4 Turbo demonstrated high diagnostic accuracy in identifying intracranial hemorrhage (ICH) from radiology reports, with a positive predictive value of 1 and a negative predictive value of 0.988 (AUC:0.996). Aidoc's false positive classifications were influenced by scanner manufacturer, midline shift, mass effect, artifacts, and neurologic symptoms. Multivariate analysis identified Philips scanners (OR: 6.97, p=0.003) and artifacts (OR: 3.79, p=0.029) as significant contributors to false positives, while midline shift (OR: 0.08, p=0.021) and mass effect (OR: 0.18, p=0.021) were associated with a reduced false positive rate. Aidoc-assisted radiologists achieved a sensitivity of 0.936 and a specificity of 1. This study underscores the importance of continuous performance monitoring for AI systems in clinical practice. The integration of LLMs offers a scalable solution for evaluating AI performance, ensuring reliable deployment and enhancing diagnostic workflows.

CT Detection Neurological Retrospective Clinical Post Market Startup Reproducibility GenAI

Aphasia severity prediction using a multi-modal machine learning approach.

Artificial Intelligence based fractional flow reserve.

Exploring Radiologists' Use of AI Chatbots for Assistance in Image Interpretation: Patterns of Use and Trust Evaluation.

Quantitative Prostate MRI, From the <i>AJR</i> Special Series on Quantitative Imaging.

Exploring GPT-4o's multimodal reasoning capabilities with panoramic radiograph: the role of prompt engineering.

Current imaging applications, radiomics, and machine learning modalities of CNS demyelinating disorders and its mimickers.

The performance of large language models in dentomaxillofacial radiology: a systematic review.

Generative Artificial Intelligence to Automate Cerebral Perfusion Mapping in Acute Ischemic Stroke from Non-contrast Head Computed Tomography Images: Pilot Study.

Decoding fetal motion in 4D ultrasound with DeepLabCut.

Post-deployment Monitoring of AI Performance in Intracranial Hemorrhage Detection by ChatGPT.

Ready to Sharpen Your Edge?