Latest Papers on Radiology AI. Tags: GenAI

AI-Driven Integrated System for Burn Depth Prediction With Electronic Medical Records: Algorithm Development and Validation.

Rahman MM, Masry ME, Gnyawali SC, Xue Y, Gordillo G, Wachs JP

•papers•Aug 15 2025

Burn injuries represent a significant clinical challenge due to the complexity of accurately assessing burn depth, which directly influences the course of treatment and patient outcomes. Traditional diagnostic methods primarily rely on visual inspection by experienced burn surgeons. Studies report diagnostic accuracies of around 76% for experts, dropping to nearly 50% for less experienced clinicians. Such inaccuracies can result in suboptimal clinical decisions-delaying vital surgical interventions in severe cases or initiating unnecessary treatments for superficial burns. This diagnostic variability not only compromises patient care but also strains health care resources and increases the likelihood of adverse outcomes. Hence, a more consistent and precise approach to burn classification is urgently needed. The objective is to determine whether a multimodal integrated artificial intelligence (AI) system for accurate classification of burn depth can preserve diagnostic accuracy and provide an important resource when used as part of the electronic medical record (EMR). This study used a novel multimodal AI system, integrating digital photographs and ultrasound tissue Doppler imaging (TDI) data to accurately assess burn depth. These imaging modalities were accessed and processed through an EMR system, enabling real-time data retrieval and AI-assisted evaluation. TDI was instrumental in evaluating the biomechanical properties of subcutaneous tissues, using color-coded images to identify burn-induced changes in tissue stiffness and elasticity. The collected imaging data were uploaded to the EMR system (DrChrono), where they were processed by a vision-language model built on GPT-4 architecture. This model received expert-formulated prompts describing how to interpret both digital and TDI images, guiding the AI in making explainable classifications. This study evaluated whether a multimodal AI classifier, designed to identify first-, second-, and third-degree burns, could be effectively applied to imaging data stored within an EMR system. The classifier achieved an overall accuracy of 84.38%, significantly surpassing human performance benchmarks typically cited in the literature. This highlights the potential of the AI model to serve as a robust clinical decision support tool, especially in settings lacking highly specialized expertise. In addition to accuracy, the classifier demonstrated strong performance across multiple evaluation metrics. The classifier's ability to distinguish between burn severities was further validated by the area under the receiver operating characteristic: 0.97 for first-degree, 0.96 for second-degree, and a perfect 1.00 for third-degree burns, each with narrow 95% CIs. The storage of multimodal imaging data within the EMR, along with the ability for post hoc analysis by AI algorithms, offers significant advancements in burn care, enabling real-time burn depth prediction on currently available data. Using digital photos for superficial burns, easily diagnosed through physical examinations, reduces reliance on TDI, while TDI helps distinguish deep second- and third-degree burns, enhancing diagnostic efficiency.

Mixed Modality Classification Retrospective Clinical In Silico Academic Lab Benchmark SOTA GenAI

Multimodal quantitative analysis guides precise preoperative localization of epilepsy.

Shen Y, Shen Z, Huang Y, Wu Z, Ma Y, Hu F, Shu K

•papers•Aug 15 2025

Epilepsy surgery efficacy is critically contingent upon the precise localization of the epileptogenic zone (EZ). However, conventional qualitative methods face challenges in achieving accurate localization, integrating multimodal data, and accounting for variations in clinical expertise among practitioners. With the rapid advancement of artificial intelligence and computing power, multimodal quantitative analysis has emerged as a pivotal approach for EZ localization. Nonetheless, no research team has thus far provided a systematic elaboration of this concept. This narrative review synthesizes recent advancements across four key dimensions: (1) seizure semiology quantification using deep learning and computer vision to analyze behavioral patterns; (2) structural neuroimaging leveraging high-field MRI, radiomics, and AI; (3) functional imaging integrating EEG-fMRI dynamics and PET biomarkers; and (4) electrophysiological quantification encompassing source localization, intracranial EEG, and network modeling. The convergence of these complementary approaches enables comprehensive characterization of epileptogenic networks across behavioral, structural, functional, and electrophysiological domains. Despite these advancements, clinical heterogeneity, limitations in algorithmic generalizability, and barriers to data sharing hinder translation into clinical practice. Future directions emphasize personalized modeling, federated learning, and cross-modal standardization to advance data-driven localization. This integrated paradigm holds promise for overcoming qualitative limitations, reducing medical costs, and improving seizure-free outcomes.

MRI Detection Neurological Review Concept GenAI Ethics

From dictation to diagnosis: enhancing radiology reporting with integrated speech recognition in multimodal large language models.

Gertz RJ, Beste NC, Dratsch T, Lennartz S, Bremm J, Iuga AI, Bunck AC, Laukamp KR, Schönfeld M, Kottlors J

•papers•Aug 15 2025

This study evaluates the efficiency, accuracy, and cost-effectiveness of radiology reporting using audio multimodal large language models (LLMs) compared to conventional reporting with speech recognition software. We hypothesized that providing minimal audio input would enable a multimodal LLM to generate complete radiological reports. 480 reports from 80 retrospective multimodal imaging studies were reported by two board-certified radiologists using three workflows: conventional workflow (C-WF) with speech recognition software to generate findings and impressions separately and LLM-based workflow (LLM-WF) using the state-of-the-art LLMs GPT-4o and Claude Sonnet 3.5. Outcome measures included reporting time, corrections and personnel cost per report. Two radiologists assessed formal structure and report quality. Statistical analysis used ANOVA and Tukey's post hoc tests (p < 0.05). LLM-WF significantly reduced reporting time (GPT-4o/Sonnet 3.5: 38.9 s ± 22.7 s vs. C-WF: 88.0 s ± 60.9 s, p < 0.01), required fewer corrections (GPT-4o: 1.0 ± 1.1, Sonnet 3.5: 0.9 ± 1.0 vs. C-WF: 2.4 ± 2.5, p < 0.01), and lowered costs (GPT-4o: $2.3 ± $1.4, Sonnet 3.5: $2.4 ± $1.4 vs. C-WF: $3.0 ± $2.1, p < 0.01). Reports generated with Sonnet 3.5 were rated highest in quality, while GPT-4o and conventional reports showed no difference. Multimodal LLMs can generate high-quality radiology reports based solely on minimal audio input, with greater speed, fewer corrections, and reduced costs compared to conventional speech-based workflows. However, future implementation may involve licensing costs, and generalizability to broader clinical contexts warrants further evaluation. Question Comparing time, accuracy, cost, and report quality of reporting using audio input functionality of GPT-4o and Claude Sonnet 3.5 to conventional reporting with speech recognition. Findings Large language models enable radiological reporting via minimal audio input, reducing turnaround time and costs without quality loss compared to conventional reporting with speech recognition. Clinical relevance Large language model-based reporting from minimal audio input has the potential to improve efficiency and report quality, supporting more streamlined workflows in clinical radiology.

Mixed Modality LLM Radiology Report Retrospective Clinical In Silico Academic Lab GenAI

Fine-Tuned Large Language Model for Extracting Pretreatment Pancreatic Cancer According to Computed Tomography Radiology Reports.

Hirakawa H, Yasaka K, Nomura T, Tsujimoto R, Sonoda Y, Kiryu S, Abe O

•papers•Aug 15 2025

This study aimed to examine the performance of a fine-tuned large language model (LLM) in extracting pretreatment pancreatic cancer according to computed tomography (CT) radiology reports and to compare it with that of readers. This retrospective study included 2690, 886, and 378 CT reports for the training, validation, and test datasets, respectively. Clinical indication, image finding, and imaging diagnosis sections of the radiology report (used as input data) were reviewed and categorized into groups 0 (no pancreatic cancer), 1 (after treatment for pancreatic cancer), and 2 (pretreatment pancreatic cancer present) (used as reference data). A pre-trained Bidirectional Encoder Representation from the Transformers Japanese model was fine-tuned with the training and validation dataset. Group 1 data were undersampled and group 2 data were oversampled in the training dataset due to group imbalance. The best-performing model from the validation set was subsequently assessed using the test dataset for testing purposes. Additionally, three readers (readers 1, 2, and 3) were involved in classifying reports within the test dataset. The fine-tuned LLM and readers 1, 2, and 3 demonstrated an overall accuracy of 0.942, 0.984, 0.979, and 0.947; sensitivity for differentiating groups 0/1/2 of 0.944/0.960/0.921, 0.976/1.000/0.976, 0.984/0.984/0.968, and 1.000/1.000/0.841; and total time required for classification of 49 s, 2689 s, 3496 s, and 4887 s, respectively. Fine-tuned LLM effectively extracted patients with pretreatment pancreatic cancer according to CT radiology reports, and its performance was comparable to that of readers in a shorter time.

CT LLM Radiology Report Abdominal Retrospective Clinical In Silico Academic Lab GenAI

Automating the Referral of Bone Metastases Patients With and Without the Use of Large Language Models.

Sangwon KL, Han X, Becker A, Zhang Y, Ni R, Zhang J, Alber DA, Alyakin A, Nakatsuka M, Fabbri N, Aphinyanaphongs Y, Yang JT, Chachoua A, Kondziolka D, Laufer I, Oermann EK

•papers•Aug 15 2025

Bone metastases, affecting more than 4.8% of patients with cancer annually, and particularly spinal metastases require urgent intervention to prevent neurological complications. However, the current process of manually reviewing radiological reports leads to potential delays in specialist referrals. We hypothesized that natural language processing (NLP) review of routine radiology reports could automate the referral process for timely multidisciplinary care of spinal metastases. We assessed 3 NLP models-a rule-based regular expression (RegEx) model, GPT-4, and a specialized Bidirectional Encoder Representations from Transformers (BERT) model (NYUTron)-for automated detection and referral of bone metastases. Study inclusion criteria targeted patients with active cancer diagnoses who underwent advanced imaging (computed tomography, MRI, or positron emission tomography) without previous specialist referral. We defined 2 separate tasks: task of identifying clinically significant bone metastatic terms (lexical detection), and identifying cases needing a specialist follow-up (clinical referral). Models were developed using 3754 hand-labeled advanced imaging studies in 2 phases: phase 1 focused on spine metastases, and phase 2 generalized to bone metastases. Standard McRae's line performance metrics were evaluated and compared across all stages and tasks. In the lexical detection, a simple RegEx achieved the highest performance (sensitivity 98.4%, specificity 97.6%, F1 = 0.965), followed by NYUTron (sensitivity 96.8%, specificity 89.9%, and F1 = 0.787). For the clinical referral task, RegEx also demonstrated superior performance (sensitivity 92.3%, specificity 87.5%, and F1 = 0.936), followed by a fine-tuned NYUTron model (sensitivity 90.0%, specificity 66.7%, and F1 = 0.750). An NLP-based automated referral system can accurately identify patients with bone metastases requiring specialist evaluation. A simple RegEx model excels in syntax-based identification and expert-informed rule generation for efficient referral patient recommendation in comparison with advanced NLP models. This system could significantly reduce missed follow-ups and enhance timely intervention for patients with bone metastases.

Mixed Modality Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab GenAI

Benchmarking GPT-5 for Zero-Shot Multimodal Medical Reasoning in Radiology and Radiation Oncology

Mingzhe Hu, Zach Eidex, Shansong Wang, Mojtaba Safari, Qiang Li, Xiaofeng Yang

•preprint•Aug 15 2025

Radiology, radiation oncology, and medical physics require decision-making that integrates medical images, textual reports, and quantitative data under high-stakes conditions. With the introduction of GPT-5, it is critical to assess whether recent advances in large multimodal models translate into measurable gains in these safety-critical domains. We present a targeted zero-shot evaluation of GPT-5 and its smaller variants (GPT-5-mini, GPT-5-nano) against GPT-4o across three representative tasks. We present a targeted zero-shot evaluation of GPT-5 and its smaller variants (GPT-5-mini, GPT-5-nano) against GPT-4o across three representative tasks: (1) VQA-RAD, a benchmark for visual question answering in radiology; (2) SLAKE, a semantically annotated, multilingual VQA dataset testing cross-modal grounding; and (3) a curated Medical Physics Board Examination-style dataset of 150 multiple-choice questions spanning treatment planning, dosimetry, imaging, and quality assurance. Across all datasets, GPT-5 achieved the highest accuracy, with substantial gains over GPT-4o up to +20.00% in challenging anatomical regions such as the chest-mediastinal, +13.60% in lung-focused questions, and +11.44% in brain-tissue interpretation. On the board-style physics questions, GPT-5 attained 90.7% accuracy (136/150), exceeding the estimated human passing threshold, while GPT-4o trailed at 78.0%. These results demonstrate that GPT-5 delivers consistent and often pronounced performance improvements over GPT-4o in both image-grounded reasoning and domain-specific numerical problem-solving, highlighting its potential to augment expert workflows in medical imaging and therapeutic physics.

Mixed Modality Classification Methodology In Silico Academic Lab Benchmark SOTA GenAI

Artificial Intelligence based fractional flow reserve.

Bednarek A, Gąsior P, Jaguszewski M, Buszman PP, Milewski K, Hawranek M, Gil R, Wojakowski W, Kochman J, Tomaniak M

•papers•Aug 14 2025

Fractional flow reserve (FFR) - a physiological indicator of coronary stenosis significance - has now become a widely used parameter also in the guidance of percutaneous coronary intervention (PCI). Several studies have shown the superiority of FFR compared to visual assessment, contributing to the reduction in clinical endpoints. However, the current approach to FFR assessment requires coronary instrumentation with a dedicated pressure wire and thus increasing invasiveness, cost, and duration of the procedure. Alternative, noninvasive methods of FFR assessment based on computational fluid dynamics are being widely tested; these approaches are generally not fully automated and may sometimes require substantial computational power. Nowadays, one of the most rapidly expanding fields in medicine is the use of artificial intelligence (AI) in therapy optimization, diagnosis, treatment, and risk stratification. AI usage contributes to the development of more sophisticated methods of imaging analysis and allows for the derivation of clinically important parameters in a faster and more accurate way. Over the recent years, AI utility in deriving FFR in a noninvasive manner has been increasingly reported. In this review, we critically summarize current knowledge in the field of AI-derived FFR based on data from computed tomography angiography, invasive angiography, optical coherence tomography, and intravascular ultrasound. Available solutions, possible future directions in optimizing cathlab performance, including the use of mixed reality, as well as current limitations standing behind the wide adoption of these techniques, are overviewed.

Mixed Modality Classification Cardiac Review In Silico Academic Lab GenAI

Performance of GPT-5 in Brain Tumor MRI Reasoning

Mojtaba Safari, Shansong Wang, Mingzhe Hu, Zach Eidex, Qiang Li, Xiaofeng Yang

•preprint•Aug 14 2025

Accurate differentiation of brain tumor types on magnetic resonance imaging (MRI) is critical for guiding treatment planning in neuro-oncology. Recent advances in large language models (LLMs) have enabled visual question answering (VQA) approaches that integrate image interpretation with natural language reasoning. In this study, we evaluated GPT-4o, GPT-5-nano, GPT-5-mini, and GPT-5 on a curated brain tumor VQA benchmark derived from 3 Brain Tumor Segmentation (BraTS) datasets - glioblastoma (GLI), meningioma (MEN), and brain metastases (MET). Each case included multi-sequence MRI triplanar mosaics and structured clinical features transformed into standardized VQA items. Models were assessed in a zero-shot chain-of-thought setting for accuracy on both visual and reasoning tasks. Results showed that GPT-5-mini achieved the highest macro-average accuracy (44.19%), followed by GPT-5 (43.71%), GPT-4o (41.49%), and GPT-5-nano (35.85%). Performance varied by tumor subtype, with no single model dominating across all cohorts. These findings suggest that GPT-5 family models can achieve moderate accuracy in structured neuro-oncological VQA tasks, but not at a level acceptable for clinical use.

MRI LLM Radiology Report Neurological Retrospective Clinical In Silico GenAI

Exploring the potential of generative artificial intelligence in medical image synthesis: opportunities, challenges, and future directions.

Khosravi B, Purkayastha S, Erickson BJ, Trivedi HM, Gichoya JW

•papers•Aug 14 2025

Generative artificial intelligence has emerged as a transformative force in medical imaging since 2022, enabling the creation of derivative synthetic datasets that closely resemble real-world data. This Viewpoint examines key aspects of synthetic data, focusing on its advancements, applications, and challenges in medical imaging. Various generative artificial intelligence image generation paradigms, such as physics-informed and statistical models, and their potential to augment and diversify medical research resources are explored. The promises of synthetic datasets, including increased diversity, privacy preservation, and multifunctionality, are also discussed, along with their ability to model complex biological phenomena. Next, specific applications using synthetic data such as enhancing medical education, augmenting rare disease datasets, improving radiology workflows, and enabling privacy-preserving multicentre collaborations are highlighted. The challenges and ethical considerations surrounding generative artificial intelligence, including patient privacy, data copying, and potential biases that could impede clinical translation, are also addressed. Finally, future directions for research and development in this rapidly evolving field are outlined, emphasising the need for robust evaluation frameworks and responsible utilisation of generative artificial intelligence in medical imaging.

Mixed Modality Image Synthesis Review Concept Academic Lab GenAI Ethics

Integrating Machine Learning Pipelines for Multimodal Biomarker Prediction in Alzheimer and Parkinson Disease: A Component of the Neurodiagnoses Framework

Osaghae, N. O., GONZALEZ, M. M.

•preprint•Aug 14 2025

Alzheimers and Parkinsons diseases are age-related neurodegenerative diseases that often require invasive procedures for diagnosis. Traditional diagnostic methods may fail to capture the interplay between genetic, molecular, and neuroanatomical markers. This manuscript aims to develop interpretable machine learning models that can predict key biomarkers, such as pTau, tTau, A{beta} positivity, and motor symptom severity, using non-invasive data. Machine learning models (Random Forest, XGBoost) were trained using ADNI and PPMI baseline data. Using the APOE4 genotype, MRI volumes, cognitive scores, and demographics as inputs, SHAP was employed to enhance model interpretability. Models achieved AUCs of 0.859 (tTau) and 0.852 (pTau) with recall > 80%. The PD motor severity yielded an MAE of 5.72 and an R2 of 0.586. SHAP confirmed the contributions of APOE4 status, hippocampal atrophy, and dopaminergic asymmetries. The pipelines provide clinically meaningful predictions of biomarker status and motor symptoms, supporting interpretable, multi-axis neurodiagnostic tools within the neurodiagnoses framework.

MRI Classification Neurological Retrospective Clinical In Silico GenAI

Filter Papers

Tags

AI-Driven Integrated System for Burn Depth Prediction With Electronic Medical Records: Algorithm Development and Validation.

Multimodal quantitative analysis guides precise preoperative localization of epilepsy.

From dictation to diagnosis: enhancing radiology reporting with integrated speech recognition in multimodal large language models.

Fine-Tuned Large Language Model for Extracting Pretreatment Pancreatic Cancer According to Computed Tomography Radiology Reports.

Automating the Referral of Bone Metastases Patients With and Without the Use of Large Language Models.

Benchmarking GPT-5 for Zero-Shot Multimodal Medical Reasoning in Radiology and Radiation Oncology

Artificial Intelligence based fractional flow reserve.

Performance of GPT-5 in Brain Tumor MRI Reasoning

Exploring the potential of generative artificial intelligence in medical image synthesis: opportunities, challenges, and future directions.

Integrating Machine Learning Pipelines for Multimodal Biomarker Prediction in Alzheimer and Parkinson Disease: A Component of the Neurodiagnoses Framework

Ready to Sharpen Your Edge?