Latest Papers on Radiology AI. Tags: GenAI.

Generating Findings for Jaw Cysts in Dental Panoramic Radiographs Using GPT-4o: Building a Two-Stage Self-Correction Loop with Structured Output (SLSO) Framework

Nanaka Hosokawa, Ryo Takahashi, Tomoya Kitano, Yukihiro Iida, Chisako Muramatsu, Tatsuro Hayashi, Yuta Seino, Xiangrong Zhou, Takeshi Hara, Akitoshi Katsumata, Hiroshi Fujita

•preprint•Oct 2 2025

In this study, we utilized the multimodal capabilities of OpenAI GPT-4o to automatically generate jaw cyst findings on dental panoramic radiographs. To improve accuracy, we constructed a Self-correction Loop with Structured Output (SLSO) framework and verified its effectiveness. A 10-step process was implemented for 22 cases of jaw cysts, including image input and analysis, structured data generation, tooth number extraction and consistency checking, iterative regeneration when inconsistencies were detected, and finding generation with subsequent restructuring and consistency verification. A comparative experiment was conducted using the conventional Chain-of-Thought (CoT) method across seven evaluation items: transparency, internal structure, borders, root resorption, tooth movement, relationships with other structures, and tooth number. The results showed that the proposed SLSO framework improved output accuracy for many items, with 66.9%, 33.3%, and 28.6% improvement rates for tooth number, tooth movement, and root resorption, respectively. In the successful cases, a consistently structured output was achieved after up to five regenerations. Although statistical significance was not reached because of the small size of the dataset, the overall SLSO framework enforced negative finding descriptions, suppressed hallucinations, and improved tooth number identification accuracy. However, the accurate identification of extensive lesions spanning multiple teeth is limited. Nevertheless, further refinement is required to enhance overall performance and move toward a practical finding generation system.

X-Ray Report Generation Methodology Prototype Academic Lab GenAI

Multimodal Foundation Models for Early Disease Detection

Md Talha Mohsin, Ismail Abdulrashid

•preprint•Oct 2 2025

Healthcare generates diverse streams of data, including electronic health records (EHR), medical imaging, genetics, and ongoing monitoring from wearable devices. Traditional diagnostic models frequently analyze these sources in isolation, which constrains their capacity to identify cross-modal correlations essential for early disease diagnosis. Our research presents a multimodal foundation model that consolidates diverse patient data through an attention-based transformer framework. At first, dedicated encoders put each modality into a shared latent space. Then, they combine them using multi-head attention and residual normalization. The architecture is made for pretraining on many tasks, which makes it easy to adapt to new diseases and datasets with little extra work. We provide an experimental strategy that uses benchmark datasets in oncology, cardiology, and neurology, with the goal of testing early detection tasks. The framework includes data governance and model management tools in addition to technological performance to improve transparency, reliability, and clinical interpretability. The suggested method works toward a single foundation model for precision diagnostics, which could improve the accuracy of predictions and help doctors make decisions.

Mixed Modality Classification Methodology In Silico GenAI

Evaluating GPT-4o for emergency disposition of complex respiratory cases with pulmonology consultation: a diagnostic accuracy study.

Yıldırım C, Aykut A, Günsoy E, Öncül MV

•papers•Oct 2 2025

Large Language Models (LLMs), such as GPT-4o, are increasingly investigated for clinical decision support in emergency medicine. However, their real-world performance in disposition prediction remains insufficiently studied. This study evaluated the diagnostic accuracy of GPT-4o in predicting ED disposition-discharge, ward admission, or ICU admission-in complex emergency respiratory cases requiring pulmonology consultation and chest CT, representing a selective high-acuity subgroup of ED patients. We conducted a retrospective observational study in a tertiary ED between November 2024 and February 2025. We retrospectively included ED patients with complex respiratory presentations who underwent pulmonology consultation and chest CT, representing a selective high-acuity subgroup rather than the general ED respiratory population. GPT-4o was prompted to predict the most appropriate ED disposition using three progressively enriched input models: Model 1 (age, sex, oxygen saturation, home oxygen therapy, and venous blood gas parameters); Model 2 (Model 1 plus laboratory data); and Model 3 (Model 2 plus chest CT findings). Model performance was assessed using accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1 score. Among the 221 patients included, 69.2% were admitted to the ward, 9.0% to the intensive care unit (ICU), and 21.7% were discharged. For hospital admission prediction, Model 3 demonstrated the highest sensitivity (91.9%) and overall accuracy (76.5%), but the lowest specificity (20.8%). In contrast, for discharge prediction, Model 3 achieved the highest specificity (91.9%) but the lowest sensitivity (20.8%). Numerical improvements were observed across models, but none reached statistical significance (all p > 0.22). Model 1 therefore performed comparably to Models 2-3 while being less complex. Among patients who were discharged despite GPT-4o predicting admission, the 14-day ED re-presentation rates were 23.8% (5/21) for Model 1, 30.0% (9/30) for Model 2, and 28.9% (11/38) for Model 3. GPT-4o demonstrated high sensitivity in identifying ED patients requiring hospital admission, particularly those needing intensive care, when provided with progressively enriched clinical input. However, its low sensitivity for discharge prediction resulted in frequent overtriage, limiting its utility for autonomous decision-making. This proof-of-concept study demonstrates GPT-4o's capacity to stratify disposition decisions in complex respiratory cases under varying levels of limited input data. However, these findings should be interpreted in light of key limitations, including the selective high-acuity cohort and the absence of vital signs, and require prospective validation before clinical implementation.

CT Classification Chest Retrospective Clinical In Silico GenAI

Artificial intelligence in regional anesthesia.

Harris J, Kamming D, Bowness JS

•papers•Oct 1 2025

Artificial intelligence (AI) is having an increasing impact on healthcare. In ultrasound-guided regional anesthesia (UGRA), commercially available devices exist that augment traditional grayscale ultrasound imaging by highlighting key sono-anatomical structures in real-time. We review the latest evidence supporting this emerging technology and consider the opportunities and challenges to its widespread deployment. The existing literature is limited and heterogenous, which impedes full appraisal of systems, comparison between devices, and informed adoption. AI-based devices promise to improve clinical practice and training in UGRA, though their impact on patient outcomes and provision of UGRA techniques is unclear at this early stage. Calls for standardization across both UGRA and AI are increasing, with greater clinical leadership required. Emerging AI applications in UGRA warrant further study due to an opaque and fragmented evidence base. Robust and consistent evaluation and reporting of algorithm performance, in a representative clinical context, will expedite discovery and appropriate deployment of AI in UGRA. A clinician-focused approach to the development, evaluation, and implementation of this exciting branch of AI has huge potential to advance the human art of regional anesthesia.

Ultrasound Segmentation Review Clinical Pilot Academic Lab GenAI

Current and novel approaches for critical care management of aneurysmal subarachnoid hemorrhage in critical care.

Zoumprouli A, Carden R, Bilotta F

•papers•Oct 1 2025

This review highlights recent advancements and evidence-based approaches in the critical care management of aneurysmal subarachnoid hemorrhage (aSAH), focusing on developments from the past 18 months. It addresses key challenges [rebleeding prevention, delayed cerebral ischemia (DCI), hydrocephalus, transfusion strategies, and temperature management], emphasizing multidisciplinary care and personalized treatment. Recent studies underscore the importance of systolic blood pressure control (<160 mmHg) to reduce rebleeding risk before aneurysm securing. Novel prognostic tools, including the modified 5-item frailty index and quantitative imaging software, show promise in improving outcome prediction. Prophylactic lumbar drainage may reduce DCI and improve neurological outcomes, while milrinone and computed tomography perfusion-guided therapies are being explored for vasospasm management. Transfusion strategies suggest a hemoglobin threshold of 9 g/dl may optimize outcomes. Temperature management remains contentious, but consensus recommends maintaining normothermia (36.0-37.5 °C) with continuous monitoring. Advances in aSAH care emphasize precision medicine, leveraging technology [e.g. Artificial intelligence (AI), quantitative imaging], and multidisciplinary collaboration. Key unresolved questions warrant multicenter trials to validate optimal blood pressure, transfusion, and temperature targets alongside emerging therapies for DCI.

CT Classification Neurological Review In Silico GenAI

Design of AI-driven microwave imaging for lung tumor monitoring.

Singh A, Paul S, Gayen S, Mandal B, Mitra D, Augustine R

•papers•Oct 1 2025

The global incidence of lung diseases, particularly lung cancer, is increasing at an alarming rate, underscoring the urgent need for early detection, robust monitoring, and timely intervention. This study presents design aspects of an artificial intelligence (AI)-integrated microwave-based diagnostic tool for the early detection of lung tumors. The proposed method assimilates the prowess of machine learning (ML) tools with microwave imaging (MWI). A microwave unit containing eight antennas in the form of a wearable belt is employed for data collection from the CST body models. The data, collected in the form of scattering parameters, are reconstructed as 2D images. Two different ML approaches have been investigated for tumor detection and prediction of the size of the detected tumor. The first approach employs XGBoost models on raw S-parameters and the second approach uses convolutional neural networks (CNN) on the reconstructed 2-D microwave images. It is found that the XGBoost-based classifier with S-parameters outperforms the CNN-based classifier on reconstructed microwave images for tumor detection. Whereas a CNN-based model on reconstructed microwave images performs much better than an XGBoost-based regression model designed on the raw S-parameters for tumor size prediction. The performances of both of these models are evaluated on other body models to examine their generalization capacity over unknown data. This work explores the feasibility of a low-cost portable AI-integrated microwave diagnostic device for lung tumor detection, which eliminates the risk of exposure to harmful ionizing radiations of X-ray and CT scans.

OCT Detection Chest Methodology Concept Academic Lab GenAI

LMOD+: A Comprehensive Multimodal Dataset and Benchmark for Developing and Evaluating Multimodal Large Language Models in Ophthalmology

Zhenyue Qin, Yang Liu, Yu Yin, Jinyu Ding, Haoran Zhang, Anran Li, Dylan Campbell, Xuansheng Wu, Ke Zou, Tiarnan D. L. Keenan, Emily Y. Chew, Zhiyong Lu, Yih-Chung Tham, Ninghao Liu, Xiuzhen Zhang, Qingyu Chen

•preprint•Sep 30 2025

Vision-threatening eye diseases pose a major global health burden, with timely diagnosis limited by workforce shortages and restricted access to specialized care. While multimodal large language models (MLLMs) show promise for medical image interpretation, advancing MLLMs for ophthalmology is hindered by the lack of comprehensive benchmark datasets suitable for evaluating generative models. We present a large-scale multimodal ophthalmology benchmark comprising 32,633 instances with multi-granular annotations across 12 common ophthalmic conditions and 5 imaging modalities. The dataset integrates imaging, anatomical structures, demographics, and free-text annotations, supporting anatomical structure recognition, disease screening, disease staging, and demographic prediction for bias evaluation. This work extends our preliminary LMOD benchmark with three major enhancements: (1) nearly 50% dataset expansion with substantial enlargement of color fundus photography; (2) broadened task coverage including binary disease diagnosis, multi-class diagnosis, severity classification with international grading standards, and demographic prediction; and (3) systematic evaluation of 24 state-of-the-art MLLMs. Our evaluations reveal both promise and limitations. Top-performing models achieved ~58% accuracy in disease screening under zero-shot settings, and performance remained suboptimal for challenging tasks like disease staging. We will publicly release the dataset, curation pipeline, and leaderboard to potentially advance ophthalmic AI applications and reduce the global burden of vision-threatening diseases.

Mixed Modality Classification Dataset Release In Silico Academic Lab Open Dataset GenAI

Leveraging ChatGPT for Report Error Audit: An Accuracy-Driven and Cost-Efficient Solution for Ophthalmic Imaging Reports.

Xu Y, Kang D, Shi D, Tham YC, Grzybowski A, Jin K

•papers•Sep 30 2025

Accurate ophthalmic imaging reports, including fundus fluorescein angiography (FFA) and ocular B-scan ultrasound, are essential for effective clinical decision-making. The current process, involving drafting by residents followed by review by ophthalmic technicians and ophthalmologists, is time-consuming and prone to errors. This study evaluates the effectiveness of ChatGPT-4o in auditing errors in FFA and ocular B-scan reports and assesses its potential to reduce time and costs within the reporting workflow. Preliminary 100 FFA and 80 ocular B-scan reports drafted by residents were analyzed using GPT-4o to identify the errors in identifying left or right eye and incorrect anatomical descriptions. The accuracy of GPT-4o was compared to retinal specialists, general ophthalmologists, and ophthalmic technicians. Additionally, a cost-effective analysis was conducted to estimate time and cost savings from integrating GPT-4o into the reporting process. A pilot real-world validation with 20 erroneous reports was also performed between GPT-4o and human reviewers. GPT-4o demonstrated a detection rate of 79.0% (158 of 200; 95% CI 73.0-85.0) across all examinations, which was comparable to the average detection performance of general ophthalmologists (78.0% [155 of 200; 95% CI 72.0-83.0]; P ≥ 0.09). Integration of GPT-4o reduced the average report review time by 86%, completing 180 ophthalmic reports in approximately 0.27 h compared to 2.17-3.19 h by human ophthalmologists. Additionally, compared to human reviewers, GPT-4o lowered the cost from $0.21 to $0.03 per report (savings of $0.18). In the real-world evaluation, GPT-4o detected 18 of 20 errors with no false positives, compared to 95-100% by human reviewers. GPT-4o effectively enhances the accuracy of ophthalmic imaging reports by identifying and correcting common errors. Its implementation can potentially alleviate the workload of ophthalmologists, streamline the reporting process, and reduce associated costs, thereby improving overall clinical workflow and patient outcomes.

Ultrasound LLM Radiology Report Retrospective Clinical Clinical Pilot Academic Lab GenAI

An interpretable generative multimodal neuroimaging-genomics framework for decoding Alzheimer's disease.

Dolci G, Cruciani F, Abdur Rahaman M, Abrol A, Chen J, Fu Z, Boscolo Galazzo I, Menegaz G, Calhoun VD

•papers•Sep 30 2025

Objective.Alzheimer's disease (AD) is the most prevalent form of dementia worldwide, encompassing a prodromal stage known as mild cognitive impairment (MCI), where patients may either progress to AD or remain stable. The objective of the work was to capture structural and functional modulations of brain structure and function relying on multimodal MRI data and single nucleotide polymorphisms, also in case of missing views, with the twofold goal of classifying AD patients versus healthy controls and detecting MCI converters.Approach.We propose a multimodal deep learning (DL)-based classification framework where a generative module employing cycle generative adversarial networks was introduced in the latent space for imputing missing data (a common issue of multimodal approaches). Explainable AI method was then used to extract input features' relevance allowing for post-hoc validation and enhancing the interpretability of the learned representations.Main results.Experimental results on two tasks, AD detection and MCI conversion, showed that our framework reached competitive performance in the state-of-the-art with an accuracy of0.926±0.02(CI [0.90, 0.95]) and0.711±0.01(CI [0.70, 0.72]) in the two tasks, respectively. The interpretability analysis revealed gray matter modulations in cortical and subcortical brain areas typically associated with AD. Moreover, impairments in sensory-motor and visual resting state networks along the disease continuum, as well as genetic mutations defining biological processes linked to endocytosis, amyloid-beta, and cholesterol, were identified.Significance.Our integrative and interpretable DL approach shows promising performance for AD detection and MCI prediction while shedding light on important biological insights.

MRI Classification Neurological Methodology In Silico GenAI

A Multimodal LLM Approach for Visual Question Answering on Multiparametric 3D Brain MRI

Arvind Murari Vepa, Yannan Yu, Jingru Gan, Anthony Cuturrufo, Weikai Li, Wei Wang, Fabien Scalzo, Yizhou Sun

•preprint•Sep 30 2025

We introduce mpLLM, a prompt-conditioned hierarchical mixture-of-experts (MoE) architecture for visual question answering over multi-parametric 3D brain MRI (mpMRI). mpLLM routes across modality-level and token-level projection experts to fuse multiple interrelated 3D modalities, enabling efficient training without image--report pretraining. To address limited image-text paired supervision, mpLLM integrates a synthetic visual question answering (VQA) protocol that generates medically relevant VQA from segmentation annotations, and we collaborate with medical experts for clinical validation. mpLLM outperforms strong medical VLM baselines by 5.3% on average across multiple mpMRI datasets. Our study features three main contributions: (1) the first clinically validated VQA dataset for 3D brain mpMRI, (2) a novel multimodal LLM that handles multiple interrelated 3D modalities, and (3) strong empirical results that demonstrate the medical utility of our methodology. Ablations highlight the importance of modality-level and token-level experts and prompt-conditioned routing. We have included our source code in the supplementary materials and will release our dataset upon publication.

MRI LLM Radiology Report Neurological Methodology In Silico Academic Lab Open Code Open Dataset GenAI

Filter Papers

Tags

Generating Findings for Jaw Cysts in Dental Panoramic Radiographs Using GPT-4o: Building a Two-Stage Self-Correction Loop with Structured Output (SLSO) Framework

Multimodal Foundation Models for Early Disease Detection

Evaluating GPT-4o for emergency disposition of complex respiratory cases with pulmonology consultation: a diagnostic accuracy study.

Artificial intelligence in regional anesthesia.

Current and novel approaches for critical care management of aneurysmal subarachnoid hemorrhage in critical care.

Design of AI-driven microwave imaging for lung tumor monitoring.

LMOD+: A Comprehensive Multimodal Dataset and Benchmark for Developing and Evaluating Multimodal Large Language Models in Ophthalmology

Leveraging ChatGPT for Report Error Audit: An Accuracy-Driven and Cost-Efficient Solution for Ophthalmic Imaging Reports.

An interpretable generative multimodal neuroimaging-genomics framework for decoding Alzheimer's disease.

A Multimodal LLM Approach for Visual Question Answering on Multiparametric 3D Brain MRI

Ready to Sharpen Your Edge?