Latest Papers on Radiology AI. Tags: GenAI

Improving AI models for rare thyroid cancer subtype by text guided diffusion models.

Dai F, Yao S, Wang M, Zhu Y, Qiu X, Sun P, Qiu C, Yin J, Shen G, Sun J, Wang M, Wang Y, Yang Z, Sang J, Wang X, Sun F, Cai W, Zhang X, Lu H

•papers•May 13 2025

Artificial intelligence applications in oncology imaging often struggle with diagnosing rare tumors. We identify significant gaps in detecting uncommon thyroid cancer types with ultrasound, where scarce data leads to frequent misdiagnosis. Traditional augmentation strategies do not capture the unique disease variations, hindering model training and performance. To overcome this, we propose a text-driven generative method that fuses clinical insights with image generation, producing synthetic samples that realistically reflect rare subtypes. In rigorous evaluations, our approach achieves substantial gains in diagnostic metrics, surpasses existing methods in authenticity and diversity measures, and generalizes effectively to other private and public datasets with various rare cancers. In this work, we demonstrate that text-guided image augmentation substantially enhances model accuracy and robustness for rare tumor detection, offering a promising avenue for more reliable and widespread clinical adoption.

Ultrasound Detection Abdominal Methodology In Silico Academic Lab GenAI Open Dataset

A Deep Learning-Driven Framework for Inhalation Injury Grading Using Bronchoscopy Images

Yifan Li, Alan W Pang, Jo Woon Chong

•preprint•May 13 2025

Inhalation injuries face a challenge in clinical diagnosis and grading due to the limitations of traditional methods, such as Abbreviated Injury Score (AIS), which rely on subjective assessments and show weak correlations with clinical outcomes. This study introduces a novel deep learning-based framework for grading inhalation injuries using bronchoscopy images with the duration of mechanical ventilation as an objective metric. To address the scarcity of medical imaging data, we propose enhanced StarGAN, a generative model that integrates Patch Loss and SSIM Loss to improve synthetic images' quality and clinical relevance. The augmented dataset generated by enhanced StarGAN significantly improved classification performance when evaluated using the Swin Transformer, achieving an accuracy of 77.78%, an 11.11% improvement over the original dataset. Image quality was assessed using the Fr\'echet Inception Distance (FID), where Enhanced StarGAN achieved the lowest FID of 30.06, outperforming baseline models. Burn surgeons confirmed the realism and clinical relevance of the generated images, particularly the preservation of bronchial structures and color distribution. These results highlight the potential of enhanced StarGAN in addressing data limitations and improving classification accuracy for inhalation injury grading.

Mixed Modality Classification Chest Methodology In Silico Open Code GenAI

A survey of deep-learning-based radiology report generation using multimodal inputs.

Wang X, Figueredo G, Li R, Zhang WE, Chen W, Chen X

•papers•May 13 2025

Automatic radiology report generation can alleviate the workload for physicians and minimize regional disparities in medical resources, therefore becoming an important topic in the medical image analysis field. It is a challenging task, as the computational model needs to mimic physicians to obtain information from multi-modal input data (i.e., medical images, clinical information, medical knowledge, etc.), and produce comprehensive and accurate reports. Recently, numerous works have emerged to address this issue using deep-learning-based methods, such as transformers, contrastive learning, and knowledge-base construction. This survey summarizes the key techniques developed in the most recent works and proposes a general workflow for deep-learning-based report generation with five main components, including multi-modality data acquisition, data preparation, feature learning, feature fusion and interaction, and report generation. The state-of-the-art methods for each of these components are highlighted. Additionally, we summarize the latest developments in large model-based methods and model explainability, along with public datasets, evaluation methods, current challenges, and future directions in this field. We have also conducted a quantitative comparison between different methods in the same experimental setting. This is the most up-to-date survey that focuses on multi-modality inputs and data fusion for radiology report generation. The aim is to provide comprehensive and rich information for researchers interested in automatic clinical report generation and medical image analysis, especially when using multimodal inputs, and to assist them in developing new algorithms to advance the field.

Mixed Modality Report Generation Review Academic Lab GenAI Open Dataset

A Deep Learning-Driven Inhalation Injury Grading Assistant Using Bronchoscopy Images

Yifan Li, Alan W Pang, Jo Woon Chong

•preprint•May 13 2025

Inhalation injuries present a challenge in clinical diagnosis and grading due to Conventional grading methods such as the Abbreviated Injury Score (AIS) being subjective and lacking robust correlation with clinical parameters like mechanical ventilation duration and patient mortality. This study introduces a novel deep learning-based diagnosis assistant tool for grading inhalation injuries using bronchoscopy images to overcome subjective variability and enhance consistency in severity assessment. Our approach leverages data augmentation techniques, including graphic transformations, Contrastive Unpaired Translation (CUT), and CycleGAN, to address the scarcity of medical imaging data. We evaluate the classification performance of two deep learning models, GoogLeNet and Vision Transformer (ViT), across a dataset significantly expanded through these augmentation methods. The results demonstrate GoogLeNet combined with CUT as the most effective configuration for grading inhalation injuries through bronchoscopy images and achieves a classification accuracy of 97.8%. The histograms and frequency analysis evaluations reveal variations caused by the augmentation CUT with distribution changes in the histogram and texture details of the frequency spectrum. PCA visualizations underscore the CUT substantially enhances class separability in the feature space. Moreover, Grad-CAM analyses provide insight into the decision-making process; mean intensity for CUT heatmaps is 119.6, which significantly exceeds 98.8 of the original datasets. Our proposed tool leverages mechanical ventilation periods as a novel grading standard, providing comprehensive diagnostic support.

X-Ray Classification Chest Methodology In Silico GenAI

Evaluating the reference accuracy of large language models in radiology: a comparative study across subspecialties.

Güneş YC, Cesur T, Çamur E

•papers•May 12 2025

This study aimed to compare six large language models (LLMs) [Chat Generative Pre-trained Transformer (ChatGPT)o1-preview, ChatGPT-4o, ChatGPT-4o with canvas, Google Gemini 1.5 Pro, Claude 3.5 Sonnet, and Claude 3 Opus] in generating radiology references, assessing accuracy, fabrication, and bibliographic completeness. In this cross-sectional observational study, 120 open-ended questions were administered across eight radiology subspecialties (neuroradiology, abdominal, musculoskeletal, thoracic, pediatric, cardiac, head and neck, and interventional radiology), with 15 questions per subspecialty. Each question prompted the LLMs to provide responses containing four references with in-text citations and complete bibliographic details (authors, title, journal, publication year/month, volume, issue, page numbers, and PubMed Identifier). References were verified using Medline, Google Scholar, the Directory of Open Access Journals, and web searches. Each bibliographic element was scored for correctness, and a composite final score [(FS): 0-36] was calculated by summing the correct elements and multiplying this by a 5-point verification score for content relevance. The FS values were then categorized into a 5-point Likert scale reference accuracy score (RAS: 0 = fabricated; 4 = fully accurate). Non-parametric tests (Kruskal-Wallis, Tamhane's T2, Wilcoxon signed-rank test with Bonferroni correction) were used for statistical comparisons. Claude 3.5 Sonnet demonstrated the highest reference accuracy, with 80.8% fully accurate references (RAS 4) and a fabrication rate of 3.1%, significantly outperforming all other models (P < 0.001). Claude 3 Opus ranked second, achieving 59.6% fully accurate references and a fabrication rate of 18.3% (P < 0.001). ChatGPT-based models (ChatGPT-4o, ChatGPT-4o with canvas, and ChatGPT o1-preview) exhibited moderate accuracy, with fabrication rates ranging from 27.7% to 52.9% and <8% fully accurate references. Google Gemini 1.5 Pro had the lowest performance, achieving only 2.7% fully accurate references and the highest fabrication rate of 60.6% (P < 0.001). Reference accuracy also varied by subspecialty, with neuroradiology and cardiac radiology outperforming pediatric and head and neck radiology. Claude 3.5 Sonnet significantly outperformed all other models in generating verifiable radiology references, and Claude 3 Opus showed moderate performance. In contrast, ChatGPT models and Google Gemini 1.5 Pro delivered substantially lower accuracy with higher rates of fabricated references, highlighting current limitations in automated academic citation generation. The high accuracy of Claude 3.5 Sonnet can improve radiology literature reviews, research, and education with dependable references. The poor performance of other models, with high fabrication rates, risks misinformation in clinical and academic settings and highlights the need for refinement to ensure safe and effective use.

LLM Radiology Report Retrospective Clinical In Silico Academic Lab GenAI Policy

Benchmarking Radiology Report Generation From Noisy Free-Texts.

Yuan Y, Zheng Y, Qu L

•papers•May 12 2025

Automatic radiology report generation can enhance diagnostic efficiency and accuracy. However, clean open-source imaging scan-report pairs are limited in scale and variety. Moreover, the vast amount of radiological texts available online is often too noisy to be directly employed. To address this challenge, we introduce a novel task called Noisy Report Refinement (NRR), which generates radiology reports from noisy free-texts. To achieve this, we propose a report refinement pipeline that leverages large language models (LLMs) enhanced with guided self-critique and report selection strategies. To address the inability of existing radiology report generation metrics in measuring cleanliness, radiological usefulness, and factual correctness across various modalities of reports in NRR task, we introduce a new benchmark, NRRBench, for NRR evaluation. This benchmark includes two online-sourced datasets and four clinically explainable LLM-based metrics: two metrics evaluate the matching rate of radiology entities and modality-specific template attributes respectively, one metric assesses report cleanliness, and a combined metric evaluates overall NRR performance. Experiments demonstrate that guided self-critique and report selection strategies significantly improve the quality of refined reports. Additionally, our proposed metrics show a much higher correlation with noisy rate and error count of reports than radiology report generation metrics in evaluating NRR.

Mixed Modality LLM Radiology Report Methodology In Silico Benchmark SOTA GenAI

A comparison of performance of DeepSeek-R1 model-generated responses to musculoskeletal radiology queries against ChatGPT-4 and ChatGPT-4o - A feasibility study.

Uldin H, Saran S, Gandikota G, Iyengar KP, Vaishya R, Parmar Y, Rasul F, Botchu R

•papers•May 12 2025

Artificial Intelligence (AI) has transformed society and chatbots using Large Language Models (LLM) are playing an increasing role in scientific research. This study aims to assess and compare the efficacy of newer DeepSeek R1 and ChatGPT-4 and 4o models in answering scientific questions about recent research. We compared output generated from ChatGPT-4, ChatGPT-4o, and DeepSeek-R1 in response to ten standardized questions in the setting of musculoskeletal (MSK) radiology. These were independently analyzed by one MSK radiologist and one final-year MSK radiology trainee and graded using a Likert scale from 1 to 5 (1 being inaccurate to 5 being accurate). Five DeepSeek answers were significantly inaccurate and provided fictitious references only on prompting. All ChatGPT-4 and 4o answers were well-written with good content, the latter including useful and comprehensive references. ChatGPT-4o generates structured research answers to questions on recent MSK radiology research with useful references in all our cases, enabling reliable usage. DeepSeek-R1 generates articles that, on the other hand, may appear authentic to the unsuspecting eye but contain a higher amount of falsified and inaccurate information in the current version. Further iterations may improve these accuracies.

Mixed Modality LLM Radiology Report Musculoskeletal Retrospective Clinical In Silico Academic Lab GenAI Reproducibility

A Clinical Neuroimaging Platform for Rapid, Automated Lesion Detection and Personalized Post-Stroke Outcome Prediction

Brzus, M., Griffis, J. C., Riley, C. J., Bruss, J., Shea, C., Johnson, H. J., Boes, A. D.

•preprint•May 11 2025

Predicting long-term functional outcomes for individuals with stroke is a significant challenge. Solving this challenge will open new opportunities for improving stroke management by informing acute interventions and guiding personalized rehabilitation strategies. The location of the stroke is a key predictor of outcomes, yet no clinically deployed tools incorporate lesion location information for outcome prognostication. This study responds to this critical need by introducing a fully automated, three-stage neuroimaging processing and machine learning pipeline that predicts personalized outcomes from clinical imaging in adult ischemic stroke patients. In the first stage, our system automatically processes raw DICOM inputs, registers the brain to a standard template, and uses deep learning models to segment the stroke lesion. In the second stage, lesion location and automatically derived network features are input into statistical models trained to predict long-term impairments from a large independent cohort of lesion patients. In the third stage, a structured PDF report is generated using a large language model that describes the strokes location, the arterial distribution, and personalized prognostic information. We demonstrate the viability of this approach in a proof-of-concept application predicting select cognitive outcomes in a stroke cohort. Brain-behavior models were pre-trained to predict chronic impairment on 28 different cognitive outcomes in a large cohort of patients with focal brain lesions (N=604). The automated pipeline used these models to predict outcomes from clinically acquired MRIs in an independent ischemic stroke cohort (N=153). Starting from raw clinical DICOM images, we show that our pipeline can generate outcome predictions for individual patients in less than 3 minutes with 96% concordance relative to methods requiring manual processing. We also show that prediction accuracy is enhanced using models that incorporate lesion location, lesion-associated network information, and demographics. Our results provide a strong proof-of-concept and lay the groundwork for developing imaging-based clinical tools for stroke outcome prognostication.

MRI Segmentation Neurological Retrospective Clinical In Silico Academic Lab GenAI

APD-FFNet: A Novel Explainable Deep Feature Fusion Network for Automated Periodontitis Diagnosis on Dental Panoramic Radiography.

Resul ES, Senirkentli GB, Bostanci E, Oduncuoglu BF

•papers•May 9 2025

This study introduces APD-FFNet, a novel, explainable deep learning architecture for automated periodontitis diagnosis using panoramic radiographs. A total of 337 panoramic radiographs, annotated by a periodontist, served as the dataset. APD-FFNet combines custom convolutional and transformer-based layers within a deep feature fusion framework that captures both local and global contextual features. Performance was evaluated using accuracy, the F1 score, the area under the receiver operating characteristic curve, the Jaccard similarity coefficient, and the Matthews correlation coefficient. McNemar's test confirmed statistical significance, and SHapley Additive exPlanations provided interpretability insights. APD-FFNet achieved 94% accuracy, a 93.88% F1 score, 93.47% area under the receiver operating characteristic curve, 88.47% Jaccard similarity coefficient, and 88.46% Matthews correlation coefficient, surpassing comparable approaches. McNemar's test validated these findings (p < 0.05). Explanations generated by SHapley Additive exPlanations highlighted important regions in each radiograph, supporting clinical applicability. By merging convolutional and transformer-based layers, APD-FFNet establishes a new benchmark in automated, interpretable periodontitis diagnosis, with low hyperparameter sensitivity facilitating its integration into regular dental practice. Its adaptable design suggests broader relevance to other medical imaging domains. This is the first feature fusion method specifically devised for periodontitis diagnosis, supported by an expert-curated dataset and advanced explainable artificial intelligence. Its robust accuracy, low hyperparameter sensitivity, and transparent outputs set a new standard for automated periodontal analysis.

X-Ray Classification Retrospective Clinical In Silico Academic Lab GenAI

Computationally Efficient Diffusion Models in Medical Imaging: A Comprehensive Review

Abdullah, Tao Huang, Ickjai Lee, Euijoon Ahn

•preprint•May 9 2025

The diffusion model has recently emerged as a potent approach in computer vision, demonstrating remarkable performances in the field of generative artificial intelligence. Capable of producing high-quality synthetic images, diffusion models have been successfully applied across a range of applications. However, a significant challenge remains with the high computational cost associated with training and generating these models. This study focuses on the efficiency and inference time of diffusion-based generative models, highlighting their applications in both natural and medical imaging. We present the most recent advances in diffusion models by categorizing them into three key models: the Denoising Diffusion Probabilistic Model (DDPM), the Latent Diffusion Model (LDM), and the Wavelet Diffusion Model (WDM). These models play a crucial role in medical imaging, where producing fast, reliable, and high-quality medical images is essential for accurate analysis of abnormalities and disease diagnosis. We first investigate the general framework of DDPM, LDM, and WDM and discuss the computational complexity gap filled by these models in natural and medical imaging. We then discuss the current limitations of these models as well as the opportunities and future research directions in medical imaging.

Mixed Modality Image Synthesis Review Concept GenAI

Filter Papers

Tags

Improving AI models for rare thyroid cancer subtype by text guided diffusion models.

A Deep Learning-Driven Framework for Inhalation Injury Grading Using Bronchoscopy Images

A survey of deep-learning-based radiology report generation using multimodal inputs.

A Deep Learning-Driven Inhalation Injury Grading Assistant Using Bronchoscopy Images

Evaluating the reference accuracy of large language models in radiology: a comparative study across subspecialties.

Benchmarking Radiology Report Generation From Noisy Free-Texts.

A comparison of performance of DeepSeek-R1 model-generated responses to musculoskeletal radiology queries against ChatGPT-4 and ChatGPT-4o - A feasibility study.

A Clinical Neuroimaging Platform for Rapid, Automated Lesion Detection and Personalized Post-Stroke Outcome Prediction

APD-FFNet: A Novel Explainable Deep Feature Fusion Network for Automated Periodontitis Diagnosis on Dental Panoramic Radiography.

Computationally Efficient Diffusion Models in Medical Imaging: A Comprehensive Review

Ready to Sharpen Your Edge?