Latest Papers on Radiology AI. Tags: None

AI-Driven Integrated System for Burn Depth Prediction With Electronic Medical Records: Algorithm Development and Validation.

Rahman MM, Masry ME, Gnyawali SC, Xue Y, Gordillo G, Wachs JP

•papers•Aug 15 2025

Burn injuries represent a significant clinical challenge due to the complexity of accurately assessing burn depth, which directly influences the course of treatment and patient outcomes. Traditional diagnostic methods primarily rely on visual inspection by experienced burn surgeons. Studies report diagnostic accuracies of around 76% for experts, dropping to nearly 50% for less experienced clinicians. Such inaccuracies can result in suboptimal clinical decisions-delaying vital surgical interventions in severe cases or initiating unnecessary treatments for superficial burns. This diagnostic variability not only compromises patient care but also strains health care resources and increases the likelihood of adverse outcomes. Hence, a more consistent and precise approach to burn classification is urgently needed. The objective is to determine whether a multimodal integrated artificial intelligence (AI) system for accurate classification of burn depth can preserve diagnostic accuracy and provide an important resource when used as part of the electronic medical record (EMR). This study used a novel multimodal AI system, integrating digital photographs and ultrasound tissue Doppler imaging (TDI) data to accurately assess burn depth. These imaging modalities were accessed and processed through an EMR system, enabling real-time data retrieval and AI-assisted evaluation. TDI was instrumental in evaluating the biomechanical properties of subcutaneous tissues, using color-coded images to identify burn-induced changes in tissue stiffness and elasticity. The collected imaging data were uploaded to the EMR system (DrChrono), where they were processed by a vision-language model built on GPT-4 architecture. This model received expert-formulated prompts describing how to interpret both digital and TDI images, guiding the AI in making explainable classifications. This study evaluated whether a multimodal AI classifier, designed to identify first-, second-, and third-degree burns, could be effectively applied to imaging data stored within an EMR system. The classifier achieved an overall accuracy of 84.38%, significantly surpassing human performance benchmarks typically cited in the literature. This highlights the potential of the AI model to serve as a robust clinical decision support tool, especially in settings lacking highly specialized expertise. In addition to accuracy, the classifier demonstrated strong performance across multiple evaluation metrics. The classifier's ability to distinguish between burn severities was further validated by the area under the receiver operating characteristic: 0.97 for first-degree, 0.96 for second-degree, and a perfect 1.00 for third-degree burns, each with narrow 95% CIs. The storage of multimodal imaging data within the EMR, along with the ability for post hoc analysis by AI algorithms, offers significant advancements in burn care, enabling real-time burn depth prediction on currently available data. Using digital photos for superficial burns, easily diagnosed through physical examinations, reduces reliance on TDI, while TDI helps distinguish deep second- and third-degree burns, enhancing diagnostic efficiency.

Mixed Modality Classification Retrospective Clinical In Silico Academic Lab Benchmark SOTA GenAI

Prospective validation of an artificial intelligence assessment in a cohort of applicants seeking financial compensation for asbestosis (PROSBEST).

Smesseim I, Lipman KBWG, Trebeschi S, Stuiver MM, Tissier R, Burgers JA, de Gooijer CJ

•papers•Aug 15 2025

Asbestosis, a rare pneumoconiosis marked by diffuse pulmonary fibrosis, arises from prolonged asbestos exposure. Its diagnosis, guided by the Helsinki criteria, relies on exposure history, clinical findings, radiology, and lung function. However, interobserver variability complicates diagnoses and financial compensation. This study prospectively validated the sensitivity of an AI-driven assessment for asbestosis compensation in the Netherlands. Secondary objectives included evaluating specificity, accuracy, predictive values, area under the curve of the receiver operating characteristic (ROC-AUC), area under the precision-recall curve (PR-AUC), and interobserver variability. Between September 2020 and July 2022, 92 adult compensation applicants were assessed using both AI models and pulmonologists' reviews based on Dutch Health Council criteria. The AI model assigned an asbestosis probability score: negative (< 35), uncertain (35-66), or positive (≥ 66). Uncertain cases underwent additional reviews for a final determination. The AI assessment demonstrated sensitivity of 0.86 (95% confidence interval: 0.77-0.95), specificity of 0.85 (0.76-0.97), accuracy of 0.87 (0.79-0.93), ROC-AUC of 0.92 (0.84-0.97), and PR-AUC of 0.95 (0.89-0.99). Despite strong metrics, the sensitivity target of 98% was unmet. Pulmonologist reviews showed moderate to substantial interobserver variability. The AI-driven approach demonstrated robust accuracy but insufficient sensitivity for validation. Addressing interobserver variability and incorporating objective fibrosis measurements could enhance future reliability in clinical and compensation settings. The AI-driven assessment for financial compensation of asbestosis showed adequate accuracy but did not meet the required sensitivity for validation. We prospectively assessed the sensitivity of an AI-driven assessment procedure for financial compensation of asbestosis. The AI-driven asbestosis probability score underperformed across all metrics compared to internal testing. The AI-driven assessment procedure achieved a sensitivity of 0.86 (95% confidence interval: 0.77-0.95). It did not meet the predefined sensitivity target.

CT Classification Chest Prospective Clinical Pilot Academic Lab Reproducibility

Comprehensive analysis of [18F]MFBG biodistribution normal patterns and variability in pediatric patients with neuroblastoma.

Wang P, Chen X, Yan X, Yan J, Yang S, Mao J, Li F, Su X

•papers•Aug 15 2025

[18F]-meta-fluorobenzylguanidine ([18F]MFBG) PET/CT is a promising imaging modality for neural crest-derived tumors, particularly neuroblastoma. Accurate interpretation necessitates an understanding of normal biodistribution and variations in physiological uptake. This study aimed to systematically characterize the physiological distribution and variability of [18F]MFBG uptake in pediatric patients to enhance clinical interpretation and differentiate normal from pathological uptake. We retrospectively analyzed [18F]MFBG PET/CT scans from 169 pediatric neuroblastoma patients, including 20 in confirmed remission, for detailed biodistribution analysis. Organ uptake was quantified using both manual segmentation and deep learning(DL)-based automatic segmentation methods. Patterns of physiological uptake variants were categorized and illustrated using representative cases. [18F]MFBG demonstrated consistent physiological uptake in the salivary glands (SUVmax 9.8 ± 3.3), myocardium (7.1 ± 1.7), and adrenal glands (4.6 ± 0.9), with low activity in bone (0.6 ± 0.2) and muscle (0.8 ± 0.2). DL-based analysis confirmed uniform, mild uptake across vertebral and peripheral skeletal structures (SUVmean 0.47 ± 0.08). Three physiological liver uptake patterns were identified: uniform (43%), left-lobe predominant (31%), and marginal (26%). Asymmetric uptake in the pancreatic head, transient brown adipose tissue activity, gallbladder excretion, and symmetric epiphyseal uptake were also recorded. These variants were not associated with structural abnormalities or clinical recurrence and showed distinct patterns from pathological lesions. This study establishes a reference for normal [18F]MFBG biodistribution and physiological variants in children. Understanding these patterns is essential for accurate image interpretation and the avoidance of diagnostic pitfalls in pediatric neuroblastoma patients.

PET Segmentation Retrospective Clinical In Silico Academic Lab

Deep learning radiomics of elastography for diagnosing compensated advanced chronic liver disease: an international multicenter study.

Lu X, Zhang H, Kuroda H, Garcovich M, de Ledinghen V, Grgurević I, Linghu R, Ding H, Chang J, Wu M, Feng C, Ren X, Liu C, Song T, Meng F, Zhang Y, Fang Y, Ma S, Wang J, Qi X, Tian J, Yang X, Ren J, Liang P, Wang K

•papers•Aug 15 2025

Accurate, noninvasive diagnosis of compensated advanced chronic liver disease (cACLD) is essential for effective clinical management but remains challenging. This study aimed to develop a deep learning-based radiomics model using international multicenter data and to evaluate its performance by comparing it to the two-dimensional shear wave elastography (2D-SWE) cut-off method covering multiple countries or regions, etiologies, and ultrasound device manufacturers. This retrospective study included 1937 adult patients with chronic liver disease due to hepatitis B, hepatitis C, or metabolic dysfunction-associated steatotic liver disease. All patients underwent 2D-SWE imaging and liver biopsy at 17 centers across China, Japan, and Europe using devices from three manufacturers (SuperSonic Imagine, General Electric, and Mindray). The proposed generalized deep learning radiomics of elastography model integrated both elastographic images and liver stiffness measurements and was trained and tested on stratified internal and external datasets. A total of 1937 patients with 9472 2D-SWE images were included in the statistical analysis. Compared to 2D-SWE, the model achieved a higher area under the receiver operating characteristic curve (AUC) (0.89 vs 0.83, P = 0.025). It also achieved a highly consistent diagnosis across all subanalyses (P values: 0.21-0.91), whereas 2D-SWE exhibited different AUCs in the country or region (P < 0.001) and etiology (P = 0.005) subanalyses but not in the manufacturer subanalysis (P = 0.24). The model demonstrated more accurate and robust performance in noninvasive cACLD diagnosis than 2D-SWE across different countries or regions, etiologies, and manufacturers.

Ultrasound Classification Abdominal Retrospective Clinical In Silico Academic Lab

From dictation to diagnosis: enhancing radiology reporting with integrated speech recognition in multimodal large language models.

Gertz RJ, Beste NC, Dratsch T, Lennartz S, Bremm J, Iuga AI, Bunck AC, Laukamp KR, Schönfeld M, Kottlors J

•papers•Aug 15 2025

This study evaluates the efficiency, accuracy, and cost-effectiveness of radiology reporting using audio multimodal large language models (LLMs) compared to conventional reporting with speech recognition software. We hypothesized that providing minimal audio input would enable a multimodal LLM to generate complete radiological reports. 480 reports from 80 retrospective multimodal imaging studies were reported by two board-certified radiologists using three workflows: conventional workflow (C-WF) with speech recognition software to generate findings and impressions separately and LLM-based workflow (LLM-WF) using the state-of-the-art LLMs GPT-4o and Claude Sonnet 3.5. Outcome measures included reporting time, corrections and personnel cost per report. Two radiologists assessed formal structure and report quality. Statistical analysis used ANOVA and Tukey's post hoc tests (p < 0.05). LLM-WF significantly reduced reporting time (GPT-4o/Sonnet 3.5: 38.9 s ± 22.7 s vs. C-WF: 88.0 s ± 60.9 s, p < 0.01), required fewer corrections (GPT-4o: 1.0 ± 1.1, Sonnet 3.5: 0.9 ± 1.0 vs. C-WF: 2.4 ± 2.5, p < 0.01), and lowered costs (GPT-4o: $2.3 ± $1.4, Sonnet 3.5: $2.4 ± $1.4 vs. C-WF: $3.0 ± $2.1, p < 0.01). Reports generated with Sonnet 3.5 were rated highest in quality, while GPT-4o and conventional reports showed no difference. Multimodal LLMs can generate high-quality radiology reports based solely on minimal audio input, with greater speed, fewer corrections, and reduced costs compared to conventional speech-based workflows. However, future implementation may involve licensing costs, and generalizability to broader clinical contexts warrants further evaluation. Question Comparing time, accuracy, cost, and report quality of reporting using audio input functionality of GPT-4o and Claude Sonnet 3.5 to conventional reporting with speech recognition. Findings Large language models enable radiological reporting via minimal audio input, reducing turnaround time and costs without quality loss compared to conventional reporting with speech recognition. Clinical relevance Large language model-based reporting from minimal audio input has the potential to improve efficiency and report quality, supporting more streamlined workflows in clinical radiology.

Mixed Modality LLM Radiology Report Retrospective Clinical In Silico Academic Lab GenAI

Fine-Tuned Large Language Model for Extracting Pretreatment Pancreatic Cancer According to Computed Tomography Radiology Reports.

Hirakawa H, Yasaka K, Nomura T, Tsujimoto R, Sonoda Y, Kiryu S, Abe O

•papers•Aug 15 2025

This study aimed to examine the performance of a fine-tuned large language model (LLM) in extracting pretreatment pancreatic cancer according to computed tomography (CT) radiology reports and to compare it with that of readers. This retrospective study included 2690, 886, and 378 CT reports for the training, validation, and test datasets, respectively. Clinical indication, image finding, and imaging diagnosis sections of the radiology report (used as input data) were reviewed and categorized into groups 0 (no pancreatic cancer), 1 (after treatment for pancreatic cancer), and 2 (pretreatment pancreatic cancer present) (used as reference data). A pre-trained Bidirectional Encoder Representation from the Transformers Japanese model was fine-tuned with the training and validation dataset. Group 1 data were undersampled and group 2 data were oversampled in the training dataset due to group imbalance. The best-performing model from the validation set was subsequently assessed using the test dataset for testing purposes. Additionally, three readers (readers 1, 2, and 3) were involved in classifying reports within the test dataset. The fine-tuned LLM and readers 1, 2, and 3 demonstrated an overall accuracy of 0.942, 0.984, 0.979, and 0.947; sensitivity for differentiating groups 0/1/2 of 0.944/0.960/0.921, 0.976/1.000/0.976, 0.984/0.984/0.968, and 1.000/1.000/0.841; and total time required for classification of 49 s, 2689 s, 3496 s, and 4887 s, respectively. Fine-tuned LLM effectively extracted patients with pretreatment pancreatic cancer according to CT radiology reports, and its performance was comparable to that of readers in a shorter time.

CT LLM Radiology Report Abdominal Retrospective Clinical In Silico Academic Lab GenAI

A novel interpreted deep network for Alzheimer's disease prediction based on inverted self attention and vision transformer.

Ibrar W, Khan MA, Hamza A, Rubab S, Alqahtani O, Alouane MT, Teng S, Nam Y

•papers•Aug 15 2025

In the world, Alzheimer's disease (AD) is the utmost public reason for dementia. AD causes memory loss and disturbing mental function impairment in aging people. The loss of memory and disturbing mental function brings a significant load on patients as well as on society. So far, there is no actual treatment that can cure AD; however, early diagnosis can slow down this disease. Deep learning has shown substantial success in diagnosing AZ disease. However, challenges remain due to limited data, improper model selection, and extraction of irrelevant features. In this work, we proposed a fully automated framework based on the fusion of a vision transformer and a novel inverted residual bottleneck with self-attention (IRBwSA) for AD diagnosis. In the first step, data augmentation was performed to balance the selected dataset. After that, the vision model is designed and modified according to the dataset. Similarly, a new inverted bottleneck self-attention model is developed. The designed models are trained on the augmented dataset, and extracted features are fused using a novel search-based approach. Moreover, the designed models are interpreted using an explainable artificial intelligence technique named LIME. The fused features are finally classified using a shallow wide neural network and other classifiers. The experimental process was conducted on an augmented MRI dataset, and 96.1% accuracy and 96.05% precision rate were obtained. Comparison with a few recent techniques shows the proposed framework's better performance.

MRI Classification Neurological Methodology In Silico Academic Lab Benchmark SOTA

Is ChatGPT-5 Ready for Mammogram VQA?

Qiang Li, Shansong Wang, Mingzhe Hu, Mojtaba Safari, Zachary Eidex, Xiaofeng Yang

•preprint•Aug 15 2025

Mammogram visual question answering (VQA) integrates image interpretation with clinical reasoning and has potential to support breast cancer screening. We systematically evaluated the GPT-5 family and GPT-4o model on four public mammography datasets (EMBED, InBreast, CMMD, CBIS-DDSM) for BI-RADS assessment, abnormality detection, and malignancy classification tasks. GPT-5 consistently was the best performing model but lagged behind both human experts and domain-specific fine-tuned models. On EMBED, GPT-5 achieved the highest scores among GPT variants in density (56.8%), distortion (52.5%), mass (64.5%), calcification (63.5%), and malignancy (52.8%) classification. On InBreast, it attained 36.9% BI-RADS accuracy, 45.9% abnormality detection, and 35.0% malignancy classification. On CMMD, GPT-5 reached 32.3% abnormality detection and 55.0% malignancy accuracy. On CBIS-DDSM, it achieved 69.3% BI-RADS accuracy, 66.0% abnormality detection, and 58.2% malignancy accuracy. Compared with human expert estimations, GPT-5 exhibited lower sensitivity (63.5%) and specificity (52.3%). While GPT-5 exhibits promising capabilities for screening tasks, its performance remains insufficient for high-stakes clinical imaging applications without targeted domain adaptation and optimization. However, the tremendous improvements in performance from GPT-4o to GPT-5 show a promising trend in the potential for general large language models (LLMs) to assist with mammography VQA tasks.

Mammography LLM Radiology Report Breast Retrospective Clinical In Silico Benchmark SOTA

Efficient Image-to-Image Schrödinger Bridge for CT Field of View Extension

Zhenhao Li, Long Yang, Xiaojie Yin, Haijun Yu, Jiazhou Wang, Hongbin Han, Weigang Hu, Yixing Huang

•preprint•Aug 15 2025

Computed tomography (CT) is a cornerstone imaging modality for non-invasive, high-resolution visualization of internal anatomical structures. However, when the scanned object exceeds the scanner's field of view (FOV), projection data are truncated, resulting in incomplete reconstructions and pronounced artifacts near FOV boundaries. Conventional reconstruction algorithms struggle to recover accurate anatomy from such data, limiting clinical reliability. Deep learning approaches have been explored for FOV extension, with diffusion generative models representing the latest advances in image synthesis. Yet, conventional diffusion models are computationally demanding and slow at inference due to their iterative sampling process. To address these limitations, we propose an efficient CT FOV extension framework based on the image-to-image Schr\"odinger Bridge (I$^2$SB) diffusion model. Unlike traditional diffusion models that synthesize images from pure Gaussian noise, I$^2$SB learns a direct stochastic mapping between paired limited-FOV and extended-FOV images. This direct correspondence yields a more interpretable and traceable generative process, enhancing anatomical consistency and structural fidelity in reconstructions. I$^2$SB achieves superior quantitative performance, with root-mean-square error (RMSE) values of 49.8\,HU on simulated noisy data and 152.0HU on real data, outperforming state-of-the-art diffusion models such as conditional denoising diffusion probabilistic models (cDDPM) and patch-based diffusion methods. Moreover, its one-step inference enables reconstruction in just 0.19s per 2D slice, representing over a 700-fold speedup compared to cDDPM (135s) and surpassing diffusionGAN (0.58s), the second fastest. This combination of accuracy and efficiency makes I$^2$SB highly suitable for real-time or clinical deployment.

CT Image Synthesis Methodology In Silico Academic Lab Benchmark SOTA

Automating the Referral of Bone Metastases Patients With and Without the Use of Large Language Models.

Sangwon KL, Han X, Becker A, Zhang Y, Ni R, Zhang J, Alber DA, Alyakin A, Nakatsuka M, Fabbri N, Aphinyanaphongs Y, Yang JT, Chachoua A, Kondziolka D, Laufer I, Oermann EK

•papers•Aug 15 2025

Bone metastases, affecting more than 4.8% of patients with cancer annually, and particularly spinal metastases require urgent intervention to prevent neurological complications. However, the current process of manually reviewing radiological reports leads to potential delays in specialist referrals. We hypothesized that natural language processing (NLP) review of routine radiology reports could automate the referral process for timely multidisciplinary care of spinal metastases. We assessed 3 NLP models-a rule-based regular expression (RegEx) model, GPT-4, and a specialized Bidirectional Encoder Representations from Transformers (BERT) model (NYUTron)-for automated detection and referral of bone metastases. Study inclusion criteria targeted patients with active cancer diagnoses who underwent advanced imaging (computed tomography, MRI, or positron emission tomography) without previous specialist referral. We defined 2 separate tasks: task of identifying clinically significant bone metastatic terms (lexical detection), and identifying cases needing a specialist follow-up (clinical referral). Models were developed using 3754 hand-labeled advanced imaging studies in 2 phases: phase 1 focused on spine metastases, and phase 2 generalized to bone metastases. Standard McRae's line performance metrics were evaluated and compared across all stages and tasks. In the lexical detection, a simple RegEx achieved the highest performance (sensitivity 98.4%, specificity 97.6%, F1 = 0.965), followed by NYUTron (sensitivity 96.8%, specificity 89.9%, and F1 = 0.787). For the clinical referral task, RegEx also demonstrated superior performance (sensitivity 92.3%, specificity 87.5%, and F1 = 0.936), followed by a fine-tuned NYUTron model (sensitivity 90.0%, specificity 66.7%, and F1 = 0.750). An NLP-based automated referral system can accurately identify patients with bone metastases requiring specialist evaluation. A simple RegEx model excels in syntax-based identification and expert-informed rule generation for efficient referral patient recommendation in comparison with advanced NLP models. This system could significantly reduce missed follow-ups and enhance timely intervention for patients with bone metastases.

Mixed Modality Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab GenAI

Filter Papers

Tags

AI-Driven Integrated System for Burn Depth Prediction With Electronic Medical Records: Algorithm Development and Validation.

Prospective validation of an artificial intelligence assessment in a cohort of applicants seeking financial compensation for asbestosis (PROSBEST).

Comprehensive analysis of [<sup>18</sup>F]MFBG biodistribution normal patterns and variability in pediatric patients with neuroblastoma.

Deep learning radiomics of elastography for diagnosing compensated advanced chronic liver disease: an international multicenter study.

From dictation to diagnosis: enhancing radiology reporting with integrated speech recognition in multimodal large language models.

Fine-Tuned Large Language Model for Extracting Pretreatment Pancreatic Cancer According to Computed Tomography Radiology Reports.

A novel interpreted deep network for Alzheimer's disease prediction based on inverted self attention and vision transformer.

Is ChatGPT-5 Ready for Mammogram VQA?

Efficient Image-to-Image Schrödinger Bridge for CT Field of View Extension

Automating the Referral of Bone Metastases Patients With and Without the Use of Large Language Models.

Ready to Sharpen Your Edge?