Latest Papers on Radiology AI. Tags: GenAI

LLM-Based Extraction of Imaging Features from Radiology Reports: Automating Disease Activity Scoring in Crohn's Disease.

Dehdab R, Mankertz F, Brendel JM, Maalouf N, Kaya K, Afat S, Kolahdoozan S, Radmard AR

•papers•Aug 8 2025

Large Language Models (LLMs) offer a promising solution for extracting structured clinical information from free-text radiology reports. The Simplified Magnetic Resonance Index of Activity (sMARIA) is a validated scoring system used to quantify Crohn's disease (CD) activity based on Magnetic Resonance Enterography (MRE) findings. This study aims to evaluate the performance of two advanced LLMs in extracting key imaging features and computing sMARIA scores from free-text MRE reports. This retrospective study included 117 anonymized free-text MRE reports from patients with confirmed CD. ChatGPT (GPT-4o) and DeepSeek (DeepSeek-R1) were prompted using a structured input designed to extract four key radiologic features relevant to sMARIA: bowel wall thickness, mural edema, perienteric fat stranding, and ulceration. LLM outputs were evaluated against radiologist annotations at both the segment and feature levels. Segment-level agreement was assessed using accuracy, mean absolute error (MAE) and Pearson correlation. Feature-level performance was evaluated using sensitivity, specificity, precision, and F1-score. Errors including confabulations were recorded descriptively. ChatGPT achieved a segment-level accuracy of 98.6%, MAE of 0.17, and Pearson correlation of 0.99. DeepSeek achieved 97.3% accuracy, MAE of 0.51, and correlation of 0.96. At the feature level, ChatGPT yielded an F1-score of 98.8% (precision 97.8%, sensitivity 99.9%), while DeepSeek achieved 97.9% (precision 96.0%, sensitivity 99.8%). LLMs demonstrate near-human accuracy in extracting structured information and computing sMARIA scores from free-text MRE reports. This enables automated assessment of CD activity without altering current reporting workflows, supporting longitudinal monitoring and large-scale research. Integration into clinical decision support systems may be feasible in the future, provided appropriate human oversight and validation are ensured.

MRI LLM Radiology Report Abdominal Retrospective Clinical In Silico Academic Lab GenAI

Value of artificial intelligence in neuro-oncology.

Voigtlaender S, Nelson TA, Karschnia P, Vaios EJ, Kim MM, Lohmann P, Galldiks N, Filbin MG, Azizi S, Natarajan V, Monje M, Dietrich J, Winter SF

•papers•Aug 8 2025

CNS cancers are complex, difficult-to-treat malignancies that remain insufficiently understood and mostly incurable, despite decades of research efforts. Artificial intelligence (AI) is poised to reshape neuro-oncological practice and research, driving advances in medical image analysis, neuro-molecular-genetic characterisation, biomarker discovery, therapeutic target identification, tailored management strategies, and neurorehabilitation. This Review examines key opportunities and challenges associated with AI applications along the neuro-oncological care trajectory. We highlight emerging trends in foundation models, biophysical modelling, synthetic data, and drug development and discuss regulatory, operational, and ethical hurdles across data, translation, and implementation gaps. Near-term clinical translation depends on scaling validated AI solutions for well defined clinical tasks. In contrast, more experimental AI solutions offer broader potential but require technical refinement and resolution of data and regulatory challenges. Addressing both general and neuro-oncology-specific issues is essential to unlock the full potential of AI and ensure its responsible, effective, and needs-based integration into neuro-oncological practice.

MRI Neurological Review Policy GenAI

GPT-4 vs. Radiologists: who advances mediastinal tumor classification better across report quality levels? A cohort study.

Wen R, Li X, Chen K, Sun M, Zhu C, Xu P, Chen F, Ji C, Mi P, Li X, Deng X, Yang Q, Song W, Shang Y, Huang S, Zhou M, Wang J, Zhou C, Chen W, Liu C

•papers•Aug 8 2025

Accurate mediastinal tumor classification is crucial for treatment planning, but diagnostic performance varies with radiologists' experience and report quality. To evaluate GPT-4's diagnostic accuracy in classifying mediastinal tumors from radiological reports compared to radiologists of different experience levels using radiological reports of varying quality. We conducted a retrospective study of 1,494 patients from five tertiary hospitals with mediastinal tumors diagnosed via chest CT and pathology. Radiological reports were categorized into low-, medium-, and high-quality based on predefined criteria assessed by experienced radiologists. Six radiologists (two residents, two attending radiologists, and two associate senior radiologists) and GPT-4 evaluated the chest CT reports. Diagnostic performance was analyzed overall, by report quality, and by tumor type using Wald χ2 tests and 95% CIs calculated via the Wilson method. GPT-4 achieved an overall diagnostic accuracy of 73.3% (95% CI: 71.0-75.5), comparable to associate senior radiologists (74.3%, 95% CI: 72.0-76.5; p >0.05). For low-quality reports, GPT-4 outperformed associate senior radiologists (60.8% vs. 51.1%, p<0.001). In high-quality reports, GPT-4 was comparable to attending radiologists (80.6% vs.79.4%, p>0.05). Diagnostic performance varied by tumor type: GPT-4 was comparable to radiology residents for neurogenic tumors (44.9% vs. 50.3%, p>0.05), similar to associate senior radiologists for teratomas (68.1% vs. 65.9%, p>0.05), and superior in diagnosing lymphoma (75.4% vs. 60.4%, p<0.001). GPT-4 demonstrated interpretation accuracy comparable to Associate Senior Radiologists, excelling in low-quality reports and outperforming them in diagnosing lymphoma. These findings underscore GPT-4's potential to enhance diagnostic performance in challenging diagnostic scenarios.

CT Classification Chest Retrospective Clinical In Silico Academic Lab GenAI

Transformer-Based Explainable Deep Learning for Breast Cancer Detection in Mammography: The MammoFormer Framework

Ojonugwa Oluwafemi Ejiga Peter, Daniel Emakporuena, Bamidele Dayo Tunde, Maryam Abdulkarim, Abdullahi Bn Umar

•preprint•Aug 8 2025

Breast cancer detection through mammography interpretation remains difficult because of the minimal nature of abnormalities that experts need to identify alongside the variable interpretations between readers. The potential of CNNs for medical image analysis faces two limitations: they fail to process both local information and wide contextual data adequately, and do not provide explainable AI (XAI) operations that doctors need to accept them in clinics. The researcher developed the MammoFormer framework, which unites transformer-based architecture with multi-feature enhancement components and XAI functionalities within one framework. Seven different architectures consisting of CNNs, Vision Transformer, Swin Transformer, and ConvNext were tested alongside four enhancement techniques, including original images, negative transformation, adaptive histogram equalization, and histogram of oriented gradients. The MammoFormer framework addresses critical clinical adoption barriers of AI mammography systems through: (1) systematic optimization of transformer architectures via architecture-specific feature enhancement, achieving up to 13% performance improvement, (2) comprehensive explainable AI integration providing multi-perspective diagnostic interpretability, and (3) a clinically deployable ensemble system combining CNN reliability with transformer global context modeling. The combination of transformer models with suitable feature enhancements enables them to achieve equal or better results than CNN approaches. ViT achieves 98.3% accuracy alongside AHE while Swin Transformer gains a 13.0% advantage through HOG enhancements

Mammography Classification Breast Methodology In Silico GenAI Ethics

Structured Report Generation for Breast Cancer Imaging Based on Large Language Modeling: A Comparative Analysis of GPT-4 and DeepSeek.

Chen K, Hou X, Li X, Xu W, Yi H

•papers•Aug 7 2025

The purpose of this study is to compare the performance of GPT-4 and DeepSeek large language models in generating structured breast cancer multimodality imaging integrated reports from free-text radiology reports including mammography, ultrasound, MRI, and PET/CT. A retrospective analysis was conducted on 1358 free-text reports from 501 breast cancer patients across two institutions. The study design involved synthesizing multimodal imaging data into structured reports with three components: primary lesion characteristics, metastatic lesions, and TNM staging. Input prompts were standardized for both models, with GPT-4 using predesigned instructions and DeepSeek requiring manual input. Reports were evaluated based on physician satisfaction using a Likert scale, descriptive accuracy including lesion localization, size, SUV, and metastasis assessment, and TNM staging correctness according to NCCN guidelines. Statistical analysis included McNemar tests for binary outcomes and correlation analysis for multiclass comparisons with a significance threshold of P < .05. Physician satisfaction scores showed strong correlation between models with r-values of 0.665 and 0.558 and P-values below .001. Both models demonstrated high accuracy in data extraction and integration. The mean accuracy for primary lesion features was 91.7% for GPT-4% and 92.1% for DeepSeek, while feature synthesis accuracy was 93.4% for GPT4 and 93.9% for DeepSeek. Metastatic lesion identification showed comparable overall accuracy at 93.5% for GPT4 and 94.4% for DeepSeek. GPT-4 performed better in pleural lesion detection with 94.9% accuracy compared to 79.5% for DeepSeek, whereas DeepSeek achieved higher accuracy in mesenteric metastasis identification at 87.5% vs 43.8% for GPT4. TNM staging accuracy exceeded 92% for T-stage and 94% for M-stage, with N-stage accuracy improving beyond 90% when supplemented with physical exam data. Both GPT-4 and DeepSeek effectively generate structured breast cancer imaging reports with high accuracy in data mining, integration, and TNM staging. Integrating these models into clinical practice is expected to enhance report standardization and physician productivity.

Mixed Modality LLM Radiology Report Breast Retrospective Clinical In Silico Academic Lab GenAI

Generative Artificial Intelligence in Medical Imaging: Foundations, Progress, and Clinical Translation

Xuanru Zhou, Cheng Li, Shuqiang Wang, Ye Li, Tao Tan, Hairong Zheng, Shanshan Wang

•preprint•Aug 7 2025

Generative artificial intelligence (AI) is rapidly transforming medical imaging by enabling capabilities such as data synthesis, image enhancement, modality translation, and spatiotemporal modeling. This review presents a comprehensive and forward-looking synthesis of recent advances in generative modeling including generative adversarial networks (GANs), variational autoencoders (VAEs), diffusion models, and emerging multimodal foundation architectures and evaluates their expanding roles across the clinical imaging continuum. We systematically examine how generative AI contributes to key stages of the imaging workflow, from acquisition and reconstruction to cross-modality synthesis, diagnostic support, and treatment planning. Emphasis is placed on both retrospective and prospective clinical scenarios, where generative models help address longstanding challenges such as data scarcity, standardization, and integration across modalities. To promote rigorous benchmarking and translational readiness, we propose a three-tiered evaluation framework encompassing pixel-level fidelity, feature-level realism, and task-level clinical relevance. We also identify critical obstacles to real-world deployment, including generalization under domain shift, hallucination risk, data privacy concerns, and regulatory hurdles. Finally, we explore the convergence of generative AI with large-scale foundation models, highlighting how this synergy may enable the next generation of scalable, reliable, and clinically integrated imaging systems. By charting technical progress and translational pathways, this review aims to guide future research and foster interdisciplinary collaboration at the intersection of AI, medicine, and biomedical engineering.

Mixed Modality Image Synthesis Whole Body Review Concept Academic Lab GenAI Benchmark SOTA Policy

Improving Radiology Report Generation with Semantic Understanding.

Ahn S, Park H, Yoo J, Choi J

•papers•Aug 7 2025

This study proposes RRG-LLM, a model designed to enhance RRG by effectively learning medical domain with minimal computational resources. Initially, LLM is finetuned by LoRA, enabling efficient adaptation to the medical domain. Subsequently, only the linear projection layer that project the image into text is finetuned to extract important information from the radiology image and project it onto the text dimension. Proposed model demonstrated notable improvements in report generation. The performance of ROUGE-L was improved by 0.096 (51.7%) and METEOR by 0.046 (42.85%) compared to the baseline model.

Mixed Modality Report Generation Methodology In Silico GenAI

Coarse-to-Fine Joint Registration of MR and Ultrasound Images via Imaging Style Transfer

Junyi Wang, Xi Zhu, Yikun Guo, Zixi Wang, Haichuan Gao, Le Zhang, Fan Zhang

•preprint•Aug 7 2025

We developed a pipeline for registering pre-surgery Magnetic Resonance (MR) images and post-resection Ultrasound (US) images. Our approach leverages unpaired style transfer using 3D CycleGAN to generate synthetic T1 images, thereby enhancing registration performance. Additionally, our registration process employs both affine and local deformable transformations for a coarse-to-fine registration. The results demonstrate that our approach improves the consistency between MR and US image pairs in most cases.

Mixed Modality Registration Neurological Methodology In Silico GenAI

MoMA: A Mixture-of-Multimodal-Agents Architecture for Enhancing Clinical Prediction Modelling

Jifan Gao, Mahmudur Rahman, John Caskey, Madeline Oguss, Ann O'Rourke, Randy Brown, Anne Stey, Anoop Mayampurath, Matthew M. Churpek, Guanhua Chen, Majid Afshar

•preprint•Aug 7 2025

Multimodal electronic health record (EHR) data provide richer, complementary insights into patient health compared to single-modality data. However, effectively integrating diverse data modalities for clinical prediction modeling remains challenging due to the substantial data requirements. We introduce a novel architecture, Mixture-of-Multimodal-Agents (MoMA), designed to leverage multiple large language model (LLM) agents for clinical prediction tasks using multimodal EHR data. MoMA employs specialized LLM agents ("specialist agents") to convert non-textual modalities, such as medical images and laboratory results, into structured textual summaries. These summaries, together with clinical notes, are combined by another LLM ("aggregator agent") to generate a unified multimodal summary, which is then used by a third LLM ("predictor agent") to produce clinical predictions. Evaluating MoMA on three prediction tasks using real-world datasets with different modality combinations and prediction settings, MoMA outperforms current state-of-the-art methods, highlighting its enhanced accuracy and flexibility across various tasks.

Mixed Modality Classification Methodology In Silico Academic Lab GenAI

Response Assessment in Hepatocellular Carcinoma: A Primer for Radiologists.

Mroueh N, Cao J, Srinivas Rao S, Ghosh S, Song OK, Kongboonvijit S, Shenoy-Bhangle A, Kambadakone A

•papers•Aug 7 2025

Hepatocellular carcinoma (HCC) is the third leading cause of cancer-related deaths worldwide, necessitating accurate and early diagnosis to guide therapy, along with assessment of treatment response. Response assessment criteria have evolved from traditional morphologic approaches, such as WHO criteria and Response Evaluation Criteria in Solid Tumors (RECIST), to more recent methods focused on evaluating viable tumor burden, including European Association for Study of Liver (EASL) criteria, modified RECIST (mRECIST) and Liver Imaging Reporting and Data System (LI-RADS) Treatment Response (LI-TR) algorithm. This shift reflects the complex and evolving landscape of HCC treatment in the context of emerging systemic and locoregional therapies. Each of these criteria have their own nuanced strengths and limitations in capturing the detailed characteristics of HCC treatment and response assessment. The emergence of functional imaging techniques, including dual-energy CT, perfusion imaging, and rising use of radiomics, are enhancing the capabilities of response assessment. Growth in the realm of artificial intelligence and machine learning models provides an opportunity to refine the precision of response assessment by facilitating analysis of complex imaging data patterns. This review article provides a comprehensive overview of existing criteria, discusses functional and emerging imaging techniques, and outlines future directions for advancing HCC tumor response assessment.

CT Classification Abdominal Review Concept GenAI

Filter Papers

Tags

LLM-Based Extraction of Imaging Features from Radiology Reports: Automating Disease Activity Scoring in Crohn's Disease.

Value of artificial intelligence in neuro-oncology.

GPT-4 vs. Radiologists: who advances mediastinal tumor classification better across report quality levels? A cohort study.

Transformer-Based Explainable Deep Learning for Breast Cancer Detection in Mammography: The MammoFormer Framework

Structured Report Generation for Breast Cancer Imaging Based on Large Language Modeling: A Comparative Analysis of GPT-4 and DeepSeek.

Generative Artificial Intelligence in Medical Imaging: Foundations, Progress, and Clinical Translation

Improving Radiology Report Generation with Semantic Understanding.

Coarse-to-Fine Joint Registration of MR and Ultrasound Images via Imaging Style Transfer

MoMA: A Mixture-of-Multimodal-Agents Architecture for Enhancing Clinical Prediction Modelling

Response Assessment in Hepatocellular Carcinoma: A Primer for Radiologists.

Ready to Sharpen Your Edge?