Sort by:
Page 4 of 58571 results

GPT-4 for automated sequence-level determination of MRI protocols based on radiology request forms from clinical routine.

Terzis R, Kaya K, Schömig T, Janssen JP, Iuga AI, Kottlors J, Lennartz S, Gietzen C, Gözdas C, Müller L, Hahnfeldt R, Maintz D, Dratsch T, Pennig L

pubmed logopapersAug 8 2025
This study evaluated GPT-4's accuracy in MRI sequence selection based on radiology request forms (RRFs), comparing its performance to radiology residents. This retrospective study included 100 RRFs across four subspecialties (cardiac imaging, neuroradiology, musculoskeletal, and oncology). GPT-4 and two radiology residents (R1: 2 years, R2: 5 years MRI experience) selected sequences based on each patient's medical history and clinical questions. Considering imaging society guidelines, five board-certified specialized radiologists assessed protocols based on completeness, quality, and utility in consensus, using 5-point Likert scales. Clinical applicability was rated binarily by the institution's lead radiographer. GPT-4 achieved median scores of 3 (1-5) for completeness, 4 (1-5) for quality, and 4 (1-5) for utility, comparable to R1 (3 (1-5), 4 (1-5), 4 (1-5); each p > 0.05) but inferior to R2 (4 (1-5), 5 (1-5); p < 0.01, respectively, and 5 (1-5); p < 0.001). Subspecialty protocol quality varied: GPT-4 matched R1 (4 (2-4) vs. 4 (2-5), p = 0.20) and R2 (4 (2-5); p = 0.47) in cardiac imaging; showed no differences in neuroradiology (all 5 (1-5), p > 0.05); scored lower than R1 and R2 in musculoskeletal imaging (3 (2-5) vs. 4 (3-5); p < 0.01, and 5 (3-5); p < 0.001); and matched R1 (4 (1-5) vs. 2 (1-4), p = 0.12) as well as R2 (5 (2-5); p = 0.20) in oncology. GPT-4-based protocols were clinically applicable in 95% of cases, comparable to R1 (95%) and R2 (96%). GPT-4 generated MRI protocols with notable completeness, quality, utility, and clinical applicability, excelling in standardized subspecialties like cardiac and neuroradiology imaging while yielding lower accuracy in musculoskeletal examinations. Question Long MRI acquisition times limit patient access, making accurate protocol selection crucial for efficient diagnostics, though it's time-consuming and error-prone, especially for inexperienced residents. Findings GPT-4 generated MRI protocols of remarkable yet inconsistent quality, performing on par with an experienced resident in standardized fields, but moderately in musculoskeletal examinations. Clinical relevance The large language model can assist less experienced radiologists in determining detailed MRI protocols and counteract increasing workloads. The model could function as a semi-automatic tool, generating MRI protocols for radiologists' confirmation, optimizing resource allocation, and improving diagnostics and cost-effectiveness.

LLM-Based Extraction of Imaging Features from Radiology Reports: Automating Disease Activity Scoring in Crohn's Disease.

Dehdab R, Mankertz F, Brendel JM, Maalouf N, Kaya K, Afat S, Kolahdoozan S, Radmard AR

pubmed logopapersAug 8 2025
Large Language Models (LLMs) offer a promising solution for extracting structured clinical information from free-text radiology reports. The Simplified Magnetic Resonance Index of Activity (sMARIA) is a validated scoring system used to quantify Crohn's disease (CD) activity based on Magnetic Resonance Enterography (MRE) findings. This study aims to evaluate the performance of two advanced LLMs in extracting key imaging features and computing sMARIA scores from free-text MRE reports. This retrospective study included 117 anonymized free-text MRE reports from patients with confirmed CD. ChatGPT (GPT-4o) and DeepSeek (DeepSeek-R1) were prompted using a structured input designed to extract four key radiologic features relevant to sMARIA: bowel wall thickness, mural edema, perienteric fat stranding, and ulceration. LLM outputs were evaluated against radiologist annotations at both the segment and feature levels. Segment-level agreement was assessed using accuracy, mean absolute error (MAE) and Pearson correlation. Feature-level performance was evaluated using sensitivity, specificity, precision, and F1-score. Errors including confabulations were recorded descriptively. ChatGPT achieved a segment-level accuracy of 98.6%, MAE of 0.17, and Pearson correlation of 0.99. DeepSeek achieved 97.3% accuracy, MAE of 0.51, and correlation of 0.96. At the feature level, ChatGPT yielded an F1-score of 98.8% (precision 97.8%, sensitivity 99.9%), while DeepSeek achieved 97.9% (precision 96.0%, sensitivity 99.8%). LLMs demonstrate near-human accuracy in extracting structured information and computing sMARIA scores from free-text MRE reports. This enables automated assessment of CD activity without altering current reporting workflows, supporting longitudinal monitoring and large-scale research. Integration into clinical decision support systems may be feasible in the future, provided appropriate human oversight and validation are ensured.

Transformer-Based Explainable Deep Learning for Breast Cancer Detection in Mammography: The MammoFormer Framework

Ojonugwa Oluwafemi Ejiga Peter, Daniel Emakporuena, Bamidele Dayo Tunde, Maryam Abdulkarim, Abdullahi Bn Umar

arxiv logopreprintAug 8 2025
Breast cancer detection through mammography interpretation remains difficult because of the minimal nature of abnormalities that experts need to identify alongside the variable interpretations between readers. The potential of CNNs for medical image analysis faces two limitations: they fail to process both local information and wide contextual data adequately, and do not provide explainable AI (XAI) operations that doctors need to accept them in clinics. The researcher developed the MammoFormer framework, which unites transformer-based architecture with multi-feature enhancement components and XAI functionalities within one framework. Seven different architectures consisting of CNNs, Vision Transformer, Swin Transformer, and ConvNext were tested alongside four enhancement techniques, including original images, negative transformation, adaptive histogram equalization, and histogram of oriented gradients. The MammoFormer framework addresses critical clinical adoption barriers of AI mammography systems through: (1) systematic optimization of transformer architectures via architecture-specific feature enhancement, achieving up to 13% performance improvement, (2) comprehensive explainable AI integration providing multi-perspective diagnostic interpretability, and (3) a clinically deployable ensemble system combining CNN reliability with transformer global context modeling. The combination of transformer models with suitable feature enhancements enables them to achieve equal or better results than CNN approaches. ViT achieves 98.3% accuracy alongside AHE while Swin Transformer gains a 13.0% advantage through HOG enhancements

GPT-4 vs. Radiologists: who advances mediastinal tumor classification better across report quality levels? A cohort study.

Wen R, Li X, Chen K, Sun M, Zhu C, Xu P, Chen F, Ji C, Mi P, Li X, Deng X, Yang Q, Song W, Shang Y, Huang S, Zhou M, Wang J, Zhou C, Chen W, Liu C

pubmed logopapersAug 8 2025
Accurate mediastinal tumor classification is crucial for treatment planning, but diagnostic performance varies with radiologists' experience and report quality. To evaluate GPT-4's diagnostic accuracy in classifying mediastinal tumors from radiological reports compared to radiologists of different experience levels using radiological reports of varying quality. We conducted a retrospective study of 1,494 patients from five tertiary hospitals with mediastinal tumors diagnosed via chest CT and pathology. Radiological reports were categorized into low-, medium-, and high-quality based on predefined criteria assessed by experienced radiologists. Six radiologists (two residents, two attending radiologists, and two associate senior radiologists) and GPT-4 evaluated the chest CT reports. Diagnostic performance was analyzed overall, by report quality, and by tumor type using Wald χ2 tests and 95% CIs calculated via the Wilson method. GPT-4 achieved an overall diagnostic accuracy of 73.3% (95% CI: 71.0-75.5), comparable to associate senior radiologists (74.3%, 95% CI: 72.0-76.5; p >0.05). For low-quality reports, GPT-4 outperformed associate senior radiologists (60.8% vs. 51.1%, p<0.001). In high-quality reports, GPT-4 was comparable to attending radiologists (80.6% vs.79.4%, p>0.05). Diagnostic performance varied by tumor type: GPT-4 was comparable to radiology residents for neurogenic tumors (44.9% vs. 50.3%, p>0.05), similar to associate senior radiologists for teratomas (68.1% vs. 65.9%, p>0.05), and superior in diagnosing lymphoma (75.4% vs. 60.4%, p<0.001). GPT-4 demonstrated interpretation accuracy comparable to Associate Senior Radiologists, excelling in low-quality reports and outperforming them in diagnosing lymphoma. These findings underscore GPT-4's potential to enhance diagnostic performance in challenging diagnostic scenarios.

Synthesized myelin and iron stainings from 7T multi-contrast MRI via deep learning.

Pittayapong S, Hametner S, Bachrata B, Endmayr V, Bogner W, Höftberger R, Grabner G

pubmed logopapersAug 8 2025
Iron and myelin are key biomarkers for studying neurodegenerative and demyelinating brain diseases. Multi-contrast MRI techniques, such as R2* and QSM, are commonly used for iron assessment, with histology as the reference standard, but non-invasive myelin assessment remains challenging. To address this, we developed a deep learning model to generate iron and myelin staining images from in vivo multi-contrast MRI data, with a resolution comparable to ex vivo histology macro-scans. A cadaver head was scanned using a 7T MR scanner to acquire T1-weighted and multi-echo GRE data for R2*, and QSM processing, followed by histological staining for myelin and iron. To evaluate the generalizability of the model, a second cadaver head and two in vivo MRI datasets were included. After MRI-to-histology registration in the training subject, a self-attention generative adversarial network (GAN) was trained to synthesize myelin and iron staining images using various combinations of MRI contrast. The model achieved optimal myelin prediction when combining T1w, R2*, and QSM images. Incorporating the synthesized myelin images improved the subsequent prediction of iron staining. The generated images displayed fine details similar to those in histology data and demonstrated generalizability across healthy control subjects. Synthesized myelin images clearly differentiated myelin concentration between white and gray matter, while synthesized iron staining presented distinct patterns such as particularly high deposition in deep gray matter. This study shows that deep learning can transform MRI data into histological feature images, offering ex vivo insights from in vivo data and contributing to advancements in brain histology research.

Text Embedded Swin-UMamba for DeepLesion Segmentation

Ruida Cheng, Tejas Sudharshan Mathai, Pritam Mukherjee, Benjamin Hou, Qingqing Zhu, Zhiyong Lu, Matthew McAuliffe, Ronald M. Summers

arxiv logopreprintAug 8 2025
Segmentation of lesions on CT enables automatic measurement for clinical assessment of chronic diseases (e.g., lymphoma). Integrating large language models (LLMs) into the lesion segmentation workflow offers the potential to combine imaging features with descriptions of lesion characteristics from the radiology reports. In this study, we investigate the feasibility of integrating text into the Swin-UMamba architecture for the task of lesion segmentation. The publicly available ULS23 DeepLesion dataset was used along with short-form descriptions of the findings from the reports. On the test dataset, a high Dice Score of 82% and low Hausdorff distance of 6.58 (pixels) was obtained for lesion segmentation. The proposed Text-Swin-UMamba model outperformed prior approaches: 37% improvement over the LLM-driven LanGuideMedSeg model (p < 0.001),and surpassed the purely image-based xLSTM-UNet and nnUNet models by 1.74% and 0.22%, respectively. The dataset and code can be accessed at https://github.com/ruida/LLM-Swin-UMamba

Response Assessment in Hepatocellular Carcinoma: A Primer for Radiologists.

Mroueh N, Cao J, Srinivas Rao S, Ghosh S, Song OK, Kongboonvijit S, Shenoy-Bhangle A, Kambadakone A

pubmed logopapersAug 7 2025
Hepatocellular carcinoma (HCC) is the third leading cause of cancer-related deaths worldwide, necessitating accurate and early diagnosis to guide therapy, along with assessment of treatment response. Response assessment criteria have evolved from traditional morphologic approaches, such as WHO criteria and Response Evaluation Criteria in Solid Tumors (RECIST), to more recent methods focused on evaluating viable tumor burden, including European Association for Study of Liver (EASL) criteria, modified RECIST (mRECIST) and Liver Imaging Reporting and Data System (LI-RADS) Treatment Response (LI-TR) algorithm. This shift reflects the complex and evolving landscape of HCC treatment in the context of emerging systemic and locoregional therapies. Each of these criteria have their own nuanced strengths and limitations in capturing the detailed characteristics of HCC treatment and response assessment. The emergence of functional imaging techniques, including dual-energy CT, perfusion imaging, and rising use of radiomics, are enhancing the capabilities of response assessment. Growth in the realm of artificial intelligence and machine learning models provides an opportunity to refine the precision of response assessment by facilitating analysis of complex imaging data patterns. This review article provides a comprehensive overview of existing criteria, discusses functional and emerging imaging techniques, and outlines future directions for advancing HCC tumor response assessment.

Generative Artificial Intelligence in Medical Imaging: Foundations, Progress, and Clinical Translation

Xuanru Zhou, Cheng Li, Shuqiang Wang, Ye Li, Tao Tan, Hairong Zheng, Shanshan Wang

arxiv logopreprintAug 7 2025
Generative artificial intelligence (AI) is rapidly transforming medical imaging by enabling capabilities such as data synthesis, image enhancement, modality translation, and spatiotemporal modeling. This review presents a comprehensive and forward-looking synthesis of recent advances in generative modeling including generative adversarial networks (GANs), variational autoencoders (VAEs), diffusion models, and emerging multimodal foundation architectures and evaluates their expanding roles across the clinical imaging continuum. We systematically examine how generative AI contributes to key stages of the imaging workflow, from acquisition and reconstruction to cross-modality synthesis, diagnostic support, and treatment planning. Emphasis is placed on both retrospective and prospective clinical scenarios, where generative models help address longstanding challenges such as data scarcity, standardization, and integration across modalities. To promote rigorous benchmarking and translational readiness, we propose a three-tiered evaluation framework encompassing pixel-level fidelity, feature-level realism, and task-level clinical relevance. We also identify critical obstacles to real-world deployment, including generalization under domain shift, hallucination risk, data privacy concerns, and regulatory hurdles. Finally, we explore the convergence of generative AI with large-scale foundation models, highlighting how this synergy may enable the next generation of scalable, reliable, and clinically integrated imaging systems. By charting technical progress and translational pathways, this review aims to guide future research and foster interdisciplinary collaboration at the intersection of AI, medicine, and biomedical engineering.

Improving Radiology Report Generation with Semantic Understanding.

Ahn S, Park H, Yoo J, Choi J

pubmed logopapersAug 7 2025
This study proposes RRG-LLM, a model designed to enhance RRG by effectively learning medical domain with minimal computational resources. Initially, LLM is finetuned by LoRA, enabling efficient adaptation to the medical domain. Subsequently, only the linear projection layer that project the image into text is finetuned to extract important information from the radiology image and project it onto the text dimension. Proposed model demonstrated notable improvements in report generation. The performance of ROUGE-L was improved by 0.096 (51.7%) and METEOR by 0.046 (42.85%) compared to the baseline model.

MedCLIP-SAMv2: Towards universal text-driven medical image segmentation.

Koleilat T, Asgariandehkordi H, Rivaz H, Xiao Y

pubmed logopapersAug 7 2025
Segmentation of anatomical structures and pathologies in medical images is essential for modern disease diagnosis, clinical research, and treatment planning. While significant advancements have been made in deep learning-based segmentation techniques, many of these methods still suffer from limitations in data efficiency, generalizability, and interactivity. As a result, developing robust segmentation methods that require fewer labeled datasets remains a critical challenge in medical image analysis. Recently, the introduction of foundation models like CLIP and Segment-Anything-Model (SAM), with robust cross-domain representations, has paved the way for interactive and universal image segmentation. However, further exploration of these models for data-efficient segmentation in medical imaging is an active field of research. In this paper, we introduce MedCLIP-SAMv2, a novel framework that integrates the CLIP and SAM models to perform segmentation on clinical scans using text prompts, in both zero-shot and weakly supervised settings. Our approach includes fine-tuning the BiomedCLIP model with a new Decoupled Hard Negative Noise Contrastive Estimation (DHN-NCE) loss, and leveraging the Multi-modal Information Bottleneck (M2IB) to create visual prompts for generating segmentation masks with SAM in the zero-shot setting. We also investigate using zero-shot segmentation labels in a weakly supervised paradigm to enhance segmentation quality further. Extensive validation across four diverse segmentation tasks and medical imaging modalities (breast tumor ultrasound, brain tumor MRI, lung X-ray, and lung CT) demonstrates the high accuracy of our proposed framework. Our code is available at https://github.com/HealthX-Lab/MedCLIP-SAMv2.
Page 4 of 58571 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.