Sort by:
Page 14 of 99982 results

Enhancing 3D Medical Image Understanding with Pretraining Aided by 2D Multimodal Large Language Models.

Chen Q, Yao X, Ye H, Hong Y

pubmed logopapersSep 15 2025
Understanding 3D medical image volumes is critical in the medical field, yet existing 3D medical convolution and transformer-based self-supervised learning (SSL) methods often lack deep semantic comprehension. Recent advancements in multimodal large language models (MLLMs) provide a promising approach to enhance image understanding through text descriptions. To leverage these 2D MLLMs for improved 3D medical image understanding, we propose Med3DInsight, a novel pretraining framework that integrates 3D image encoders with 2D MLLMs via a specially designed plane-slice-aware transformer module. Additionally, our model employs a partial optimal transport based alignment, demonstrating greater tolerance to noise introduced by potential noises in LLM-generated content. Med3DInsight introduces a new paradigm for scalable multimodal 3D medical representation learning without requiring human annotations. Extensive experiments demonstrate our state-of-the-art performance on two downstream tasks, i.e., segmentation and classification, across various public datasets with CT and MRI modalities, outperforming current SSL methods. Med3DInsight can be seamlessly integrated into existing 3D medical image understanding networks, potentially enhancing their performance. Our source code, generated datasets, and pre-trained models will be available upon acceptance.

Evaluating the role of LLMs in supporting patient education during the informed consent process for routine radiology procedures.

Einspänner E, Schwab R, Hupfeld S, Thormann M, Fuchs E, Gawlitza M, Borggrefe J, Behme D

pubmed logopapersSep 15 2025
This study evaluated three LLM chatbots (GPT-3.5-turbo, GPT-4-turbo, and GPT-4o) on their effectiveness in supporting patient education by answering common patient questions for CT, MRI, and DSA informed consent, assessing their accuracy and clarity. Two radiologists formulated 90 questions categorized as general, clinical, or technical. Each LLM answered every question five times. Radiologists then rated the responses for medical accuracy and clarity, while medical physicists assessed technical accuracy using a Likert scale. semantic similarity was analyzed with SBERT and cosine similarity. Ratings improved with newer model versions. Linear mixed-effects models revealed that GPT-4 models were rated significantly higher than GPT-3.5 (p < 0.001) by both physicians and physicists. However, physicians' ratings for GPT-4 models showed a significant performance decrease for complex modalities like DSA and MRI (p < 0.01), a pattern not observed in physicists' ratings. SBERT analysis revealed high internal consistency across all models. SBERT analysis revealed high internal consistency across all models. Variability in ratings revealed that while models effectively handled general and technical questions, they struggled with contextually complex medical inquiries requiring personalized responses and nuanced understanding. Statistical analysis confirms that while newer models are superior, their performance is modality-dependent and perceived differently by clinical and technical experts. This study evaluates the potential of LLMs to enhance informed consent in radiology, highlighting strengths in general and technical questions while noting limitations with complex clinical inquiries, with performance varying significantly by model type and imaging modality.

Toward Next-generation Medical Vision Backbones: Modeling Finer-grained Long-range Visual Dependency

Mingyuan Meng

arxiv logopreprintSep 14 2025
Medical Image Computing (MIC) is a broad research topic covering both pixel-wise (e.g., segmentation, registration) and image-wise (e.g., classification, regression) vision tasks. Effective analysis demands models that capture both global long-range context and local subtle visual characteristics, necessitating fine-grained long-range visual dependency modeling. Compared to Convolutional Neural Networks (CNNs) that are limited by intrinsic locality, transformers excel at long-range modeling; however, due to the high computational loads of self-attention, transformers typically cannot process high-resolution features (e.g., full-scale image features before downsampling or patch embedding) and thus face difficulties in modeling fine-grained dependency among subtle medical image details. Concurrently, Multi-layer Perceptron (MLP)-based visual models are recognized as computation/memory-efficient alternatives in modeling long-range visual dependency but have yet to be widely investigated in the MIC community. This doctoral research advances deep learning-based MIC by investigating effective long-range visual dependency modeling. It first presents innovative use of transformers for both pixel- and image-wise medical vision tasks. The focus then shifts to MLPs, pioneeringly developing MLP-based visual models to capture fine-grained long-range visual dependency in medical images. Extensive experiments confirm the critical role of long-range dependency modeling in MIC and reveal a key finding: MLPs provide feasibility in modeling finer-grained long-range dependency among higher-resolution medical features containing enriched anatomical/pathological details. This finding establishes MLPs as a superior paradigm over transformers/CNNs, consistently enhancing performance across various medical vision tasks and paving the way for next-generation medical vision backbones.

No Modality Left Behind: Dynamic Model Generation for Incomplete Medical Data

Christoph Fürböck, Paul Weiser, Branko Mitic, Philipp Seeböck, Thomas Helbich, Georg Langs

arxiv logopreprintSep 14 2025
In real world clinical environments, training and applying deep learning models on multi-modal medical imaging data often struggles with partially incomplete data. Standard approaches either discard missing samples, require imputation or repurpose dropout learning schemes, limiting robustness and generalizability. To address this, we propose a hypernetwork-based method that dynamically generates task-specific classification models conditioned on the set of available modalities. Instead of training a fixed model, a hypernetwork learns to predict the parameters of a task model adapted to available modalities, enabling training and inference on all samples, regardless of completeness. We compare this approach with (1) models trained only on complete data, (2) state of the art channel dropout methods, and (3) an imputation-based method, using artificially incomplete datasets to systematically analyze robustness to missing modalities. Results demonstrate superior adaptability of our method, outperforming state of the art approaches with an absolute increase in accuracy of up to 8% when trained on a dataset with 25% completeness (75% of training data with missing modalities). By enabling a single model to generalize across all modality configurations, our approach provides an efficient solution for real-world multi-modal medical data analysis.

Adapting Medical Vision Foundation Models for Volumetric Medical Image Segmentation via Active Learning and Selective Semi-supervised Fine-tuning

Jin Yang, Daniel S. Marcus, Aristeidis Sotiras

arxiv logopreprintSep 13 2025
Medical Vision Foundation Models (Med-VFMs) have superior capabilities of interpreting medical images due to the knowledge learned from self-supervised pre-training with extensive unannotated images. To improve their performance on adaptive downstream evaluations, especially segmentation, a few samples from target domains are selected randomly for fine-tuning them. However, there lacks works to explore the way of adapting Med-VFMs to achieve the optimal performance on target domains efficiently. Thus, it is highly demanded to design an efficient way of fine-tuning Med-VFMs by selecting informative samples to maximize their adaptation performance on target domains. To achieve this, we propose an Active Source-Free Domain Adaptation (ASFDA) method to efficiently adapt Med-VFMs to target domains for volumetric medical image segmentation. This ASFDA employs a novel Active Learning (AL) method to select the most informative samples from target domains for fine-tuning Med-VFMs without the access to source pre-training samples, thus maximizing their performance with the minimal selection budget. In this AL method, we design an Active Test Time Sample Query strategy to select samples from the target domains via two query metrics, including Diversified Knowledge Divergence (DKD) and Anatomical Segmentation Difficulty (ASD). DKD is designed to measure the source-target knowledge gap and intra-domain diversity. It utilizes the knowledge of pre-training to guide the querying of source-dissimilar and semantic-diverse samples from the target domains. ASD is designed to evaluate the difficulty in segmentation of anatomical structures by measuring predictive entropy from foreground regions adaptively. Additionally, our ASFDA method employs a Selective Semi-supervised Fine-tuning to improve the performance and efficiency of fine-tuning by identifying samples with high reliability from unqueried ones.

Epicardial and Pericardial Adipose Tissue: Anatomy, physiology, Imaging, Segmentation, and Treatment Effects.

Demmert TT, Klambauer K, Moser LJ, Mergen V, Eberhard M, Alkadhi H

pubmed logopapersSep 13 2025
Epicardial (EAT) and pericardial adipose tissue (PAT) are increasingly recognized as distinct fat depots with implications for cardiovascular disease. This review discusses their anatomical and physiological characteristics, as well as their pathophysiological roles. EAT, in direct contact with the myocardium, exerts local inflammatory and metabolic effects on the heart, while PAT influences cardiovascular health rather systemically. We sought to discuss the currently used imaging modalities to assess these fat compartments-CT, MRI, and echocardiography-emphasizing their advantages, limitations, and the urgent need for standardization for both scanning and image reconstruction. Advances in image segmentation, particularly deep learning-based approaches, have improved the accuracy and reproducibility of EAT and PAT quantification. This review also explores the role of EAT and PAT as risk factors for cardiovascular outcomes, summarizing conflicting evidence across studies. Finally, we summarize the effects of medical therapy and lifestyle interventions on reducing EAT volume. Understanding and accurately quantifying EAT and PAT is essential for cardiovascular risk stratification and may open new pathways for therapeutic interventions.

PET-Computed Tomography in the Management of Sarcoma by Interventional Oncology.

Yazdanpanah F, Hunt SJ

pubmed logopapersSep 13 2025
PET-computed tomography (CT) has become essential in sarcoma management, offering precise diagnosis, staging, and response assessment by combining metabolic and anatomic imaging. Its high accuracy in detecting primary, recurrent, and metastatic disease guides personalized treatment strategies and enhances interventional procedures like biopsies and ablations. Advances in novel radiotracers and hybrid imaging modalities further improve diagnostic specificity, especially in complex and pediatric cases. Integrating PET-CT with genomic data and artificial intelligence (AI)-driven tools promises to advance personalized medicine, enabling tailored therapies and better outcomes. As a cornerstone of multidisciplinary sarcoma care, PET-CT continues to transform diagnostic and therapeutic approaches in oncology.

Updates in Cerebrovascular Imaging.

Ali H, Abu Qdais A, Chatterjee A, Abdalkader M, Raz E, Nguyen TN, Al Kasab S

pubmed logopapersSep 12 2025
Cerebrovascular imaging has undergone significant advances, enhancing the diagnosis and management of cerebrovascular diseases such as stroke, aneurysms, and arteriovenous malformations. This chapter explores key imaging modalities, including non-contrast computed tomography, computed tomography angiography, magnetic resonance imaging (MRI), and digital subtraction angiography. Innovations such as high-resolution vessel wall imaging, artificial intelligence (AI)-driven stroke detection, and advanced perfusion imaging have improved diagnostic accuracy and treatment selection. Additionally, novel techniques like 7-T MRI, molecular imaging, and functional ultrasound provide deeper insights into vascular pathology. AI and machine learning applications are revolutionizing automated detection and prognostication, expediting treatment decisions. Challenges remain in standardization, radiation exposure, and accessibility. However, continued technological advances, multimodal imaging integration, and AI-driven automation promise a future of precise, non-invasive cerebrovascular diagnostics, ultimately improving patient outcomes.

Assessing accuracy and legitimacy of multimodal large language models on Japan Diagnostic Radiology Board Examination.

Hirano Y, Miki S, Yamagishi Y, Hanaoka S, Nakao T, Kikuchi T, Nakamura Y, Nomura Y, Yoshikawa T, Abe O

pubmed logopapersSep 12 2025
To assess and compare the accuracy and legitimacy of multimodal large language models (LLMs) on the Japan Diagnostic Radiology Board Examination (JDRBE). The dataset comprised questions from JDRBE 2021, 2023, and 2024, with ground-truth answers established through consensus among multiple board-certified diagnostic radiologists. Questions without associated images and those lacking unanimous agreement on answers were excluded. Eight LLMs were evaluated: GPT-4 Turbo, GPT-4o, GPT-4.5, GPT-4.1, o3, o4-mini, Claude 3.7 Sonnet, and Gemini 2.5 Pro. Each model was evaluated under two conditions: with inputting images (vision) and without (text-only). Performance differences between the conditions were assessed using McNemar's exact test. Two diagnostic radiologists (with 2 and 18 years of experience) independently rated the legitimacy of responses from four models (GPT-4 Turbo, Claude 3.7 Sonnet, o3, and Gemini 2.5 Pro) using a five-point Likert scale, blinded to model identity. Legitimacy scores were analyzed using Friedman's test, followed by pairwise Wilcoxon signed-rank tests with Holm correction. The dataset included 233 questions. Under the vision condition, o3 achieved the highest accuracy at 72%, followed by o4-mini (70%) and Gemini 2.5 Pro (70%). Under the text-only condition, o3 topped the list with an accuracy of 67%. Addition of image input significantly improved the accuracy of two models (Gemini 2.5 Pro and GPT-4.5), but not the others. Both o3 and Gemini 2.5 Pro received significantly higher legitimacy scores than GPT-4 Turbo and Claude 3.7 Sonnet from both raters. Recent multimodal LLMs, particularly o3 and Gemini 2.5 Pro, have demonstrated remarkable progress on JDRBE questions, reflecting their rapid evolution in diagnostic radiology. Eight multimodal large language models were evaluated on the Japan Diagnostic Radiology Board Examination. OpenAI's o3 and Google DeepMind's Gemini 2.5 Pro achieved high accuracy rates (72% and 70%) and received good legitimacy scores from human raters, demonstrating steady progress.

The impact of U-Net architecture choices and skip connections on the robustness of segmentation across texture variations.

Kamath A, Willmann J, Andratschke N, Reyes M

pubmed logopapersSep 12 2025
Since its introduction in 2015, the U-Net architecture has become popular for medical image segmentation. U-Net is known for its "skip connections," which transfer image details directly to its decoder branch at various levels. However, it's unclear how these skip connections affect the model's performance when the texture of input images varies. To explore this, we tested six types of U-Net-like architectures in three groups: Standard (U-Net and V-Net), No-Skip (U-Net and V-Net without skip connections), and Enhanced (AGU-Net and UNet++, which have extra skip connections). Because convolutional neural networks (CNNs) are known to be sensitive to texture, we defined a novel texture disparity (TD) metric and ran experiments with synthetic images, adjusting this measure. We then applied these findings to four real medical imaging datasets, covering different anatomies (breast, colon, heart, and spleen) and imaging types (ultrasound, histology, MRI, and CT). The goal was to understand how the choice of architecture impacts the model's ability to handle varying TD between foreground and background. For each dataset, we tested the models with five categories of TD, measuring their performance using the Dice Score Coefficient (DSC), Hausdorff distance, surface distance, and surface DSC. Our results on synthetic data with varying textures show differences between the performance of architectures with and without skip connections, especially when trained in hard textural conditions. When translated to medical data, it indicates that training data sets with a narrow texture range negatively impact the robustness of architectures that include more skip connections. The robustness gap between architectures reduces when trained on a larger TD range. In the harder TD categories, models from the No-Skip group performed the best in 5/8 cases (based on DSC) and 7/8 (based on Hausdorff distances). When measuring robustness using the coefficient of variation metric on the DSC, the No-Skip group performed the best in 7 out of 16 cases, showing superior results than the Enhanced (6/16) and Standard groups (3/16). These findings suggest that skip connections offer performance benefits, usually at the expense of robustness losses, depending on the degree of texture disparity between the foreground and background, and the range of texture variations present in the training set. This indicates careful evaluation of their use for robustness-critical tasks like medical image segmentation. Combinations of texture-aware architectures must be investigated to achieve better performance-robustness characteristics.
Page 14 of 99982 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.