Sort by:
Page 10 of 99982 results

Imaging in chronic thromboembolic pulmonary hypertension: review of the current literature.

Hekimoglu K, Gopalan D, Onur MR, Kahraman G, Akay T

pubmed logopapersSep 23 2025
Chronic thromboembolic pulmonary hypertension (CTEPH) is a severe, life-threatening complication of pulmonary embolism with pulmonary hypertension (PH). The combination of insufficient resolution of thrombi following pulmonary emboli and accompanying microvascular disease results in PH. Advances in imaging can offer better insight into CTEPH diagnosis and management, but lack of disease awareness among radiologists has been shown to be a cause of CTEPH misdiagnosis or delayed diagnosis. This review highlights features pertinent to CTEPH diagnosis. The primary focus is on different modalities with their distinctive signs and newly developed technologies employing artificial intelligence systems.

Advanced Image-Guidance and Surgical-Navigation Techniques for Real-Time Visualized Surgery.

Fan X, Liu X, Xia Q, Chen G, Cheng J, Shi Z, Fang Y, Khadaroo PA, Qian J, Lin H

pubmed logopapersSep 23 2025
Surgical navigation is a rapidly evolving multidisciplinary system that plays a crucial role in precision medicine. Surgical-navigation systems have substantially enhanced modern surgery by improving the precision of resection, reducing invasiveness, and enhancing patient outcomes. However, clinicians, engineers, and professionals in other fields often view this field from their own perspectives, which usually results in a one-sided viewpoint. This article aims to provide a thorough overview of the recent advancements in surgical-navigation systems and categorizes them on the basis of their unique characteristics and applications. Established techniques (e.g., radiography, intraoperative computed tomography [CT], magnetic resonance imaging [MRI], and ultrasound) and emerging technologies (e.g., photoacoustic imaging and near-infrared [NIR]-II imaging) are systematically analyzed, highlighting their underlying mechanisms, methods of use, and respective advantages and disadvantages. Despite substantial progress, the existing navigation systems face challenges, including limited accuracy, high costs, and extensive training requirements for surgeons. Addressing these limitations is crucial for widespread adoption of these technologies. The review emphasizes the need for developing more intelligent, minimally invasive, precise, personalized, and radiation-free navigation solutions. By integrating advanced imaging modalities, machine learning algorithms, and real-time feedback mechanisms, next-generation surgical-navigation systems can further enhance surgical precision and patient safety. By bridging the knowledge gap between clinical practice and engineering innovation, this review not only provides valuable insights for surgeons seeking optimal navigation strategies, but also offers engineers a deeper understanding of clinical application scenarios.

Citrus-V: Advancing Medical Foundation Models with Unified Medical Image Grounding for Clinical Reasoning

Guoxin Wang, Jun Zhao, Xinyi Liu, Yanbo Liu, Xuyang Cao, Chao Li, Zhuoyun Liu, Qintian Sun, Fangru Zhou, Haoqiang Xing, Zhenhong Yang

arxiv logopreprintSep 23 2025
Medical imaging provides critical evidence for clinical diagnosis, treatment planning, and surgical decisions, yet most existing imaging models are narrowly focused and require multiple specialized networks, limiting their generalization. Although large-scale language and multimodal models exhibit strong reasoning and multi-task capabilities, real-world clinical applications demand precise visual grounding, multimodal integration, and chain-of-thought reasoning. We introduce Citrus-V, a multimodal medical foundation model that combines image analysis with textual reasoning. The model integrates detection, segmentation, and multimodal chain-of-thought reasoning, enabling pixel-level lesion localization, structured report generation, and physician-like diagnostic inference in a single framework. We propose a novel multimodal training approach and release a curated open-source data suite covering reasoning, detection, segmentation, and document understanding tasks. Evaluations demonstrate that Citrus-V outperforms existing open-source medical models and expert-level imaging systems across multiple benchmarks, delivering a unified pipeline from visual grounding to clinical reasoning and supporting precise lesion quantification, automated reporting, and reliable second opinions.

Citrus-V: Advancing Medical Foundation Models with Unified Medical Image Grounding for Clinical Reasoning

Guoxin Wang, Jun Zhao, Xinyi Liu, Yanbo Liu, Xuyang Cao, Chao Li, Zhuoyun Liu, Qintian Sun, Fangru Zhou, Haoqiang Xing, Zhenhong Yang

arxiv logopreprintSep 23 2025
Medical imaging provides critical evidence for clinical diagnosis, treatment planning, and surgical decisions, yet most existing imaging models are narrowly focused and require multiple specialized networks, limiting their generalization. Although large-scale language and multimodal models exhibit strong reasoning and multi-task capabilities, real-world clinical applications demand precise visual grounding, multimodal integration, and chain-of-thought reasoning. We introduce Citrus-V, a multimodal medical foundation model that combines image analysis with textual reasoning. The model integrates detection, segmentation, and multimodal chain-of-thought reasoning, enabling pixel-level lesion localization, structured report generation, and physician-like diagnostic inference in a single framework. We propose a novel multimodal training approach and release a curated open-source data suite covering reasoning, detection, segmentation, and document understanding tasks. Evaluations demonstrate that Citrus-V outperforms existing open-source medical models and expert-level imaging systems across multiple benchmarks, delivering a unified pipeline from visual grounding to clinical reasoning and supporting precise lesion quantification, automated reporting, and reliable second opinions.

Improving the performance of medical image segmentation with instructive feature learning.

Dai D, Dong C, Huang H, Liu F, Li Z, Xu S

pubmed logopapersSep 23 2025
Although deep learning models have greatly automated medical image segmentation, they still struggle with complex samples, especially those with irregular shapes, notable scale variations, or blurred boundaries. One key reason for this is that existing methods often overlook the importance of identifying and enhancing the instructive features tailored to various targets, thereby impeding optimal feature extraction and transmission. To address these issues, we propose two innovative modules: an Instructive Feature Enhancement Module (IFEM) and an Instructive Feature Integration Module (IFIM). IFEM synergistically captures rich detailed information and local contextual cues within a unified convolutional module through flexible resolution scaling and extensive information interplay, thereby enhancing the network's feature extraction capabilities. Meanwhile, IFIM explicitly guides the fusion of encoding-decoding features to create more discriminative representations through sensitive intermediate predictions and omnipresent attention operations, thus refining contextual feature transmission. These two modules can be seamlessly integrated into existing segmentation frameworks, significantly boosting their performance. Furthermore, to achieve superior performance with substantially reduced computational demands, we develop an effective and efficient segmentation framework (EESF). Unlike traditional U-Nets, EESF adopts a shallower and wider asymmetric architecture, achieving a better balance between fine-grained information retention and high-order semantic abstraction with minimal learning parameters. Ultimately, by incorporating IFEM and IFIM into EESF, we construct EE-Net, a high-performance and low-resource segmentation network. Extensive experiments across six diverse segmentation tasks consistently demonstrate that EE-Net outperforms a wide range of competing methods in terms of segmentation performance, computational efficiency, and learning ability. The code is available at https://github.com/duweidai/EE-Net.

Diagnostic accuracy and consistency of ChatGPT-4o in radiology: influence of image, clinical data, and answer options on performance.

Atakır K, Işın K, Taş A, Önder H

pubmed logopapersSep 22 2025
This study aimed to evaluate the diagnostic accuracy of Chat Generative Pre-trained Transformer (ChatGPT) version 4 Omni (ChatGPT-4o) in radiology across seven information input combinations (image, clinical data, and multiple-choice options) to assess the consistency of its outputs across repeated trials and to compare its performance with that of human radiologists. We tested 129 distinct radiology cases under seven input conditions (varying presence of imaging, clinical context, and answer options). Each case was processed by ChatGPT-4o for seven different input combinations on three separate accounts. Diagnostic accuracy was determined by comparison with ground-truth diagnoses, and interobserver consistency was measured using Fleiss' kappa. Pairwise comparisons were performed with the Wilcoxon signed-rank test. Additionally, the same set of cases was evaluated by nine radiology residents to benchmark ChatGPT-4o's performance against human diagnostic accuracy. ChatGPT-4o's diagnostic accuracy was lowest for "image only" (19.90%) and "options only" (20.67%) conditions. The highest accuracy was observed in "image + clinical information + options" (80.88%) and "clinical information + options" (75.45%) conditions. The highest interobserver agreement was observed in the "image + clinical information + options" condition (κ = 0.733) and the lowest was in the "options only" condition (κ = 0.023), suggesting that more information improves consistency. However, there was no effective benefit of adding imaging data over already provided clinical data and options, as seen in post-hoc analysis. In human comparison, ChatGPT-4o outperformed radiology residents in text-based configurations (75.45% vs. 42.89%), whereas residents showed slightly better performance in image-based tasks (64.13% vs. 61.24%). Notably, when residents were allowed to use ChatGPT-4o as a support tool, their image-based diagnostic accuracy increased from 63.04% to 74.16%. ChatGPT-4o performs well when provided with rich textual input but remains limited in purely image- based diagnoses. Its accuracy and consistency increase with multimodal input, yet adding imaging does not significantly improve performance beyond clinical context and diagnostic options alone. The model's superior performance to residents in text-based tasks underscores its potential as a diagnostic aid in structured scenarios. Furthermore, its integration as a support tool may enhance human diagnostic accuracy, particularly in image-based interpretation. Although ChatGPT-4o is not yet capable of reliably interpreting radiologic images on its own, it demonstrates strong performance in text-based diagnostic reasoning. Its integration into clinical workflows-particularly for triage, structured decision support, or educational purposes-may augment radiologists' diagnostic capacity and consistency.

Visual Instruction Pretraining for Domain-Specific Foundation Models

Yuxuan Li, Yicheng Zhang, Wenhao Tang, Yimian Dai, Ming-Ming Cheng, Xiang Li, Jian Yang

arxiv logopreprintSep 22 2025
Modern computer vision is converging on a closed loop in which perception, reasoning and generation mutually reinforce each other. However, this loop remains incomplete: the top-down influence of high-level reasoning on the foundational learning of low-level perceptual features is not yet underexplored. This paper addresses this gap by proposing a new paradigm for pretraining foundation models in downstream domains. We introduce Visual insTruction Pretraining (ViTP), a novel approach that directly leverages reasoning to enhance perception. ViTP embeds a Vision Transformer (ViT) backbone within a Vision-Language Model and pretrains it end-to-end using a rich corpus of visual instruction data curated from target downstream domains. ViTP is powered by our proposed Visual Robustness Learning (VRL), which compels the ViT to learn robust and domain-relevant features from a sparse set of visual tokens. Extensive experiments on 16 challenging remote sensing and medical imaging benchmarks demonstrate that ViTP establishes new state-of-the-art performance across a diverse range of downstream tasks. The code is available at github.com/zcablii/ViTP.

Medical AI Consensus: A Multi-Agent Framework for Radiology Report Generation and Evaluation

Ahmed T. Elboardy, Ghada Khoriba, Essam A. Rashed

arxiv logopreprintSep 22 2025
Automating radiology report generation poses a dual challenge: building clinically reliable systems and designing rigorous evaluation protocols. We introduce a multi-agent reinforcement learning framework that serves as both a benchmark and evaluation environment for multimodal clinical reasoning in the radiology ecosystem. The proposed framework integrates large language models (LLMs) and large vision models (LVMs) within a modular architecture composed of ten specialized agents responsible for image analysis, feature extraction, report generation, review, and evaluation. This design enables fine-grained assessment at both the agent level (e.g., detection and segmentation accuracy) and the consensus level (e.g., report quality and clinical relevance). We demonstrate an implementation using chatGPT-4o on public radiology datasets, where LLMs act as evaluators alongside medical radiologist feedback. By aligning evaluation protocols with the LLM development lifecycle, including pretraining, finetuning, alignment, and deployment, the proposed benchmark establishes a path toward trustworthy deviance-based radiology report generation.

Multimodal Health Risk Prediction System for Chronic Diseases via Vision-Language Fusion and Large Language Models

Dingxin Lu, Shurui Wu, Xinyi Huang

arxiv logopreprintSep 22 2025
With the rising global burden of chronic diseases and the multimodal and heterogeneous clinical data (medical imaging, free-text recordings, wearable sensor streams, etc.), there is an urgent need for a unified multimodal AI framework that can proactively predict individual health risks. We propose VL-RiskFormer, a hierarchical stacked visual-language multimodal Transformer with a large language model (LLM) inference head embedded in its top layer. The system builds on the dual-stream architecture of existing visual-linguistic models (e.g., PaLM-E, LLaVA) with four key innovations: (i) pre-training with cross-modal comparison and fine-grained alignment of radiological images, fundus maps, and wearable device photos with corresponding clinical narratives using momentum update encoders and debiased InfoNCE losses; (ii) a time fusion block that integrates irregular visit sequences into the causal Transformer decoder through adaptive time interval position coding; (iii) a disease ontology map adapter that injects ICD-10 codes into visual and textual channels in layers and infers comorbid patterns with the help of a graph attention mechanism. On the MIMIC-IV longitudinal cohort, VL-RiskFormer achieved an average AUROC of 0.90 with an expected calibration error of 2.7 percent.

Visual Instruction Pretraining for Domain-Specific Foundation Models

Yuxuan Li, Yicheng Zhang, Wenhao Tang, Yimian Dai, Ming-Ming Cheng, Xiang Li, Jian Yang

arxiv logopreprintSep 22 2025
Modern computer vision is converging on a closed loop in which perception, reasoning and generation mutually reinforce each other. However, this loop remains incomplete: the top-down influence of high-level reasoning on the foundational learning of low-level perceptual features is not yet underexplored. This paper addresses this gap by proposing a new paradigm for pretraining foundation models in downstream domains. We introduce Visual insTruction Pretraining (ViTP), a novel approach that directly leverages reasoning to enhance perception. ViTP embeds a Vision Transformer (ViT) backbone within a Vision-Language Model and pretrains it end-to-end using a rich corpus of visual instruction data curated from target downstream domains. ViTP is powered by our proposed Visual Robustness Learning (VRL), which compels the ViT to learn robust and domain-relevant features from a sparse set of visual tokens. Extensive experiments on 16 challenging remote sensing and medical imaging benchmarks demonstrate that ViTP establishes new state-of-the-art performance across a diverse range of downstream tasks. The code is available at https://github.com/zcablii/ViTP.
Page 10 of 99982 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.