Sort by:
Page 26 of 78779 results

Response Assessment in Hepatocellular Carcinoma: A Primer for Radiologists.

Mroueh N, Cao J, Srinivas Rao S, Ghosh S, Song OK, Kongboonvijit S, Shenoy-Bhangle A, Kambadakone A

pubmed logopapersAug 7 2025
Hepatocellular carcinoma (HCC) is the third leading cause of cancer-related deaths worldwide, necessitating accurate and early diagnosis to guide therapy, along with assessment of treatment response. Response assessment criteria have evolved from traditional morphologic approaches, such as WHO criteria and Response Evaluation Criteria in Solid Tumors (RECIST), to more recent methods focused on evaluating viable tumor burden, including European Association for Study of Liver (EASL) criteria, modified RECIST (mRECIST) and Liver Imaging Reporting and Data System (LI-RADS) Treatment Response (LI-TR) algorithm. This shift reflects the complex and evolving landscape of HCC treatment in the context of emerging systemic and locoregional therapies. Each of these criteria have their own nuanced strengths and limitations in capturing the detailed characteristics of HCC treatment and response assessment. The emergence of functional imaging techniques, including dual-energy CT, perfusion imaging, and rising use of radiomics, are enhancing the capabilities of response assessment. Growth in the realm of artificial intelligence and machine learning models provides an opportunity to refine the precision of response assessment by facilitating analysis of complex imaging data patterns. This review article provides a comprehensive overview of existing criteria, discusses functional and emerging imaging techniques, and outlines future directions for advancing HCC tumor response assessment.

MedCLIP-SAMv2: Towards universal text-driven medical image segmentation.

Koleilat T, Asgariandehkordi H, Rivaz H, Xiao Y

pubmed logopapersAug 7 2025
Segmentation of anatomical structures and pathologies in medical images is essential for modern disease diagnosis, clinical research, and treatment planning. While significant advancements have been made in deep learning-based segmentation techniques, many of these methods still suffer from limitations in data efficiency, generalizability, and interactivity. As a result, developing robust segmentation methods that require fewer labeled datasets remains a critical challenge in medical image analysis. Recently, the introduction of foundation models like CLIP and Segment-Anything-Model (SAM), with robust cross-domain representations, has paved the way for interactive and universal image segmentation. However, further exploration of these models for data-efficient segmentation in medical imaging is an active field of research. In this paper, we introduce MedCLIP-SAMv2, a novel framework that integrates the CLIP and SAM models to perform segmentation on clinical scans using text prompts, in both zero-shot and weakly supervised settings. Our approach includes fine-tuning the BiomedCLIP model with a new Decoupled Hard Negative Noise Contrastive Estimation (DHN-NCE) loss, and leveraging the Multi-modal Information Bottleneck (M2IB) to create visual prompts for generating segmentation masks with SAM in the zero-shot setting. We also investigate using zero-shot segmentation labels in a weakly supervised paradigm to enhance segmentation quality further. Extensive validation across four diverse segmentation tasks and medical imaging modalities (breast tumor ultrasound, brain tumor MRI, lung X-ray, and lung CT) demonstrates the high accuracy of our proposed framework. Our code is available at https://github.com/HealthX-Lab/MedCLIP-SAMv2.

On the effectiveness of multimodal privileged knowledge distillation in two vision transformer based diagnostic applications

Simon Baur, Alexandra Benova, Emilio Dolgener Cantú, Jackie Ma

arxiv logopreprintAug 6 2025
Deploying deep learning models in clinical practice often requires leveraging multiple data modalities, such as images, text, and structured data, to achieve robust and trustworthy decisions. However, not all modalities are always available at inference time. In this work, we propose multimodal privileged knowledge distillation (MMPKD), a training strategy that utilizes additional modalities available solely during training to guide a unimodal vision model. Specifically, we used a text-based teacher model for chest radiographs (MIMIC-CXR) and a tabular metadata-based teacher model for mammography (CBIS-DDSM) to distill knowledge into a vision transformer student model. We show that MMPKD can improve the resulting attention maps' zero-shot capabilities of localizing ROI in input images, while this effect does not generalize across domains, as contrarily suggested by prior research.

A Comprehensive Framework for Uncertainty Quantification of Voxel-wise Supervised Models in IVIM MRI

Nicola Casali, Alessandro Brusaferri, Giuseppe Baselli, Stefano Fumagalli, Edoardo Micotti, Gianluigi Forloni, Riaz Hussein, Giovanna Rizzo, Alfonso Mastropietro

arxiv logopreprintAug 6 2025
Accurate estimation of intravoxel incoherent motion (IVIM) parameters from diffusion-weighted MRI remains challenging due to the ill-posed nature of the inverse problem and high sensitivity to noise, particularly in the perfusion compartment. In this work, we propose a probabilistic deep learning framework based on Deep Ensembles (DE) of Mixture Density Networks (MDNs), enabling estimation of total predictive uncertainty and decomposition into aleatoric (AU) and epistemic (EU) components. The method was benchmarked against non probabilistic neural networks, a Bayesian fitting approach and a probabilistic network with single Gaussian parametrization. Supervised training was performed on synthetic data, and evaluation was conducted on both simulated and an in vivo dataset. The reliability of the quantified uncertainties was assessed using calibration curves, output distribution sharpness, and the Continuous Ranked Probability Score (CRPS). MDNs produced more calibrated and sharper predictive distributions for the diffusion coefficient D and fraction f parameters, although slight overconfidence was observed in pseudo-diffusion coefficient D*. The Robust Coefficient of Variation (RCV) indicated smoother in vivo estimates for D* with MDNs compared to Gaussian model. Despite the training data covering the expected physiological range, elevated EU in vivo suggests a mismatch with real acquisition conditions, highlighting the importance of incorporating EU, which was allowed by DE. Overall, we present a comprehensive framework for IVIM fitting with uncertainty quantification, which enables the identification and interpretation of unreliable estimates. The proposed approach can also be adopted for fitting other physical models through appropriate architectural and simulation adjustments.

NEARL-CLIP: Interacted Query Adaptation with Orthogonal Regularization for Medical Vision-Language Understanding

Zelin Peng, Yichen Zhao, Yu Huang, Piao Yang, Feilong Tang, Zhengqin Xu, Xiaokang Yang, Wei Shen

arxiv logopreprintAug 6 2025
Computer-aided medical image analysis is crucial for disease diagnosis and treatment planning, yet limited annotated datasets restrict medical-specific model development. While vision-language models (VLMs) like CLIP offer strong generalization capabilities, their direct application to medical imaging analysis is impeded by a significant domain gap. Existing approaches to bridge this gap, including prompt learning and one-way modality interaction techniques, typically focus on introducing domain knowledge to a single modality. Although this may offer performance gains, it often causes modality misalignment, thereby failing to unlock the full potential of VLMs. In this paper, we propose \textbf{NEARL-CLIP} (i\underline{N}teracted qu\underline{E}ry \underline{A}daptation with o\underline{R}thogona\underline{L} Regularization), a novel cross-modality interaction VLM-based framework that contains two contributions: (1) Unified Synergy Embedding Transformer (USEformer), which dynamically generates cross-modality queries to promote interaction between modalities, thus fostering the mutual enrichment and enhancement of multi-modal medical domain knowledge; (2) Orthogonal Cross-Attention Adapter (OCA). OCA introduces an orthogonality technique to decouple the new knowledge from USEformer into two distinct components: the truly novel information and the incremental knowledge. By isolating the learning process from the interference of incremental knowledge, OCA enables a more focused acquisition of new information, thereby further facilitating modality interaction and unleashing the capability of VLMs. Notably, NEARL-CLIP achieves these two contributions in a parameter-efficient style, which only introduces \textbf{1.46M} learnable parameters.

Clinical information prompt-driven retinal fundus image for brain health evaluation.

Tong N, Hui Y, Gou SP, Chen LX, Wang XH, Chen SH, Li J, Li XS, Wu YT, Wu SL, Wang ZC, Sun J, Lv H

pubmed logopapersAug 6 2025
Brain volume measurement serves as a critical approach for assessing brain health status. Considering the close biological connection between the eyes and brain, this study aims to investigate the feasibility of estimating brain volume through retinal fundus imaging integrated with clinical metadata, and to offer a cost-effective approach for assessing brain health. Based on clinical information, retinal fundus images, and neuroimaging data derived from a multicenter, population-based cohort study, the KaiLuan Study, we proposed a cross-modal correlation representation (CMCR) network to elucidate the intricate co-degenerative relationships between the eyes and brain for 755 subjects. Specifically, individual clinical information, which has been followed up for as long as 12 years, was encoded as a prompt to enhance the accuracy of brain volume estimation. Independent internal validation and external validation were performed to assess the robustness of the proposed model. Root mean square error (RMSE), peak signal-to-noise ratio (PSNR), and structural similarity index measure (SSIM) metrics were employed to quantitatively evaluate the quality of synthetic brain images derived from retinal imaging data. The proposed framework yielded average RMSE, PSNR, and SSIM values of 98.23, 35.78 dB, and 0.64, respectively, which significantly outperformed 5 other methods: multi-channel Variational Autoencoder (mcVAE), Pixel-to-Pixel (Pixel2pixel), transformer-based U-Net (TransUNet), multi-scale transformer network (MT-Net), and residual vision transformer (ResViT). The two- (2D) and three-dimensional (3D) visualization results showed that the shape and texture of the synthetic brain images generated by the proposed method most closely resembled those of actual brain images. Thus, the CMCR framework accurately captured the latent structural correlations between the fundus and the brain. The average difference between predicted and actual brain volumes was 61.36 cm<sup>3</sup>, with a relative error of 4.54%. When all of the clinical information (including age and sex, daily habits, cardiovascular factors, metabolic factors, and inflammatory factors) was encoded, the difference was decreased to 53.89 cm<sup>3</sup>, with a relative error of 3.98%. Based on the synthesized brain MR images from retinal fundus images, the volumes of brain tissues could be estimated with high accuracy. This study provides an innovative, accurate, and cost-effective approach to characterize brain health status through readily accessible retinal fundus images. NCT05453877 ( https://clinicaltrials.gov/ ).

Beyond the type 1 pattern: comprehensive risk stratification in Brugada syndrome.

Kan KY, Van Wyk A, Paterson T, Ninan N, Lysyganicz P, Tyagi I, Bhasi Lizi R, Boukrid F, Alfaifi M, Mishra A, Katraj SVK, Pooranachandran V

pubmed logopapersAug 6 2025
Brugada Syndrome (BrS) is an inherited cardiac ion channelopathy associated with an elevated risk of sudden cardiac death, particularly due to ventricular arrhythmias in structurally normal hearts. Affecting approximately 1 in 2,000 individuals, BrS is most prevalent among middle-aged males of Asian descent. Although diagnosis is based on the presence of a Type 1 electrocardiographic (ECG) pattern, either spontaneous or induced, accurately stratifying risk in asymptomatic and borderline patients remains a major clinical challenge. This review explores current and emerging approaches to BrS risk stratification, focusing on electrocardiographic, electrophysiological, imaging, and computational markers. Non-invasive ECG indicators such as the β-angle, fragmented QRS, S wave in lead I, early repolarisation, aVR sign, and transmural dispersion of repolarisation have demonstrated predictive value for arrhythmic events. Adjunctive tools like signal-averaged ECG, Holter monitoring, and exercise stress testing enhance diagnostic yield by capturing dynamic electrophysiological changes. In parallel, imaging modalities, particularly speckle-tracking echocardiography and cardiac magnetic resonance have revealed subclinical structural abnormalities in the right ventricular outflow tract and atria, challenging the paradigm of BrS as a purely electrical disorder. Invasive electrophysiological studies and substrate mapping have further clarified the anatomical basis of arrhythmogenesis, while risk scoring systems (e.g., Sieira, BRUGADA-RISK, PAT) and machine learning models offer new avenues for personalised risk assessment. Together, these advances underscore the importance of an integrated, multimodal approach to BrS risk stratification. Optimising these strategies is essential to guide implantable cardioverter-defibrillator decisions and improve outcomes in patients vulnerable to life-threatening arrhythmias.

Foundation models for radiology-the position of the AI for Health Imaging (AI4HI) network.

de Almeida JG, Alberich LC, Tsakou G, Marias K, Tsiknakis M, Lekadir K, Marti-Bonmati L, Papanikolaou N

pubmed logopapersAug 6 2025
Foundation models are large models trained on big data which can be used for downstream tasks. In radiology, these models can potentially address several gaps in fairness and generalization, as they can be trained on massive datasets without labelled data and adapted to tasks requiring data with a small number of descriptions. This reduces one of the limiting bottlenecks in clinical model construction-data annotation-as these models can be trained through a variety of techniques that require little more than radiological images with or without their corresponding radiological reports. However, foundation models may be insufficient as they are affected-to a smaller extent when compared with traditional supervised learning approaches-by the same issues that lead to underperforming models, such as a lack of transparency/explainability, and biases. To address these issues, we advocate that the development of foundation models should not only be pursued but also accompanied by the development of a decentralized clinical validation and continuous training framework. This does not guarantee the resolution of the problems associated with foundation models, but it enables developers, clinicians and patients to know when, how and why models should be updated, creating a clinical AI ecosystem that is better capable of serving all stakeholders. CRITICAL RELEVANCE STATEMENT: Foundation models may mitigate issues like bias and poor generalization in radiology AI, but challenges persist. We propose a decentralized, cross-institutional framework for continuous validation and training to enhance model reliability, safety, and clinical utility. KEY POINTS: Foundation models trained on large datasets reduce annotation burdens and improve fairness and generalization in radiology. Despite improvements, they still face challenges like limited transparency, explainability, and residual biases. A decentralized, cross-institutional framework for clinical validation and continuous training can strengthen reliability and inclusivity in clinical AI.

Controllable Mask Diffusion Model for medical annotation synthesis with semantic information extraction.

Heo C, Jung J

pubmed logopapersAug 5 2025
Medical segmentation, a prominent task in medical image analysis utilizing artificial intelligence, plays a crucial role in computer-aided diagnosis and depends heavily on the quality of the training data. However, the availability of sufficient data is constrained by strict privacy regulations associated with medical data. To mitigate this issue, research on data augmentation has gained significant attention. Medical segmentation tasks require paired datasets consisting of medical images and annotation images, also known as mask images, which represent lesion areas or radiological information within the medical images. Consequently, it is essential to apply data augmentation to both image types. This study proposes a Controllable Mask Diffusion Model, a novel approach capable of controlling and generating new masks. This model leverages the binary structure of the mask to extract semantic information, namely, the mask's size, location, and count, which is then applied as multi-conditional input to a diffusion model via a regressor. Through the regressor, newly generated masks conform to the input semantic information, thereby enabling input-driven controllable generation. Additionally, a technique that analyzes correlation within semantic information was devised for large-scale data synthesis. The generative capacity of the proposed model was evaluated against real datasets, and the model's ability to control and generate new masks based on previously unseen semantic information was confirmed. Furthermore, the practical applicability of the model was demonstrated by augmenting the data with the generated data, applying it to segmentation tasks, and comparing the performance with and without augmentation. Additionally, experiments were conducted on single-label and multi-label masks, yielding superior results for both types. This demonstrates the potential applicability of this study to various areas within the medical field.

Augmenting Continual Learning of Diseases with LLM-Generated Visual Concepts

Jiantao Tan, Peixian Ma, Kanghao Chen, Zhiming Dai, Ruixuan Wang

arxiv logopreprintAug 5 2025
Continual learning is essential for medical image classification systems to adapt to dynamically evolving clinical environments. The integration of multimodal information can significantly enhance continual learning of image classes. However, while existing approaches do utilize textual modality information, they solely rely on simplistic templates with a class name, thereby neglecting richer semantic information. To address these limitations, we propose a novel framework that harnesses visual concepts generated by large language models (LLMs) as discriminative semantic guidance. Our method dynamically constructs a visual concept pool with a similarity-based filtering mechanism to prevent redundancy. Then, to integrate the concepts into the continual learning process, we employ a cross-modal image-concept attention module, coupled with an attention loss. Through attention, the module can leverage the semantic knowledge from relevant visual concepts and produce class-representative fused features for classification. Experiments on medical and natural image datasets show our method achieves state-of-the-art performance, demonstrating the effectiveness and superiority of our method. We will release the code publicly.
Page 26 of 78779 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.