Sort by:
Page 1 of 13125 results
Next

Leveraging multi-modal foundation model image encoders to enhance brain MRI-based headache classification.

Rafsani F, Sheth D, Che Y, Shah J, Siddiquee MMR, Chong CD, Nikolova S, Ross K, Dumkrieger G, Li B, Wu T, Schwedt TJ

pubmed logopapersSep 26 2025
Headaches are a nearly universal human experience traditionally diagnosed based solely on symptoms. Recent advances in imaging techniques and artificial intelligence (AI) have enabled the development of automated headache detection systems, which can enhance clinical diagnosis, especially when symptom-based evaluations are insufficient. Current AI models often require extensive data, limiting their clinical applicability where data availability is low. However, deep learning models, particularly pre-trained ones and fine-tuned with smaller, targeted datasets can potentially overcome this limitation. By leveraging BioMedCLIP, a pre-trained foundational model combining a vision transformer (ViT) image encoder with PubMedBERT text encoder, we fine-tuned the pre-trained ViT model for the specific purpose of classifying headaches and detecting biomarkers from brain MRI data. The dataset consisted of 721 individuals: 424 healthy controls (HC) from the IXI dataset and 297 local participants, including migraine sufferers (n = 96), individuals with acute post-traumatic headache (APTH, n = 48), persistent post-traumatic headache (PPTH, n = 49), and additional HC (n = 104). The model achieved high accuracy across multiple balanced test sets, including 89.96% accuracy for migraine versus HC, 88.13% for APTH versus HC, and 83.13% for PPTH versus HC, all validated through five-fold cross-validation for robustness. Brain regions identified by Gradient-weighted Class Activation Mapping analysis as responsible for migraine classification included the postcentral cortex, supramarginal gyrus, superior temporal cortex, and precuneus cortex; for APTH, rostral middle frontal and precentral cortices; and, for PPTH, cerebellar cortex and precentral cortex. To our knowledge, this is the first study to leverage a multimodal biomedical foundation model in the context of headache classification and biomarker detection using structural MRI, offering complementary insights into the causes and brain changes associated with headache disorders.

Secure and fault tolerant cloud based framework for medical image storage and retrieval in a distributed environment.

Amaithi Rajan A, V V, M A, R PK

pubmed logopapersSep 26 2025
In the evolving field of healthcare, centralized cloud-based medical image retrieval faces challenges related to security, availability, and adversarial threats. Existing deep learning-based solutions improve retrieval but remain vulnerable to adversarial attacks and quantum threats, necessitating a shift to more secure distributed cloud solutions. This article proposes SFMedIR, a secure and fault tolerant medical image retrieval framework that contains an adversarial attack-resistant federated learning for hashcode generation, utilizing a ConvNeXt-based model to improve accuracy and generalizability. The framework integrates quantum-chaos-based encryption for security, dynamic threshold-based shadow storage for fault tolerance, and a distributed cloud architecture to mitigate single points of failure. Unlike conventional methods, this approach significantly improves security and availability in cloud-based medical image retrieval systems, providing a resilient and efficient solution for healthcare applications. The framework is validated on Brain MRI and Kidney CT datasets, achieving a 60-70% improvement in retrieval accuracy for adversarial queries and an overall 90% retrieval accuracy, outperforming existing models by 5-10%. The results demonstrate superior performance in terms of both security and retrieval efficiency, making this framework a valuable contribution to the future of secure medical image management.

Decipher-MR: A Vision-Language Foundation Model for 3D MRI Representations

Zhijian Yang, Noel DSouza, Istvan Megyeri, Xiaojian Xu, Amin Honarmandi Shandiz, Farzin Haddadpour, Krisztian Koos, Laszlo Rusko, Emanuele Valeriano, Bharadwaj Swaninathan, Lei Wu, Parminder Bhatia, Taha Kass-Hout, Erhan Bas

arxiv logopreprintSep 25 2025
Magnetic Resonance Imaging (MRI) is a critical medical imaging modality in clinical diagnosis and research, yet its complexity and heterogeneity pose challenges for automated analysis, particularly in scalable and generalizable machine learning applications. While foundation models have revolutionized natural language and vision tasks, their application to MRI remains limited due to data scarcity and narrow anatomical focus. In this work, we present Decipher-MR, a 3D MRI-specific vision-language foundation model trained on a large-scale dataset comprising 200,000 MRI series from over 22,000 studies spanning diverse anatomical regions, sequences, and pathologies. Decipher-MR integrates self-supervised vision learning with report-guided text supervision to build robust, generalizable representations, enabling effective adaptation across broad applications. To enable robust and diverse clinical tasks with minimal computational overhead, Decipher-MR supports a modular design that enables tuning of lightweight, task-specific decoders attached to a frozen pretrained encoder. Following this setting, we evaluate Decipher-MR across diverse benchmarks including disease classification, demographic prediction, anatomical localization, and cross-modal retrieval, demonstrating consistent performance gains over existing foundation models and task-specific approaches. Our results establish Decipher-MR as a scalable and versatile foundation for MRI-based AI, facilitating efficient development across clinical and research domains.

Citrus-V: Advancing Medical Foundation Models with Unified Medical Image Grounding for Clinical Reasoning

Guoxin Wang, Jun Zhao, Xinyi Liu, Yanbo Liu, Xuyang Cao, Chao Li, Zhuoyun Liu, Qintian Sun, Fangru Zhou, Haoqiang Xing, Zhenhong Yang

arxiv logopreprintSep 23 2025
Medical imaging provides critical evidence for clinical diagnosis, treatment planning, and surgical decisions, yet most existing imaging models are narrowly focused and require multiple specialized networks, limiting their generalization. Although large-scale language and multimodal models exhibit strong reasoning and multi-task capabilities, real-world clinical applications demand precise visual grounding, multimodal integration, and chain-of-thought reasoning. We introduce Citrus-V, a multimodal medical foundation model that combines image analysis with textual reasoning. The model integrates detection, segmentation, and multimodal chain-of-thought reasoning, enabling pixel-level lesion localization, structured report generation, and physician-like diagnostic inference in a single framework. We propose a novel multimodal training approach and release a curated open-source data suite covering reasoning, detection, segmentation, and document understanding tasks. Evaluations demonstrate that Citrus-V outperforms existing open-source medical models and expert-level imaging systems across multiple benchmarks, delivering a unified pipeline from visual grounding to clinical reasoning and supporting precise lesion quantification, automated reporting, and reliable second opinions.

Citrus-V: Advancing Medical Foundation Models with Unified Medical Image Grounding for Clinical Reasoning

Guoxin Wang, Jun Zhao, Xinyi Liu, Yanbo Liu, Xuyang Cao, Chao Li, Zhuoyun Liu, Qintian Sun, Fangru Zhou, Haoqiang Xing, Zhenhong Yang

arxiv logopreprintSep 23 2025
Medical imaging provides critical evidence for clinical diagnosis, treatment planning, and surgical decisions, yet most existing imaging models are narrowly focused and require multiple specialized networks, limiting their generalization. Although large-scale language and multimodal models exhibit strong reasoning and multi-task capabilities, real-world clinical applications demand precise visual grounding, multimodal integration, and chain-of-thought reasoning. We introduce Citrus-V, a multimodal medical foundation model that combines image analysis with textual reasoning. The model integrates detection, segmentation, and multimodal chain-of-thought reasoning, enabling pixel-level lesion localization, structured report generation, and physician-like diagnostic inference in a single framework. We propose a novel multimodal training approach and release a curated open-source data suite covering reasoning, detection, segmentation, and document understanding tasks. Evaluations demonstrate that Citrus-V outperforms existing open-source medical models and expert-level imaging systems across multiple benchmarks, delivering a unified pipeline from visual grounding to clinical reasoning and supporting precise lesion quantification, automated reporting, and reliable second opinions.

Improved pharmacokinetic parameter estimation from DCE-MRI via spatial-temporal information-driven unsupervised learning.

He X, Wang L, Yang Q, Wang J, Xing Z, Cao D, Cai C, Cai S

pubmed logopapersSep 23 2025
<b>Objective</b>: Pharmacokinetic (PK) parameters derived from dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) provide quantitative characterization of tissue perfusion and permeability. However, existing deep learning methods for PK parameter estimation rely on either temporal or spatial features alone, overlooking the integrated spatial-temporal characteristics of DCE-MRI data. This study aims to remove this barrier by fully leveraging the spatial and temporal information to improve parameter estimation.&#xD;<b>Approach</b>: A spatial-temporal information-driven unsupervised deep learning method (STUDE) was proposed. STUDE combines convolutional neural networks (CNNs) and a customized Vision Transformer (ViT) to separately capture spatial and temporal features, enabling comprehensive modelling of contrast agent dynamics and tissue heterogeneity. Besides, a spatial-temporal attention (STA) feature fusion module was proposed to enable adaptive focus on both dimensions for more effective feature fusion. Moreover, the extended Tofts model imposed physical constraints on PK parameter estimation, enabling unsupervised training of STUDE. The accuracy and diagnostic value of STUDE was compared with the orthodox non-linear least squares (NLLS) and representative deep learning-based methods (i.e., GRU, CNN, U-Net, and VTDCE-Net) on a numerical brain phantom and 87 glioma patients, respectively.&#xD;<b>Main results</b>: On the numerical brain phantom, STUDE produced PK parameter maps with the lowest systematic and random errors even under low SNR conditions (SNR = 10 dB). On glioma data, STUDE generated parameter maps with reduced noise compared to NLLS and demonstrated superior structural clarity compared to other methods. Furthermore, STUDE outshined all other methods in the identification of glioma isocitrate dehydrogenase (IDH) mutation status, achieving the area under the curve (AUC) values at 0.840 and 0.908 for the receiver operating characteristic curves of<i>K<sup>trans</sup></i>and<i>V<sub>e</sub></i>, respectively. A combination of all PK parameters improved AUC to 0.926.&#xD;<b>Significance</b>: STUDE advances spatial-temporal information-driven and physics-informed learning for precise PK parameter estimation, demonstrating its potential clinical significance.&#xD.

CPT-4DMR: Continuous sPatial-Temporal Representation for 4D-MRI Reconstruction

Xinyang Wu, Muheng Li, Xia Li, Orso Pusterla, Sairos Safai, Philippe C. Cattin, Antony J. Lomax, Ye Zhang

arxiv logopreprintSep 22 2025
Four-dimensional MRI (4D-MRI) is an promising technique for capturing respiratory-induced motion in radiation therapy planning and delivery. Conventional 4D reconstruction methods, which typically rely on phase binning or separate template scans, struggle to capture temporal variability, complicate workflows, and impose heavy computational loads. We introduce a neural representation framework that considers respiratory motion as a smooth, continuous deformation steered by a 1D surrogate signal, completely replacing the conventional discrete sorting approach. The new method fuses motion modeling with image reconstruction through two synergistic networks: the Spatial Anatomy Network (SAN) encodes a continuous 3D anatomical representation, while a Temporal Motion Network (TMN), guided by Transformer-derived respiratory signals, produces temporally consistent deformation fields. Evaluation using a free-breathing dataset of 19 volunteers demonstrates that our template- and phase-free method accurately captures both regular and irregular respiratory patterns, while preserving vessel and bronchial continuity with high anatomical fidelity. The proposed method significantly improves efficiency, reducing the total processing time from approximately five hours required by conventional discrete sorting methods to just 15 minutes of training. Furthermore, it enables inference of each 3D volume in under one second. The framework accurately reconstructs 3D images at any respiratory state, achieves superior performance compared to conventional methods, and demonstrates strong potential for application in 4D radiation therapy planning and real-time adaptive treatment.

Neural Network-Driven Direct CBCT-Based Dose Calculation for Head-and-Neck Proton Treatment Planning

Muheng Li, Evangelia Choulilitsa, Lisa Fankhauser, Francesca Albertini, Antony Lomax, Ye Zhang

arxiv logopreprintSep 22 2025
Accurate dose calculation on cone beam computed tomography (CBCT) images is essential for modern proton treatment planning workflows, particularly when accounting for inter-fractional anatomical changes in adaptive treatment scenarios. Traditional CBCT-based dose calculation suffers from image quality limitations, requiring complex correction workflows. This study develops and validates a deep learning approach for direct proton dose calculation from CBCT images using extended Long Short-Term Memory (xLSTM) neural networks. A retrospective dataset of 40 head-and-neck cancer patients with paired planning CT and treatment CBCT images was used to train an xLSTM-based neural network (CBCT-NN). The architecture incorporates energy token encoding and beam's-eye-view sequence modelling to capture spatial dependencies in proton dose deposition patterns. Training utilized 82,500 paired beam configurations with Monte Carlo-generated ground truth doses. Validation was performed on 5 independent patients using gamma analysis, mean percentage dose error assessment, and dose-volume histogram comparison. The CBCT-NN achieved gamma pass rates of 95.1 $\pm$ 2.7% using 2mm/2% criteria. Mean percentage dose errors were 2.6 $\pm$ 1.4% in high-dose regions ($>$90% of max dose) and 5.9 $\pm$ 1.9% globally. Dose-volume histogram analysis showed excellent preservation of target coverage metrics (Clinical Target Volume V95% difference: -0.6 $\pm$ 1.1%) and organ-at-risk constraints (parotid mean dose difference: -0.5 $\pm$ 1.5%). Computation time is under 3 minutes without sacrificing Monte Carlo-level accuracy. This study demonstrates the proof-of-principle of direct CBCT-based proton dose calculation using xLSTM neural networks. The approach eliminates traditional correction workflows while achieving comparable accuracy and computational efficiency suitable for adaptive protocols.

Uncovering genetic architecture of the heart via genetic association studies of unsupervised deep learning derived endophenotypes.

You L, Zhao X, Xie Z, Patel KA, Chen C, Kitkungvan D, Mohammed KK, Narula N, Arbustini E, Cassidy CK, Narula J, Zhi D

pubmed logopapersSep 20 2025
Recent genome-wide association studies (GWAS) have effectively linked genetic variants to quantitative traits derived from time-series cardiac magnetic resonance imaging, revealing insights into cardiac morphology and function. Deep learning approach generally requires extensive supervised training on manually annotated data. In this study, we developed a novel framework using a 3D U-architecture autoencoder (cineMAE) to learn deep image phenotypes from cardiac magnetic resonance (CMR) imaging for genetic discovery, focusing on long-axis two-chamber and four-chamber views. We trained a masked autoencoder to develop <b>U</b> nsupervised <b>D</b> erived <b>I</b> mage <b>P</b> henotypes for heart (Heart-UDIPs). These representations were found to be informative to indicate various heart-specific phenotypes (e.g., left ventricular hypertrophy) and diseases (e.g., hypertrophic cardiomyopathy). GWAS on Heart UDIP identified 323 lead SNP and 628 SNP-prioritized genes, which exceeded previous methods. The genes identified by method described herein, exhibited significant associations with cardiac function and showed substantial enrichment in pathways related to cardiac disorders. These results underscore the utility of our Heart-UDIP approach in enhancing the discovery potential for genetic associations, without the need for clinically defined phenotypes or manual annotations.

Visual language model-assisted spectral CT reconstruction by diffusion and low-rank priors from limited-angle measurements.

Wang Y, Liang N, Ren J, Zhang X, Shen Y, Cai A, Zheng Z, Li L, Yan B

pubmed logopapersSep 19 2025
Spectral computed tomography (CT) is a critical tool in clinical practice, offering capabilities in multi-energy spectrum imaging and material identification. The limited-angle (LA) scanning strategy has attracted attention for its advantages in fast data acquisition and reduced radiation exposure, aligning with the as low as reasonably achievable principle. However, most deep learning-based methods require separate models for each LA setting, which limits their flexibility in adapting to new conditions. In this study, we developed a novel Visual-Language model-assisted Spectral CT Reconstruction (VLSR) method to address LA artifacts and enable multi-setting adaptation within a single model. The VLSR method integrates the image-text perception ability of visual-language models and the image generation potential of diffusion models. Prompt engineering is introduced to better represent LA artifact characteristics, further improving artifact accuracy. Additionally, a collaborative sampling framework combining data consistency, low-rank regularization, and image-domain diffusion models is developed to produce high-quality and consistent spectral CT reconstructions. The performance of VLSR is superior to other comparison methods. Under the scanning angles of 90° and 60° for simulated data, the VLSR method improves peak signal noise ratio by at least 0.41 dB and 1.13 dB compared with other methods. VLSR method can reconstruct high-quality spectral CT images under diverse LA configurations, allowing faster and more flexible scans with dose reductions.
Page 1 of 13125 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.