Sort by:
Page 48 of 58575 results

MedBridge: Bridging Foundation Vision-Language Models to Medical Image Diagnosis

Yitong Li, Morteza Ghahremani, Christian Wachinger

arxiv logopreprintMay 27 2025
Recent vision-language foundation models deliver state-of-the-art results on natural image classification but falter on medical images due to pronounced domain shifts. At the same time, training a medical foundation model requires substantial resources, including extensive annotated data and high computational capacity. To bridge this gap with minimal overhead, we introduce MedBridge, a lightweight multimodal adaptation framework that re-purposes pretrained VLMs for accurate medical image diagnosis. MedBridge comprises three key components. First, a Focal Sampling module that extracts high-resolution local regions to capture subtle pathological features and compensate for the limited input resolution of general-purpose VLMs. Second, a Query Encoder (QEncoder) injects a small set of learnable queries that attend to the frozen feature maps of VLM, aligning them with medical semantics without retraining the entire backbone. Third, a Mixture of Experts mechanism, driven by learnable queries, harnesses the complementary strength of diverse VLMs to maximize diagnostic performance. We evaluate MedBridge on five medical imaging benchmarks across three key adaptation tasks, demonstrating its superior performance in both cross-domain and in-domain adaptation settings, even under varying levels of training data availability. Notably, MedBridge achieved over 6-15% improvement in AUC compared to state-of-the-art VLM adaptation methods in multi-label thoracic disease diagnosis, underscoring its effectiveness in leveraging foundation models for accurate and data-efficient medical diagnosis. Our code is available at https://github.com/ai-med/MedBridge.

Automatic identification of Parkinsonism using clinical multi-contrast brain MRI: a large self-supervised vision foundation model strategy.

Suo X, Chen M, Chen L, Luo C, Kemp GJ, Lui S, Sun H

pubmed logopapersMay 27 2025
Valid non-invasive biomarkers for Parkinson's disease (PD) and Parkinson-plus syndrome (PPS) are urgently needed. Based on our recent self-supervised vision foundation model the Shift Window UNET TRansformer (Swin UNETR), which uses clinical multi-contrast whole brain MRI, we aimed to develop an efficient and practical model ('SwinClassifier') for the discrimination of PD vs PPS using routine clinical MRI scans. We used 75,861 clinical head MRI scans including T1-weighted, T2-weighted and fluid attenuated inversion recovery imaging as a pre-training dataset to develop a foundation model, using self-supervised learning with a cross-contrast context recovery task. Then clinical head MRI scans from n = 1992 participants with PD and n = 1989 participants with PPS were used as a downstream PD vs PPS classification dataset. We then assessed SwinClassifier's performance in confusion matrices compared to a comparative self-supervised vanilla Vision Transformer (ViT) autoencoder ('ViTClassifier'), and to two convolutional neural networks (DenseNet121 and ResNet50) trained from scratch. SwinClassifier showed very good performance (F1 score 0.83, 95% confidence interval [CI] [0.79-0.87], AUC 0.89) in PD vs PPS discrimination in independent test datasets (n = 173 participants with PD and n = 165 participants with PPS). This self-supervised classifier with pretrained weights outperformed the ViTClassifier and convolutional classifiers trained from scratch (F1 score 0.77-0.82, AUC 0.83-0.85). Occlusion sensitivity mapping in the correctly-classified cases (n = 160 PD and n = 114 PPS) highlighted the brain regions guiding discrimination mainly in sensorimotor and midline structures including cerebellum, brain stem, ventricle and basal ganglia. Our self-supervised digital model based on routine clinical head MRI discriminated PD vs PPS with good accuracy and sensitivity. With incremental improvements the approach may be diagnostically useful in early disease. National Key Research and Development Program of China.

Evaluating Large Language Models for Enhancing Radiology Specialty Examination: A Comparative Study with Human Performance.

Liu HY, Chen SJ, Wang W, Lee CH, Hsu HH, Shen SH, Chiou HJ, Lee WJ

pubmed logopapersMay 27 2025
The radiology specialty examination assesses clinical decision-making, image interpretation, and diagnostic reasoning. With the expansion of medical knowledge, traditional test design faces challenges in maintaining accuracy and relevance. Large language models (LLMs) demonstrate potential in medical education. This study evaluates LLM performance in radiology specialty exams, explores their role in assessing question difficulty, and investigates their reasoning processes, aiming to develop a more objective and efficient framework for exam design. This study compared the performance of LLMs and human examinees in a radiology specialty examination. Three LLMs (GPT-4o, o1-preview, and GPT-3.5-turbo-1106) were evaluated under zero-shot conditions. Exam accuracy, examinee accuracy, discrimination index, and point-biserial correlation were used to assess LLMs' ability to predict question difficulty and reasoning processes. The data provided by the Taiwan Radiological Society ensures comparability between AI and human performance. As for accuracy, GPT-4o (88.0%) and o1-preview (90.9%) outperformed human examinees (76.3%), whereas GPT-3.5-turbo-1106 showed significantly lower accuracy (50.2%). Question difficulty analysis revealed that newer LLMs excel in solving complex questions, while GPT-3.5-turbo-1106 exhibited greater performance variability. Discrimination index and point-biserial Correlation analyses demonstrated that GPT-4o and o1-preview accurately identified key differentiating questions, closely mirroring human reasoning patterns. These findings suggest that advanced LLMs can assess medical examination difficulty, offering potential applications in exam standardization and question evaluation. This study evaluated the problem-solving capabilities of GPT-3.5-turbo-1106, GPT-4o, and o1-preview in a radiology specialty examination. LLMs should be utilized as tools for assessing exam question difficulty and assisting in the standardized development of medical examinations.

Can intraoperative improvement of radial endobronchial ultrasound imaging enhance the diagnostic yield in peripheral pulmonary lesions?

Nishida K, Ito T, Iwano S, Okachi S, Nakamura S, Chrétien B, Chen-Yoshikawa TF, Ishii M

pubmed logopapersMay 26 2025
Data regarding the diagnostic efficacy of radial endobronchial ultrasound (R-EBUS) findings obtained via transbronchial needle aspiration (TBNA)/biopsy (TBB) with endobronchial ultrasonography with a guide sheath (EBUS-GS) for peripheral pulmonary lesions (PPLs) are lacking. We evaluated whether intraoperative probe repositioning improves R-EBUS imaging and affects diagnostic yield and safety of EBUS-guided sampling for PPLs. We retrospectively studied 363 patients with PPLs who underwent TBNA/TBB (83 lesions) or TBB (280 lesions) using EBUS-GS. Based on the R-EBUS findings before and after these procedures, patients were categorized into three groups: the improved R-EBUS image (n = 52), unimproved R-EBUS image (n = 69), and initial within-lesion groups (n = 242). The impact of improved R-EBUS findings on diagnostic yield and complications was assessed using multivariable logistic regression, adjusting for lesion size, lesion location, and the presence of a bronchus leading to the lesion on CT. A separate exploratory random-forest model with SHAP analysis was used to explore factors associated with successful repositioning in lesions not initially "within." The diagnostic yield in the improved R-EBUS group was significantly higher than that in the unimproved R-EBUS group (76.9% vs. 46.4%, p = 0.001). The regression model revealed that the improvement in intraoperative R-EBUS findings was associated with a high diagnostic yield (odds ratio: 3.55, 95% confidence interval, 1.57-8.06, p = 0.002). Machine learning analysis indicated that inner lesion location and radiographic visibility were the most influential predictors of successful repositioning. The complication rates were similar across all groups (total complications: 5.8% vs. 4.3% vs. 6.2%, p = 0.943). Improved R-EBUS findings during TBNA/TBB or TBB with EBUS-GS were associated with a high diagnostic yield without an increase in complications, even when the initial R-EBUS findings were inadequate. This suggests that repeated intraoperative probe repositioning can safely boost outcomes.

Clinical, radiological, and radiomics feature-based explainable machine learning models for prediction of neurological deterioration and 90-day outcomes in mild intracerebral hemorrhage.

Zeng W, Chen J, Shen L, Xia G, Xie J, Zheng S, He Z, Deng L, Guo Y, Yang J, Lv Y, Qin G, Chen W, Yin J, Wu Q

pubmed logopapersMay 26 2025
The risks and prognosis of mild intracerebral hemorrhage (ICH) patients were easily overlooked by clinicians. Our goal was to use machine learning (ML) methods to predict mild ICH patients' neurological deterioration (ND) and 90-day prognosis. This prospective study recruited 257 patients with mild ICH for this study. After exclusions, 148 patients were included in the ND study and 144 patients in the 90-day prognosis study. We trained five ML models using filtered data, including clinical, traditional imaging, and radiomics indicators based on non-contrast computed tomography (NCCT). Additionally, we incorporated the Shapley Additive Explanation (SHAP) method to display key features and visualize the decision-making process of the model for each individual. A total of 21 (14.2%) mild ICH patients developed ND, and 35 (24.3%) mild ICH patients had a 90-day poor prognosis. In the validation set, the support vector machine (SVM) models achieved an AUC of 0.846 (95% confidence intervals (CI), 0.627-1.000) and an F1-score of 0.667 for predicting ND, and an AUC of 0.970 (95% CI, 0.928-1.000), and an F1-score of 0.846 for predicting 90-day prognosis. The SHAP analysis results indicated that several clinical features, the island sign, and the radiomics features of the hematoma were of significant value in predicting ND and 90-day prognosis. The ML models, constructed using clinical, traditional imaging, and radiomics indicators, demonstrated good classification performance in predicting ND and 90-day prognosis in patients with mild ICH, and have the potential to serve as an effective tool in clinical practice. Not applicable.

An Explainable Diagnostic Framework for Neurodegenerative Dementias via Reinforcement-Optimized LLM Reasoning

Andrew Zamai, Nathanael Fijalkow, Boris Mansencal, Laurent Simon, Eloi Navet, Pierrick Coupe

arxiv logopreprintMay 26 2025
The differential diagnosis of neurodegenerative dementias is a challenging clinical task, mainly because of the overlap in symptom presentation and the similarity of patterns observed in structural neuroimaging. To improve diagnostic efficiency and accuracy, deep learning-based methods such as Convolutional Neural Networks and Vision Transformers have been proposed for the automatic classification of brain MRIs. However, despite their strong predictive performance, these models find limited clinical utility due to their opaque decision making. In this work, we propose a framework that integrates two core components to enhance diagnostic transparency. First, we introduce a modular pipeline for converting 3D T1-weighted brain MRIs into textual radiology reports. Second, we explore the potential of modern Large Language Models (LLMs) to assist clinicians in the differential diagnosis between Frontotemporal dementia subtypes, Alzheimer's disease, and normal aging based on the generated reports. To bridge the gap between predictive accuracy and explainability, we employ reinforcement learning to incentivize diagnostic reasoning in LLMs. Without requiring supervised reasoning traces or distillation from larger models, our approach enables the emergence of structured diagnostic rationales grounded in neuroimaging findings. Unlike post-hoc explainability methods that retrospectively justify model decisions, our framework generates diagnostic rationales as part of the inference process-producing causally grounded explanations that inform and guide the model's decision-making process. In doing so, our framework matches the diagnostic performance of existing deep learning methods while offering rationales that support its diagnostic conclusions.

Multimodal integration of longitudinal noninvasive diagnostics for survival prediction in immunotherapy using deep learning.

Yeghaian M, Bodalal Z, van den Broek D, Haanen JBAG, Beets-Tan RGH, Trebeschi S, van Gerven MAJ

pubmed logopapersMay 26 2025
Immunotherapies have revolutionized the landscape of cancer treatments. However, our understanding of response patterns in advanced cancers treated with immunotherapy remains limited. By leveraging routinely collected noninvasive longitudinal and multimodal data with artificial intelligence, we could unlock the potential to transform immunotherapy for cancer patients, paving the way for personalized treatment approaches. In this study, we developed a novel artificial neural network architecture, multimodal transformer-based simple temporal attention (MMTSimTA) network, building upon a combination of recent successful developments. We integrated pre- and on-treatment blood measurements, prescribed medications, and CT-based volumes of organs from a large pan-cancer cohort of 694 patients treated with immunotherapy to predict mortality at 3, 6, 9, and 12 months. Different variants of our extended MMTSimTA network were implemented and compared to baseline methods, incorporating intermediate and late fusion-based integration methods. The strongest prognostic performance was demonstrated using a variant of the MMTSimTA model with area under the curves of 0.84 ± 0.04, 0.83 ± 0.02, 0.82 ± 0.02, 0.81 ± 0.03 for 3-, 6-, 9-, and 12-month survival prediction, respectively. Our findings show that integrating noninvasive longitudinal data using our novel architecture yields an improved multimodal prognostic performance, especially in short-term survival prediction. Our study demonstrates that multimodal longitudinal integration of noninvasive data using deep learning may offer a promising approach for personalized prognostication in immunotherapy-treated cancer patients.

[Clinical value of medical imaging artificial intelligence in the diagnosis and treatment of peritoneal metastasis in gastrointestinal cancers].

Fang MJ, Dong D, Tian J

pubmed logopapersMay 25 2025
Peritoneal metastasis is a key factor in the poor prognosis of advanced gastrointestinal cancer patients. Traditional radiological diagnostic faces challenges such as insufficient sensitivity. Through technologies like radiomics and deep learning, artificial intelligence can deeply analyze the tumor heterogeneity and microenvironment features in medical images, revealing markers of peritoneal metastasis and constructing high-precision predictive models. These technologies have demonstrated advantages in tasks such as predicting peritoneal metastasis, assessing the risk of peritoneal recurrence, and identifying small metastatic foci during surgery. This paper summarizes the representative progress and application prospects of medical imaging artificial intelligence in the diagnosis and treatment of peritoneal metastasis, and discusses potential development directions such as multimodal data fusion and large model. The integration of medical imaging artificial intelligence with clinical practice is expected to advance personalized and precision medicine in the diagnosis and treatment of peritoneal metastasis in gastrointestinal cancers.

CardioCoT: Hierarchical Reasoning for Multimodal Survival Analysis

Shaohao Rui, Haoyang Su, Jinyi Xiang, Lian-Ming Wu, Xiaosong Wang

arxiv logopreprintMay 25 2025
Accurate prediction of major adverse cardiovascular events recurrence risk in acute myocardial infarction patients based on postoperative cardiac MRI and associated clinical notes is crucial for precision treatment and personalized intervention. Existing methods primarily focus on risk stratification capability while overlooking the need for intermediate robust reasoning and model interpretability in clinical practice. Moreover, end-to-end risk prediction using LLM/VLM faces significant challenges due to data limitations and modeling complexity. To bridge this gap, we propose CardioCoT, a novel two-stage hierarchical reasoning-enhanced survival analysis framework designed to enhance both model interpretability and predictive performance. In the first stage, we employ an evidence-augmented self-refinement mechanism to guide LLM/VLMs in generating robust hierarchical reasoning trajectories based on associated radiological findings. In the second stage, we integrate the reasoning trajectories with imaging data for risk model training and prediction. CardioCoT demonstrates superior performance in MACE recurrence risk prediction while providing interpretable reasoning processes, offering valuable insights for clinical decision-making.

MedITok: A Unified Tokenizer for Medical Image Synthesis and Interpretation

Chenglong Ma, Yuanfeng Ji, Jin Ye, Zilong Li, Chenhui Wang, Junzhi Ning, Wei Li, Lihao Liu, Qiushan Guo, Tianbin Li, Junjun He, Hongming Shan

arxiv logopreprintMay 25 2025
Advanced autoregressive models have reshaped multimodal AI. However, their transformative potential in medical imaging remains largely untapped due to the absence of a unified visual tokenizer -- one capable of capturing fine-grained visual structures for faithful image reconstruction and realistic image synthesis, as well as rich semantics for accurate diagnosis and image interpretation. To this end, we present MedITok, the first unified tokenizer tailored for medical images, encoding both low-level structural details and high-level clinical semantics within a unified latent space. To balance these competing objectives, we introduce a novel two-stage training framework: a visual representation alignment stage that cold-starts the tokenizer reconstruction learning with a visual semantic constraint, followed by a textual semantic representation alignment stage that infuses detailed clinical semantics into the latent space. Trained on the meticulously collected large-scale dataset with over 30 million medical images and 2 million image-caption pairs, MedITok achieves state-of-the-art performance on more than 30 datasets across 9 imaging modalities and 4 different tasks. By providing a unified token space for autoregressive modeling, MedITok supports a wide range of tasks in clinical diagnostics and generative healthcare applications. Model and code will be made publicly available at: https://github.com/Masaaki-75/meditok.
Page 48 of 58575 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.