Sort by:
Page 313 of 3863859 results

MedBridge: Bridging Foundation Vision-Language Models to Medical Image Diagnosis

Yitong Li, Morteza Ghahremani, Christian Wachinger

arxiv logopreprintMay 27 2025
Recent vision-language foundation models deliver state-of-the-art results on natural image classification but falter on medical images due to pronounced domain shifts. At the same time, training a medical foundation model requires substantial resources, including extensive annotated data and high computational capacity. To bridge this gap with minimal overhead, we introduce MedBridge, a lightweight multimodal adaptation framework that re-purposes pretrained VLMs for accurate medical image diagnosis. MedBridge comprises three key components. First, a Focal Sampling module that extracts high-resolution local regions to capture subtle pathological features and compensate for the limited input resolution of general-purpose VLMs. Second, a Query Encoder (QEncoder) injects a small set of learnable queries that attend to the frozen feature maps of VLM, aligning them with medical semantics without retraining the entire backbone. Third, a Mixture of Experts mechanism, driven by learnable queries, harnesses the complementary strength of diverse VLMs to maximize diagnostic performance. We evaluate MedBridge on five medical imaging benchmarks across three key adaptation tasks, demonstrating its superior performance in both cross-domain and in-domain adaptation settings, even under varying levels of training data availability. Notably, MedBridge achieved over 6-15% improvement in AUC compared to state-of-the-art VLM adaptation methods in multi-label thoracic disease diagnosis, underscoring its effectiveness in leveraging foundation models for accurate and data-efficient medical diagnosis. Our code is available at https://github.com/ai-med/MedBridge.

Scalable Segmentation for Ultra-High-Resolution Brain MR Images

Xiaoling Hu, Peirong Liu, Dina Zemlyanker, Jonathan Williams Ramirez, Oula Puonti, Juan Eugenio Iglesias

arxiv logopreprintMay 27 2025
Although deep learning has shown great success in 3D brain MRI segmentation, achieving accurate and efficient segmentation of ultra-high-resolution brain images remains challenging due to the lack of labeled training data for fine-scale anatomical structures and high computational demands. In this work, we propose a novel framework that leverages easily accessible, low-resolution coarse labels as spatial references and guidance, without incurring additional annotation cost. Instead of directly predicting discrete segmentation maps, our approach regresses per-class signed distance transform maps, enabling smooth, boundary-aware supervision. Furthermore, to enhance scalability, generalizability, and efficiency, we introduce a scalable class-conditional segmentation strategy, where the model learns to segment one class at a time conditioned on a class-specific input. This novel design not only reduces memory consumption during both training and testing, but also allows the model to generalize to unseen anatomical classes. We validate our method through comprehensive experiments on both synthetic and real-world datasets, demonstrating its superior performance and scalability compared to conventional segmentation approaches.

Multi-instance Learning as Downstream Task of Self-Supervised Learning-based Pre-trained Model

Koki Matsuishi, Tsuyoshi Okita

arxiv logopreprintMay 27 2025
In deep multi-instance learning, the number of applicable instances depends on the data set. In histopathology images, deep learning multi-instance learners usually assume there are hundreds to thousands instances in a bag. However, when the number of instances in a bag increases to 256 in brain hematoma CT, learning becomes extremely difficult. In this paper, we address this drawback. To overcome this problem, we propose using a pre-trained model with self-supervised learning for the multi-instance learner as a downstream task. With this method, even when the original target task suffers from the spurious correlation problem, we show improvements of 5% to 13% in accuracy and 40% to 55% in the F1 measure for the hypodensity marker classification of brain hematoma CT.

An orchestration learning framework for ultrasound imaging: Prompt-Guided Hyper-Perception and Attention-Matching Downstream Synchronization.

Lin Z, Li S, Wang S, Gao Z, Sun Y, Lam CT, Hu X, Yang X, Ni D, Tan T

pubmed logopapersMay 27 2025
Ultrasound imaging is pivotal in clinical diagnostics due to its affordability, portability, safety, real-time capability, and non-invasive nature. It is widely utilized for examining various organs, such as the breast, thyroid, ovary, cardiac, and more. However, the manual interpretation and annotation of ultrasound images are time-consuming and prone to variability among physicians. While single-task artificial intelligence (AI) solutions have been explored, they are not ideal for scaling AI applications in medical imaging. Foundation models, although a trending solution, often struggle with real-world medical datasets due to factors such as noise, variability, and the incapability of flexibly aligning prior knowledge with task adaptation. To address these limitations, we propose an orchestration learning framework named PerceptGuide for general-purpose ultrasound classification and segmentation. Our framework incorporates a novel orchestration mechanism based on prompted hyper-perception, which adapts to the diverse inductive biases required by different ultrasound datasets. Unlike self-supervised pre-trained models, which require extensive fine-tuning, our approach leverages supervised pre-training to directly capture task-relevant features, providing a stronger foundation for multi-task and multi-organ ultrasound imaging. To support this research, we compiled a large-scale Multi-task, Multi-organ public ultrasound dataset (M<sup>2</sup>-US), featuring images from 9 organs and 16 datasets, encompassing both classification and segmentation tasks. Our approach employs four specific prompts-Object, Task, Input, and Position-to guide the model, ensuring task-specific adaptability. Additionally, a downstream synchronization training stage is introduced to fine-tune the model for new data, significantly improving generalization capabilities and enabling real-world applications. Experimental results demonstrate the robustness and versatility of our framework in handling multi-task and multi-organ ultrasound image processing, outperforming both specialist models and existing general AI solutions. Compared to specialist models, our method improves segmentation from 82.26% to 86.45%, classification from 71.30% to 79.08%, while also significantly reducing model parameters.

Evaluating Large Language Models for Enhancing Radiology Specialty Examination: A Comparative Study with Human Performance.

Liu HY, Chen SJ, Wang W, Lee CH, Hsu HH, Shen SH, Chiou HJ, Lee WJ

pubmed logopapersMay 27 2025
The radiology specialty examination assesses clinical decision-making, image interpretation, and diagnostic reasoning. With the expansion of medical knowledge, traditional test design faces challenges in maintaining accuracy and relevance. Large language models (LLMs) demonstrate potential in medical education. This study evaluates LLM performance in radiology specialty exams, explores their role in assessing question difficulty, and investigates their reasoning processes, aiming to develop a more objective and efficient framework for exam design. This study compared the performance of LLMs and human examinees in a radiology specialty examination. Three LLMs (GPT-4o, o1-preview, and GPT-3.5-turbo-1106) were evaluated under zero-shot conditions. Exam accuracy, examinee accuracy, discrimination index, and point-biserial correlation were used to assess LLMs' ability to predict question difficulty and reasoning processes. The data provided by the Taiwan Radiological Society ensures comparability between AI and human performance. As for accuracy, GPT-4o (88.0%) and o1-preview (90.9%) outperformed human examinees (76.3%), whereas GPT-3.5-turbo-1106 showed significantly lower accuracy (50.2%). Question difficulty analysis revealed that newer LLMs excel in solving complex questions, while GPT-3.5-turbo-1106 exhibited greater performance variability. Discrimination index and point-biserial Correlation analyses demonstrated that GPT-4o and o1-preview accurately identified key differentiating questions, closely mirroring human reasoning patterns. These findings suggest that advanced LLMs can assess medical examination difficulty, offering potential applications in exam standardization and question evaluation. This study evaluated the problem-solving capabilities of GPT-3.5-turbo-1106, GPT-4o, and o1-preview in a radiology specialty examination. LLMs should be utilized as tools for assessing exam question difficulty and assisting in the standardized development of medical examinations.

Interpretable Machine Learning Models for Differentiating Glioblastoma From Solitary Brain Metastasis Using Radiomics.

Xia X, Wu W, Tan Q, Gou Q

pubmed logopapersMay 27 2025
To develop and validate interpretable machine learning models for differentiating glioblastoma (GB) from solitary brain metastasis (SBM) using radiomics features from contrast-enhanced T1-weighted MRI (CE-T1WI), and to compare the impact of low-order and high-order features on model performance. A cohort of 434 patients with histopathologically confirmed GB (226 patients) and SBM (208 patients) was retrospectively analyzed. Radiomic features were derived from CE-T1WI, with feature selection conducted through minimum redundancy maximum relevance and least absolute shrinkage and selection operator regression. Machine learning models, including GradientBoost and lightGBM (LGBM), were trained using low-order and high-order features. The performance of the models was assessed through receiver operating characteristic analysis and computation of the area under the curve, along with other indicators, including accuracy, specificity, and sensitivity. SHapley Additive Explanations (SHAP) analysis is used to measure the influence of each feature on the model's predictions. The performances of various machine learning models on both the training and validation datasets were notably different. For the training group, the LGBM, CatBoost, multilayer perceptron (MLP), and GradientBoost models achieved the highest AUC scores, all exceeding 0.9, demonstrating strong discriminative power. The LGBM model exhibited the best stability, with a minimal AUC difference of only 0.005 between the training and test sets, suggesting strong generalizability. Among the validation group results, the GradientBoost classifier achieved the maximum AUC of 0.927, closely followed by random forest at 0.925. GradientBoost also demonstrated high sensitivity (0.911) and negative predictive value (NPV, 0.889), effectively identifying true positives. The LGBM model showed the highest test accuracy (86.2%) and performed excellently in terms of sensitivity (0.911), NPV (0.895), and positive predictive value (PPV, 0.837). The models utilizing high-order features outperformed those based on low-order features in all the metrics. SHAP analysis further enhances model interpretability, providing insights into feature importance and contributions to classification decisions. Machine learning techniques based on radiomics can effectively distinguish GB from SBM, with gradient boosting tree-based models such as LGBMs demonstrating superior performance. High-order features significantly improve model accuracy and robustness. SHAP technology enhances the interpretability and transparency of models for distinguishing brain tumors, providing intuitive visualization of the contribution of radiomic features to classification.

Improving Breast Cancer Diagnosis in Ultrasound Images Using Deep Learning with Feature Fusion and Attention Mechanism.

Asif S, Yan Y, Feng B, Wang M, Zheng Y, Jiang T, Fu R, Yao J, Lv L, Song M, Sui L, Yin Z, Wang VY, Xu D

pubmed logopapersMay 27 2025
Early detection of malignant lesions in ultrasound images is crucial for effective cancer diagnosis and treatment. While traditional methods rely on radiologists, deep learning models can improve accuracy, reduce errors, and enhance efficiency. This study explores the application of a deep learning model for classifying benign and malignant lesions, focusing on its performance and interpretability. In this study, we proposed a feature fusion-based deep learning model for classifying benign and malignant lesions in ultrasound images. The model leverages advanced architectures such as MobileNetV2 and DenseNet121, enhanced with feature fusion and attention mechanisms to boost classification accuracy. The clinical dataset comprises 2171 images collected from 1758 patients between December 2020 and May 2024. Additionally, we utilized the publicly available BUSI dataset, consisting of 780 images from female patients aged 25 to 75, collected in 2018. To enhance interpretability, we applied Grad-CAM, Saliency Maps, and shapley additive explanations (SHAP) techniques to explain the model's decision-making. A comparative analysis with radiologists of varying expertise levels is also conducted. The proposed model exhibited the highest performance, achieving an AUC of 0.9320 on our private dataset and an area under the curve (AUC) of 0.9834 on the public dataset, significantly outperforming traditional deep convolutional neural network models. It also exceeded the diagnostic performance of radiologists, showcasing its potential as a reliable tool for medical image classification. The model's success can be attributed to its incorporation of advanced architectures, feature fusion, and attention mechanisms. The model's decision-making process was further clarified using interpretability techniques like Grad-CAM, Saliency Maps, and SHAP, offering insights into its ability to focus on relevant image features for accurate classification. The proposed deep learning model offers superior accuracy in classifying benign and malignant lesions in ultrasound images, outperforming traditional models and radiologists. Its strong performance, coupled with interpretability techniques, demonstrates its potential as a reliable and efficient tool for medical diagnostics. The datasets generated and analyzed during the current study are not publicly available due to the nature of this research and participants of this study, but may be available from the corresponding author on reasonable request.

Automatic identification of Parkinsonism using clinical multi-contrast brain MRI: a large self-supervised vision foundation model strategy.

Suo X, Chen M, Chen L, Luo C, Kemp GJ, Lui S, Sun H

pubmed logopapersMay 27 2025
Valid non-invasive biomarkers for Parkinson's disease (PD) and Parkinson-plus syndrome (PPS) are urgently needed. Based on our recent self-supervised vision foundation model the Shift Window UNET TRansformer (Swin UNETR), which uses clinical multi-contrast whole brain MRI, we aimed to develop an efficient and practical model ('SwinClassifier') for the discrimination of PD vs PPS using routine clinical MRI scans. We used 75,861 clinical head MRI scans including T1-weighted, T2-weighted and fluid attenuated inversion recovery imaging as a pre-training dataset to develop a foundation model, using self-supervised learning with a cross-contrast context recovery task. Then clinical head MRI scans from n = 1992 participants with PD and n = 1989 participants with PPS were used as a downstream PD vs PPS classification dataset. We then assessed SwinClassifier's performance in confusion matrices compared to a comparative self-supervised vanilla Vision Transformer (ViT) autoencoder ('ViTClassifier'), and to two convolutional neural networks (DenseNet121 and ResNet50) trained from scratch. SwinClassifier showed very good performance (F1 score 0.83, 95% confidence interval [CI] [0.79-0.87], AUC 0.89) in PD vs PPS discrimination in independent test datasets (n = 173 participants with PD and n = 165 participants with PPS). This self-supervised classifier with pretrained weights outperformed the ViTClassifier and convolutional classifiers trained from scratch (F1 score 0.77-0.82, AUC 0.83-0.85). Occlusion sensitivity mapping in the correctly-classified cases (n = 160 PD and n = 114 PPS) highlighted the brain regions guiding discrimination mainly in sensorimotor and midline structures including cerebellum, brain stem, ventricle and basal ganglia. Our self-supervised digital model based on routine clinical head MRI discriminated PD vs PPS with good accuracy and sensitivity. With incremental improvements the approach may be diagnostically useful in early disease. National Key Research and Development Program of China.

Estimation of time-to-total knee replacement surgery with multimodal modeling and artificial intelligence.

Cigdem O, Hedayati E, Rajamohan HR, Cho K, Chang G, Kijowski R, Deniz CM

pubmed logopapersMay 27 2025
The methods for predicting time-to-total knee replacement (TKR) do not provide enough information to make robust and accurate predictions. Develop and evaluate an artificial intelligence-based model for predicting time-to-TKR by analyzing longitudinal knee data and identifying key features associated with accelerated knee osteoarthritis progression. A total of 547 subjects underwent TKR in the Osteoarthritis Initiative over nine years, and their longitudinal data was used for model training and testing. 518 and 164 subjects from Multi-Center Osteoarthritis Study and internal hospital data were used for external testing, respectively. The clinical variables, magnetic resonance (MR) images, radiographs, and quantitative and semi-quantitative assessments from images were analyzed. Deep learning (DL) models were used to extract features from radiographs and MR images. DL features were combined with clinical and image assessment features for survival analysis. A Lasso Cox feature selection method combined with a random survival forest model was used to estimate time-to-TKR. Utilizing only clinical variables for time-to-TKR predictions provided the estimation accuracy of 60.4% and C-index of 62.9%. Combining DL features extracted from radiographs, MR images with clinical, quantitative, and semi-quantitative image assessment features achieved the highest accuracy of 73.2%, (p=.001) and C-index of 77.3% for predicting time-to-TKR. The proposed predictive model demonstrated the potential of DL models and multimodal data fusion in accurately predicting time-to-TKR surgery that may help assist physicians to personalize treatment strategies and improve patient outcomes.

DeepMultiConnectome: Deep Multi-Task Prediction of Structural Connectomes Directly from Diffusion MRI Tractography

Marcus J. Vroemen, Yuqian Chen, Yui Lo, Tengfei Xu, Weidong Cai, Fan Zhang, Josien P. W. Pluim, Lauren J. O'Donnell

arxiv logopreprintMay 27 2025
Diffusion MRI (dMRI) tractography enables in vivo mapping of brain structural connections, but traditional connectome generation is time-consuming and requires gray matter parcellation, posing challenges for large-scale studies. We introduce DeepMultiConnectome, a deep-learning model that predicts structural connectomes directly from tractography, bypassing the need for gray matter parcellation while supporting multiple parcellation schemes. Using a point-cloud-based neural network with multi-task learning, the model classifies streamlines according to their connected regions across two parcellation schemes, sharing a learned representation. We train and validate DeepMultiConnectome on tractography from the Human Connectome Project Young Adult dataset ($n = 1000$), labeled with an 84 and 164 region gray matter parcellation scheme. DeepMultiConnectome predicts multiple structural connectomes from a whole-brain tractogram containing 3 million streamlines in approximately 40 seconds. DeepMultiConnectome is evaluated by comparing predicted connectomes with traditional connectomes generated using the conventional method of labeling streamlines using a gray matter parcellation. The predicted connectomes are highly correlated with traditionally generated connectomes ($r = 0.992$ for an 84-region scheme; $r = 0.986$ for a 164-region scheme) and largely preserve network properties. A test-retest analysis of DeepMultiConnectome demonstrates reproducibility comparable to traditionally generated connectomes. The predicted connectomes perform similarly to traditionally generated connectomes in predicting age and cognitive function. Overall, DeepMultiConnectome provides a scalable, fast model for generating subject-specific connectomes across multiple parcellation schemes.
Page 313 of 3863859 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.