Sort by:
Page 23 of 1401395 results

Adapting and Evaluating Multimodal Large Language Models for Adolescent Idiopathic Scoliosis Self-Management: A Divide and Conquer Framework

Zhaolong Wu, Pu Luo, Jason Pui Yin Cheung, Teng Zhang

arxiv logopreprintSep 15 2025
This study presents the first comprehensive evaluation of Multimodal Large Language Models (MLLMs) for Adolescent Idiopathic Scoliosis (AIS) self-management. We constructed a database of approximately 3,000 anteroposterior X-rays with diagnostic texts and evaluated five MLLMs through a `Divide and Conquer' framework consisting of a visual question-answering task, a domain knowledge assessment task, and a patient education counseling assessment task. Our investigation revealed limitations of MLLMs' ability in interpreting complex spinal radiographs and comprehending AIS care knowledge. To address these, we pioneered enhancing MLLMs with spinal keypoint prompting and compiled an AIS knowledge base for retrieval augmented generation (RAG), respectively. Results showed varying effectiveness of visual prompting across different architectures, while RAG substantially improved models' performances on the knowledge assessment task. Our findings indicate current MLLMs are far from capable in realizing personalized assistant in AIS care. The greatest challenge lies in their abilities to obtain accurate detections of spinal deformity locations (best accuracy: 0.55) and directions (best accuracy: 0.13).

Accuracy of AI-Based Algorithms in Pulmonary Embolism Detection on Computed Tomographic Pulmonary Angiography: An Updated Systematic Review and Meta-analysis.

Nabipoorashrafi SA, Seyedi A, Bahri RA, Yadegar A, Shomal-Zadeh M, Mohammadi F, Afshari SA, Firoozeh N, Noroozzadeh N, Khosravi F, Asadian S, Chalian H

pubmed logopapersSep 15 2025
Several artificial intelligence (AI) algorithms have been designed for detection of pulmonary embolism (PE) using computed tomographic pulmonary angiography (CTPA). Due to the rapid development of this field and the lack of an updated meta-analysis, we aimed to systematically review the available literature about the accuracy of AI-based algorithms to diagnose PE via CTPA. We searched EMBASE, PubMed, Web of Science, and Cochrane for studies assessing the accuracy of AI-based algorithms. Studies that reported sensitivity and specificity were included. The R software was used for univariate meta-analysis and drawing summary receiver operating characteristic (sROC) curves based on bivariate analysis. To explore the source of heterogeneity, sub-group analysis was performed (PROSPERO: CRD42024543107). A total of 1722 articles were found, and after removing duplicated records, 1185 were screened. Twenty studies with 26 AI models/population met inclusion criteria, encompassing 11,950 participants. Univariate meta-analysis showed a pooled sensitivity of 91.5% (95% CI 85.5-95.2) and specificity of 84.3 (95% CI 74.9-90.6) for PE detection. Additionally, in the bivariate sROC, the pooled area under the curved (AUC) was 0.923 out of 1, indicating a very high accuracy of AI algorithms in the detection of PE. Also, subgroup meta-analysis showed geographical area as a potential source of heterogeneity where the I<sup>2</sup> for sensitivity and specificity in the Asian article subgroup were 60% and 6.9%, respectively. Findings highlight the promising role of AI in accurately diagnosing PE while also emphasizing the need for further research to address regional variations and improve generalizability.

Enhancing Radiographic Disease Detection with MetaCheX, a Context-Aware Multimodal Model

Nathan He, Cody Chen

arxiv logopreprintSep 15 2025
Existing deep learning models for chest radiology often neglect patient metadata, limiting diagnostic accuracy and fairness. To bridge this gap, we introduce MetaCheX, a novel multimodal framework that integrates chest X-ray images with structured patient metadata to replicate clinical decision-making. Our approach combines a convolutional neural network (CNN) backbone with metadata processed by a multilayer perceptron through a shared classifier. Evaluated on the CheXpert Plus dataset, MetaCheX consistently outperformed radiograph-only baseline models across multiple CNN architectures. By integrating metadata, the overall diagnostic accuracy was significantly improved, measured by an increase in AUROC. The results of this study demonstrate that metadata reduces algorithmic bias and enhances model generalizability across diverse patient populations. MetaCheX advances clinical artificial intelligence toward robust, context-aware radiographic disease detection.

A Fully Open and Generalizable Foundation Model for Ultrasound Clinical Applications

Hongyuan Zhang, Yuheng Wu, Mingyang Zhao, Zhiwei Chen, Rebecca Li, Fei Zhu, Haohan Zhao, Xiaohua Yuan, Meng Yang, Chunli Qiu, Xiang Cong, Haiyan Chen, Lina Luan, Randolph H. L. Wong, Huai Liao, Colin A Graham, Shi Chang, Guowei Tao, Dong Yi, Zhen Lei, Nassir Navab, Sebastien Ourselin, Jiebo Luo, Hongbin Liu, Gaofeng Meng

arxiv logopreprintSep 15 2025
Artificial intelligence (AI) that can effectively learn ultrasound representations by integrating multi-source data holds significant promise for advancing clinical care. However, the scarcity of large labeled datasets in real-world clinical environments and the limited generalizability of task-specific models have hindered the development of generalizable clinical AI models for ultrasound applications. In this study, we present EchoCare, a novel ultrasound foundation model for generalist clinical use, developed via self-supervised learning on our curated, publicly available, large-scale dataset EchoCareData. EchoCareData comprises 4.5 million ultrasound images, sourced from over 23 countries across 5 continents and acquired via a diverse range of distinct imaging devices, thus encompassing global cohorts that are multi-center, multi-device, and multi-ethnic. Unlike prior studies that adopt off-the-shelf vision foundation model architectures, we introduce a hierarchical classifier into EchoCare to enable joint learning of pixel-level and representation-level features, capturing both global anatomical contexts and local ultrasound characteristics. With minimal training, EchoCare outperforms state-of-the-art comparison models across 10 representative ultrasound benchmarks of varying diagnostic difficulties, spanning disease diagnosis, lesion segmentation, organ detection, landmark prediction, quantitative regression, imaging enhancement and report generation. The code and pretrained model are publicly released, rendering EchoCare accessible for fine-tuning and local adaptation, supporting extensibility to additional applications. EchoCare provides a fully open and generalizable foundation model to boost the development of AI technologies for diverse clinical ultrasound applications.

Trade-Off Analysis of Classical Machine Learning and Deep Learning Models for Robust Brain Tumor Detection: Benchmark Study.

Tian Y

pubmed logopapersSep 15 2025
Medical image analysis plays a critical role in brain tumor detection, but training deep learning models often requires large, labeled datasets, which can be time-consuming and costly. This study explores a comparative analysis of machine learning and deep learning models for brain tumor classification, focusing on whether deep learning models are necessary for small medical datasets and whether self-supervised learning can reduce annotation costs. The primary goal is to evaluate trade-offs between traditional machine learning and deep learning, including self-supervised models under small medical image data. The secondary goal is to assess model robustness, transferability, and generalization through evaluation of unseen data within- and cross-domains. Four models were compared: (1) support vector machine (SVM) with histogram of oriented gradients (HOG) features, (2) a convolutional neural network based on ResNet18, (3) a transformer-based model using vision transformer (ViT-B/16), and (4) a self-supervised learning approach using Simple Contrastive Learning of Visual Representations (SimCLR). These models were selected to represent diverse paradigms. SVM+HOG represents traditional feature engineering with low computational cost, ResNet18 serves as a well-established convolutional neural network with strong baseline performance, ViT-B/16 leverages self-attention to capture long-range spatial features, and SimCLR enables learning from unlabeled data, potentially reducing annotation costs. The primary dataset consisted of 2870 brain magnetic resonance images across 4 classes: glioma, meningioma, pituitary, and nontumor. All models were trained under consistent settings, including data augmentation, early stopping, and 3 independent runs using the different random seeds to account for performance variability. Performance metrics included accuracy, precision, recall, F<sub>1</sub>-score, and convergence. To assess robustness and generalization capability, evaluation was performed on unseen test data from both the primary and cross datasets. No retraining or test augmentations were applied to the external data, thereby reflecting realistic deployment conditions. The models demonstrated consistently strong performance in both within-domain and cross-domain evaluations. The results revealed distinct trade-offs; ResNet18 achieved the highest validation accuracy (mean 99.77%, SD 0.00%) and the lowest validation loss, along with a weighted test accuracy of 99% within-domain and 95% cross-domain. SimCLR reached a mean validation accuracy of 97.29% (SD 0.86%) and achieved up to 97% weighted test accuracy within-domain and 91% cross-domain, despite requiring 2-stage training phases involving contrastive pretraining followed by linear evaluation. ViT-B/16 reached a mean validation accuracy of 97.36% (SD 0.11%), with a weighted test accuracy of 98% within-domain and 93% cross-domain. SVM+HOG maintained a competitive validation accuracy of 96.51%, with 97% within-domain test accuracy, though its accuracy dropped to 80% cross-domain. The study reveals meaningful trade-offs between model complexity, annotation requirements, and deployment feasibility-critical factors for selecting models in real-world medical imaging applications.

Fractal-driven self-supervised learning enhances early-stage lung cancer GTV segmentation: a novel transfer learning framework.

Tozuka R, Kadoya N, Yasunaga A, Saito M, Komiyama T, Nemoto H, Ando H, Onishi H, Jingu K

pubmed logopapersSep 15 2025
To develop and evaluate a novel deep learning strategy for automated early-stage lung cancer gross tumor volume (GTV) segmentation, utilizing pre-training with mathematically generated non-natural fractal images. This retrospective study included 104 patients (36-91 years old; 81 males; 23 females) with peripheral early-stage non-small cell lung cancer who underwent radiotherapy at our institution from December 2017 to March 2025. First, we utilized encoders from a Convolutional Neural Network and a Vision Transformer (ViT), pre-trained with four learning strategies: from scratch, ImageNet-1K (1,000 classes of natural images), FractalDB-1K (1,000 classes of fractal images), and FractalDB-10K (10,000 classes of fractal images), with the latter three utilizing publicly available models. Second, the models were fine-tuned using CT images and physician-created contour data. Model accuracy was then evaluated using the volumetric Dice Similarity Coefficient (vDSC), surface Dice Similarity Coefficient (sDSC), and 95th percentile Hausdorff Distance (HD95) between the predicted and ground truth GTV contours, averaged across the fourfold cross-validation. Additionally, the segmentation accuracy was compared between simple and complex groups, categorized by the surface-to-volume ratio, to assess the impact of GTV shape complexity. Pre-trained with FractalDB-10K yielded the best segmentation accuracy across all metrics. For the ViT model, the vDSC, sDSC, and HD95 results were 0.800 ± 0.079, 0.732 ± 0.152, and 2.04 ± 1.59 mm for FractalDB-10K; 0.779 ± 0.093, 0.688 ± 0.156, and 2.72 ± 3.12 mm for FractalDB-1K; 0.764 ± 0.102, 0.660 ± 0.156, and 3.03 ± 3.47 mm for ImageNet-1K, respectively. In conditions FractalDB-1K and ImageNet-1K, there was no significant difference in the simple group, whereas the complex group showed a significantly higher vDSC (0.743 ± 0.095 vs 0.714 ± 0.104, p = 0.006). Pre-training with fractal structures achieved comparable or superior accuracy to ImageNet pre-training for early-stage lung cancer GTV auto-segmentation.

Multi Anatomy X-Ray Foundation Model

Nishank Singla, Krisztian Koos, Farzin Haddadpour, Amin Honarmandi Shandiz, Lovish Chum, Xiaojian Xu, Qing Jin, Erhan Bas

arxiv logopreprintSep 15 2025
X-ray imaging is a ubiquitous in radiology, yet most existing AI foundation models are limited to chest anatomy and fail to generalize across broader clinical tasks. In this work, we introduce XR-0, the multi-anatomy X-ray foundation model using self-supervised learning on a large, private dataset of 1.15 million images spanning diverse anatomical regions and evaluated across 12 datasets and 20 downstream tasks, including classification, retrieval, segmentation, localization, visual grounding, and report generation. XR-0 achieves state-of-the-art performance on most multi-anatomy tasks and remains competitive on chest-specific benchmarks. Our results demonstrate that anatomical diversity and supervision are critical for building robust, general-purpose medical vision models, paving the way for scalable and adaptable AI systems in radiology.

MambaDiff: Mamba-Enhanced Diffusion Model for 3D Medical Image Segmentation.

Liu Y, Feng Y, Cheng J, Zhan H, Zhu Z

pubmed logopapersSep 15 2025
Accurate 3D medical image segmentation is crucial for diagnosis and treatment. Diffusion models demonstrate promising performance in medical image segmentation tasks due to the progressive nature of the generation process and the explicit modeling of data distributions. However, the weak guidance of conditional information and insufficient feature extraction in diffusion models lead to the loss of fine-grained features and structural consistency in the segmentation results, thereby affecting the accuracy of medical image segmentation. To address this challenge, we propose a Mamba-Enhanced Diffusion Model for 3D Medical Image Segmentation. We extract multilevel semantic features from the original images using an encoder and tightly integrate them with the denoising process of the diffusion model through a Semantic Hierarchical Embedding (SHE) mechanism, to capture the intricate relationship between the noisy label and image data. Meanwhile, we design a Global-Slice Perception Mamba (GSPM) layer, which integrates multi-dimensional perception mechanisms to endow the model with comprehensive spatial reasoning and feature extraction capabilities. Experimental results show that our proposed MambaDiff achieves more competitive performance compared to prior arts with substantially fewer parameters on four public medical image segmentation datasets including BraTS 2021, BraTS 2024, LiTS and MSD Hippocampus. The source code of our method is available at https://github.com/yuliu316316/MambaDiff.

Enhancing 3D Medical Image Understanding with Pretraining Aided by 2D Multimodal Large Language Models.

Chen Q, Yao X, Ye H, Hong Y

pubmed logopapersSep 15 2025
Understanding 3D medical image volumes is critical in the medical field, yet existing 3D medical convolution and transformer-based self-supervised learning (SSL) methods often lack deep semantic comprehension. Recent advancements in multimodal large language models (MLLMs) provide a promising approach to enhance image understanding through text descriptions. To leverage these 2D MLLMs for improved 3D medical image understanding, we propose Med3DInsight, a novel pretraining framework that integrates 3D image encoders with 2D MLLMs via a specially designed plane-slice-aware transformer module. Additionally, our model employs a partial optimal transport based alignment, demonstrating greater tolerance to noise introduced by potential noises in LLM-generated content. Med3DInsight introduces a new paradigm for scalable multimodal 3D medical representation learning without requiring human annotations. Extensive experiments demonstrate our state-of-the-art performance on two downstream tasks, i.e., segmentation and classification, across various public datasets with CT and MRI modalities, outperforming current SSL methods. Med3DInsight can be seamlessly integrated into existing 3D medical image understanding networks, potentially enhancing their performance. Our source code, generated datasets, and pre-trained models will be available upon acceptance.

SGRRG: Leveraging radiology scene graphs for improved and abnormality-aware radiology report generation.

Wang J, Zhu L, Bhalerao A, He Y

pubmed logopapersSep 15 2025
Radiology report generation (RRG) methods often lack sufficient medical knowledge to produce clinically accurate reports. A scene graph provides comprehensive information for describing objects within an image. However, automatically generated radiology scene graphs (RSG) may contain noise annotations and highly overlapping regions, posing challenges in utilizing RSG to enhance RRG. To this end, we propose Scene Graph aided RRG (SGRRG), a framework that leverages an automatically generated RSG and copes with noisy supervision problems in the RSG with a transformer-based module, effectively distilling medical knowledge in an end-to-end manner. SGRRG is composed of a dedicated scene graph encoder responsible for translating the radiography into a RSG, and a scene graph-aided decoder that takes advantage of both patch-level and region-level visual information and mitigates the noisy annotation problem in the RSG. The incorporation of both patch-level and region-level features, alongside the integration of the essential RSG construction modules, enhances our framework's flexibility and robustness, enabling it to readily exploit prior advanced RRG techniques. A fine-grained, sentence-level attention method is designed to better distill the RSG information. Additionally, we introduce two proxy tasks to enhance the model's ability to produce clinically accurate reports. Extensive experiments demonstrate that SGRRG outperforms previous state-of-the-art methods in report generation and can better capture abnormal findings. Code is available at https://github.com/Markin-Wang/SGRRG.
Page 23 of 1401395 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.