Sort by:
Page 72 of 78773 results

MedBLIP: Fine-tuning BLIP for Medical Image Captioning

Manshi Limbu, Diwita Banerjee

arxiv logopreprintMay 20 2025
Medical image captioning is a challenging task that requires generating clinically accurate and semantically meaningful descriptions of radiology images. While recent vision-language models (VLMs) such as BLIP, BLIP2, Gemini and ViT-GPT2 show strong performance on natural image datasets, they often produce generic or imprecise captions when applied to specialized medical domains. In this project, we explore the effectiveness of fine-tuning the BLIP model on the ROCO dataset for improved radiology captioning. We compare the fine-tuned BLIP against its zero-shot version, BLIP-2 base, BLIP-2 Instruct and a ViT-GPT2 transformer baseline. Our results demonstrate that domain-specific fine-tuning on BLIP significantly improves performance across both quantitative and qualitative evaluation metrics. We also visualize decoder cross-attention maps to assess interpretability and conduct an ablation study to evaluate the contributions of encoder-only and decoder-only fine-tuning. Our findings highlight the importance of targeted adaptation for medical applications and suggest that decoder-only fine-tuning (encoder-frozen) offers a strong performance baseline with 5% lower training time than full fine-tuning, while full model fine-tuning still yields the best results overall.

Dynadiff: Single-stage Decoding of Images from Continuously Evolving fMRI

Marlène Careil, Yohann Benchetrit, Jean-Rémi King

arxiv logopreprintMay 20 2025
Brain-to-image decoding has been recently propelled by the progress in generative AI models and the availability of large ultra-high field functional Magnetic Resonance Imaging (fMRI). However, current approaches depend on complicated multi-stage pipelines and preprocessing steps that typically collapse the temporal dimension of brain recordings, thereby limiting time-resolved brain decoders. Here, we introduce Dynadiff (Dynamic Neural Activity Diffusion for Image Reconstruction), a new single-stage diffusion model designed for reconstructing images from dynamically evolving fMRI recordings. Our approach offers three main contributions. First, Dynadiff simplifies training as compared to existing approaches. Second, our model outperforms state-of-the-art models on time-resolved fMRI signals, especially on high-level semantic image reconstruction metrics, while remaining competitive on preprocessed fMRI data that collapse time. Third, this approach allows a precise characterization of the evolution of image representations in brain activity. Overall, this work lays the foundation for time-resolved brain-to-image decoding.

Expert-guided StyleGAN2 image generation elevates AI diagnostic accuracy for maxillary sinus lesions.

Zeng P, Song R, Chen S, Li X, Li H, Chen Y, Gong Z, Cai G, Lin Y, Shi M, Huang K, Chen Z

pubmed logopapersMay 20 2025
The progress of artificial intelligence (AI) research in dental medicine is hindered by data acquisition challenges and imbalanced distributions. These problems are especially apparent when planning to develop AI-based diagnostic or analytic tools for various lesions, such as maxillary sinus lesions (MSL) including mucosal thickening and polypoid lesions. Traditional unsupervised generative models struggle to simultaneously control the image realism, diversity, and lesion-type specificity. This study establishes an expert-guided framework to overcome these limitations to elevate AI-based diagnostic accuracy. A StyleGAN2 framework was developed for generating clinically relevant MSL images (such as mucosal thickening and polypoid lesion) under expert control. The generated images were then integrated into training datasets to evaluate their effect on ResNet50's diagnostic performance. Here we show: 1) Both lesion subtypes achieve satisfactory fidelity metrics, with structural similarity indices (SSIM > 0.996) and maximum mean discrepancy values (MMD < 0.032), and clinical validation scores close to those of real images; 2) Integrating baseline datasets with synthetic images significantly enhances diagnostic accuracy for both internal and external test sets, particularly improving area under the precision-recall curve (AUPRC) by approximately 8% and 14% for mucosal thickening and polypoid lesions in the internal test set, respectively. The StyleGAN2-based image generation tool effectively addressed data scarcity and imbalance through high-quality MSL image synthesis, consequently boosting diagnostic model performance. This work not only facilitates AI-assisted preoperative assessment for maxillary sinus lift procedures but also establishes a methodological framework for overcoming data limitations in medical image analysis.

AI-powered integration of multimodal imaging in precision medicine for neuropsychiatric disorders.

Huang W, Shu N

pubmed logopapersMay 20 2025
Neuropsychiatric disorders have complex pathological mechanism, pronounced clinical heterogeneity, and a prolonged preclinical phase, which presents a challenge for early diagnosis and development of precise intervention strategies. With the development of large-scale multimodal neuroimaging datasets and advancement of artificial intelligence (AI) algorithms, the integration of multimodal imaging with AI techniques has emerged as a pivotal avenue for early detection and tailoring individualized treatment for neuropsychiatric disorders. To support these advances, in this review, we outline multimodal neuroimaging techniques, AI methods, and strategies for multimodal data fusion. We highlight applications of multimodal AI based on neuroimaging data in precision medicine for neuropsychiatric disorders, discussing challenges in clinical adoption, their emerging solutions, and future directions.

Diagnosis of early idiopathic pulmonary fibrosis: current status and future perspective.

Wang X, Xia X, Hou Y, Zhang H, Han W, Sun J, Li F

pubmed logopapersMay 19 2025
The standard approach to diagnosing idiopathic pulmonary fibrosis (IPF) includes identifying the usual interstitial pneumonia (UIP) pattern via high resolution computed tomography (HRCT) or lung biopsy and excluding known causes of interstitial lung disease (ILD). However, limitations of manual interpretation of lung imaging, along with other reasons such as lack of relevant knowledge and non-specific symptoms have hindered the timely diagnosis of IPF. This review proposes the definition of early IPF, emphasizes the diagnostic urgency of early IPF, and highlights current diagnostic strategies and future prospects for early IPF. The integration of artificial intelligence (AI), specifically machine learning (ML) and deep learning (DL), is revolutionizing the diagnostic procedure of early IPF by standardizing and accelerating the interpretation of thoracic images. Innovative bronchoscopic techniques such as transbronchial lung cryobiopsy (TBLC), genomic classifier, and endobronchial optical coherence tomography (EB-OCT) provide less invasive diagnostic alternatives. In addition, chest auscultation, serum biomarkers, and susceptibility genes are pivotal for the indication of early diagnosis. Ongoing research is essential for refining diagnostic methods and treatment strategies for early IPF.

The effect of medical explanations from large language models on diagnostic decisions in radiology

Spitzer, P., Hendriks, D., Rudolph, J., Schläger, S., Ricke, J., Kühl, N., Hoppe, B., Feuerriegel, S.

medrxiv logopreprintMay 18 2025
Large language models (LLMs) are increasingly used by physicians for diagnostic support. A key advantage of LLMs is the ability to generate explanations that can help physicians understand the reasoning behind a diagnosis. However, the best-suited format for LLM-generated explanations remains unclear. In this large-scale study, we examined the effect of different formats for LLM explanations on clinical decision-making. For this, we conducted a randomized experiment with radiologists reviewing patient cases with radiological images (N = 2020 assessments). Participants received either no LLM support (control group) or were supported by one of three LLM-generated explanations: (1) a standard output providing the diagnosis without explanation; (2) a differential diagnosis comparing multiple possible diagnoses; or (3) a chain-of-thought explanation offering a detailed reasoning process for the diagnosis. We find that the format of explanations significantly influences diagnostic accuracy. The chain-of-thought explanations yielded the best performance, improving the diagnostic accuracy by 12.2% compared to the control condition without LLM support (P = 0.001). The chain-of-thought explanations are also superior to the standard output without explanation (+7.2%; P = 0.040) and the differential diagnosis format (+9.7%; P = 0.004). We further assessed the robustness of these findings across case difficulty and different physician backgrounds such as general vs. specialized radiologists. Evidently, explaining the reasoning for a diagnosis helps physicians to identify and correct potential errors in LLM predictions and thus improve overall decisions. Altogether, the results highlight the importance of how explanations in medical LLMs are generated to maximize their utility in clinical practice. By designing explanations to support the reasoning processes of physicians, LLMs can improve diagnostic performance and, ultimately, patient outcomes.

ChatGPT-4-Driven Liver Ultrasound Radiomics Analysis: Advantages and Drawbacks Compared to Traditional Techniques.

Sultan L, Venkatakrishna SSB, Anupindi S, Andronikou S, Acord M, Otero H, Darge K, Sehgal C, Holmes J

pubmed logopapersMay 18 2025
Artificial intelligence (AI) is transforming medical imaging, with large language models such as ChatGPT-4 emerging as potential tools for automated image interpretation. While AI-driven radiomics has shown promise in diagnostic imaging, the efficacy of ChatGPT-4 in liver ultrasound analysis remains largely unexamined. This study evaluates the capability of ChatGPT-4 in liver ultrasound radiomics, specifically its ability to differentiate fibrosis, steatosis, and normal liver tissue, compared to conventional image analysis software. Seventy grayscale ultrasound images from a preclinical liver disease model, including fibrosis (n=31), fatty liver (n=18), and normal liver (n=21), were analyzed. ChatGPT-4 extracted texture features, which were compared to those obtained using Interactive Data Language (IDL), a traditional image analysis software. One-way ANOVA was used to identify statistically significant features differentiating liver conditions, and logistic regression models were employed to assess diagnostic performance. ChatGPT-4 extracted nine key textural features-echo intensity, heterogeneity, skewness, kurtosis, contrast, homogeneity, dissimilarity, angular second moment, and entropy-all of which significantly differed across liver conditions (p < 0.05). Among individual features, echo intensity achieved the highest F1-score (0.85). When combined, ChatGPT-4 attained 76% accuracy and 83% sensitivity in classifying liver disease. ROC analysis demonstrated strong discriminatory performance, with AUC values of 0.75 for fibrosis, 0.87 for normal liver, and 0.97 for steatosis. Compared to Interactive Data Language (IDL) image analysis software, ChatGPT-4 exhibited slightly lower sensitivity (0.83 vs. 0.89) but showed moderate correlation (R = 0.68, p < 0.0001) with IDL-derived features. However, it significantly outperformed IDL in processing efficiency, reducing analysis time by 40%, highlighting its potential for high throughput radiomic analysis. Despite slightly lower sensitivity than IDL, ChatGPT-4 demonstrated high feasibility for ultrasound radiomics, offering faster processing, high-throughput analysis, and automated multi-image evaluation. These findings support its potential integration into AI-driven imaging workflows, with further refinements needed to enhance feature reproducibility and diagnostic accuracy.

A Comprehensive Review of Techniques, Algorithms, Advancements, Challenges, and Clinical Applications of Multi-modal Medical Image Fusion for Improved Diagnosis

Muhammad Zubair, Muzammil Hussai, Mousa Ahmad Al-Bashrawi, Malika Bendechache, Muhammad Owais

arxiv logopreprintMay 18 2025
Multi-modal medical image fusion (MMIF) is increasingly recognized as an essential technique for enhancing diagnostic precision and facilitating effective clinical decision-making within computer-aided diagnosis systems. MMIF combines data from X-ray, MRI, CT, PET, SPECT, and ultrasound to create detailed, clinically useful images of patient anatomy and pathology. These integrated representations significantly advance diagnostic accuracy, lesion detection, and segmentation. This comprehensive review meticulously surveys the evolution, methodologies, algorithms, current advancements, and clinical applications of MMIF. We present a critical comparative analysis of traditional fusion approaches, including pixel-, feature-, and decision-level methods, and delves into recent advancements driven by deep learning, generative models, and transformer-based architectures. A critical comparative analysis is presented between these conventional methods and contemporary techniques, highlighting differences in robustness, computational efficiency, and interpretability. The article addresses extensive clinical applications across oncology, neurology, and cardiology, demonstrating MMIF's vital role in precision medicine through improved patient-specific therapeutic outcomes. Moreover, the review thoroughly investigates the persistent challenges affecting MMIF's broad adoption, including issues related to data privacy, heterogeneity, computational complexity, interpretability of AI-driven algorithms, and integration within clinical workflows. It also identifies significant future research avenues, such as the integration of explainable AI, adoption of privacy-preserving federated learning frameworks, development of real-time fusion systems, and standardization efforts for regulatory compliance.

SMFusion: Semantic-Preserving Fusion of Multimodal Medical Images for Enhanced Clinical Diagnosis

Haozhe Xiang, Han Zhang, Yu Cheng, Xiongwen Quan, Wanwan Huang

arxiv logopreprintMay 18 2025
Multimodal medical image fusion plays a crucial role in medical diagnosis by integrating complementary information from different modalities to enhance image readability and clinical applicability. However, existing methods mainly follow computer vision standards for feature extraction and fusion strategy formulation, overlooking the rich semantic information inherent in medical images. To address this limitation, we propose a novel semantic-guided medical image fusion approach that, for the first time, incorporates medical prior knowledge into the fusion process. Specifically, we construct a publicly available multimodal medical image-text dataset, upon which text descriptions generated by BiomedGPT are encoded and semantically aligned with image features in a high-dimensional space via a semantic interaction alignment module. During this process, a cross attention based linear transformation automatically maps the relationship between textual and visual features to facilitate comprehensive learning. The aligned features are then embedded into a text-injection module for further feature-level fusion. Unlike traditional methods, we further generate diagnostic reports from the fused images to assess the preservation of medical information. Additionally, we design a medical semantic loss function to enhance the retention of textual cues from the source images. Experimental results on test datasets demonstrate that the proposed method achieves superior performance in both qualitative and quantitative evaluations while preserving more critical medical information.
Page 72 of 78773 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.