Sort by:
Page 34 of 60594 results

Label-Efficient Chest X-ray Diagnosis via Partial CLIP Adaptation

Heet Nitinkumar Dalsania

arxiv logopreprintJul 9 2025
Modern deep learning implementations for medical imaging usually rely on large labeled datasets. These datasets are often difficult to obtain due to privacy concerns, high costs, and even scarcity of cases. In this paper, a label-efficient strategy is proposed for chest X-ray diagnosis that seeks to reflect real-world hospital scenarios. The experiments use the NIH Chest X-ray14 dataset and a pre-trained CLIP ViT-B/32 model. The model is adapted via partial fine-tuning of its visual encoder and then evaluated using zero-shot and few-shot learning with 1-16 labeled examples per disease class. The tests demonstrate that CLIP's pre-trained vision-language features can be effectively adapted to few-shot medical imaging tasks, achieving over 20\% improvement in mean AUC score as compared to the zero-shot baseline. The key aspect of this work is to attempt to simulate internal hospital workflows, where image archives exist but annotations are sparse. This work evaluates a practical and scalable solution for both common and rare disease diagnosis. Additionally this research is intended for academic and experimental purposes only and has not been peer reviewed yet. All code is found at https://github.com/heet007-code/CLIP-disease-xray.

Airway Segmentation Network for Enhanced Tubular Feature Extraction

Qibiao Wu, Yagang Wang, Qian Zhang

arxiv logopreprintJul 9 2025
Manual annotation of airway regions in computed tomography images is a time-consuming and expertise-dependent task. Automatic airway segmentation is therefore a prerequisite for enabling rapid bronchoscopic navigation and the clinical deployment of bronchoscopic robotic systems. Although convolutional neural network methods have gained considerable attention in airway segmentation, the unique tree-like structure of airways poses challenges for conventional and deformable convolutions, which often fail to focus on fine airway structures, leading to missed segments and discontinuities. To address this issue, this study proposes a novel tubular feature extraction network, named TfeNet. TfeNet introduces a novel direction-aware convolution operation that first applies spatial rotation transformations to adjust the sampling positions of linear convolution kernels. The deformed kernels are then represented as line segments or polylines in 3D space. Furthermore, a tubular feature fusion module (TFFM) is designed based on asymmetric convolution and residual connection strategies, enhancing the network's focus on subtle airway structures. Extensive experiments conducted on one public dataset and two datasets used in airway segmentation challenges demonstrate that the proposed TfeNet achieves more accuracy and continuous airway structure predictions compared with existing methods. In particular, TfeNet achieves the highest overall score of 94.95% on the current largest airway segmentation dataset, Airway Tree Modeling(ATM22), and demonstrates advanced performance on the lung fibrosis dataset(AIIB23). The code is available at https://github.com/QibiaoWu/TfeNet.

4KAgent: Agentic Any Image to 4K Super-Resolution

Yushen Zuo, Qi Zheng, Mingyang Wu, Xinrui Jiang, Renjie Li, Jian Wang, Yide Zhang, Gengchen Mai, Lihong V. Wang, James Zou, Xiaoyu Wang, Ming-Hsuan Yang, Zhengzhong Tu

arxiv logopreprintJul 9 2025
We present 4KAgent, a unified agentic super-resolution generalist system designed to universally upscale any image to 4K resolution (and even higher, if applied iteratively). Our system can transform images from extremely low resolutions with severe degradations, for example, highly distorted inputs at 256x256, into crystal-clear, photorealistic 4K outputs. 4KAgent comprises three core components: (1) Profiling, a module that customizes the 4KAgent pipeline based on bespoke use cases; (2) A Perception Agent, which leverages vision-language models alongside image quality assessment experts to analyze the input image and make a tailored restoration plan; and (3) A Restoration Agent, which executes the plan, following a recursive execution-reflection paradigm, guided by a quality-driven mixture-of-expert policy to select the optimal output for each step. Additionally, 4KAgent embeds a specialized face restoration pipeline, significantly enhancing facial details in portrait and selfie photos. We rigorously evaluate our 4KAgent across 11 distinct task categories encompassing a total of 26 diverse benchmarks, setting new state-of-the-art on a broad spectrum of imaging domains. Our evaluations cover natural images, portrait photos, AI-generated content, satellite imagery, fluorescence microscopy, and medical imaging like fundoscopy, ultrasound, and X-ray, demonstrating superior performance in terms of both perceptual (e.g., NIQE, MUSIQ) and fidelity (e.g., PSNR) metrics. By establishing a novel agentic paradigm for low-level vision tasks, we aim to catalyze broader interest and innovation within vision-centric autonomous agents across diverse research communities. We will release all the code, models, and results at: https://4kagent.github.io.

LangMamba: A Language-driven Mamba Framework for Low-dose CT Denoising with Vision-language Models

Zhihao Chen, Tao Chen, Chenhui Wang, Qi Gao, Huidong Xie, Chuang Niu, Ge Wang, Hongming Shan

arxiv logopreprintJul 8 2025
Low-dose computed tomography (LDCT) reduces radiation exposure but often degrades image quality, potentially compromising diagnostic accuracy. Existing deep learning-based denoising methods focus primarily on pixel-level mappings, overlooking the potential benefits of high-level semantic guidance. Recent advances in vision-language models (VLMs) suggest that language can serve as a powerful tool for capturing structured semantic information, offering new opportunities to improve LDCT reconstruction. In this paper, we introduce LangMamba, a Language-driven Mamba framework for LDCT denoising that leverages VLM-derived representations to enhance supervision from normal-dose CT (NDCT). LangMamba follows a two-stage learning strategy. First, we pre-train a Language-guided AutoEncoder (LangAE) that leverages frozen VLMs to map NDCT images into a semantic space enriched with anatomical information. Second, we synergize LangAE with two key components to guide LDCT denoising: Semantic-Enhanced Efficient Denoiser (SEED), which enhances NDCT-relevant local semantic while capturing global features with efficient Mamba mechanism, and Language-engaged Dual-space Alignment (LangDA) Loss, which ensures that denoised images align with NDCT in both perceptual and semantic spaces. Extensive experiments on two public datasets demonstrate that LangMamba outperforms conventional state-of-the-art methods, significantly improving detail preservation and visual fidelity. Remarkably, LangAE exhibits strong generalizability to unseen datasets, thereby reducing training costs. Furthermore, LangDA loss improves explainability by integrating language-guided insights into image reconstruction and offers a plug-and-play fashion. Our findings shed new light on the potential of language as a supervisory signal to advance LDCT denoising. The code is publicly available on https://github.com/hao1635/LangMamba.

Adaptive batch-fusion self-supervised learning for ultrasound image pretraining.

Zhang J, Wu X, Liu S, Fan Y, Chen Y, Lyu G, Liu P, Liu Z, He S

pubmed logopapersJul 8 2025
Medical self-supervised learning eliminates the reliance on labels, making feature extraction simple and efficient. The intricate design of pretext tasks in single-modal self-supervised analysis presents challenges, however, compounded by an excessive dependency on data augmentation, leading to a bottleneck in medical self-supervised learning research. Consequently, this paper reanalyzes the feature learnability introduced by data augmentation strategies in medical image self-supervised learning. We introduce an adaptive self-supervised learning data augmentation method from the perspective of batch fusion. Moreover, we propose a conv embedding block for learning the incremental representation between these batches. We tested 5 fused data tasks proposed by previous researchers and it achieved a linear classification protocol accuracy of 94.25% with only 150 self-supervised feature training in Vision Transformer(ViT), which is the best among the same methods. With a detailed ablation study on previous augmentation strategies, the results indicate that the proposed medical data augmentation strategy in this paper effectively represents ultrasound data features in the self-supervised learning process. The code and weights could be found at here.

Noise-inspired diffusion model for generalizable low-dose CT reconstruction.

Gao Q, Chen Z, Zeng D, Zhang J, Ma J, Shan H

pubmed logopapersJul 8 2025
The generalization of deep learning-based low-dose computed tomography (CT) reconstruction models to doses unseen in the training data is important and remains challenging. Previous efforts heavily rely on paired data to improve the generalization performance and robustness through collecting either diverse CT data for re-training or a few test data for fine-tuning. Recently, diffusion models have shown promising and generalizable performance in low-dose CT (LDCT) reconstruction, however, they may produce unrealistic structures due to the CT image noise deviating from Gaussian distribution and imprecise prior information from the guidance of noisy LDCT images. In this paper, we propose a noise-inspired diffusion model for generalizable LDCT reconstruction, termed NEED, which tailors diffusion models for noise characteristics of each domain. First, we propose a novel shifted Poisson diffusion model to denoise projection data, which aligns the diffusion process with the noise model in pre-log LDCT projections. Second, we devise a doubly guided diffusion model to refine reconstructed images, which leverages LDCT images and initial reconstructions to more accurately locate prior information and enhance reconstruction fidelity. By cascading these two diffusion models for dual-domain reconstruction, our NEED requires only normal-dose data for training and can be effectively extended to various unseen dose levels during testing via a time step matching strategy. Extensive qualitative, quantitative, and segmentation-based evaluations on two datasets demonstrate that our NEED consistently outperforms state-of-the-art methods in reconstruction and generalization performance. Source code is made available at https://github.com/qgao21/NEED.

Mitigating Multi-Sequence 3D Prostate MRI Data Scarcity through Domain Adaptation using Locally-Trained Latent Diffusion Models for Prostate Cancer Detection

Emerson P. Grabke, Babak Taati, Masoom A. Haider

arxiv logopreprintJul 8 2025
Objective: Latent diffusion models (LDMs) could mitigate data scarcity challenges affecting machine learning development for medical image interpretation. The recent CCELLA LDM improved prostate cancer detection performance using synthetic MRI for classifier training but was limited to the axial T2-weighted (AxT2) sequence, did not investigate inter-institutional domain shift, and prioritized radiology over histopathology outcomes. We propose CCELLA++ to address these limitations and improve clinical utility. Methods: CCELLA++ expands CCELLA for simultaneous biparametric prostate MRI (bpMRI) generation, including the AxT2, high b-value diffusion series (HighB) and apparent diffusion coefficient map (ADC). Domain adaptation was investigated by pretraining classifiers on real or LDM-generated synthetic data from an internal institution, followed with fine-tuning on progressively smaller fractions of an out-of-distribution, external dataset. Results: CCELLA++ improved 3D FID for HighB and ADC but not AxT2 (0.013, 0.012, 0.063 respectively) sequences compared to CCELLA (0.060). Classifier pretraining with CCELLA++ bpMRI outperformed real bpMRI in AP and AUC for all domain adaptation scenarios. CCELLA++ pretraining achieved highest classifier performance below 50% (n=665) external dataset volume. Conclusion: Synthetic bpMRI generated by our method can improve downstream classifier generalization and performance beyond real bpMRI or CCELLA-generated AxT2-only images. Future work should seek to quantify medical image sample quality, balance multi-sequence LDM training, and condition the LDM with additional information. Significance: The proposed CCELLA++ LDM can generate synthetic bpMRI that outperforms real data for domain adaptation with a limited target institution dataset. Our code is available at https://github.com/grabkeem/CCELLA-plus-plus

Wrist bone segmentation in X-ray images using CT-based simulations

Youssef ElTantawy, Alexia Karantana, Xin Chen

arxiv logopreprintJul 8 2025
Plain X-ray is one of the most common image modalities for clinical diagnosis (e.g. bone fracture, pneumonia, cancer screening, etc.). X-ray image segmentation is an essential step for many computer-aided diagnostic systems, yet it remains challenging. Deep-learning-based methods have achieved superior performance in medical image segmentation tasks but often require a large amount of high-quality annotated data for model training. Providing such an annotated dataset is not only time-consuming but also requires a high level of expertise. This is particularly challenging in wrist bone segmentation in X-rays, due to the interposition of multiple small carpal bones in the image. To overcome the data annotation issue, this work utilizes a large number of simulated X-ray images generated from Computed Tomography (CT) volumes with their corresponding 10 bone labels to train a deep learning-based model for wrist bone segmentation in real X-ray images. The proposed method was evaluated using both simulated images and real images. The method achieved Dice scores ranging from 0.80 to 0.92 for the simulated dataset generated from different view angles. Qualitative analysis of the segmentation results of the real X-ray images also demonstrated the superior performance of the trained model. The trained model and X-ray simulation code are freely available for research purposes: the link will be provided upon acceptance.

Geometric-Guided Few-Shot Dental Landmark Detection with Human-Centric Foundation Model

Anbang Wang, Marawan Elbatel, Keyuan Liu, Lizhuo Lin, Meng Lan, Yanqi Yang, Xiaomeng Li

arxiv logopreprintJul 7 2025
Accurate detection of anatomic landmarks is essential for assessing alveolar bone and root conditions, thereby optimizing clinical outcomes in orthodontics, periodontics, and implant dentistry. Manual annotation of landmarks on cone-beam computed tomography (CBCT) by dentists is time-consuming, labor-intensive, and subject to inter-observer variability. Deep learning-based automated methods present a promising approach to streamline this process efficiently. However, the scarcity of training data and the high cost of expert annotations hinder the adoption of conventional deep learning techniques. To overcome these challenges, we introduce GeoSapiens, a novel few-shot learning framework designed for robust dental landmark detection using limited annotated CBCT of anterior teeth. Our GeoSapiens framework comprises two key components: (1) a robust baseline adapted from Sapiens, a foundational model that has achieved state-of-the-art performance in human-centric vision tasks, and (2) a novel geometric loss function that improves the model's capacity to capture critical geometric relationships among anatomical structures. Experiments conducted on our collected dataset of anterior teeth landmarks revealed that GeoSapiens surpassed existing landmark detection methods, outperforming the leading approach by an 8.18% higher success detection rate at a strict 0.5 mm threshold-a standard widely recognized in dental diagnostics. Code is available at: https://github.com/xmed-lab/GeoSapiens.

MedGemma Technical Report

Andrew Sellergren, Sahar Kazemzadeh, Tiam Jaroensri, Atilla Kiraly, Madeleine Traverse, Timo Kohlberger, Shawn Xu, Fayaz Jamil, Cían Hughes, Charles Lau, Justin Chen, Fereshteh Mahvar, Liron Yatziv, Tiffany Chen, Bram Sterling, Stefanie Anna Baby, Susanna Maria Baby, Jeremy Lai, Samuel Schmidgall, Lu Yang, Kejia Chen, Per Bjornsson, Shashir Reddy, Ryan Brush, Kenneth Philbrick, Howard Hu, Howard Yang, Richa Tiwari, Sunny Jansen, Preeti Singh, Yun Liu, Shekoofeh Azizi, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Riviere, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Elena Buchatskaya, Jean-Baptiste Alayrac, Dmitry, Lepikhin, Vlad Feinberg, Sebastian Borgeaud, Alek Andreev, Cassidy Hardin, Robert Dadashi, Léonard Hussenot, Armand Joulin, Olivier Bachem, Yossi Matias, Katherine Chou, Avinatan Hassidim, Kavi Goel, Clement Farabet, Joelle Barral, Tris Warkentin, Jonathon Shlens, David Fleet, Victor Cotruta, Omar Sanseviero, Gus Martins, Phoebe Kirk, Anand Rao, Shravya Shetty, David F. Steiner, Can Kirmizibayrak, Rory Pilgrim, Daniel Golden, Lin Yang

arxiv logopreprintJul 7 2025
Artificial intelligence (AI) has significant potential in healthcare applications, but its training and deployment faces challenges due to healthcare's diverse data, complex tasks, and the need to preserve privacy. Foundation models that perform well on medical tasks and require less task-specific tuning data are critical to accelerate the development of healthcare AI applications. We introduce MedGemma, a collection of medical vision-language foundation models based on Gemma 3 4B and 27B. MedGemma demonstrates advanced medical understanding and reasoning on images and text, significantly exceeding the performance of similar-sized generative models and approaching the performance of task-specific models, while maintaining the general capabilities of the Gemma 3 base models. For out-of-distribution tasks, MedGemma achieves 2.6-10% improvement on medical multimodal question answering, 15.5-18.1% improvement on chest X-ray finding classification, and 10.8% improvement on agentic evaluations compared to the base models. Fine-tuning MedGemma further improves performance in subdomains, reducing errors in electronic health record information retrieval by 50% and reaching comparable performance to existing specialized state-of-the-art methods for pneumothorax classification and histopathology patch classification. We additionally introduce MedSigLIP, a medically-tuned vision encoder derived from SigLIP. MedSigLIP powers the visual understanding capabilities of MedGemma and as an encoder achieves comparable or better performance than specialized medical image encoders. Taken together, the MedGemma collection provides a strong foundation of medical image and text capabilities, with potential to significantly accelerate medical research and development of downstream applications. The MedGemma collection, including tutorials and model weights, can be found at https://goo.gle/medgemma.
Page 34 of 60594 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.