Federated generative prompt learning with vision foundation models: universal efficient multi-center medical image analysis.
Authors
Affiliations (9)
Affiliations (9)
- School of Computer Science, Shanghai Jiao Tong University, Shanghai, China.
- Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, Shanghai, China.
- School of Computer Science, Shanghai Jiao Tong University, Shanghai, China. [email protected].
- Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, Shanghai, China. [email protected].
- School of Computer Science, Faculty of Engineering, The University of Sydney, Sydney, Australia.
- School of Computer Science, Shanghai Jiao Tong University, Shanghai, China. [email protected].
- Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, Shanghai, China. [email protected].
- Big Data Institute, Central South University, Hunan, China. [email protected].
- National Engineering Research Center for Medical Big Data Application Technology, Changsha, China. [email protected].
Abstract
Federated medical AI revolutionizes multi-center collaboration, while communication cost, data scarcity, and heterogeneity still limit its practical deployment. Foundation models (FMs) offer a promising avenue for addressing these challenges, owing to their generalization capabilities and efficient adaptability to medical tasks. Here, we present Federated Generative Prompt Learning (Fed-GPL), a universal and efficient framework for multi-center medical image analysis. It collaboratively trains a prompt generator that produces customized prompts for each patient, capturing patient-specific variations and enabling precise medical diagnosis. Fed-GPL is compatible with various vision FMs and medical tasks, such as Vision Transformer (ViT) for diabetic retinopathy and melanoma classification, and Segment Anything (SAM) for polyp and prostate segmentation. Fed-GPL outperforms traditional models and full fine-tuning methods, with only 8.26% and 6.55% of the total FM parameters being trained across classification and segmentation tasks, while converging within just 15 communication rounds. For low-resource settings, Fed-GPL maintains its performance with 5% of the original training data.