Adaptive multi-teacher knowledge distillation framework with foundation models for medical image analysis.
Authors
Affiliations (6)
Affiliations (6)
- Intelligent Medical Computing Laboratory, Faculty of Applied Sciences, Macao Polytechnic University, Rua de LuÃs Gonzaga Gomes, 999078, Macao Special Administrative Region of China.
- Netherlands Cancer Institute, Plesmanlaan 121, Amsterdam, 1066 CX, Netherlands.
- Radboud University, Comeniuslaan 4, Nijmegen, 6525 HP, The Netherlands.
- Hangzhou Dianzi University, No. 1158, 2nd Street, Hangzhou, 310018, China.
- Case Western Reserve University, 10900 Euclid Avenue, Cleveland, OH 44106, USA.
- Intelligent Medical Computing Laboratory, Faculty of Applied Sciences, Macao Polytechnic University, Rua de LuÃs Gonzaga Gomes, 999078, Macao Special Administrative Region of China. Electronic address: [email protected].
Abstract
Foundation models (FMs) in medical imaging are increasingly specialized across vertical domains, yet their substantial computational demands and large parameter scales hinder deployment on resource-limited edge devices. Retraining these models for new tasks requires scarce high-quality data and significant computational resources, while cross-model knowledge transfer often introduces notable information loss. To address model coordination, capability migration, knowledge preservation, and practical edge deployment, we introduce MultiMedDistill, an adaptive multi-teacher distillation framework that integrates multiple heterogeneous FMs into a single lightweight student model. A dual-level gating mechanism enables dynamic teacher coordination, and a return decoder preserves semantic fidelity during feature projection. Across six benchmark datasets spanning ultrasound, endoscopy, fundus imaging, CT, and MRI, MultiMedDistill achieves 94.77% and 97.06% Dice on BUSI and Kvasir-SEG - improvements of 25.76% and 13.04% over baselines - while compressing the student model to 8.8M parameters (18× reduction). Ablation studies show that adaptive gating and reconstruction-based knowledge preservation contribute gains of 3.2% and 1.4%, respectively. These results demonstrate the framework's effectiveness in transferring FM capabilities with minimal computational cost, enabling practical deployment on clinical edge devices.