Back to all papers

Adaptive multi-teacher knowledge distillation framework with foundation models for medical image analysis.

March 6, 2026pubmed logopapers

Authors

Liu D,Gao Y,Zhang N,Wang X,Zhang T,Fan M,Sun Y,Li S,Tan T

Affiliations (6)

  • Intelligent Medical Computing Laboratory, Faculty of Applied Sciences, Macao Polytechnic University, Rua de Luís Gonzaga Gomes, 999078, Macao Special Administrative Region of China.
  • Netherlands Cancer Institute, Plesmanlaan 121, Amsterdam, 1066 CX, Netherlands.
  • Radboud University, Comeniuslaan 4, Nijmegen, 6525 HP, The Netherlands.
  • Hangzhou Dianzi University, No. 1158, 2nd Street, Hangzhou, 310018, China.
  • Case Western Reserve University, 10900 Euclid Avenue, Cleveland, OH 44106, USA.
  • Intelligent Medical Computing Laboratory, Faculty of Applied Sciences, Macao Polytechnic University, Rua de Luís Gonzaga Gomes, 999078, Macao Special Administrative Region of China. Electronic address: [email protected].

Abstract

Foundation models (FMs) in medical imaging are increasingly specialized across vertical domains, yet their substantial computational demands and large parameter scales hinder deployment on resource-limited edge devices. Retraining these models for new tasks requires scarce high-quality data and significant computational resources, while cross-model knowledge transfer often introduces notable information loss. To address model coordination, capability migration, knowledge preservation, and practical edge deployment, we introduce MultiMedDistill, an adaptive multi-teacher distillation framework that integrates multiple heterogeneous FMs into a single lightweight student model. A dual-level gating mechanism enables dynamic teacher coordination, and a return decoder preserves semantic fidelity during feature projection. Across six benchmark datasets spanning ultrasound, endoscopy, fundus imaging, CT, and MRI, MultiMedDistill achieves 94.77% and 97.06% Dice on BUSI and Kvasir-SEG - improvements of 25.76% and 13.04% over baselines - while compressing the student model to 8.8M parameters (18× reduction). Ablation studies show that adaptive gating and reconstruction-based knowledge preservation contribute gains of 3.2% and 1.4%, respectively. These results demonstrate the framework's effectiveness in transferring FM capabilities with minimal computational cost, enabling practical deployment on clinical edge devices.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.