Benchmarking knowledge distillation for lightweight pneumonia detection: a multi-seed calibration study on PneumoniaMNIST.
Authors
Affiliations (1)
Affiliations (1)
- Independent Researcher, Singapore, Singapore. [email protected].
Abstract
To benchmark a very small convolutional neural network trained with and without knowledge distillation for pneumonia detection on PneumoniaMNIST, with emphasis on discrimination, calibration, efficiency, and external generalization. Using the predefined MedMNIST v2 splits and 5 random seeds, a ResNet-18 teacher achieved an area under the receiver operating characteristic curve of 0.9735 (95% confidence interval 0.9687-0.9782). A 60,642-parameter TinyCNN retained about 90% of teacher discrimination (0.8818, 0.8634-0.9001) while reducing parameters by 184-fold, multiply-accumulate operations by 10.9-fold, and batch-1 CPU inference time by 3.7-fold. Knowledge distillation did not provide consistent discrimination gains over vanilla training on this benchmark. The teacher was less well calibrated before temperature scaling, which improved its negative log-likelihood, Brier score, and expected calibration error. External zero-shot evaluation on Kermany and a balanced RSNA subset showed domain-shift degradation for all models, with partial recovery after brief fine-tuning but persistent teacher-student gaps.