Redefining lightweight vision models for healthcare AI.
Authors
Affiliations (2)
Affiliations (2)
- Data Science and Artificial Intelligence Lab, Singapore General Hospital, Singapore, Singapore.
- Department of Endocrinology, Singapore General Hospital, Singapore, Singapore.
Abstract
Vision models for medical imaging often require tens of millions of parameters, raising questions about whether architectural efficiency can be achieved without sacrificing classification performance. We introduce MedLiT-seed (2.1 Million parameters) and MedLiT-nano (0.75 Million parameters), two ultra-lightweight vision transformers designed for efficient and scalable medical image analysis. MedLiT employs a streamlined Mixture-of-Experts (MoE) architecture with SwiGLU feedforward networks, grouped query attention, and depth-wise scaling. Models were pre-trained using masked autoencoding on ImageNet and MedMNIST, followed by fine-tuning on 12 MedMNIST 2D subsets. We evaluated performance across multiple configurations and compared against benchmark models including ResNet, MedViT, and AutoML systems. MedLiT-seed achieved the highest Area Under Curve (AUC) on 4 subsets and second-highest on 2 others, outperforming models with 10-20× more parameters. MedLiT-nano achieved results comparable to, and even exceeding, ResNet-18 and AutoML baselines in several subsets. Transfer learning from ImageNet significantly improved convergence and generalization. Increasing embedding size yielded greater performance gains than increasing expert count. MedLiT demonstrates that MoE-based token routing represents a viable architectural pathway for achieving competitive accuracy relative to its floating-point operations (FLOP) across diverse medical imaging modalities on the order of 2M parameters. These results suggest that selectively routing computation through specialised experts, rather than scaling model size, can serve as an effective design principle for more compact medical vision models. Such architecture can be utilised for low-resource clinical environments and scalable fine-tuning across diverse healthcare tasks, though limitations on multi-label tasks highlight clear directions for future architectural refinement.