Back to all papers

Redefining lightweight vision models for healthcare AI.

May 29, 2026pubmed logopapers

Authors

Lee L,Feng Z,Tan JH,Chng CL

Affiliations (2)

  • Data Science and Artificial Intelligence Lab, Singapore General Hospital, Singapore, Singapore.
  • Department of Endocrinology, Singapore General Hospital, Singapore, Singapore.

Abstract

Vision models for medical imaging often require tens of millions of parameters, raising questions about whether architectural efficiency can be achieved without sacrificing classification performance. We introduce MedLiT-seed (2.1 Million parameters) and MedLiT-nano (0.75 Million parameters), two ultra-lightweight vision transformers designed for efficient and scalable medical image analysis. MedLiT employs a streamlined Mixture-of-Experts (MoE) architecture with SwiGLU feedforward networks, grouped query attention, and depth-wise scaling. Models were pre-trained using masked autoencoding on ImageNet and MedMNIST, followed by fine-tuning on 12 MedMNIST 2D subsets. We evaluated performance across multiple configurations and compared against benchmark models including ResNet, MedViT, and AutoML systems. MedLiT-seed achieved the highest Area Under Curve (AUC) on 4 subsets and second-highest on 2 others, outperforming models with 10-20× more parameters. MedLiT-nano achieved results comparable to, and even exceeding, ResNet-18 and AutoML baselines in several subsets. Transfer learning from ImageNet significantly improved convergence and generalization. Increasing embedding size yielded greater performance gains than increasing expert count. MedLiT demonstrates that MoE-based token routing represents a viable architectural pathway for achieving competitive accuracy relative to its floating-point operations (FLOP) across diverse medical imaging modalities on the order of 2M parameters. These results suggest that selectively routing computation through specialised experts, rather than scaling model size, can serve as an effective design principle for more compact medical vision models. Such architecture can be utilised for low-resource clinical environments and scalable fine-tuning across diverse healthcare tasks, though limitations on multi-label tasks highlight clear directions for future architectural refinement.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.