Back to all papers

Adaptive distribution-aware transformer for multi-scale visual representation learning on imbalanced and low-resolution data.

June 17, 2026pubmed logopapers

Authors

Ahammed S,Cui X,Lu W,Yap MH

Affiliations (4)

  • Department of Computing and Mathematics, Manchester Metropolitan University, The Dalton Building, Chester Street, Manchester, M1 5GD, UK. Electronic address: [email protected].
  • Department of Computing and Mathematics, Manchester Metropolitan University, The Dalton Building, Chester Street, Manchester, M1 5GD, UK. Electronic address: [email protected].
  • Department of Computing and Mathematics, Manchester Metropolitan University, The Dalton Building, Chester Street, Manchester, M1 5GD, UK. Electronic address: [email protected].
  • Department of Computing and Mathematics, Manchester Metropolitan University, The Dalton Building, Chester Street, Manchester, M1 5GD, UK. Electronic address: [email protected].

Abstract

Deep learning models often struggle with class imbalance and low-resolution medical images, where critical spatial details and minority-class features are underrepresented. We introduce the Adaptive Distribution-aware Vision Transformer (AdaptiveViT), a novel hybrid CNN-Transformer architecture that unifies fine-grained local feature extraction with global contextual modelling. AdaptiveViT incorporates a distribution-aware modulation mechanism that adaptively adjusts feature emphasis according to the severity of class imbalance. In addition, a Distribution-aware Adaptive (DA) Loss incorporates the dataset imbalance ratio into an adaptive focusing scheme, enhancing minority-class sensitivity. Experiments on five skin lesion datasets with varying image resolutions and imbalance ratios (as high as 1:10 for melanoma versus non-melanoma) demonstrate that AdaptiveViT consistently outperforms state-of-the-art CNN, Transformer, and hybrid baselines in F1 and AUC, while maintaining stable convergence across imbalance levels. Validation on gastrointestinal endoscopy datasets further demonstrates AdaptiveViT's domain-agnostic generalisation beyond skin lesion data, which share similar imbalance characteristics. All experiments are conducted using patient-disjoint splits, with a threshold-free evaluation protocol to ensure fair, unbiased, and clinically reliable comparisons. Overall, AdaptiveViT establishes a hybrid framework for medical image classification under class imbalance and image-resolution variability. The code is available at https://github.com/mmu-dermatology-research/AdaptiveViT.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.