Back to all papers

Capsule-enhanced hierarchical vision transformers for rare disease classification from medical images.

June 20, 2026pubmed logopapers

Authors

Krishna ESP,Mamidisetti G,Vellela SS,Lella KK,Duggineni V,Balakrishna N

Affiliations (6)

  • GITAM School of Computer Science and Engineering, GITAM University- Bengaluru Campus, Bengaluru, India.
  • Department of CSE, St. Peter's Engineering College, Hyderabad, India.
  • Department of CSE - Data Science, Chalapathi Institute of Technology, Guntur, 522016, India.
  • Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, 576104, India. [email protected].
  • Department of Computer Science and Engineering, Lakireddy Bali Reddy College of Engineering, Mylavaram, 521230, India.
  • Department of AI & ML, School of Computing, Mohan Babu University, Tirupati, India.

Abstract

Automated medical image analysis plays a vital role in rare disease detection, yet existing deep learning models often struggle with severe class imbalance, limited labeled data, and subtle morphological variations. To address these challenges, this paper proposes Swin-CapsuleNet, a hybrid architecture that integrates a hierarchical Swin Transformer with capsule-based representations, tailored for rare disease classification. The framework integrates a Swin Transformer backbone for multi-scale contextual feature extraction with a capsule-based classification head that preserves part-whole spatial relationships through dynamic routing. A class-balanced capsule loss is introduced to improve sensitivity toward under-represented disease categories. Extensive experiments conducted on a multi-center rare disease dataset demonstrate that Swin-CapsuleNet consistently outperforms state-of-the-art CNN, transformer, and capsule-based baselines. The proposed model achieves 94.1% accuracy, a 93.2% F1-score, and an AUC of 0.972, while attaining a macro-F1 of 0.899 for rare disease classes. Ablation studies validate the complementary contributions of hierarchical attention, capsule representations, and the proposed loss function. Furthermore, computational analysis shows that Swin-CapsuleNet offers a favorable balance between performance and efficiency, supporting its applicability in real-world clinical decision-support systems.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.