Multi-OCT-SelfNet: integrating self-supervised learning with multi-source data fusion for enhanced multi-class retinal disease classification.
Authors
Affiliations (3)
Affiliations (3)
- Department of Electrical and Computer Engineering, University of North Carolina at Charlotte, Charlotte, NC, United States.
- University of Illinois at Chicago, Chicago, IL, United States.
- Stanford University School of Medicine, Stanford, CA, United States.
Abstract
Acquiring large and diverse medical imaging datasets remains challenging because of privacy, annotation cost, and institutional variability. This limitation can reduce the generalization ability of deep learning models, particularly when they are trained on small or domain-specific retinal datasets. To address this issue, we propose Multi-OCT-SelfNet, a self-supervised framework based on a SwinV2 transformer backbone for multi-class retinal disease classification from optical coherence tomography (OCT) images. The framework combines multi-source OCT datasets during masked autoencoder-based self-supervised pre-training to learn transferable image representations, followed by supervised fine-tuning on individual downstream datasets. We evaluated Multi-OCT-SelfNet across three benchmark OCT datasets (DS1, DS2, and DS3) and compared its performance with two baselines: ResNet-50 and traditional SwinV2 trained without the proposed self-supervised multi-source pre-training strategy. In on-domain evaluation, Multi-OCT-SelfNet-SwinV2 achieved AUC-ROC scores of 0.97 on DS1, 0.97 on DS2, and 0.89 on DS3, demonstrating competitive or improved performance compared with both baselines. The advantage of the proposed framework was more evident in cross-dataset evaluation, especially for smaller datasets. When trained on DS2 and tested on DS3, Multi-OCT-SelfNet-SwinV2 improved AUC-ROC from 0.59 with ResNet-50 and 0.61 with traditional SwinV2 to 0.90. Similarly, when trained on DS3 and tested on DS2, the proposed model achieved an AUC-ROC of 0.94, compared with 0.60 for ResNet-50 and 0.81 for traditional SwinV2. Under limited-data settings using only 50% of the training samples, Multi-OCT-SelfNet-SwinV2 maintained stronger robustness than ResNet-50, achieving AUC-ROC of 0.77 on DS2 compared with 0.68 for ResNet-50, and 0.76 on DS3 compared with 0.49 for ResNet-50. Ablation analyses further showed that multi-source data fusion and self-supervised pre-training substantially improved generalization, particularly for DS2 and DS3. Statistical evaluation using the Wilcoxon signed-rank test also supported the consistency of the proposed model's improvements across paired train-test settings. These findings suggest that Multi-OCT-SelfNet-SwinV2 can learn more transferable OCT representations than conventional supervised baselines, making it a promising approach for robust AI-assisted retinal disease classification under data-limited and domain-shifted clinical conditions.