MAYOCTransformer: Masked-Attention for Yielding Comprehensive Semantic Segmentation of Retinal Optical Coherence Tomography Images using Transformer-based Neural Networks

December 29, 2025

preprint

DOI: 10.1101/2025.07.08.663601

Authors

Ye, R. Z.,Krivit, J.,Reiter, G.,Iezzi, R.

Affiliations (1)

Department of Ophthalmology, Mayo Clinic

Abstract

Purpose: Optical coherence tomography (OCT) is a widely used imaging modality in ophthalmology. Accurate semantic segmentation of these images is critical for both clinical and research applications, yet existing convolutional neural network (CNN)-based methods face challenges in generalizability and robustness. This study introduces MAYOCTransformer, the first transformer-based deep learning model for comprehensive semantic segmentation of OCT images, and evaluates its performance against CNN-based models. Methods: A large dataset of 3,500 OCT images was manually segmented using an iterative deep learning-assisted workflow. The MAYOCTransformer model, based on the Mask2Former architecture, was trained and compared against CNN-based segmentation models, including U-Net, U-Net++, FPN, and DeepLabV3+. Comprehensive segmentation tasks included 10 retinal layer segmentation, choroid stroma and vessel segmentation, and the identification of 9 types of discrete pathological findings including intraretinal fluid (IRF), subretinal fluid (SRF), pigment epithelial detachment (PED), subretinal hyper-reflective material (SHRM), intraretinal hyper-reflective foci, and reticular pseudodrusen. Model performance was evaluated using the Dice similarity coefficient (DSC) on a hold-out test set with five-fold cross-validation. Additional validation was performed using external datasets, open-source segmentation models, and a randomized blinded expert evaluation. Results: MAYOCTransformer outperformed CNN-based models in most segmentation tasks. Choroid segmentation performance was comparable between MAYOCTransformer and CNN models. External validation demonstrated the model's generalizability, achieving higher DSC scores than publicly available segmentation models. A blinded expert evaluation showed that MAYOCTransformer's segmentation was non-inferior to manual annotations. Conclusion: MAYOCTransformer provides improved segmentation performance over CNN-based models. Its ability to generalize to external datasets suggests potential applicability in clinical and research settings.

View Source Full Text PDF

Topics

bioinformatics

MAYOCTransformer: Masked-Attention for Yielding Comprehensive Semantic Segmentation of Retinal Optical Coherence Tomography Images using Transformer-based Neural Networks

Authors

Affiliations (1)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?