A multi-scale attention-based Swin transformer model for medical images segmentation.

November 6, 2025

papers

DOI: 10.1038/s41598-025-22649-0 PMID: 41198764

Authors

Mirab Golkhatmi B,Houshmand M,Hosseini SA

Affiliations (2)

Department of Computer Engineering, Ma.C., Islamic Azad University, Mashhad, Iran.
Department of Electrical Engineering, Ma.C., Islamic Azad University, Mashhad, Iran. [email protected].

Abstract

Medical image segmentation is crucial in accurately diagnosing diseases and assisting physicians in examining relevant areas. Therefore, there is a pressing need for an artificial intelligence-based model that can facilitate the diagnostic process and reduce errors. Existing networks often have high parameters, high gigaflops, and low accuracy. This research addresses this gap by proposing a transformer-based architecture. We developed a Swin transformer as an encoder to improve segmentation accuracy in medical images by leveraging deep feature extraction. This model can extract key image features more accurately by employing sliding windows and an attention mechanism. The overarching goal of this research is to design an optimized architecture for medical image segmentation that maintains high accuracy while reducing the number of network parameters and minimizing computational costs. In the decoder section, we designed the dynamic feature fusion block (DFFB) to enhance the extracted features, enabling the extraction of multi-scale features. This capability enables the model to analyze the structural information of medical images at various levels, resulting in improved performance in segmenting complex regions. We also employed the dynamic attention enhancement block to further enhance the features extracted from the DFFB output. This block utilizes spatial and channel attention mechanisms to emphasize key areas in the images, thereby enhancing the model's overall accuracy. The proposed model achieved segmentation performance across three medical datasets, obtaining a mean intersection over union (mIoU) of 0.9125 and a Dice score of 0.9542 on GlaS, an mIoU of 0.9174 and a Dice score of 0.9569 on PH2, and an mIoU of 0.9085 and a Dice score of 0.9521 on Kvasir-SEG. The experiments illustrate that the proposed model outperforms previous methods, demonstrating its potential as an effective tool in medical image segmentation.

View Source Full Text PDF

Topics

Image Processing, Computer-AssistedImage Interpretation, Computer-AssistedDiagnostic ImagingJournal Article

A multi-scale attention-based Swin transformer model for medical images segmentation.

Authors

Affiliations (2)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?