Are Vision-xLSTM-embedded U-Nets better at segmenting medical images?

August 5, 2025

papers DOI: 10.1016/j.neunet.2025.107925 PMID: 40773779

Authors

Dutta P,Bose S,Roy SK,Mitra S

Affiliations (4)

Machine Intelligence Unit, Indian Statistical Institute, 203, B.T. Road, Kolkata, 700108, West Bengal, India. Electronic address: [email protected].
Department of Computer Science and Engineering, Jadavpur University, 188, Raja Subodh Chandra Mallick Rd, Kolkata, 700032, West Bengal, India.
Department of Computer Science and Engineering, Alipurduar Government Engineering and Management College, Alipurduar, 736206, West Bengal, India.
Machine Intelligence Unit, Indian Statistical Institute, 203, B.T. Road, Kolkata, 700108, West Bengal, India.

Abstract

The development of efficient segmentation strategies for medical images has evolved from its initial dependence on Convolutional Neural Networks (CNNs) to the current investigation of hybrid models that combine CNNs with Vision Transformers (ViTs). There is an increasing focus on developing architectures that are both high-performing and computationally efficient, capable of being deployed on remote systems with limited resources. Although transformers can capture global dependencies in the input space, they face challenges from the corresponding high computational and storage expenses involved. The objective of this research is to propose that Vision Extended Long Short-Term Memory (Vision-xLSTM) forms an appropriate backbone for medical image segmentation, offering excellent performance with reduced computational costs. This study investigates the integration of CNNs with Vision-xLSTM by introducing the novel U-VixLSTM. The Vision-xLSTM blocks capture the temporal and global relationships within the patches extracted from the CNN feature maps. The convolutional feature reconstruction path upsamples the output volume from the Vision-xLSTM blocks to produce the segmentation output. The U-VixLSTM exhibits superior performance compared to the state-of-the-art networks in the publicly available Synapse, ISIC and ACDC datasets. The findings suggest that U-VixLSTM is a promising alternative to ViTs for medical image segmentation, delivering effective performance without substantial computational burden. This makes it feasible for deployment in healthcare environments with limited resources for faster diagnosis. Code provided: https://github.com/duttapallabi2907/U-VixLSTM.

View Source Full Text PDF

Topics

Journal Article

Are Vision-xLSTM-embedded U-Nets better at segmenting medical images?

Authors

Affiliations (4)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?