LTM-UNet: Linear Transformer-Mamba with Attention-Based U-Net for Context-Aware Breast Ultrasound Image Segmentation.
Authors
Affiliations (2)
Affiliations (2)
- Department of Information Technology, Atal Bihari Vajpayee Indian Institute of Information Technology and Management, Gwalior 474015, MP, India.
- Centre for Biomedical Research, Atal Bihari Vajpayee Indian Institute of Information Technology and Management, Gwalior 474015, MP, India.
Abstract
<b>Background/Objectives</b>: Accurate breast lesion segmentation using deep learning models requires precise understanding of both global contextual relevance and finer lesion structure details, which remains a challenge for existing convolutional and transformer-based approaches. This study aims to address these limitations by proposing a new segmentation model capable of improving context-aware dense segmentation tasks for ultrasound images. <b>Method</b>: We propose LTM-UNet, a novel segmentation method integrating transformer-based encoding with state-space-driven decoding in a U-Net-style framework. The architecture utilizes an efficient vision transformer encoder to extract multi-scale global representations. These features are refined through an attention-guided skip-fusion mechanism incorporating spatial-channel attention preserving finer spatial details and thereby minimizes the semantic gap between encoder and decoder features. Additionally, a direction-aware decoder based on a state-space model is introduced to efficiently capture long-range dependencies and enhance relevant feature reconstruction. <b>Results</b>: Extensive experiments on benchmark ultrasound medical imaging datasets demonstrate the effectiveness of the proposed method. The model achieves dice-score coefficients of 82.41% on the BUSI dataset and 86.62% on Dataset B (UDIAT), outperforming several existing segmentation approaches in both dice-score coefficient and Intersection-over-Union (IoU) metrics. <b>Conclusions</b>: The integration of efficient transformer-based global feature extraction, attention-enhanced feature fusion, and state-space-driven decoding enables LTM-UNet to effectively capture both structural details and contextual information, resulting in superior segmentation performance compared to existing methods.