Back to all papers

LTM-UNet: Linear Transformer-Mamba with Attention-Based U-Net for Context-Aware Breast Ultrasound Image Segmentation.

June 17, 2026pubmed logopapers

Authors

Kushwah SS,Chouhan SP,Punn NS,Bhattacharya M

Affiliations (2)

  • Department of Information Technology, Atal Bihari Vajpayee Indian Institute of Information Technology and Management, Gwalior 474015, MP, India.
  • Centre for Biomedical Research, Atal Bihari Vajpayee Indian Institute of Information Technology and Management, Gwalior 474015, MP, India.

Abstract

<b>Background/Objectives</b>: Accurate breast lesion segmentation using deep learning models requires precise understanding of both global contextual relevance and finer lesion structure details, which remains a challenge for existing convolutional and transformer-based approaches. This study aims to address these limitations by proposing a new segmentation model capable of improving context-aware dense segmentation tasks for ultrasound images. <b>Method</b>: We propose LTM-UNet, a novel segmentation method integrating transformer-based encoding with state-space-driven decoding in a U-Net-style framework. The architecture utilizes an efficient vision transformer encoder to extract multi-scale global representations. These features are refined through an attention-guided skip-fusion mechanism incorporating spatial-channel attention preserving finer spatial details and thereby minimizes the semantic gap between encoder and decoder features. Additionally, a direction-aware decoder based on a state-space model is introduced to efficiently capture long-range dependencies and enhance relevant feature reconstruction. <b>Results</b>: Extensive experiments on benchmark ultrasound medical imaging datasets demonstrate the effectiveness of the proposed method. The model achieves dice-score coefficients of 82.41% on the BUSI dataset and 86.62% on Dataset B (UDIAT), outperforming several existing segmentation approaches in both dice-score coefficient and Intersection-over-Union (IoU) metrics. <b>Conclusions</b>: The integration of efficient transformer-based global feature extraction, attention-enhanced feature fusion, and state-space-driven decoding enables LTM-UNet to effectively capture both structural details and contextual information, resulting in superior segmentation performance compared to existing methods.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.