Fog/Edge-Aware State Space Models for Multi-Task Chest X-ray Report Generation and Lesion Detection.
Authors
Abstract
Artificial intelligence (AI) is transforming radiology, particularly in automating medical report generation and abnormality detection. Although recent AI systems have shown clear potential in reducing radiologists' workloads and improving diagnostic accuracy, they still suffer from high computational cost and limited efficiency when modeling long-range dependencies. To address these challenges, we propose two state spacemodel (SSM) based frameworks: MambaXray-CTL for medical report generation, and MambaXray-MTL for unified report generation and abnormality localization. Both frameworks integrate a lightweight Mamba-based vision encoder with a large language model (LLM) decoder and incorporate multi-stage contrastive learning to align visual and textual representations. MambaXray-CTL achieves state-of-the-art performance on the IU X-ray and CheXpertPlus datasets while substantially reducing computational overhead compared with Vision Transformer models. MambaXray-MTL further extends this capability through a multi-task learning design that produces clinically coherent reports and accurately localizes abnormalities. Experimental results demonstrate the effectiveness of combining state space models with contrastive learning to deliver efficient, interpretable, and deployable AI solutions for chest radiograph analysis.