Hires-Diagnoser: A dual stream medical image diagnosis framework based on multi-level resolution adaptive sensing.
Authors
Affiliations (2)
Affiliations (2)
- Guangdong Pharmaceutical University, 280 Waihuan East Road, University Town, Guangzhou, Guangdong, Guangzhou, 510006, CHINA.
- Guangdong Pharmaceutical University, No. 280, Outer Ring East Road, Guangzhou University, Guangzhou, Guangdong Province, Guangzhou, 510006, CHINA.
Abstract
The enhancement of performance in medical image diagnosis relies on the collaborative representation of features across multiple scales and the ability to accurately capture local lesion characteristics and spatial context. Existing research has shown that conventional convolutional neural networks are constrained by their fixed local receptive field size, which limits their capacity to effectively model global semantic relationships across diverse regions. Although transformers utilizing self-attention mechanisms can capture long-range contextual information, they face challenges in identifying small lesions. To address these issues, this paper presents Hires-Diagnoser, a dual-stream framework for medical image diagnosis that accommodates multiple resolution levels. This framework features a parallel architecture that integrates ConvNeXt and Swin-Transformer branches. The ConvNeXt branch focuses on extracting local texture features through convolutional operations, while the Swin-Transformer branch is responsible for capturing global contextual dependencies via window-based self-attention. Additionally, a cross-modal correlation module (LCA) is introduced to facilitate dynamic interaction and adaptive fusion of features across varying resolutions. Experimental evaluations were conducted on four distinct datasets: RaabinWBC, Brain Tumor MRI, LC25000, and OCT-C8, yielding accuracy rates of 99.45%, 98.01%, 100%, and 97.58%, respectively, thus outperforming existing methods. By leveraging a cross-modal feature interaction mechanism, this framework achieves high performance and meticulous pathological interpretations, providing an effective and highly adaptable solution in the field of medical image diagnosis with significant application potential.
.