Multimodal text guided network for chest CT pneumonia classification.
Authors
Affiliations (2)
Affiliations (2)
- College of Computer Science, Beijing University of Technology, Beijing, 100124, China.
- College of Computer Science, Beijing University of Technology, Beijing, 100124, China. [email protected].
Abstract
Pneumonia is a prevalent and serious respiratory disease, responsible for a significant number of cases globally. With advancements in deep learning, the automatic diagnosis of pneumonia has attracted significant research attention in medical image classification. However, current methods still face several challenges. First, since lesions are often visible in only a few slices, slice-based classification algorithms may overlook critical spatial contextual information in CT sequences, and slice-level annotations are labor-intensive. Moreover, chest CT sequence-based pneumonia classification algorithms that rely solely on sequence-level coarse-grained labels remain limited, especially in integrating multi-modal information. To address these challenges, we propose a Multi-modal Text-Guided Network (MTGNet) for pneumonia classification using chest CT sequences. In this model, we design a sequential graph pooling network to encode the CT sequences by gradually selecting important slice features to obtain a sequence-level representation. Additionally, a CT description encoder is developed to learn representations from textual reports. To simulate the clinical diagnostic process, we employ multi-modal training and single-modal testing. A modal transfer module is proposed to generate simulated textual features from CT sequences. Cross-modal attention is then employed to fuse the sequence-level and simulated textual representations, thereby enhancing feature learning within the CT sequences by incorporating semantic information from textual descriptions. Furthermore, contrastive learning is applied to learn discriminative features by maximizing the similarity of positive sample pairs and minimizing the similarity of negative sample pairs. Extensive experiments on a self-constructed pneumonia CT sequences dataset demonstrate that the proposed model significantly improves classification performance.