CAM-interacted Vision GNN for Multi-label Medical Images.
Authors
Abstract
Vision Graph Neural Network (ViG) is designed to recognize different objects through graph-level processing. However, ViG constructs graphs with appearance-level neighbors and neglects the category semantic. The oversight results in the unintentional connection of patches that belong to different objects, thus affecting the distinctiveness of categories in multi-label medical image learning. Since the pixel-level annotations for images are not easily available, category-aware graphs can not be directly built. To solve this problem, we consider localizing category-specific regions using Class Activation Maps (CAMs), an effective way to highlight regions belonging to each category without requiring manual annotations. Specifically, we propose a CAM-interacted Vision GNN (CiV-GNN), in which category-aware graphs are formed to perform intra-category graph processing. CIV-GNN includes a Class-activated Patch Division (CAPD) module, which introduces CAMs as guidance for category-aware graph building. Furthermore, we develop a Multi-graph Interactive Processing (MIP) module to model the relations between category-aware graphs, promoting inter-category interaction learning. Experimental results show that CiV-GNN performs well in surgical tool localization and multi-label medical image classification. Specifically, for m2cai16-localization, CiV-GNN exhibits a 1.43% and 7.02% improvement in mAP50 and mAP50-95, respectively, compared to YOLOv8.