Combating Medical Label Noise through more precise partition-correction and progressive hard-enhanced learning.
Authors
Affiliations (5)
Affiliations (5)
- Imaging & Intelligence Lab, Taiyuan University of Technology, China.
- Imaging & Intelligence Lab, Taiyuan University of Technology, China; School of Software, North University of China, Taiyuan, China. Electronic address: [email protected].
- Imaging & Intelligence Lab, Taiyuan University of Technology, China; School of Software, Taiyuan University of Technology, Taiyuan, China; College of Information, Jinzhong College of Information, Jinzhong, China. Electronic address: [email protected].
- Jincheng Grand Hospital, Jincheng, China.
- Department of Pulmonary and Critical Care Medicine, The First Hospital of Shanxi Medical University, China.
Abstract
Computer-aided diagnosis systems based on deep neural networks heavily rely on datasets with high-quality labels. However, manual annotation for lesion diagnosis relies on image features, often requiring professional experience and complex image analysis process. This inevitably introduces noisy labels, which can misguide the training of classification models. Our goal is to design an effective method to address the challenges posed by label noise in medical images. we propose a novel noise-tolerant medical image classification framework consisting of two phases: fore-training correction and progressive hard-sample enhanced learning. In the first phase, we design a dual-branch sample partition detection scheme that effectively classifies each instance into one of three subsets: clean, hard, or noisy. Simultaneously, we propose a hard-sample label refinement strategy based on class prototypes with confidence-perception weighting and an effective joint correction method for noisy samples, enabling the acquisition of higher-quality training data. In the second phase, we design a progressive hard-sample reinforcement learning method to enhance the model's ability to learn discriminative feature representations. This approach accounts for sample difficulty and mitigates the effects of label noise in medical datasets. Our framework achieves an accuracy of 82.39% on the pneumoconiosis dataset collected by our laboratory. On a five-class skin disease dataset with six different levels of label noise (0, 0.05, 0.1, 0.2, 0.3, and 0.4), the average accuracy over the last ten epochs reaches 88.51%, 86.64%, 85.02%, 83.01%, 81.95%, 77.89%, respectively; For binary polyp classification under noise rates of 0.2, 0.3, and 0.4, the average accuracy over the last ten epochs is 97.90%, 93.77%, 89.33%, respectively. The effectiveness of our proposed framework is demonstrated through its performance on three challenging datasets with both real and synthetic noise. Experimental results further demonstrate the robustness of our method across varying noise rates.