Back to all papers

DMformer: Difficulty-adapted Masked Transformer for Semi-Supervised Medical Image Segmentation.

November 26, 2025pubmed logopapers

Authors

Peng Z,Wang G,Xu Z,Yang X,Shen W

Abstract

The shared anatomy among different human bodies can serve as a strong prior for effectively leveraging unlabeled data in semi-supervised medical image segmentation. Inspired by the success of masked image modeling, we notice that this prior can be explicitly realized by incorporating an auxiliary unsupervised gross anatomy reconstruction task into a teacher-student semi-supervised segmentation framework. In this auxiliary task, consistency is maintained between the student's predictions on masked images and the teacher's predictions on the original images. Despite its potential, we observe that the reconstruction difficulties of different organs/tissues can vary significantly and therefore reconstructing them requires tailored learning strategies. To address this issue, we introduce a difficulty-adapted mask mechanism based on the teacher-student framework, wherein the reconstruction difficulty is adapted to facilitate training. Specifically, we control the reconstruction difficulty by modulating two important factors: masked region ratio and masked class ratio. Accordingly, we design two corresponding mask strategies. 1) Region-based masking: randomly masks a fraction of each class according to an automatically computed mask ratio. 2) Class-based masking: masks the entire regions of the specific classes according to the class confidence predicted by the teacher model. During training, a conflict-aware gradient computation strategy is introduced to mitigate potential optimization conflicts arising from modulating the two reconstruction factors simultaneously. By building on vision transformers, we develop an Difficulty-adapted Masked Transformer (DMformer) for semi-supervised medical image segmentation. Extensive experiments demonstrate the superiority of DMformer, which outperforms the previous SOTA by 9.53% and 4.63% in terms of DSC on ACDC dataset with 5% labeled images and Synapse dataset with 30% labeled images, respectively. Code is available at: https://github.com/SJTU-DeepVisionLab/DMformer.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.