Controllable Mask Diffusion Model for medical annotation synthesis with semantic information extraction.
Authors
Affiliations (2)
Affiliations (2)
- Department of Information and Communication Engineering, Myongji University, Yongin, 17058, South Korea. Electronic address: [email protected].
- Department of Information and Communication Engineering, Myongji University, Yongin, 17058, South Korea. Electronic address: [email protected].
Abstract
Medical segmentation, a prominent task in medical image analysis utilizing artificial intelligence, plays a crucial role in computer-aided diagnosis and depends heavily on the quality of the training data. However, the availability of sufficient data is constrained by strict privacy regulations associated with medical data. To mitigate this issue, research on data augmentation has gained significant attention. Medical segmentation tasks require paired datasets consisting of medical images and annotation images, also known as mask images, which represent lesion areas or radiological information within the medical images. Consequently, it is essential to apply data augmentation to both image types. This study proposes a Controllable Mask Diffusion Model, a novel approach capable of controlling and generating new masks. This model leverages the binary structure of the mask to extract semantic information, namely, the mask's size, location, and count, which is then applied as multi-conditional input to a diffusion model via a regressor. Through the regressor, newly generated masks conform to the input semantic information, thereby enabling input-driven controllable generation. Additionally, a technique that analyzes correlation within semantic information was devised for large-scale data synthesis. The generative capacity of the proposed model was evaluated against real datasets, and the model's ability to control and generate new masks based on previously unseen semantic information was confirmed. Furthermore, the practical applicability of the model was demonstrated by augmenting the data with the generated data, applying it to segmentation tasks, and comparing the performance with and without augmentation. Additionally, experiments were conducted on single-label and multi-label masks, yielding superior results for both types. This demonstrates the potential applicability of this study to various areas within the medical field.