MedIENet: medical image enhancement network based on conditional latent diffusion model.
Authors
Affiliations (5)
Affiliations (5)
- School of Electronic and Information Engineering, Wuyi University, Jiangmen, Guangdong, 529020, China.
- School of Electronic and Information Engineering, Wuyi University, Jiangmen, Guangdong, 529020, China. [email protected].
- Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing, 100700, China.
- Institute of Information on Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing, 100700, China.
- School of Electronic and Information Engineering, Wuyi University, Jiangmen, Guangdong, 529020, China. [email protected].
Abstract
Deep learning necessitates a substantial amount of data, yet obtaining sufficient medical images is difficult due to concerns about patient privacy and high collection costs. To address this issue, we propose a conditional latent diffusion model-based medical image enhancement network, referred to as the Medical Image Enhancement Network (MedIENet). To meet the rigorous standards required for image generation in the medical imaging field, a multi-attention module is incorporated in the encoder of the denoising U-Net backbone. Additionally Rotary Position Embedding (RoPE) is integrated into the self-attention module to effectively capture positional information, while cross-attention is utilised to embed integrate class information into the diffusion process. MedIENet is evaluated on three datasets: Chest CT-Scan images, Chest X-Ray Images (Pneumonia), and Tongue dataset. Compared to existing methods, MedIENet demonstrates superior performance in both fidelity and diversity of the generated images. Experimental results indicate that for downstream classification tasks using ResNet50, the Area Under the Receiver Operating Characteristic curve (AUROC) achieved with real data alone is 0.76 for the Chest CT-Scan images dataset, 0.87 for the Chest X-Ray Images (Pneumonia) dataset, and 0.78 for the Tongue Dataset. When using mixed data consisting of real data and generated data, the AUROC improves to 0.82, 0.94, and 0.82, respectively, reflecting increases of approximately 6%, 7%, and 4%. These findings indicate that the images generated by MedIENet can enhance the performance of downstream classification tasks, providing an effective solution to the scarcity of medical image training data.