Feasibility of improving vocal fold pathology image classification with synthetic images generated by DDPM-based GenAI: a pilot study.

May 17, 2025

papers DOI: 10.1007/s00405-025-09443-4 PMID: 40380991

Authors

Khazrak I,Zainaee S,M Rezaee M,Ghasemi M,C Green R

Affiliations (3)

Department of Computer Science, Bowling Green State University, Bowling Green, OH, 43403, USA. [email protected].
Department of Communication Sciences and Disorders, Bowling Green State University, Bowling Green, OH, 43403, USA.
Department of Computer Science, Bowling Green State University, Bowling Green, OH, 43403, USA.

Abstract

Voice disorders (VD) are often linked to vocal fold structural pathologies (VFSP). Laryngeal imaging plays a vital role in assessing VFSPs and VD in clinical and research settings, but challenges like scarce and imbalanced datasets can limit the generalizability of findings. Denoising Diffusion Probabilistic Models (DDPMs), a subtype of Generative AI, has gained attention for its ability to generate high-quality and realistic synthetic images to address these challenges. This study explores the feasibility of improving VFSP image classification by generating synthetic images using DDPMs. 404 laryngoscopic images depicting VF without and with VFSP were included. DDPMs were used to generate synthetic images to augment the original dataset. Two convolutional neural network architectures, VGG16 and ResNet50, were applied for model training. The models were initially trained only on the original dataset. Then, they were trained on the augmented datasets. Evaluation metrics were analyzed to assess the performance of the models for both binary classification (with/without VFSPs) and multi-class classification (seven specific VFSPs). Realistic and high-quality synthetic images were generated for dataset augmentation. The model first failed to converge when trained only on the original dataset, but they successfully converged and achieved low loss and high accuracy when trained on the augmented datasets. The best performance was gained for both binary and multi-class classification when the models were trained on an augmented dataset. Generating realistic images of VFSP using DDPMs is feasible and can enhance the classification of VFSPs by an AI model and may support VD screening and diagnosis.

View Source Full Text PDF

Topics

Journal Article

Feasibility of improving vocal fold pathology image classification with synthetic images generated by DDPM-based GenAI: a pilot study.

Authors

Affiliations (3)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?