Feasibility of improving vocal fold pathology image classification with synthetic images generated by DDPM-based GenAI: a pilot study.

Authors

Khazrak I,Zainaee S,M Rezaee M,Ghasemi M,C Green R

Affiliations (3)

  • Department of Computer Science, Bowling Green State University, Bowling Green, OH, 43403, USA. [email protected].
  • Department of Communication Sciences and Disorders, Bowling Green State University, Bowling Green, OH, 43403, USA.
  • Department of Computer Science, Bowling Green State University, Bowling Green, OH, 43403, USA.

Abstract

Voice disorders (VD) are often linked to vocal fold structural pathologies (VFSP). Laryngeal imaging plays a vital role in assessing VFSPs and VD in clinical and research settings, but challenges like scarce and imbalanced datasets can limit the generalizability of findings. Denoising Diffusion Probabilistic Models (DDPMs), a subtype of Generative AI, has gained attention for its ability to generate high-quality and realistic synthetic images to address these challenges. This study explores the feasibility of improving VFSP image classification by generating synthetic images using DDPMs. 404 laryngoscopic images depicting VF without and with VFSP were included. DDPMs were used to generate synthetic images to augment the original dataset. Two convolutional neural network architectures, VGG16 and ResNet50, were applied for model training. The models were initially trained only on the original dataset. Then, they were trained on the augmented datasets. Evaluation metrics were analyzed to assess the performance of the models for both binary classification (with/without VFSPs) and multi-class classification (seven specific VFSPs). Realistic and high-quality synthetic images were generated for dataset augmentation. The model first failed to converge when trained only on the original dataset, but they successfully converged and achieved low loss and high accuracy when trained on the augmented datasets. The best performance was gained for both binary and multi-class classification when the models were trained on an augmented dataset. Generating realistic images of VFSP using DDPMs is feasible and can enhance the classification of VFSPs by an AI model and may support VD screening and diagnosis.

Topics

Journal Article
Get Started

Upload your X-ray image and get interpretation.

Upload now →

Disclaimer: X-ray Interpreter's AI-generated results are for informational purposes only and not a substitute for professional medical advice. Always consult a healthcare professional for medical diagnosis and treatment.