Back to all papers

Leveraging multimodal large language model chatbots in oral radiology: a comprehensive evaluation using questions from a korean dental university.

December 12, 2025pubmed logopapers

Authors

Jeong H,Jeon KJ,Lee C,Choi YJ,Jo GD,Han SS

Affiliations (3)

  • Department of Oral and Maxillofacial Radiology, Yonsei University College of Dentistry, Seoul, Republic of Korea.
  • Institute for Innovative in Digital Healthcare, Yonsei University, Seoul, Republic of Korea.
  • Oral Science Research Center, Yonsei University College of Dentistry, Seoul, Republic of Korea.

Abstract

This study aimed to conduct a comprehensive evaluation of general-purpose multimodal large language model (LLM) chatbots in oral radiology. Ninety text- and image-based oral radiology questions from a Korean dental university were extracted and categorized into six educational contents and two question types. ChatGPT-4o and Gemini 2.0 Flash were evaluated with following items: accuracy with group differences across six contents (using Fisher's exact test with Bonferroni correction, p < 0.0167), answer consistency across ten repeated outputs (evaluated as the mean agreement and Fleiss' kappa coefficient), and hallucination (evaluated as the mean of the 5-point Global Quality Score assigned by two oral radiologists). Multimodal AI chatbots (ChatGPT-4o and Gemini 2.0 Flash) achieved excellent performance on text-based questions with over 80% accuracy but showed limited performance on image-based tasks, with accuracy under 30%. Additionally, image-based tasks exhibited high response variability, and hallucinations were frequently observed, providing incorrect information. These findings suggest that AI chatbots are not yet suitable for reliable use in oral radiology. This study provided timely insights into the capabilities and limitations of general-purpose multimodal LLM chatbots in the oral radiology, and will serve as a foundation for more safe and effective applications of AI chatbots in the oral radiology field in the future. This is the first study to comprehensively assess multimodal LLM chatbots in oral radiology. It provides key insights into the performance benchmarks for AI chatbots in oral radiology, promoting the responsible and transparent integration of AI into dental education.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 7,100+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.