Assessing the accuracy of multiple-choice questions using different Artificial Intelligence-driven tools - An observational study.

December 5, 2025

papers

DOI: 10.1093/dmfr/twaf085 PMID: 41351186

Authors

Najmuddin M

Affiliations (1)

Assistant Professor, Department of Oral and Maxillofacial Surgery and Diagnostic Sciences, Division of Diagnostic Sciences, College of Dentistry, Jazan University, Jazan, . P O code: 82848, KSA.

Abstract

The objective was to assess the accuracy of different AI tools in providing the right responses for multiple choice questions (MCQs) and the time taken to complete the responses. The study included 80 MCQs, each with four options and a correct answer related to oral radiology used to assess the knowledge and skill across 5 different domains. The accuracy levels of ChatGPT, ChatGPT-4o (4o), Microsoft Co-pilot, DeepSeek, Gemini, and Meta AI were assessed and compared using the Chi-Square test. In addition, One-way ANOVA was used to compare the response time between different AI chatbots. Overall, Microsoft Co-pilot and ChatGPT-4o had the highest accuracy. ChatGPT had the fastest response time. Microsoft Co-pilot and DeepSeek, though not significant, had the highest accuracy for knowledge-based and skill-based queries. For accuracy on 5 domains, Microsoft Co-pilot and ChatGPT-4o had an accuracy of 100% for radiographic safety and DeepSeek was more accurate for radiographic diagnosis. Students took more time to respond than the collective time taken by AI chatbots. Microsoft Co-pilot had an overall higher accuracy, responded more accurately for knowledge-based questions, and was 100% accurate for queries related to radiographic safety. ChatGPT-4o had the second highest accuracy and DeepSeek served better for radiographic diagnosis. This study is the first to systematically compare the accuracy and response time of multiple AI-driven tools in answering domain-specific MCQs in oral radiology. It highlights significant variability in performance across platforms, offering novel insights into the suitability of AI chatbots for educational use in dentistry.

View Source Full Text PDF

Topics

Journal Article

Assessing the accuracy of multiple-choice questions using different Artificial Intelligence-driven tools - An observational study.

Authors

Affiliations (1)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?