Evaluation of the performance of four different large language models (ChatGPT, DeepSeek, Copilot, and Gemini) in answering oral, and maxillofacial radiology questions: pilot study.

February 3, 2026

papers

DOI: 10.1186/s12903-026-07790-0 PMID: 41629933

Authors

Haylaz E,Gumussoy I,Kalabalik F,Say Ş,Can Eren M,Geduk G

Affiliations (3)

Department of Oral and Maxillofacial Radiology, Faculty of Dentistry, Sakarya University, Sakarya, Turkey. [email protected].
Department of Oral and Maxillofacial Radiology, Faculty of Dentistry, Sakarya University, Sakarya, Turkey.
Department of Oral and Maxillofacial Radiology, Faculty of Dentistry, Zonguldak Bülent Ecevit University, Zonguldak, Turkey.

Abstract

The aim of this study is to evaluate and compare the accuracy of large language models (LLMs) such as ChatGPT-4, DeepSeek-R1, Google Gemini and Microsoft Copilot in answering multiple-choice questions related to oral and maxillofacial radiology (OMFR). It is also aimed to analyze the accuracy rates of answers across different topics and question types. In this study, 240 multiple-choice questions posed to dental students were used as a sample. While 200 of the questions were text-based and consisted of 10 different topics (20 questions each), 40 were image-based questions supported by JPEG format images. Topics included radiation physics, projection geometry, radiobiology, radiographic anatomy, imaging modalities, cysts and tumors, systemic diseases, paranasal sinus diseases, temporomandibular joint (TMJ) diseases and salivary gland diseases. The accuracy rate of the chatbots was determined by evaluating the answers of each large language model to the questions. For text-based questions, the accuracy rates were as follows: ChatGPT 90.5%, DeepSeek 84.5%, Copilot 82.5%, and Gemini 81.0%. For image-based questions, ChatGPT achieved 90.0%, while DeepSeek, Copilot, and Gemini scored 0.0%, 32.5%, and 65.0%, respectively. ChatGPT demonstrated the highest overall accuracy, while DeepSeek had the lowest. Among all topics, the lowest success rate was observed in salivary gland diseases. These results emphasize the supportive and complementary role of LLMs in OMFR education. However, it should be noted that these models may not be sufficient on their own in cases requiring specialized expertise. LLMs can provide significant benefits in dentistry, including support for diagnosis and treatment planning, improvement of clinical decision-making processes, patient communication and education, contribution to academic research, increased efficiency in routine tasks, and applications in tele-dentistry.

View Source Full Text PDF

Topics

Journal Article

Evaluation of the performance of four different large language models (ChatGPT, DeepSeek, Copilot, and Gemini) in answering oral, and maxillofacial radiology questions: pilot study.

Authors

Affiliations (3)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?