Comparison of the diagnostic accuracy of dentists and ChatGPT in jawbone lesions.
Authors
Affiliations (5)
Affiliations (5)
- Department of Dentomaxillofacial Radiology, Gazi University Faculty of Dentistry, Bishkek St. (8th St.) 1st St. No:8, Emek, Ankara, 06490, Turkey. [email protected].
- Department of Dentomaxillofacial Radiology, Gazi University Faculty of Dentistry, Bishkek St. (8th St.) 1st St. No:8, Emek, Ankara, 06490, Turkey.
- Department of Dentomaxillofacial Surgery, Gazi University Faculty of Dentistry, Ankara, Turkey.
- Department of Statistics, Ankara University Faculty of Science Department of Statistics, Ankara, Turkey.
- Department of Dentomaxillofacial Radiology, Autism and Developmental Disorders Research Center, Gazi University Faculty of Dentistry, Ankara, Turkey.
Abstract
Artificial intelligence (AI) is leading to a significant paradigm shift in medical imaging and diagnostic sciences. In particular, Chat Generative Pre-trained Transformer (ChatGPT) is finding increasing application in diagnostic processes due to its ability to generate clinical outcomes. This study aims to evaluate the diagnostic accuracy of ChatGPT for jawbone lesions and also to compare it with that of Oral and Maxillofacial Radiologists (OMFR), Oral and Maxillofacial Surgeons (OMFS), and general dentists. Thirty cases with jawbone lesions, for which clinical information, panoramic radiographs, and histopathological diagnoses were available, were selected. A questionnaire was prepared, including participants' (OMFR, OMFS, and general dentists) demographic information, the cases' clinical findings and panoramic radiographs, and distributed via electronic communication channels. The same cases were loaded into ChatGPT-4 and asked to generate a preliminary diagnosis. The data were statistically analyzed using the Wilcoxon Signed Rank, Mann-Whitney U, and Kruskal-Wallis tests at a significance level of p < 0.05. Overall, ChatGPT's diagnostic accuracy was limited to 46.67%, while the OMFR (67.71%) and OMFS (58.96%) groups had statistically significantly higher success rates (p < 0.05) than ChatGPT. General dentists (37.85%) had lower or similar diagnostic accuracy compared to ChatGPT in most subgroups (gender, age, workplace, professional experience). ChatGPT demonstrated moderate diagnostic accuracy. While OMFR and OMFS participants had significantly higher accuracy rates than ChatGPT, ChatGPT generally outperformed general dentists. These results indicate that such AI systems cannot replace specialist clinicians but can provide valuable contributions as supportive tools that enhance diagnosis.