Artificial Intelligence-Assisted Periapical Radiographic Assessment: Lesion Detection, Endodontic Complication Analysis, and Review of Clinical Treatment Recommendations.
Authors
Affiliations (2)
Affiliations (2)
- Department of Endodontics, Faculty of Dentistry, Erciyes University, Kayseri, Türkiye.
- Department of Endodontics, Faculty of Dentistry, Erciyes University, Kayseri, Türkiye. Electronic address: [email protected].
Abstract
Artificial intelligence (AI) systems are increasingly used in dental radiology to support endodontic diagnosis. However, their diagnostic reliability across different clinical categories remains unclear. This study compared three vision-language AI models (ChatGPT-5 Plus, Gemini 2.5 Pro, and Copilot Pro) with expert endodontists by assessing sensitivity, specificity, overall diagnostic agreement, and Youden's Index across multiple endodontic conditions. This retrospective diagnostic accuracy study evaluated the relationship between periapical radiographs and treatment decisions, procedural complications, and lesion detection. Expert endodontists served as the gold standard of reference. Diagnostic categories included primary treatment selection, non-surgical retreatment, final treatment decisions, perforation, underfilling, overfilling, broken file, calcification, and periapical lesion detection. There was an almost perfect agreement between the endodontists (κ = 0.95). Gemini 2.5 Pro demonstrated the highest diagnostic accuracy, particularly in periapical lesion detection (sensitivity 100%, specificity 88%), while ChatGPT-5 Plus showed similarly strong performance in treatment selection. Copilot Pro exhibited markedly low sensitivity for complications such as perforation and instrument fracture. Kappa values for preoperative and postoperative treatment decisions were high for Gemini and ChatGPT-5 Plus, but low for Copilot Pro. The Friedman test confirmed significant differences among the groups (p < 0.001). AI systems demonstrated promising diagnostic accuracy in treatment selection tasks and lesion detection, but performed less reliably in identifying complex procedural complications. Gemini 2.5 Pro showed the most balanced performance, whereas Copilot Pro displayed the highest variability across diagnostic categories.