Detection accuracy of an AI platform for dental treatment features on panoramic radiographs - tooth- and patient-level analyses.
Authors
Affiliations (7)
Affiliations (7)
- Kazimierczak Clinic, Dworcowa 13/u6a, Bydgoszcz, 85-009, Poland.
- Faculty of Medicine, Collegium Medicum, Nicolaus Copernicus University in Torun, Jagiellońska 13-15, Bydgoszcz, 85-067, Poland.
- , Bydgoszcz, Poland.
- Department of Interdisciplinary Dentistry, Pomeranian Medical University in Szczecin, Szczecin, 70-111, Poland.
- Faculty of Medicine, Bydgoszcz University of Science and Technology, Kaliskiego 7, Bydgoszcz, 85-796, Poland.
- Kazimierczak Clinic, Dworcowa 13/u6a, Bydgoszcz, 85-009, Poland. [email protected].
- Faculty of Medicine, Collegium Medicum, Nicolaus Copernicus University in Torun, Jagiellońska 13-15, Bydgoszcz, 85-067, Poland. [email protected].
Abstract
Artificial intelligence (AI) has shown promise in dental imaging, yet its reliability for comprehensive diagnostic charting from panoramic radiographs (PAN) remains uncertain. We evaluated the diagnostic performance of a commercial AI platform (Diagnocat™, San Francisco, USA) in detecting common dental treatment features on PAN images. In this retrospective study, 147 patients (4,148 teeth) were analyzed against the consensus of two experienced readers. Tooth-level performance was assessed for missing teeth, fillings, crowns, pontics, endodontic treatments, orthodontic appliances, and implants, using patient-clustered nonparametric bootstrapping used to account for within-patient correlations. At the tooth level, the AI achieved high diagnostic accuracy across features (94.9-99.9%), with nearly perfect results for missing teeth, crowns, pontics, and implants. However, under a stringent patient-level "perfect match" criterion requiring error-free full-mouth reports, the AI succeeded in only 56.5% of cases (95% CI, 48.4-64.2%). Errors were most often related to fillings (57.3%) and endodontic treatments (19.5%). These findings highlight a critical gap between high per-tooth accuracy and clinically meaningful patient-level performance. These findings underscore that while the AI performs strongly at the tooth level, its patient-level performance is insufficient for autonomous diagnostic charting.