Diagnostic performance of artificial intelligence in periapical radiography: a systematic review.
Authors
Affiliations (3)
Affiliations (3)
- Faculty of Dentistry, Department of Pediatric Dentistry, Nigde Omer Halisdemir University, 51240, Nigde, Türkiye. [email protected].
- Faculty of Dentistry, Department of Dentomaxillofacial Radiology, Necmettin Erbakan University, 42050, Meram, Konya, Türkiye.
- Department of Computer Engineering, Necmettin Erbakan University, 42090, Konya, Türkiye.
Abstract
To systematically evaluate the diagnostic accuracy of artificial intelligence (AI) models in periapical radiography for detection, classification, and segmentation tasks compared to human experts, while critically appraising methodological quality and risk of bias. The systematic review was conducted in accordance with PRISMA guidelines and registered in PROSPERO (CRD420251132033). Strict exclusion criteria were applied to studies with insufficient dataset sizes (< 200 images) lacking augmentation, ambiguous reference standards, or reporting only accuracy without complementary metrics. A comprehensive literature search was conducted on June 23, 2025, across five electronic databases: PubMed/MEDLINE, Scopus, ScienceDirect, Web of Science, and IEEE Xplore. The search strategy combined keywords related to "periapical radiography" (e.g., periapical X-ray), "artificial intelligence" (e.g., deep learning, neural networks), and specific diagnostic tasks, with no restrictions on publication date or language. Studies utilizing artificial intelligence models for diagnostic tasks on periapical radiographs and validating performance against a human reference standard (expert consensus) were eligible for inclusion. Out of 544 identified records, 47 studies met the full eligibility criteria and were included in the qualitative synthesis. AI models possess a high diagnostic potential in periapical radiography, performing at a level comparable to human experts in tasks such as pathology detection, anatomical segmentation, and implant classification. However, their clinical applicability is currently limited by a high risk of bias, lack of external validation, reliance on cropped datasets, and the use of human consensus as a surrogate reference standard. Future research must prioritize full-arch evaluations, anatomical region-specific performance reporting, and adherence to AI-specific standardized metrics.