Performance and Clinical Applicability of AI Models for Jawbone Lesion Classification: A Systematic Review with Meta-analysis and Introduction of a Clinical Interpretation Score.
Authors
Affiliations (4)
Affiliations (4)
- OMFS-IMPATH Research Group, Department of Imaging and Pathology, Catholic University Leuven, Kapucijnenvoer 7, 3000 Leuven, Belgium.
- Department of Pathology, University Hospitals Leuven, Herestraat 49, 3000 Leuven, Belgium.
- Department of Oral and Maxillofacial Surgery, University Hospitals Leuven, Kapucijnenvoer 7, 3000 Leuven, Belgium.
- Department of Dentistry, Karolinska Institutet, Alfred Nobels Allé 8, 141 52 Huddinge, Sweden.
Abstract
To evaluate the diagnostic accuracy and generalizability of artificial-intelligence (AI) models for radiographic classification of jawbone cysts and tumours, and to propose a Clinical Interpretation Score (CIS) that rates the transparency and real-world readiness of published AI tools. Eligible studies reporting sensitivity and specificity of AI classifiers on panoramic radiographs or cone-beam CT were retrieved. Two reviewers applied JBI risk-of-bias criteria and extracted 2 x 2 tables and relevant metrics. Pooled estimates were calculated with random-effects meta-analysis; heterogeneity was quantified with I2. Nineteen studies were included, predominantly reporting convolutional neural networks. Pooled specificity was consistently high (≥0.90) across lesions, whereas sensitivity ranged widely (0.50-1.00). Stafne bone cavities achieved near-perfect metrics; ameloblastoma and odontogenic keratocyst showed moderate sensitivity (0.62-0.85) but retained high specificity. Cone-beam CT improved sensitivity relative to panoramic imaging. Substantial heterogeneity (I2 > 50% in most comparisons) reflected variable prevalence, imaging protocols and validation strategies. AI models demonstrate promising diagnostic performance in classifying several jawbone lesions, though their accuracy is influenced by imaging modality, lesion type, and prevalence. Despite encouraging technical results, many studies lack transparent reporting and external validation, limiting their clinical interpretability. The Clinical Interpretation Score (CIS) provides a structured framework to evaluate the methodological transparency and clinical readiness of AI tools, helping to distinguish between technically sound models and those suitable for integration into diagnostic workflows.