Back to all papers

Performance and Clinical Applicability of AI Models for Jawbone Lesion Classification: A Systematic Review with Meta-analysis and Introduction of a Clinical Interpretation Score.

December 4, 2025pubmed logopapers

Authors

Ver Berne J,That MT,Jacobs R

Affiliations (4)

  • OMFS-IMPATH Research Group, Department of Imaging and Pathology, Catholic University Leuven, Kapucijnenvoer 7, 3000 Leuven, Belgium.
  • Department of Pathology, University Hospitals Leuven, Herestraat 49, 3000 Leuven, Belgium.
  • Department of Oral and Maxillofacial Surgery, University Hospitals Leuven, Kapucijnenvoer 7, 3000 Leuven, Belgium.
  • Department of Dentistry, Karolinska Institutet, Alfred Nobels Allé 8, 141 52 Huddinge, Sweden.

Abstract

To evaluate the diagnostic accuracy and generalizability of artificial-intelligence (AI) models for radiographic classification of jawbone cysts and tumours, and to propose a Clinical Interpretation Score (CIS) that rates the transparency and real-world readiness of published AI tools. Eligible studies reporting sensitivity and specificity of AI classifiers on panoramic radiographs or cone-beam CT were retrieved. Two reviewers applied JBI risk-of-bias criteria and extracted 2 x 2 tables and relevant metrics. Pooled estimates were calculated with random-effects meta-analysis; heterogeneity was quantified with I2. Nineteen studies were included, predominantly reporting convolutional neural networks. Pooled specificity was consistently high (≥0.90) across lesions, whereas sensitivity ranged widely (0.50-1.00). Stafne bone cavities achieved near-perfect metrics; ameloblastoma and odontogenic keratocyst showed moderate sensitivity (0.62-0.85) but retained high specificity. Cone-beam CT improved sensitivity relative to panoramic imaging. Substantial heterogeneity (I2 > 50% in most comparisons) reflected variable prevalence, imaging protocols and validation strategies. AI models demonstrate promising diagnostic performance in classifying several jawbone lesions, though their accuracy is influenced by imaging modality, lesion type, and prevalence. Despite encouraging technical results, many studies lack transparent reporting and external validation, limiting their clinical interpretability. The Clinical Interpretation Score (CIS) provides a structured framework to evaluate the methodological transparency and clinical readiness of AI tools, helping to distinguish between technically sound models and those suitable for integration into diagnostic workflows.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.