Transformer-based models in dentistry: a systematic review.
Authors
Affiliations (4)
Affiliations (4)
- Department of Regenerative and Reconstructive Dental Medicine, Institute of Science Tokyo, 1-5-45 Yushima, Bunkyo-ku, Tokyo, 113-8510, Japan.
- Department of AI Technology Development, M&D Data Medical Center, Institute of Integrated Research, Institute of Science Tokyo, Tokyo, 1010062, Japan.
- Department of Chemistry, University of Florida, 214 Leigh Hall, Gainesville, FL, 32611, USA.
- Department of AI Technology Development, M&D Data Medical Center, Institute of Integrated Research, Institute of Science Tokyo, Tokyo, 1010062, Japan. [email protected].
Abstract
Transformer-based architectures have rapidly gained prominence in medical imaging due to their ability to model long-range dependencies and global contextual information more effectively than convolutional neural networks. In dentistry, their applications have expanded across diagnostic, predictive, and generative tasks, yet no comprehensive synthesis has systematically evaluated their performance and clinical relevance. This systematic review provides an up-to-date assessment of Transformer-based models in dental imaging and diagnostics. A structured search of Medline, Embase, Web of Science, Scopus, and Cochrane databases was performed for studies published from 2020 to August 2025, following the PRISMA-DTA guidelines for diagnostic test accuracy systematic reviews and a PROSPERO-registered protocol (CRD420251142603). Eligible studies applied Transformer-based architectures to dental clinical tasks and reported quantitative performance outcomes. Two independent reviewers conducted article screening, data extraction, and bias assessment, resolving discrepancies through consensus. A total of 112 studies met the inclusion criteria. We found 91 hybrid convolutional neural network (CNN)-Transformer architectures and 23 pure Transformer models for dental image segmentation, classification, anomaly detection, and multimodal fusion tasks. In subgroup analyses, hybrid CNN-Transformer models consistently outperformed pure Transformer architectures. A significant 5-8% performance gap was observed between internal and external validation. These hybrid models demonstrated higher accuracy in tooth and anatomical segmentation, caries and lesion detection, orthopedic and orthodontic analysis, implant localization, and craniomaxillofacial assessment. Generative and predictive applications, such as 3D crown and bone reconstruction, dental age and sex estimation, artifact reduction, and implant position prediction, have achieved promising technical performance in controlled experimental settings, supporting further clinical translational research. Key advantages included enhanced global feature representation, improved robustness in heterogeneous imaging conditions, and the ability to incorporate multimodal inputs. However, major limitations remain: the scarcity of large annotated datasets, heterogeneous evaluation protocols, and limited prospective or real-world validation. Transformer-based models represent a significant methodological advance for dental AI, with superior technical performance in controlled experimental studies of dental imaging and diagnostic tasks. However, their clinical translation remains at an early stage. Broader clinical adoption requires not only standardized datasets, harmonized benchmarks and explainability frameworks, but also rigorous prospective multi-center clinical validation, regulatory and ethical evaluation, and real-world workflow integration assessment.