Diagnostic accuracy of artificial intelligence models for temporomandibular joint anomalies on MRI: a systematic review and meta-analysis.
Authors
Affiliations (3)
Affiliations (3)
- Department of Medical Imaging Technology, Manipal College of Health Professions, Manipal Academy of Higher Education, Manipal, 576104, Karnataka, India.
- Department of Radio Diagnosis and Imaging, Kasturba Medical College, Manipal Academy of Higher Education, Manipal, 576104, Karnataka, India.
- Department of Medical Imaging Technology, Manipal College of Health Professions, Manipal Academy of Higher Education, Manipal, 576104, Karnataka, India. [email protected].
Abstract
Artificial intelligence (AI) techniques are increasingly applied to magnetic resonance imaging (MRI) for detecting temporomandibular joint (TMJ) anomalies; however, their overall diagnostic accuracy and generalizability remain uncertain. To systematically review and meta-analyse the diagnostic performance of AI models for TMJ anomaly detection on MRI and to identify factors influencing model performance. A comprehensive search of PubMed, Scopus, Embase, and Web of Science was conducted for studies published between January 2015 and September 2025. Two reviewers independently screened and extracted data. Eligible studies developed and tested AI, machine learning, or deep learning models on human TMJ MRI and reported quantitative performance metrics. Risk of bias was assessed using the QUADAS-2 tool. Pooled sensitivity and specificity were estimated using a bivariate random-effects model, while pooled accuracy was derived using logit transformation. Heterogeneity (I<sup>2</sup>) was explored through subgroup analyses by model architecture and validation strategy. Fourteen studies were included in the systematic review, of which six met the criteria for meta-analysis. Across these six studies, 18 models were analyzed for accuracy, 29 for sensitivity, and 24 for specificity. The pooled diagnostic accuracy was 0.487 (95% CI 0.403-0.571), with pooled sensitivity and specificity of 0.399 (95% CI 0.348-0.450) and 0.399 (95% CI 0.343-0.456), respectively, all showing substantial heterogeneity (I<sup>2</sup> > 90%). Subgroup analyses indicated that advanced architectures such as ResNet-18, Inception v3, and EfficientNet-b4 achieved higher and more consistent diagnostic performance. Advanced deep learning architectures such as ResNet-18, Inception v3, and EfficientNet-b4 demonstrated superior diagnostic performance for detecting temporomandibular joint anomalies on MRI. These findings highlight the potential of AI-assisted MRI interpretation to improve diagnostic consistency, efficiency, and early detection of TMJ pathology. However, substantial heterogeneity and limited external validation currently limit clinical translation. Standardized multicenter studies and transparent model validation are essential to ensure reliable integration of AI tools into clinical TMJ imaging workflows.