The performance of large language models in dentomaxillofacial radiology: a systematic review.

Authors

Liu Z,Nalley A,Hao J,H Ai QY,Kan Yeung AW,Tanaka R,Hung KF

Affiliations (2)

  • Oral and Maxillofacial Radiology, Applied Oral Sciences and Community Dental Care, Faculty of Dentistry, The University of Hong Kong, Hong Kong SAR, China.
  • Department of Diagnostic Radiology, The University of Hong Kong, Hong Kong SAR, China.

Abstract

This study aimed to systematically review the current performance of large language models (LLMs) in dento-maxillofacial radiology (DMFR). Five electronic databases were used to identify studies that developed, fine-tuned, or evaluated LLMs for DMFR-related tasks. Data extracted included study purpose, LLM type, images/text source, applied language, dataset characteristics, input and output, performance outcomes, evaluation methods, and reference standards. Customized assessment criteria adapted from the TRIPOD-LLM reporting guideline were used to evaluate the risk-of-bias in the included studies specifically regarding the clarity of dataset origin, the robustness of performance evaluation methods, and the validity of the reference standards. The initial search yielded 1621 titles, and nineteen studies were included. These studies investigated the use of LLMs for tasks including the production and answering of DMFR-related qualification exams and educational questions (n = 8), diagnosis and treatment recommendations (n = 7), and radiology report generation and patient communication (n = 4). LLMs demonstrated varied performance in diagnosing dental conditions, with accuracy ranging from 37-92.5% and expert ratings for differential diagnosis and treatment planning between 3.6-4.7 on a 5-point scale. For DMFR-related qualification exams and board-style questions, LLMs achieved correctness rates between 33.3-86.1%. Automated radiology report generation showed moderate performance with accuracy ranging from 70.4-81.3%. LLMs demonstrate promising potential in DMFR, particularly for diagnostic, educational, and report generation tasks. However, their current accuracy, completeness, and consistency remain variable. Further development, validation, and standardization are needed before LLMs can be reliably integrated as supportive tools in clinical workflows and educational settings.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.