A benchmark multimodal oro-dental dataset for large vision-language models

November 7, 2025

Authors

Haoxin Lv,Ijazul Haq,Jin Du,Jiaxin Ma,Binnian Zhu,Xiaobing Dang,Chaoan Liang,Ruxu Du,Yingjie Zhang,Muhammad Saqib

Abstract

The advancement of artificial intelligence in oral healthcare relies on the availability of large-scale multimodal datasets that capture the complexity of clinical practice. In this paper, we present a comprehensive multimodal dataset, comprising 8775 dental checkups from 4800 patients collected over eight years (2018-2025), with patients ranging from 10 to 90 years of age. The dataset includes 50000 intraoral images, 8056 radiographs, and detailed textual records, including diagnoses, treatment plans, and follow-up notes. The data were collected under standard ethical guidelines and annotated for benchmarking. To demonstrate its utility, we fine-tuned state-of-the-art large vision-language models, Qwen-VL 3B and 7B, and evaluated them on two tasks: classification of six oro-dental anomalies and generation of complete diagnostic reports from multimodal inputs. We compared the fine-tuned models with their base counterparts and GPT-4o. The fine-tuned models achieved substantial gains over these baselines, validating the dataset and underscoring its effectiveness in advancing AI-driven oro-dental healthcare solutions. The dataset is publicly available, providing an essential resource for future research in AI dentistry.

View Source Full Text PDF

Topics

cs.CV

A benchmark multimodal oro-dental dataset for large vision-language models

Authors

Abstract

Tags

Topics

Ready to Sharpen Your Edge?