Comparative analysis of transformer, CNN, and YOLO architectures for mandibular condyle segmentation on panoramic radiographs: a deep learning benchmark.

April 16, 2026

papers

DOI: 10.1186/s12903-026-08228-3 PMID: 41992227

Authors

Yilmaz S,Ozturk HP,Ozgedik HS,Avsever IH,Senel B,Tasyurek M,Kurt MH

Affiliations (5)

Department of Oral and Maxillofacial Radiology, Faculty of Dentistry, Niğde Ömer Halisdemir University, Niğde, Türkiye.
Department of Oral and Maxillofacial Radiology, Health Sciences University, Gulhane Dentistry Faculty, Ankara, Türkiye. [email protected].
Department of Oral and Maxillofacial Radiology, Health Sciences University, Gulhane Dentistry Faculty, Ankara, Türkiye.
Department of Computer Engineering, Faculty of Engineering, Kayseri University, Kayseri, Türkiye.
Department of Oral and Maxillofacial Radiology, Faculty of Dentistry, Ankara University, Ankara, Türkiye.

Abstract

This study aimed to perform the first multi-architecture comparison of pixel-level mandibular condyle segmentation on panoramic radiographs using transformer-based (RT-DETR), CNN-based (EfficientNet, Mask R-CNN, ConvNeXt), and YOLO-based (YOLOv9-Seg, YOLOv11-Seg) deep learning models. A dataset of 1,300 panoramic radiographs (2,600 condyles) was retrospectively curated. Ground-truth masks were annotated by a primary radiologist and reviewed by a senior radiologist; inter-observer agreement was quantified on a blinded 10% subset (Dice: 0.92 ± 0.03). Six state-of-the-art architectures were trained and evaluated on a fixed test set. Performance was assessed using Intersection over Union (IoU), Dice Similarity Coefficient (DSC), precision, recall, and F1-score. All models achieved high segmentation accuracy, with DSC values ranging from 0.819 to 0.866. The transformer-based RT-DETR model showed the highest numerical DSC (0.866), IoU (0.764), and F1-score (0.866), indicating a balanced overall segmentation profile. Among the one-stage detectors, YOLOv9-Seg provided competitive results (DSC: 0.862) with high recall (0.902), outperforming CNN-based alternatives. YOLOv11-Seg showed high sensitivity but lower precision compared to other architectures. Deep learning enables accurate and automated condylar segmentation on panoramic radiographs. While RT-DETR showed favorable anatomical fidelity for quantitative morphometry, YOLOv9-Seg presented a viable real-time alternative. This study establishes a benchmark for selecting segmentation architectures tailored to specific clinical needs in TMJ analysis. Not applicable.

View Source Full Text PDF

Topics

Journal Article

Comparative analysis of transformer, CNN, and YOLO architectures for mandibular condyle segmentation on panoramic radiographs: a deep learning benchmark.

Authors

Affiliations (5)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?