Comparative analysis of transformer, CNN, and YOLO architectures for mandibular condyle segmentation on panoramic radiographs: a deep learning benchmark.
Authors
Affiliations (5)
Affiliations (5)
- Department of Oral and Maxillofacial Radiology, Faculty of Dentistry, Niğde Ömer Halisdemir University, Niğde, Türkiye.
- Department of Oral and Maxillofacial Radiology, Health Sciences University, Gulhane Dentistry Faculty, Ankara, Türkiye. [email protected].
- Department of Oral and Maxillofacial Radiology, Health Sciences University, Gulhane Dentistry Faculty, Ankara, Türkiye.
- Department of Computer Engineering, Faculty of Engineering, Kayseri University, Kayseri, Türkiye.
- Department of Oral and Maxillofacial Radiology, Faculty of Dentistry, Ankara University, Ankara, Türkiye.
Abstract
This study aimed to perform the first multi-architecture comparison of pixel-level mandibular condyle segmentation on panoramic radiographs using transformer-based (RT-DETR), CNN-based (EfficientNet, Mask R-CNN, ConvNeXt), and YOLO-based (YOLOv9-Seg, YOLOv11-Seg) deep learning models. A dataset of 1,300 panoramic radiographs (2,600 condyles) was retrospectively curated. Ground-truth masks were annotated by a primary radiologist and reviewed by a senior radiologist; inter-observer agreement was quantified on a blinded 10% subset (Dice: 0.92 ± 0.03). Six state-of-the-art architectures were trained and evaluated on a fixed test set. Performance was assessed using Intersection over Union (IoU), Dice Similarity Coefficient (DSC), precision, recall, and F1-score. All models achieved high segmentation accuracy, with DSC values ranging from 0.819 to 0.866. The transformer-based RT-DETR model showed the highest numerical DSC (0.866), IoU (0.764), and F1-score (0.866), indicating a balanced overall segmentation profile. Among the one-stage detectors, YOLOv9-Seg provided competitive results (DSC: 0.862) with high recall (0.902), outperforming CNN-based alternatives. YOLOv11-Seg showed high sensitivity but lower precision compared to other architectures. Deep learning enables accurate and automated condylar segmentation on panoramic radiographs. While RT-DETR showed favorable anatomical fidelity for quantitative morphometry, YOLOv9-Seg presented a viable real-time alternative. This study establishes a benchmark for selecting segmentation architectures tailored to specific clinical needs in TMJ analysis. Not applicable.