Controlled comparative study of YOLOv8-Pose, YOLOv11-Pose, and Detectron2 for vertebrae detection and keypoint estimation.
Authors
Affiliations (1)
Affiliations (1)
- Department of Cybernetics and Biomedical Engineering, VSB-Technical University of Ostrava, FEECS, Ostrava, Poruba, Czech Republic.
Abstract
Accurate vertebrae detection with precise keypoint localization is essential for medical image analysis and clinical applications, such as spinal alignment assessment. This study presents a controlled, task-driven comparison of four pose-based deep learning models for vertebrae detection and keypoint estimation: YOLOv8n-Pose, YOLOv11n-Pose, and Detectron2 (Keypoint R-CNN) with ResNet-50 and ResNet-101 backbones. A single-class vertebrae dataset with bounding box annotations and four anatomically meaningful keypoints per vertebra was evaluated under consistent training settings. Performance was assessed in terms of keypoint localization accuracy, detection precision, inference speed, and vertebrae-specific prediction behavior, including detection completeness and duplicate detections, which are critical for clinical usability. Results show that YOLOv11n-Pose provides the best balance between keypoint accuracy and inference efficiency, while YOLOv8n-Pose achieves the fastest inference with competitive performance. Detectron2-based models exhibit lower keypoint accuracy, slower inference, and frequent duplicate predictions that obscure correct anatomical landmarks. Additional experiments indicate that larger YOLO variants (particularly YOLOv8l-Pose) improve accuracy, albeit at increased computational cost. Overall, the results demonstrate that model capacity or architectural recency alone does not guarantee superior performance in vertebrae keypoint detection, highlighting the importance of anatomy-aware model selection for spine imaging applications.