UltraBones100k: A reliable automated labeling method and large-scale dataset for ultrasound-based bone surface extraction.
Wu L, Cavalcanti NA, Seibold M, Loggia G, Reissner L, Hein J, Beeler S, Viehöfer A, Wirth S, Calvet L, Fürnstahl P
•papers•Jun 4 2025Ultrasound-based bone surface segmentation is crucial in computer-assisted orthopedic surgery. However, ultrasound images have limitations, including a low signal-to-noise ratio, acoustic shadowing, and speckle noise, which make interpretation difficult. Existing deep learning models for bone segmentation rely primarily on costly manual labeling by experts, limiting dataset size and model generalizability. Additionally, the complexity of ultrasound physics and acoustic shadow makes the images difficult for humans to interpret, leading to incomplete labels in low-intensity and anechoic regions and limiting model performance. To advance the state-of-the-art in ultrasound bone segmentation and establish effective model benchmarks, larger and higher-quality datasets are needed. We propose a methodology for collecting ex-vivo ultrasound datasets with automatically generated bone labels, including anechoic regions. The proposed labels are derived by accurately superimposing tracked bone Computed Tomography (CT) models onto the tracked ultrasound images. These initial labels are refined to account for ultrasound physics. To clinically evaluate the proposed method, an expert physician from our university hospital specialized in orthopedic sonography assessed the quality of the generated bone labels. A neural network for bone segmentation is trained on the collected dataset and its predictions are compared to expert manual labels, evaluating accuracy, completeness, and F1-score. We collected UltraBones100k, the largest known dataset comprising 100k ex-vivo ultrasound images of human lower limbs with bone annotations, specifically targeting the fibula, tibia, and foot bones. A Wilcoxon signed-rank test with Bonferroni correction confirmed that the bone alignment after our optimization pipeline significantly improved the quality of bone labeling (p<0.001). The model trained on UltraBones100k consistently outperforms manual labeling in all metrics, particularly in low-intensity regions (at a distance threshold of 0.5 mm: 320% improvement in completeness, 27.4% improvement in accuracy, and 197% improvement in F1 score) CONCLUSION:: This work is promising to facilitate research and clinical translation of ultrasound imaging in computer-assisted interventions, particularly for applications such as 2D bone segmentation, 3D bone surface reconstruction, and multi-modality bone registration.