Beyond Benchmarks: Towards Robust Artificial Intelligence Bone Segmentation in Socio-Technical Systems

Authors

Xie, K.,Gruber, L. J.,Crampen, M.,Li, Y.,Ferreira, A.,Tappeiner, E.,Gillot, M.,Schepers, J.,Xu, J.,Pankert, T.,Beyer, M.,Shahamiri, N.,ten Brink, R.,Dot, G.,Weschke, C.,van Nistelrooij, N.,Verhelst, P.-J.,Guo, Y.,Xu, Z.,Bienzeisler, J.,Rashad, A.,Flügge, T.,Cotton, R.,Vinayahalingam, S.,Ilesan, R.,Raith, S.,Madsen, D.,Seibold, C.,Xi, T.,Berge, S.,Nebelung, S.,Kodym, O.,Sundqvist, O.,Thieringer, F.,Lamecker, H.,Coppens, A.,Potrusil, T.,Kraeima, J.,Witjes, M.,Wu, G.,Chen, X.,Lambrechts, A.,Cevidanes, L. H. S.,Zachow, S.,Hermans, A.,Truhn, D.,Alves,

Affiliations (1)

  • Department of Oral and Maxillofacial Surgery, University Hospital RWTH Aachen

Abstract

Despite the advances in automated medical image segmentation, AI models still underperform in various clinical settings, challenging real-world integration. In this multicenter evaluation, we analyzed 20 state-of-the-art mandibular segmentation models across 19,218 segmentations of 1,000 clinically resampled CT/CBCT scans. We show that segmentation accuracy varies by up to 25% depending on socio-technical factors such as voxel size, bone orientation, and patient conditions such as osteosynthesis or pathology. Higher sharpness, isotropic smaller voxels, and neutral orientation significantly improved results, while metallic osteosynthesis and anatomical complexity led to significant degradation. Our findings challenge the common view of AI models as "plug-and-play" tools and suggest evidence-based optimization recommendations for both clinicians and developers. This will in turn boost the integration of AI segmentation tools in routine healthcare.

Topics

radiology and imaging

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.