Evaluation of uncertainty estimation methods in medical image segmentation: Exploring the usage of uncertainty in clinical deployment.

Authors

Li S,Yuan M,Dai X,Zhang C

Affiliations (3)

  • Digital Medical Research Center, School of Basic Medical Science, Fudan University, Shanghai, 200032, China; Shanghai Key Lab of Medical Image Computing and Computer Assisted Intervention, Shanghai, 200032, China.
  • Shanghai Key Lab of Medical Image Computing and Computer Assisted Intervention, Shanghai, 200032, China; Digital Medical Research Center, Academy for Engineering and Technology, Fudan University, Shanghai, 200032, China.
  • Digital Medical Research Center, School of Basic Medical Science, Fudan University, Shanghai, 200032, China; Shanghai Key Lab of Medical Image Computing and Computer Assisted Intervention, Shanghai, 200032, China. Electronic address: [email protected].

Abstract

Uncertainty estimation methods are essential for the application of artificial intelligence (AI) models in medical image segmentation, particularly in addressing reliability and feasibility challenges in clinical deployment. Despite their significance, the adoption of uncertainty estimation methods in clinical practice remains limited due to the lack of a comprehensive evaluation framework tailored to their clinical usage. To address this gap, a simulation of uncertainty-assisted clinical workflows is conducted, highlighting the roles of uncertainty in model selection, sample screening, and risk visualization. Furthermore, uncertainty evaluation is extended to pixel, sample, and model levels to enable a more thorough assessment. At the pixel level, the Uncertainty Confusion Metric (UCM) is proposed, utilizing density curves to improve robustness against variability in uncertainty distributions and to assess the ability of pixel uncertainty to identify potential errors. At the sample level, the Expected Segmentation Calibration Error (ESCE) is introduced to provide more accurate calibration aligned with Dice, enabling more effective identification of low-quality samples. At the model level, the Harmonic Dice (HDice) metric is developed to integrate uncertainty and accuracy, mitigating the influence of dataset biases and offering a more robust evaluation of model performance on unseen data. Using this systematic evaluation framework, five mainstream uncertainty estimation methods are compared on organ and tumor datasets, providing new insights into their clinical applicability. Extensive experimental analyses validated the practicality and effectiveness of the proposed metrics. This study offers clear guidance for selecting appropriate uncertainty estimation methods in clinical settings, facilitating their integration into clinical workflows and ultimately improving diagnostic efficiency and patient outcomes.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.