Back to all papers

CalDiff: calibrating uncertainty and accessing reliability of diffusion models for trustworthy lesion segmentation.

October 23, 2025pubmed logopapers

Authors

Wang X,Yang M,Tosun S,Nakamura K,Li S,Li X

Abstract

Low reliability has consistently been a challenge in the application of deep learning models for high risk decision-making scenarios. In medical image segmentation, for instance, multiple expert annotations can be consulted to reduce subjective bias and reach a consensus, thereby enhancing the segmentation accuracy and reliability. To develop a reliable lesion segmentation model, we leverage the uncertainty introduced by multiple annotations, enabling the model to better capture real-world diagnostic variability and provide more informative predictions. Since a reliable model should produce calibrated uncertainty estimates that align with actual predictive performance, we propose CalDiff, a novel framework designed to calibrate model uncertainty in lesion segmentation and mitigate the risk of overconfident yet incorrect predictions. To harness the superior generative ability of diffusion mod els, a dual step-wise and sequence-aware calibration mechanism is proposed on the basis of the sequential nature of diffusion models. We evaluate the calibrated model through a comprehensive quantitative and visual analysis, thus ad dressing the previously overlooked challenge of assessing uncertainty calibration and model reliability in scenarios with multiple annotations and multiple predictions. Experimental results on two multi-annotated lesion segmentation datasets demonstrate that CalDiff produces uncertainty maps that can reflect informative low confidence areas, which can further indicate the false predictions potentially made by the model. By calibrating the uncertainty in the training phase, the uncertain areas produced from our model are more closely correlated with areas where the model has made errors in the inference. In summary, the uncertainty captured by our CalDiff framework can serve as apowerful indicator, which can help mitigate the risks of adopting model's outputs, allowing clinicians to prioritize reviewing areas or slices with higher uncertainty and enhancing the model's reliability and trustworthiness in real clinical practice.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.