A framework for quantifying and leveraging uncertainty in pre-trained CT denoising model.
Authors
Abstract
To develop an architecture-agnostic framework that estimates, calibrates, and leverages total uncertainty (aleatoric + epistemic) in pre-trained, deep-learning denoising models for low-dose computed tomography (CT). Aleatoric and epistemic uncertainties were estimated using physics-based inference-time augmentation and training-free, post-hoc Monte Carlo dropout, respectively, followed by non-parametric re-calibration for improved uncertainty calibration. To leverage uncertainty, we explored adaptive local fusion (ALF) guided by local mean-to-uncertainty ratio. For proof-of-concept, this framework was assessed using pre-trained U-net and ResNet-based models across datasets varying in CT tasks, radiation dose, and lesion characteristics. Uncertainty estimation and calibration were assessed in cadaver scans, using normalized-root-mean-square-error (NRMSE) and normalized-calibration-error (NCE), respectively. ALF was evaluated with chest and liver exams, using noise, structural similarity index (SSIM), and lesion detectability. Lesion detectability was quantified using clinically validated deep-learning model observer, with Wilcoxon signed-rank test to assess significance. This framework provided accurate uncertainty quantification and calibration: NRMSE range [1.2%, 2.4%], NCE [0.9%, 2.2%]. Compared to original pre-trained models, ALF yielded comparable or lower noise, improved lesion structural fidelity and detectability (p<0.05): For lung nodules - noise reduction up to 69.7%, SSIM range (ALF vs pre-trained) [0.92, 0.96] vs [0.78, 0.87], detectability improvement up to 12.9%; for liver metastases - noise reduction up to 35.0%, SSIM range (ALF vs pre-trained) [0.82, 0.99] vs [0.80, 0.98], detectability improvement up to 13.2%. Our framework effectively benchmarked and utilized total uncertainty to enhance diagnostic image quality with pre-trained CT denoising models. This framework can facilitate performance monitoring, deployment optimization, and trustworthiness establishment.