Evaluation of domain shift sources and generalisability in AI-based prostate MRI autocontouring for radiotherapy.
Authors
Affiliations (4)
Affiliations (4)
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London SE1 7EU, United Kingdom; Department of Biomedical Engineering, Imam Abdurhamn bin Faisal University, Dammam, Saudi Arabia. Electronic address: [email protected].
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London SE1 7EU, United Kingdom; Department of Medical Physics and Clinical Engineering, Guy's and St Thomas' NHS Foundation Trust, London SE1 9RT, United Kingdom.
- Department of Clinical Haematology and Oncology, Guy's and St Thomas' NHS Foundation Trust, Guy's Cancer Centre, Guy's Hospital, Great Maze Pond, London SE1 9RT, United Kingdom.
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London SE1 7EU, United Kingdom.
Abstract
Deep learning (DL) models have been widely proposed to automate MRI-based delineation, but their use is hindered by differences in image characteristics between training and evaluation datasets (i.e. domain shift). This paper aims to (i) analyse the impacts of different sources of domain shift and (ii) externally evaluate a model trained using heterogeneous public data and compare it with an in-house model. The nnU-Net DL framework was trained for prostate autocontouring using axial T2-weighted (T2W) prostate MRIs from five public datasets. By controlling training set size, three sources of domain shift were evaluated: dataset, scanner vendor/field strength, and image acquisition/annotation protocol. 66 prostate MRIs were used for external evaluation and training/evaluation of an in-house model. The Dice Similarity Coefficient (DSC) and 95 % Hausdorff distance (HD) evaluated the model-produced contours. The performance gap (Δ) between intra/inter-domain evaluation showed that domain shift from scanner vendor/field strength (ΔDSC = 0.33, Δ95 % HD = 246.86 mm) and image acquisition/annotation protocols (ΔDSC = 0.20, Δ95 % HD = 14.70 mm) had greater impact than that from dataset (ΔDSC = 0.06, Δ95 % HD = 3.69 mm), although all were significant (p < 0.05). External evaluation showed that the mixed-domain trained model performed well but was less robust than the in-house model (median (IQR) DSC/95 % HD = 0.87 (0.06)/3.75 (4.47)mm, 0.90 (0.03)/1.03 (1.29) mm, p < 0.05, respectively). We highlight for the first time the domain shift effect of image acquisition/annotation protocol, even with images acquired using the same scanner vendor/field strength. Understanding the effect of multiple sources of domain shift has enabled us to train a robust model that can be safely clinically deployed.