Interpreting convolutional neural network explainability for head-and-neck cancer radiotherapy organ-at-risk segmentation
Authors
Affiliations (1)
Affiliations (1)
- InselSpital Bern, Switzerland
Abstract
BackgroundConvolutional neural networks (CNNs) have emerged to reduce clinical resources and standardize auto-contouring of organs-at-risk (OARs). Although CNNs perform adequately for most patients, understanding when the CNN might fail is critical for effective and safe clinical deployment. However, the limitations of CNNs are poorly understood because of their black-box nature. Explainable artificial intelligence (XAI) can expose CNNs inner mechanisms for classification. Here, we investigate the inner mechanisms of CNNs for segmentation and explore a novel, computational approach to a-priori flag potentially insufficient parotid gland (PG) contours. MethodsFirst, 3D UNets were trained in three PG segmentation situations using (1) synthetic cases; (2) 1925 clinical computed tomography (CT) scans with typical and (3) more consistent contours curated through a previously validated auto-curation step. Then, we generated attribution maps for seven XAI methods, and qualitatively assessed them for congruency between simulated and clinical contours, and how much XAI agreed with expert reasoning. To objectify observations, we explored persistent homology intensity filtrations to capture essential topological characteristics of XAI attributions. Principal component (PC) eigenvalues of Euler characteristic profiles were correlated with spatial agreement (Dice-Sorensen similarity coefficient; DSC). Evaluation was done using sensitivity, specificity and the area under receiver operating characteristic (AUROC) curve on an external AAPM dataset, where as proof-of-principle, we regard the lowest 15% DSC as insufficient. ResultsPatternNet attributions (PNet-A) focused on soft-tissue structures, whereas guided backpropagation (GBP) highlighted both soft-tissue and high-density structures (e.g. mandible bone), which was congruent with synthetic situations. Both methods typically had higher/denser activations in better auto-contoured medial and anterior lobes. Curated models produced "cleaner" gradient class-activation mapping (GCAM) attributions. Quantitative analysis showed that PC{lambda}1 of guided GCAMs (GGCAM) Euler characteristic (EC) profile had good predictive value (sensitivity>0.85, specificity>0.9) of DSC for AAPM cases, with AUROC=0.66, 0.74, 0.94, 0.83 for GBP, GCAM, GGCAM and PNet-A. For for {lambda}1<-1.8e3 of GGCAMs EC-profile, 87% of cases were insufficient. ConclusionsGBP and PNet-A qualitatively agreed most with expert reasoning on directly (structure borders) and indirectly (proxies used for identifying structure borders) important features for PG segmentation. Additionally, this work investigated as proof-of-principle how topological data analysis could possibly be used for quantitative XAI signal analysis to a-priori mark potentially inadequate CNN-segmentations, using only features from inside the predicted PG. This work used PG as a well-understood segmentation paradigm and may extend to target volumes and other organs-at-risk.