Challenges in Using Deep Neural Networks Across Multiple Readers in Delineating Prostate Gland Anatomy.
Authors
Affiliations (8)
Affiliations (8)
- Department of Machine Learning, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USA.
- Department of Electrical Engineering, University of South Florida, Tampa, FL, USA.
- Department of Diagnostic Radiology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USA.
- Department of Genitourinary Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USA.
- Department of Machine Learning, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USA. [email protected].
- Department of Electrical Engineering, University of South Florida, Tampa, FL, USA. [email protected].
- Department of Diagnostic Radiology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USA. [email protected].
- Department of Genitourinary Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USA. [email protected].
Abstract
Deep learning methods provide enormous promise in automating manually intense tasks such as medical image segmentation and provide workflow assistance to clinical experts. Deep neural networks (DNN) require a significant amount of training examples and a variety of expert opinions to capture the nuances and the context, a challenging proposition in oncological studies (H. Wang et al., Nature, vol. 620, no. 7972, pp. 47-60, Aug 2023). Inter-reader variability among clinical experts is a real-world problem that severely impacts the generalization of DNN reproducibility. This study proposes quantifying the variability in DNN performance using expert opinions and exploring strategies to train the network and adapt between expert opinions. We address the inter-reader variability problem in the context of prostate gland segmentation using a well-studied DNN, the 3D U-Net model. Reference data includes magnetic resonance imaging (MRI, T2-weighted) with prostate glandular anatomy annotations from two expert readers (R#1, n = 342 and R#2, n = 204). 3D U-Net was trained and tested with individual expert examples (R#1 and R#2) and had an average Dice coefficient of 0.825 (CI, [0.81 0.84]) and 0.85 (CI, [0.82 0.88]), respectively. Combined training with a representative cohort proportion (R#1, n = 100 and R#2, n = 150) yielded enhanced model reproducibility across readers, achieving an average test Dice coefficient of 0.863 (CI, [0.85 0.87]) for R#1 and 0.869 (CI, [0.87 0.88]) for R#2. We re-evaluated the model performance across the gland volumes (large, small) and found improved performance for large gland size with an average Dice coefficient to be at 0.846 [CI, 0.82 0.87] and 0.872 [CI, 0.86 0.89] for R#1 and R#2, respectively, estimated using fivefold cross-validation. Performance for small gland sizes diminished with average Dice of 0.8 [0.79, 0.82] and 0.8 [0.79, 0.83] for R#1 and R#2, respectively.