Performance of a screening-trained DL model for pulmonary nodule malignancy estimation of incidental clinical nodules.
Authors
Affiliations (6)
Affiliations (6)
- Department of Medical Imaging, Radboud University Medical Center, Nijmegen, The Netherlands. [email protected].
- Department of Medical Imaging, Radboud University Medical Center, Nijmegen, The Netherlands.
- Department of Radiology, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands.
- Department of Radiology and Nuclear Medicine, Maastricht University Medical Center, Maastricht University, Maastricht, The Netherlands.
- Maastricht University, GROW, School of Oncology and Reproduction, Maastricht, The Netherlands.
- Department of Radiology, Meander Medical Center, Amersfoort, The Netherlands.
Abstract
To test the performance of a DL model developed and validated for screen-detected pulmonary nodules on incidental nodules detected in a clinical setting. A retrospective dataset of incidental pulmonary nodules sized 5-15 mm was collected, and a subset of size-matched solid nodules was selected. The performance of the DL model was compared to the Brock model. AUCs with 95% CIs were compared using the DeLong method. Sensitivity and specificity were determined at various thresholds, using a 10% threshold for the Brock model as reference. The model's calibration was visually assessed. The dataset included 49 malignant and 359 benign solid or part-solid nodules, and the size-matched dataset included 47 malignant and 47 benign solid nodules. In the complete dataset, AUCs [95% CI] were 0.89 [0.85, 0.93] for the DL model and 0.86 [0.81, 0.92] for the Brock model (p = 0.27). In the size-matched subset, AUCs of the DL and Brock models were 0.78 [0.69, 0.88] and 0.58 [0.46, 0.69] (p < 0.01), respectively. At a 10% threshold, the Brock model had a sensitivity of 0.49 [0.35, 0.63] and a specificity of 0.92 [0.89, 0.94]. At a threshold of 17%, the DL model matched the specificity of the Brock model at the 10% threshold, but had a higher sensitivity (0.57 [0.43, 0.71]). Calibration analysis revealed that the DL model overestimated the malignancy probability. The DL model demonstrated good discriminatory performance in a dataset of incidental nodules and outperformed the Brock model, but may need recalibration for clinical practice. Question What is the performance of a DL model for pulmonary nodule malignancy risk estimation developed on screening data in a dataset of incidentally detected nodules? Findings The DL model performed well on a dataset of nodules from clinical routine care and outperformed the Brock model in a size-matched subset. Clinical relevance This study provides further evidence about the potential of DL models for risk stratification of incidental nodules, which may improve nodule management in routine clinical practice.