Bias in deep learning-based image quality assessments of T2-weighted imaging in prostate MRI.
Authors
Affiliations (4)
Affiliations (4)
- Department of Radiology, Mayo Clinic, Rochester, USA.
- Department of Radiology, Mayo Clinic, Scottsdale, AZ, USA.
- Department of Radiology, Mayo Clinic, Jacksonville, FL, USA.
- Department of Radiology, Mayo Clinic, Rochester, USA. [email protected].
Abstract
To determine whether deep learning (DL)-based image quality (IQ) assessment of T2-weighted images (T2WI) could be biased by the presence of clinically significant prostate cancer (csPCa). In this three-center retrospective study, five abdominal radiologists categorized IQ of 2,105 transverse T2WI series into optimal, mild, moderate, and severe degradation. An IQ classification model was developed using 1,719 series (development set). The agreement between the model and radiologists was assessed using the remaining 386 series with a quadratic weighted kappa. The model was applied to 11,723 examinations that were not included in the development set and without documented prostate cancer at the time of MRI (patient age, 65.5 ± 8.3 years [mean ± standard deviation]). Examinations categorized as mild to severe degradation were used as target groups, whereas those as optimal were used to construct matched control groups. Case-control matching was performed to mitigate the effects of pre-MRI confounding factors, such as age and prostate-specific antigen value. The proportion of patients with csPCa was compared between the target and matched control groups using the chi-squared test. The agreement between the model and radiologists was moderate with a quadratic weighted kappa of 0.53. The mild-moderate IQ-degraded groups had significantly higher csPCa proportions than the matched control groups with optimal IQ: moderate (N = 126) vs. optimal (N = 504), 26.3% vs. 22.7%, respectively, difference = 3.6% [95% confidence interval: 0.4%, 6.8%], p = 0.03; mild (N = 1,399) vs. optimal (N = 1,399), 22.9% vs. 20.2%, respectively, difference = 2.7% [0.7%, 4.7%], p = 0.008. The DL-based IQ tended to be worse in patients with csPCa, raising concerns about its clinical application.