Pixel Tampering: Does Face Redaction Harm Medical AI Performance?
Authors
Affiliations (14)
Affiliations (14)
- Dasa, São Paulo, Brazil.
- Universidade Federal de São Paulo, Unifesp, São Paulo, Brazil.
- Dasa, São Paulo, Brazil. [email protected].
- Universidade Federal de São Paulo, Unifesp, São Paulo, Brazil. [email protected].
- Division of Radiology and Biomedical Engineering, University of Tokyo, Tokyo, Japan.
- Department of Nephrology, Osaka University, Osaka, Japan.
- , Siegen, Germany, North Rhine-Westphalia.
- IDOR, São Paulo, Brazil.
- AC Camargo, São Paulo, Brazil.
- UT Southwestern Medical Center, Dallas, TX, USA.
- Department of Radiology, University of North Carolina at Chapel Hill, Chapel Hill, USA.
- Universidade Estadual de Campinas, São Paulo, Brazil.
- Instituto do Coração - InCor, São Paulo, Brazil.
- Bunkerhill Health, San Francisco, CA, USA.
Abstract
Balancing data sharing and patient privacy is essential in medical imaging. Face redaction tools anonymize head CTs by removing identifiable features, but their impact on deep learning (DL) model performance remains a concern. We present an open-source face redaction tool designed to enhance data-sharing security while preserving DL performance, validated through a Kaggle competition on age prediction from brain CTs. This study aims to evaluate whether models trained on redacted images perform comparably on both redacted and non-redacted test sets, and how they compare to similar models trained on non-redacted images reported in the literature. A Kaggle challenge was conducted between March 2 and April 30, 2024, to crowdsource age prediction models. The dataset comprised 2377 redacted head CT studies for training and 148 for testing, sourced from multiple institutions. In a post-hoc analysis, the top-performing models were evaluated on both redacted and non-redacted formats of the test set, with performance measured using mean absolute error (MAE). The two best models achieved MAEs of 2.8 and 3.4 years on redacted test data. On the non-redacted format, MAEs increased to 3.2 and 3.8, respectively. Paired t-tests showed a significant performance drop for one model (p = 0.038) but not the other (p = 0.051). There was no significant difference between the models on the redacted test set (p = 0.610). Models trained on redacted data may show minimal performance decline when applied to non-redacted images, yet still outperform existing benchmarks. Our tool enables secure data sharing with a limited impact on DL accuracy.