Fed-ComBat: A Generalized Federated Framework for Batch Effect Harmonization in Collaborative Studies
Authors
Affiliations (1)
Affiliations (1)
- Universite Cote d\\\'Azur, Inria Sophia Antipolis, Epione Research Group, France
Abstract
In neuroimaging research, the utilization of multi-centric analyses is crucial for obtaining sufficient sample sizes and representative clinical populations. Data harmonization techniques are typically part of the pipeline in multi-centric studies to address systematic biases and ensure the comparability of the data. However, most multi-centric studies require centralized data, which may result in exposing individual patient information. This poses a significant challenge in data governance, leading to the implementation of regulations such as the GDPR and the CCPA, which attempt to address these concerns but also hinder data access for researchers. Federated learning offers a privacy-preserving alternative approach in machine learning, enabling models to be collaboratively trained on decentralized data without the need for data centralization or sharing. In this paper, we present Fed-ComBat, a federated framework for batch effect harmonization on decentralized data. Fed-ComBat extends existing centralized linear methods, such as ComBat and distributed as d-ComBat, and nonlinear approaches like ComBat-GAM in accounting for potentially nonlinear and multivariate covariate effects. By doing so, Fed-ComBat enables the preservation of nonlinear covariate effects without requiring centralization of data and without prior knowledge of which variables should be considered nonlinear or their interactions, differentiating it from ComBat-GAM. We assessed Fed-ComBat and existing approaches on simulated data and multiple cohorts comprising healthy controls (CN) and subjects with various disorders such as Parkinson's disease (PD), Alzheimer's disease (AD), and autism spectrum disorder (ASD). The results of our study show that Fed-ComBat performs better than centralized ComBat when dealing with nonlinear effects and is on par with centralized methods like ComBat-GAM. Through experiments using synthetic data, Fed-ComBat demonstrates a superior ability to reconstruct the target unbiased function, achieving a 35% improvement (RMSE=0.5952) compared to d-ComBat (RMSE=0.9162) and a 12% improvement compared to our proposal to federate ComBat-GAM, d-ComBat-GAM (RMSE=0.6751). Additionally, Fed-ComBat achieves comparable results to centralized methods like ComBat-GAM for MRI-derived phenotypes without requiring prior knowledge of potential nonlinearities.