Categorical and phenotypic image synthetic learning as an alternative to federated learning.
Authors
Affiliations (10)
Affiliations (10)
- Department of Radiology, University of Texas Southwestern Medical Center, Dallas, TX, USA. [email protected].
- Department of Radiology, University of Texas Southwestern Medical Center, Dallas, TX, USA.
- Department of Pathology, University of Texas Southwestern Medical Center, Dallas, TX, USA.
- Department of Neurological Surgery, University of Texas Southwestern Medical Center, Dallas, TX, USA.
- Department of Bioengineering, University of Texas at Dallas, Richardson, TX, USA.
- Department of Radiology, NYU Grossman School of Medicine, New York, NY, USA.
- Department of Neurosurgery, NYU Grossman School of Medicine, New York, NY, USA.
- Department of Radiology, University of Wisconsin-Madison, Madison, WI, USA.
- Department of Radiology, Mayo Clinic, Rochester, MN, USA.
- Department of Radiology, University of Texas Southwestern Medical Center, Dallas, TX, USA. [email protected].
Abstract
Multi-center collaborations are crucial in developing robust and generalizable machine learning models in medical imaging. Traditional methods, such as centralized data sharing or federated learning (FL), face challenges, including privacy issues, communication burdens, and synchronization complexities. We present CATegorical and PHenotypic Image SyntHetic learnING (CATphishing), an alternative to FL using Latent Diffusion Models (LDM) to generate synthetic multi-contrast three-dimensional magnetic resonance imaging data for downstream tasks, eliminating the need for raw data sharing or iterative inter-site communication. Each institution trains an LDM to capture site-specific data distributions, producing synthetic samples aggregated at a central server. We evaluate CATphishing using data from 2491 patients across seven institutions for isocitrate dehydrogenase mutation classification and three-class tumor-type classification. CATphishing achieves accuracy comparable to centralized training and FL, with synthetic data exhibiting high fidelity. This method addresses privacy, scalability, and communication challenges, offering a promising alternative for collaborative artificial intelligence development in medical imaging.