FairGen: preference-aligned diffusion for demographically equitable medical image synthesis.
Authors
Affiliations (6)
Affiliations (6)
- Swanson School of Engineering, University of Pittsburgh, Pittsburgh, PA, USA.
- Department of Computer Science, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ, USA.
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, USA.
- Swanson School of Engineering, University of Pittsburgh, Pittsburgh, PA, USA. [email protected].
- Department of Computer Science, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA. [email protected].
Abstract
Medical imaging is central to modern diagnostics, and artificial intelligence (AI) systems are increasingly used to support image-based analysis by improving efficiency, accuracy, and access to care. However, inequities in healthcare access and differential disease prevalence create severe demographic imbalances in clinical image data. Such imbalances are compounded by the fact that diseases can manifest with distinct features across demographic groups, rendering certain phenotypic presentations naturally rare. AI models trained on such imbalanced data risk perpetuating diagnostic bias and widening healthcare disparities. Here we introduce FairGen, a fairness-aware diffusion framework that synthesizes demographically balanced medical images while preserving pathology-relevant visual features. By embedding physician-aligned preferences into the generation process, FairGen improves subgroup coverage during synthesis and downstream classification. Applied to dermatology, radiology, and neuroimaging benchmark tasks, FairGen achieves fairness improvements of 95.9% for skin images, 80.0% for chest radiography, and 35.2% for brain MRI, while maintaining competitive diagnostic accuracy relative to models trained on original clinical data. Clinician-facing expert review and external validation on independent cohorts further support that these gains extend beyond standard fidelity metrics and are not confined to the original in-distribution datasets.