Mitigating Data Bias in Healthcare AI with Self-Supervised Standardization.
Authors
Abstract
The rapid advancement of artificial intelligence (AI) in healthcare has accelerated innovations in medical algorithms, yet its broader adoption faces critical ethical and technical barriers. A key challenge lies in algorithmic bias stemming from heterogeneous medical data across institutions, equipment, and workflows, which may perpetuate disparities in AI-driven diagnoses and exacerbate inequities in patient care. While AI's ability to extract deep features from large-scale data offers transformative potential, its effectiveness heavily depends on standardized, high-quality datasets. Current standardization gaps not only limit model generalizability but also raise concerns about reliability and fairness in real-world clinical settings, particularly for marginalized populations. Addressing these urgent issues, this paper proposes an ethical AI framework centered on a novel self-supervised medical image standardization method. By integrating self-supervised image style conversion, channel attention mechanisms, and contrastive learning-based loss functions, our approach enhances structural and style consistency in diverse datasets while preserving patient privacy through decentralized learning paradigms. Experiments across multi-institutional medical image datasets demonstrate that our method significantly improves AI generalizability without requiring centralized data sharing. By bridging the data standardization gap, this work advances technical foundations for trustworthy AI in healthcare.