Synthetic data generation in paediatrics and paediatric nursing: what, how, and why?
Authors
Affiliations (4)
Affiliations (4)
- Department of Women's and Children's Health, University of Padova, Padova, Italy. Electronic address: [email protected].
- Pediatric Hematology-Oncology and Bone Marrow Transplant Unit, Department of Woman's and Child's Health, Azienda-Ospedale-Università di Padova, Padova, Italy.
- Department of Women's and Children's Health, University of Padova, Padova, Italy; Pediatric Hematology-Oncology and Bone Marrow Transplant Unit, Department of Woman's and Child's Health, Azienda-Ospedale-Università di Padova, Padova, Italy.
- IRCCS-Istituto delle Scienze Neurologiche, Bologna, Italy.
Abstract
This paper explores the potential benefits and limitations of synthetic data (SD) in paediatrics, addressing the challenges of data scarcity and privacy concerns in paediatric research. A narrative literature review was conducted, searching PubMed and Scopus databases for relevant publications up to August 2025. The review focused on studies addressing the use, development, or application of SD in paediatric healthcare settings. Synthetic data offers numerous benefits in paediatrics, including enhancing dataset diversity, protecting patient privacy, and enabling AI model development, especially in areas with limited real datasets such as rare diseases. Applications of SD in paediatrics span various fields, including neonatology, oncology, radiology, and neurodevelopmental disorders. However, challenges persist, including potential data bias, ensuring accuracy and quality, privacy concerns, and the lack of standardized guidelines for data generation and validation. While SD demonstrates potential in specific paediatric applications, such as improving AI early warning systems and augmenting datasets for rare conditions, its use requires a structured, actionable framework for evaluation. Future efforts should focus through multi-stakeholder engagement, on developing paediatric-specific guidelines, ensuring fair and safe use of SD, and addressing unique aspects of child development in data synthesis.