Scaling genetic discovery for organ volumes using machine learning-assisted imputation and bias-corrected GWAS
Authors
Affiliations (1)
Affiliations (1)
- University of Southampton
Abstract
BackgroundMRI-derived organ and tissue volumes are powerful endophenotypes for studying complex disease, but their availability is limited by cost and throughput. We present a scalable framework that combines machine learning-based phenotypic imputation with probabilistic GWAS (POP-GWAS) to enable robust genetic discovery for imaging-derived phenotypes (IDPs). ResultsUsing 37,589 UK Biobank MRI scans and 382 biomarkers, we imputed nine IDPs--including volumes of fat depots, muscle, pancreas, and lung--across [~]450,000 individuals. The POP-GWAS framework integrated measured and imputed traits, correcting for imputation uncertainty and increasing effective sample size by up to 200%. We identified 452 independent loci associated with the nine IDPs. This approach uncovered new insights into the architecture and disease relevance of organ volumes. For example, genetically higher abdominal subcutaneous fat was associated with higher risks of diabetes, polycystic ovary syndrome, cardiovascular disease, gout, osteoarthritis, asthma, psoriasis; higher visceral fat with cholelithiasis and reflux; higher muscle volume with aortic aneurysm, atrial fibrillation, thrombotic events, osteoarthritis, but a lower risk of depression; higher lung volume with higher risks of aortic aneurysm, but a lower risk of heart disease and reflux; higher pancreas volume with lower risk of diabetes. Tissue enrichment analyses revealed organ-specific patterns, e.g., brain tissue for fat traits and pancreatic for pancreas volume. ConclusionsOur study demonstrates that machine learning-assisted GWAS enables scalable discovery in imaging genetics. This framework advances understanding of organ-specific biology and provides a blueprint for leveraging the remaining >60,000 UK Biobank MRI scans to accelerate genetic discovery and uncover mechanisms of disease.