Original Research ARTICLE
Data integration methods for phenotype harmonization in multi-cohort genome-wide association studies with behavioral outcomes
- 1University of Notre Dame, United States
- 2Netherlands Twin Register, Department of Biological Psychology and Amsterdam Public Health research institute, Faculty of Behavioural and Movement Sciences, Vrije Universiteit Amsterdam, Netherlands
- 3Netherlands Twin Register, Department of Biological Psychology; & Amsterdam Public Health research institute, Faculty of Behavioural and Movement Sciences, Vrije Universiteit Amsterdam, Netherlands
- 4Department of Medical Epidemiology and Biostatistics, Karolinska Institutet (KI), Sweden
- 5Gillberg Neuropsychiatry Centre, University of Gothenburg, Sweden
- 6School of Medical Sciences, Örebro University, Sweden
- 7Netherlands Twin Register, Department of Biological Psychology & Amsterdam Public Health research institute, Faculty of Behavioural and Movement Sciences, Vrije Universiteit Amsterdam, Netherlands
- 8Amsterdam Neuroscience, Medical Center, VU University Amsterdam, Netherlands
Parallel meta-analysis is a popular approach for increasing the power to detect genetic effects in genome-wide association studies across multiple cohorts. Consortia studying the genetics of behavioral phenotypes are oftentimes faced with systematic differences in phenotype measurement across cohorts, introducing heterogeneity into the meta-analysis and reducing statistical power. This study investigated integrative data analysis (IDA) as an approach for jointly modeling the phenotype across multiple datasets. We put forth a bi-factor integration model (BFIM) that provides a single common phenotype score and accounts for sources of study-specific variability in the phenotype. In order to capitalize on this modeling strategy, a phenotype reference panel was utilized as a supplemental sample with complete data on all behavioral measures. A simulation study showed that a mega-analysis of genetic variant effects in a BFIM were more powerful than meta-analysis of genetic effects on a cohort-specific sum score of items. Saving the factor scores from the BFIM and using those as the outcome in meta-analysis was also more powerful than the sum score in most simulation conditions, but a small degree of bias was introduced by this approach. The reference panel was necessary to realize these power gains. An empirical demonstration used the BFIM to harmonize aggression scores in 9-year old children across the Netherlands Twin Register and the Child and Adolescent Twin Study in Sweden, providing a template for application of the BFIM to a range of different phenotypes. A supplemental data collection in the NTR served as a reference panel for phenotype modeling across both cohorts. Our results indicate that model-based harmonization for the study of complex traits is a useful step within genetic consortia.
Keywords: Phenotype harmonization, Genome-wide association (GWA) study, Latent variable modeling (LVM), data integration, Consortia
Received: 04 Oct 2018;
Accepted: 05 Nov 2019.
Copyright: © 2019 Luningham, McArtor, Hendriks, van Beijsterveldt, Lichtenstein, Lundström, Larsson, Bartels, Boomsma and Lubke. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Dr. Justin M. Luningham, University of Notre Dame, Notre Dame, United States, email@example.com