# Including Phenotypic Causal Networks in Genome-Wide Association Studies Using Mixed Effects Structural Equation Models

^{1}Department of Animal Science, Faculty of Agriculture, Shahid Bahonar University of Kerman, Kerman, Iran^{2}Roslin Institute, University of Edinburgh, Midlothian, United Kingdom^{3}Department of Exact Sciences, University of São Paulo-Escola Superior de Agricultura Luiz de Queiroz, Piracicaba, Brazil^{4}Department of Animal Sciences, University of Wisconsin, Madison, WI, United States^{5}Department of Animal and Poultry Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States^{6}Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, United States^{7}Department of Dairy Science, University of Wisconsin, Madison, WI, United States

Network based statistical models accounting for putative causal relationships among multiple phenotypes can be used to infer single-nucleotide polymorphism (SNP) effect which transmitting through a given causal path in genome-wide association studies (GWAS). In GWAS with multiple phenotypes, reconstructing underlying causal structures among traits and SNPs using a single statistical framework is essential for understanding the entirety of genotype-phenotype maps. A structural equation model (SEM) can be used for such purposes. We applied SEM to GWAS (SEM-GWAS) in chickens, taking into account putative causal relationships among breast meat (BM), body weight (BW), hen-house production (HHP), and SNPs. We assessed the performance of SEM-GWAS by comparing the model results with those obtained from traditional multi-trait association analyses (MTM-GWAS). Three different putative causal path diagrams were inferred from highest posterior density (HPD) intervals of 0.75, 0.85, and 0.95 using the inductive causation algorithm. A positive path coefficient was estimated for BM → BW, and negative values were obtained for BM → HHP and BW → HHP in all implemented scenarios. Further, the application of SEM-GWAS enabled the decomposition of SNP effects into direct, indirect, and total effects, identifying whether a SNP effect is acting directly or indirectly on a given trait. In contrast, MTM-GWAS only captured overall genetic effects on traits, which is equivalent to combining the direct and indirect SNP effects from SEM-GWAS. Although MTM-GWAS and SEM-GWAS use the similar probabilistic models, we provide evidence that SEM-GWAS captures complex relationships in terms of causal meaning and mediation and delivers a more comprehensive understanding of SNP effects compared to MTM-GWAS. Our results showed that SEM-GWAS provides important insight regarding the mechanism by which identified SNPs control traits by partitioning them into direct, indirect, and total SNP effects.

## Introduction

Genome-wide association studies (GWAS) have become a standard approach for investigating relationships between common genetic variants in the genome (e.g., single-nucleotide polymorphisms, SNPs) and phenotypes of interest in human, plant, and animal genetics (Hayes and Goddard, 2010; Brachi et al., 2011; Wang et al., 2012). A typical GWAS is based on univariate linear or logistic regression of phenotypes on genotypes for each SNP individually while often adjusting for the presence of nuisance covariates (Hayes and Goddard, 2010; Sikorska et al., 2013). A statistically significant association indicates that SNPs may be in strong linkage disequilibrium (LD) with quantitative trait loci (QTL) that contribute to the trait etiology. Alternatively, multi-trait model GWAS (MTM-GWAS) can be used to test for genetic associations among a set of traits (Korte et al., 2012; O'Reilly et al., 2012; Zhou and Stephens, 2012). It has been established that MTM-GWAS reduces false positives and increases the statistical power of association tests, explaining the recent popularity of this method. MTM-GWAS can be used to study genetic associations among a set of traits. However, it does not consider various cryptic biological signals that may affect a trait of interest, either directly or indirectly through other intermediate traits.

Complex traits are the product of various cryptic biological signals that may affect a trait of interest either directly or indirectly through other intermediate traits (Falconer and Mackay, 1996). A standard regression cannot describe such complex relationships between traits and QTLs properly. For instance, some traits may simultaneously act as both dependent and independent variables. Structural equation modeling (SEM) is an extended version of Wright's path analysis (Wright, 1921; Gianola and Sorensen, 2004) that offers a powerful technique for modeling causal networks. In a complex genotype-phenotype setting involving many traits, a given trait can be influenced not only by genetic and systematic factors but also by other traits (as covariates). Here, QTLs may not affect the target trait directly; instead, the effects may be mediated by upstream traits in a causal network. Indirect effects may therefore constitute a proportion of perceived pleiotropy, and these concepts apply to sets of heritable traits, organized as networks, that are common in biological systems. An example from dairy cattle production systems, described by Gianola and Sorensen (2004), is that higher milk yield increases the risk of a particular disease, such as mastitis, while the prevalence of the disease may negatively affect milk yield As another example, Varona et al. (2007) explored a causal link from litter size to average piglet weight in two pig breeds. In humans, obesity is a key factor influencing insulin resistance, which subsequently causes type 2 diabetes. Lists of causal networks across human diseases and candidate genes are described in Kumar and Agrawal (2013) and Schadt (2016).

Although MTM-GWAS is a valuable approach, it only captures correlations or associations among traits and does not provide information about causal relationships. Knowledge of the causal structures underlying complex traits is essential, as correlation does not imply causation. For example, a correlation between two traits, T1 and T2, could be attributed to a direct effect of T1 on T2 or T2 on T1, or to additional variables that jointly influence both traits (Rosa et al., 2011). Likewise, if we know a “causal” SNP is linked to a QTL, we can imagine three possible scenarios with respect to T11: (1) causal (*SNP* → *T*1 → *T*2), (2) reactive (*SNP* → *T*2 → *T*1), or (3) independent (*T*1 ← *SNP* → *T*2). Scenarios (1) and (2) do not cause pleiotropy but produce association.

A SEM methodology has the ability to handle complex genotype-phenotype maps in GWAS, placing an emphasis on causal networks (Li et al., 2006). Therefore, SEM-based GWAS (SEM-GWAS) may provide a better understanding of biological mechanisms and of relationships among a set of traits than MTM-GWAS. SEM can potentially decompose the total SNP effect on a trait into direct and indirect (i.e., mediated) contributions. However, SEM-derived GWAS has yet not been discussed or applied fully in quantitative genetic studies yet. Our objective was to illustrate the potential utility of SEM-GWAS by using three production traits in broiler chickens genotyped for a battery of SNPs as a case example.

## Materials and Methods

### Data Set

The analysis included records for 1,351 broiler chickens provided by Aviagen Ltd. (Newbridge, Scotland) for three phenotypic traits: ultrasound of breast muscle (BM) at 35 days of age, body weight (BW), and hen-house egg production (HHP), defined as the total number of eggs laid between weeks 28 and 54 per bird. The sample consisted of 274 full-sib families, 326 sires, and 592 dams. More details regarding population and family structure were provided by Momen et al. (2017). A pre-correction procedure was performed on the phenotypes to account for systematic effects such as sex, hatch week, pen, and contemporary group for BM and BW. HHP was corrected for random hatch effects, with a general mean as the sole fixed effect.

Each bird was genotyped for 580,954 SNP markers with a 600k Affymetrix SNP (Kranis et al., 2013) chip (Affymetrix, Inc., Santa Clara, CA, USA). The Beagle software program (Browning and Browning, 2007) was used to impute missing SNP genotypes, and quality control was performed using PLINK version 1.9 (Purcell et al., 2007). Markers with minor allele frequencies (MAF) < 1%, call rate < 95%, and Hardy–Weinberg equilibrium (Chi-square test *p*-value threshold was 10^{−6}) were removed. The main reason for conducting the HWE test was to remove SNPs with potential genotyping error. Finally, 354,364 autosomal SNP markers were included in the analysis.

### Multiple-Trait Model for GWAS

MTM-GWAS is a single-trait GWAS model extended to multi-dimensional responses. When only considering additive effects of SNPs, the phenotype of a quantitative trait using the single-trait model can be described as:

where *y*_{i} is the phenotypic trait of individual *i*, *w*_{j} = (*w*_{1}, …, *w*_{p}) is the number of A alleles (i.e., *w*_{j} ∈ {0, 1, 2}) in the genotype of SNP marker *j*, and *s*_{j} is the allele substitution effect for SNP marker *j*. Strong LD between markers and QTLs coupled with an adequate marker density increases the chance of detecting marker and phenotype associations. Hypothesis testing is typically used to evaluate the strength of the evidence of a putative association. Typically, a *t*-test is applied to obtain *p*-values, and the statistic is${\text{}T}_{ij}=\frac{{\u015d}_{j}}{se({\u015d}_{j})}$, where ŝ is the point estimate of the *j-**th* SNP effect and *se*(ŝ_{j}) is its standard error.

The single locus model described above is naive for a complex trait because the data typically contain hidden population structure and individuals have varying degrees of genetic similarity (Listgarten et al., 2012; Gianola et al., 2016). Therefore, accounting for covariance structure induced by genetic similarity is expected to produce better inferences (Kennedy et al., 1992). Ignoring effects that reveal genetic relatedness inflates the residual terms and compromises the ability to detect association. A random effect *g*_{i}, including a covariance matrix reflecting pairwise similarities between additive genetic effects of individuals, can be included to control population stratification. The similarity metrics can be derived from pedigree information or from whole-genome marker genotypes. This model, extended for analysis of *t* traits, is given by:

where ** Y** is the pre-adjusted phenotypic value measured on each birds,

**as previously defined, represent the incidence matrix of genotype codes,**

*W***is the vector of additive marker effect,**

*s***is the vector of random polygenic effect, $g~N(0,{\displaystyle {\sum}_{g}\otimes K})$, and**

*g***ε**represents the residual vector, $\epsilon ~N(0,\text{\hspace{0.17em}}{\displaystyle {\sum}_{\epsilon}\otimes I})$. Here ⊗ denotes the Kronecker product. The covariance matrices were:

The positive definite matrix **K** may be a genomic relationship matrix (**G**) computed from marker data, or a pedigree-based matrix (**A**) computed from genealogical information. The **A** matrix describes the expected additive similarity among individuals, while **G** measures the realized fraction of alleles shared. Genomic relationship matrices can be derived in several ways (VanRaden, 2008; Yang et al., 2010; Forni et al., 2011). Here, we used the form proposed by VanRaden (2008):

where **M** is an *n* × *p* matrix of centered SNP genotypes and *p*_{j} and *q*_{j} = 1 − *p*_{j} are the allele frequencies at marker locus *j*. We evaluated both **A** and **G** in the present study.

### Structural Equation Model Association Analysis

A SEM consists of two essential parts: a measurement model and a structural model. The measurement model depicts the connections between observable variables and their corresponding latent variables (Anderson and Gerbing, 1988). The measurement model is also known as confirmatory factor analysis. The critical part of a SEM is the structural model, which can have three forms (Raykov and Marcoulides, 2012). The first consists of observable exogenous and endogenous variables. This model is a restricted version of a SEM known as path analysis (Wright, 1921). The second form explains the relationship between exogenous and endogenous variables that are only latent. The third type is a model consisting of both manifest and latent variables.

SEM can be applied to GWAS as an alternative to MTM-GWAS to study how different causal paths mediate SNP effects on each trait. The following SEM model was considered:

where **Λ** is a *t*×*t* matrix of regression coefficients or structural coefficients (typically lower-triangular) according to the learned causal structure from the residuals and the diagonal matrix filled with zeros:

The vectors **g** and **ε** are assumed to have a joint distribution $\left[\begin{array}{c}g\\ \epsilon \end{array}\right]=\text{}N\left\{\left[\begin{array}{c}0\\ 0\end{array}\right],\left[\begin{array}{ll}{\displaystyle {\sum}_{g}\otimes K}\hfill & 0\hfill \\ 0\hfill & \Psi \hfill \end{array}\right]\right\}$, and the residual covariance matrix is a diagonal as $\Psi =\left[\begin{array}{lll}{\sigma}_{{\epsilon}_{(BM)}}^{2}\hfill & 0\hfill & 0\hfill \\ 0\hfill & {\sigma}_{{\epsilon}_{(BW)}}^{2}\hfill & 0\hfill \\ 0\hfill & 0\hfill & {\sigma}_{{\epsilon}_{(HHP)}}^{2}\hfill \end{array}\right]$. The remaining terms are as presented earlier with one important difference: the SNP effects are not interpreted as overall effects on trait *t* but instead represent direct effects on trait *t*. Additional indirect effects from the same SNP may be mediated by phenotypic traits in *C*. Each marker is entered into Equation (4) separately, and its significance is tested. For a discussion of how SEM represents genetic signals on each trait through multiple causal paths, see Wu et al. (2010) and Jamrozik and Schaeffer (2011). Despite the difference in interpretation, the distribution of the vector of polygenic effects is assumed to be the same as in the MTM-GWAS model. The same applies to residual terms within a trait. We also consider trait-specific residuals to be independent within an individual. This restriction is required to render structural coefficients likelihood-identifiable. In addition, the interpretation of inferences as having a causal meaning requires imposing the restriction that the residuals' joint distribution be interpreted as the causal sufficiency assumption (Pearl, 2009). In the present study, all exogenous and endogenous variables were observable, and there was no latent variable. Hence, causal structure was assumed between the endogenous variables BM, BW, and HHP.

We considered the following GWAS models with their causal structures were recovered by the inductive causation (IC) algorithm (Pearl, 2009): (1) MTM-GWAS with pedigree-based kinship **A** (MTM-A) or marker-based kinship **G** (MTM-G), and (2) SEM-GWAS with **A** (SEM-A) or **G** (SEM-G). Although nuisance covariates such as environmental factors can be omitted in the graph, they may be incorporated into the models as exogenous variables. The SEM representation allowed us to decompose SNP effects into direct, indirect, and total effects.

A direct SNP effect is the path coefficient between a SNP as an exogenous variable and a dependent variable without any causal mediation by any other variable. The indirect effects of a SNP are those mediated by at least one other intervening endogenous variable. Indirect effects are calculated by multiplying path coefficients for each path linking the SNP to an associated variable, and then summing over all such paths (Mi et al., 2010a; Jiang et al., 2013). The overall effect is the sum of all direct and indirect effects. By explicitly accounting for complex relationship structure among traits in such a way, SEM provides a better understanding of a genome-wide SNP analysis by allowing us to decompose effects into direct, indirect, and overall effects within a predefined casual framework (Nock and Zhang, 2011). MTM-GWAS and SEM-GWAS were compared with the logarithm of the likelihood function (log L), Akaike's Information Criterion (AIC), and the Bayesian Information Criterion (BIC). The model providing the lowest values for these information criteria is considered to fit the data better. MTM-GWAS and SEM-GWAS were fitted using the SNP Snappy strategy (Meyer and Tier, 2012), which is implemented in the Wombat software program (Meyer, 2007). The outputs were a vector of multiple SNP effect estimates, ŝ = [ŝ_{BM}, ŝ_{BW}, ŝ_{HHP}], with corresponding standard errors and respective t-values.

### Searching for a Phenotypic Causal Network in a Mixed Model

In the SEM-GWAS formulation described earlier, the structure of the underlying causal phenotypic network needs to be known. Because this is not so in practice, we used a causal inference algorithm to infer the structure. Residuals are assumed to be independent in all SEM analyses, so associations between observed traits are viewed as due to causal links between traits and by correlations among genetic values (i.e., *g*_{1}, *g*_{2}, and *g*_{3}). Thus, to eliminate confounding problems when inferring the underlying network among traits, we used the approach of Valente et al. (2010) to search for acyclic causal structures through conditional independencies on the distribution of the phenotypes, given the genetic effects. A causal phenotypic network was inferred in two stages: (1) an MTM model (Henderson and Quaas, 1976) was employed to estimate covariance matrices of additive genetic effects and of residuals, and (2) the causal structure among phenotypes from the covariance matrix between traits, conditionally on additive genetic effects, was inferred by the IC algorithm. The residual (co)variance matrix was inferred using Bayesian Markov-chain Monte Carlo (Valente et al., 2010; Wu et al., 2010), with samples drawn from the posterior distribution. The reason for our use of the residual (co)covariances is that the residual structure could bear information from the joint distribution of all phenotypic traits conditional on their polygenic effects, such that they correct the confounding issues caused by such effects when the traits are genetically correlated (Pearl, 2009). For each query testing statistical independence between traits *y*_{t} and ${y}_{{t}^{{}^{\prime}}}$, the posterior distribution of the residual partial correlation ${\rho}_{{y}_{t},{y}_{{t}^{{}^{\prime}}}}|h$ was obtained, where *h* is a set of variables (traits) that are independent. Three highest posterior density (HPD) intervals of 0.75, 0.85, and 0.95 were used to make statistical decisions for SEM-GWAS. We thus considered SEM-A75 (HPD > 0.75), SEM-A85 (HPD > 0.85), SEM-A95 (HPD > 0.95), and SEM-G75 (HPD > 0.75). An HPD interval that does not contain zero declares *y*_{t} and ${y}_{{t}^{{}^{\prime}}}$ to be conditionally dependent.

## Results

Figure 1 shows phenotypic relationship structures recovered by the IC algorithm for the three different HPD intervals. Edges connecting two traits represent non-null partial correlations as indicated by HPD intervals. We compared the two MTM-GWAS and four SEM-GWAS by using the three chicken traits (BW, BM, and HHP). Fully recursive (there is at least one incoming OR outgoing edge for each node) SEM-A75 and SEM-G75 graphs revealed direct effects of BM on BW and HHP, and those of BW on HHP, as well as an indirect effect of BM on HHP. In addition, SEM-A85 detected a direct effect of BM on BW, the direct effect of BW on HHP, and the indirect effect of BM on HHP mediated by BW. Finally, SEM-A95 only identified a direct effect of BM on BW because of a statistically stringent HPD cutoff imposed. SEM-G85 and SEM-G95 were not explored further because they produced the same results as SEM-A85 and SEM-A95.

**Figure 1**. Causal graphs inferred using the IC algorithm among three traits: breast meat (BM), body weight (BW), and hen-house production (HHP) in the chicken data. SEM-A75 and SEM-G75 were the inferred fully recursive causal structures with HPD > 0.75 and corrected for genetic confounder using A (pedigree-based) and G (marker-based) matrices. SEM-A85 and SEM-A95 were obtained with HPD > 0.85 and HPD > 0.95, respectively, corrected with A. Arrows indicate direction of causal relationships. Dashed lines indicate negative coefficients, and the continuous arrows indicate positive coefficients.

Given the causal structures inferred from the IC algorithm, the following SEM was fitted:

Note that only a small number of the entries in the structural coefficient matrix (λ in Equation 5) are non-zero due to sparsity. These non-zero entries specify the effect of one phenotype on other phenotypes. The corresponding directed acyclic graph is shown in Figure 2 assuming the causal relationships among the three traits, where y_{1}, y_{2}, and y_{3} represent BM, BW, and HHP, respectively; SNP_{j} is the genotype of the *j-*th SNP; s_{jl} is the direct SNP effect on trait l; and the remaining variables are as presented earlier. This diagram depicts a fully recursive structure in which all recursive relationships among the three phenotypic traits are shown. Arrows represent causal connections, whereas double-headed arrows between polygenic effects are correlations.

**Figure 2**. A diagram for causal path analysis of SNP effects in a fully recursive structural equation model for three traits, *p* exogenous independent SNP variables, and three correlated polygenic effects. Arrows indicate the direction of causal effects and dashed lines represent associations among the three phenotypes. Genetic correlation between traits (*r*_{g}), polygenic effects (*g*_{t}), environmental effect on trait *t* (*e*_{t}), effects of *j th* SNP on *t th* trait (*S*_{j(yt)}), and recursive effect of phenotype *t*′ on phenotype *l* (${\lambda}_{t,{t}^{{}^{\prime}}}$). Dashed lines indicate negative coefficients and the continuous arrows indicate positive coefficients.

We examined the fit of each model implemented to assess how well it describes the data (Table 1). Varona et al. (2007) and recently Valente et al. (2013) showed that re-parametrization and reduction of a SEM mixed model yield the same joint probability distribution of observation as in MTM, suggesting that the expected likelihood of SEM and MTM should be similar. As expected, SEM-GWAS and MTM-GWAS showed very similar results (e.g., SEM-A75 vs. MTM-A and SEM-G75 vs. MTM-G). Among the models considered, those involving **G** exhibited slightly better fits. SEM-A85 and SEM-A95, sharing a subset of the SEM-A75 structure, presented almost identical AIC and BIC values. Since these results imply that the recursive model and standard mixed model for GWAS are statistically equivalent in terms of the fitting criteria, the focus of the remainder of the analysis will be on the modeling of SNP (or QTL) effects in the SEM context (SEM-A75 or SEM-G75) as an extension of MTM, which accounts for recursive links among the three measured traits.

**Table 1**. Model comparison criteria: logarithm of the restricted maximum likelihood function (log L), Akaike's information criteria (AIC), Schwarz Bayesian information criteria (BIC) were used evaluate model fit for two multiple trait models (MTM) and four structural equation models (SEM).

### Structural Coefficients

Table 2 presents the causal structural path coefficients for endogenous variables (BM, BW, and HHP). All models have positive effects for BM → BW, whereas the BM → HHP and BW → HHP relationships have negative path coefficients. The latter confirmed the fact that chicken breeding is divided into broiler and layer sections due to the negative genetic correlation between BW and HHP.

**Table 2**. Estimates of three causal structural coefficients (λ) derived from four different structural models.

Also shown in Table 2 are the magnitudes of the SEM structural coefficient reflecting the intensity of the causality. The positive coefficient λ_{21} quantifies the (direct) causal effect of BM on BW. This suggests that a 1-unit increase in BM results in a λ_{21}-unit increase in BW. Likewise, the negative causal effects λ_{31} and λ_{32} offer the same interpretation.

### Decomposition of SNP Effect Paths Using a Fully Recursive Model

We can decompose SNP effects into direct and indirect effects using Figure 2. The direct effect of the SNP j on y_{3} (HHP) is given by d_{SNPj → y3}: Ŝ_{j(y3)}, where d denotes the direct effect. Note there are only one direct and many indirect paths. We find three indirect paths from SNP_{j} to y_{3} mediated by y_{1} and y_{2} (i.e., the nodes formed by other traits). The first indirect effect is ind_{(1)SNPj → y3}: λ_{32}(λ_{21}Ŝ_{j(y1)}) in the path mediated by y_{1} and y_{2}, where ind denotes the indirect effect. The second indirect effect ind_{(2)SNPj → y3}: λ_{32}Ŝ_{j(y2)}, is mediated by y_{2}. The last indirect effect, is ind_{(3)SNPj → y3}: λ_{31}Ŝ_{j(y1)}, mediated by *y*_{1}. Therefore, the overall effect is given by summing all four paths, T_{SNPj → y3}: λ_{32}(λ_{21}Ŝ_{j(y1)}) +λ_{32}Ŝ_{j(y2)}+λ_{31}Ŝ_{j(y1)}+Ŝ_{j(y3)}. The fully recursive model of the overall SNP effect is then:

For y_{1} (BM), there is only one effect, so the overall effect is equal to the direct effect. For y_{2} (BW) and y_{3} (HHP), direct and indirect SNP effects are involved. There are two paths for y_{2}: one indirect, ind_{Sj → y2}:Ŝ_{j(y1)} → y_{1} → y_{2}, and one direct, d_{Sj → y2}:Ŝ_{j(y2)} → y_{2}. Here, the SNP effect is direct and mediated thorough other phenotypes according to causal networks in SEM-GWAS (Figures 1, 2). For instance, the overall SNP effect for y_{3} into four direct and indirect paths is T_{Ŝj → y3}:λ_{32}λ_{21}Ŝ_{j(y1)}+λ_{32}Ŝ_{j(y1)}+λ_{31}Ŝ_{j(y1)}+Ŝ_{j(y3)}.

The scatter plots in Figure 3 compare the estimated total effects for HHP (T_{Ŝj → y3}) obtained from SEM-GWAS and those from MTM-GWAS. We observed good agreement between SEM-GWAS and MTM-GWAS. The total SNP signals derived from SEM and MTM are the same but SEM provides biologically relevant additional information.

**Figure 3**. Comparison of multiple trait (MTM) and fully recursive overall SNP effects obtained with **A** (pedigree-based) and **G** (marker-based) from structural equation modeling (SEM)-based GWAS. Overall effects in SEM are the sum of all direct and indirect effects. HHP, hen-house egg production.

Figures S1–S4 present scatter plots of MTM-GWAS and SEM-GWAS signals (SEM-A75, SEM-G75, SEM-A85, and SEM-A95) for the BM → BW path, which was a common path across all SEM-GWAS considered. These two traits have a genetic correlation of 0.5 (results not shown). We partitioned the SEM causal link into direct, indirect, and overall effects based on directed links inferred from the IC algorithm with HPD > 0.85, whereas MTM-GWAS captures an overall SNP effect on BW. Scatter plots of the overall effects from SEM-GWAS and those of the total effects from MTM-GWAS indicated almost perfect agreement (top left plots, Figures S1–S4). We also observed concomitance between estimated overall and direct effects (top right plots, Figures S1–S4). In contrast, there was less agreement in the magnitude of the SNP effects when comparing overall vs. indirect effects (bottom left plots, Figures S1–S4). There was no linear relationship between the indirect and direct SNP effects (bottom right plots, Figures S1–S4). In short, genetic signals detected in SEM-GWAS were close to those of MTM-GWAS for overall effects because both models are based on a multivariate approach with the same covariance matrix. In all SEM-GWAS, results showed that direct effects contributed to overall effects more than the indirect effects.

### Manhattan Plot of Direct, Indirect, and Overall SNP Effects

Figure 4 depicts a Manhattan plot summarizing the magnitude of direct (SEM-75A), indirect (SEM-75A), and overall SNP effects (MTM-75A). We plotted the decomposed SNP effects on BW along chromosomes to visualize estimated marker effects from SEM-GWAS and MTM-GWAS. The indirect and direct effects provide a view of SNP effects from a perspective that is not available for the total effect of MTM-GWAS. For instance, there were two estimated SNP effects on chromosomes 1 and 2 that deserve particular attention. These two SNPs are highlighted with black circles and red ovals. The overall effect of the first SNP consisted of large indirect and small direct effects on BM, whereas the opposite pattern was observed for the second SNP, which showed large direct and small indirect effects. Although the overall effects of these SNPs were similar (top Manhattan plot, Figure 4), use of decomposition allowed us to determine that the trait of interest is affected in different manners: the second SNP effect acted directly on BW without any mediation by BM, whereas the first SNP reflected a large effect mediated by BM on BW. Collectively, new insight regarding the direction of SNP effects can be obtained using the SEM-GWAS methodology.

**Figure 4**. Manhattan plot showing overall, direct and indirect SNP effects using a full recursive model based on **A** matrix for body weight (BW).

The corresponding Manhattan plot based on –log_{10} (*p*-values) is shown in Figure S5. As with the magnitude of effect sizes, the results showed that –log_{10} (*p*-values) of estimated overall effects from SEM-A75 and those from MTM-A75 yielded the same significant peaks. We found that some significant indirect SNP effects reached genome-wide significance after correction for multiple-testing using a 5% FDR threshold level (2.752). The most significant SNPs were on chromosomes 1 and 4 (GGA1 and GGA4).

As an illustration, the six most significant SNPs with the highest –log_{10} (*p*-values) for each type of decomposed SNP effect are presented in Table 3. Seven candidate genes were identified near the significant SNPs derived from the SNP effects decomposition, with two on GGA7 (*OLA1* and *ZNF385B*), one on GGA3 (*EPHA7*), three on GGA4 (*LOC422264, LOC422265*, and *MAEA*), and one on GGA14 (*GRIN2A*). We found that only genes on GGA4 and GGA1 are linked to significant indirect SNP effects that impact HHP. Some studies reported QTLs for BM on GGA1 and for BW on GGA4, stating that these genomic regions contain QTLs related to abdominal fat and growth traits that were detected across diverse chicken populations (Sun et al., 2013; Van Goor et al., 2015). One of the two detected genes on GGA14, i.e., *GRIN2A*, which was linked to the SNP Gga_rs313620413, showed significant direct and overall SNPs effects using SEM as well as MTM. Collectively, Gga_rs15390496, Gga_rs16591372, and Gga_rs313620413 SNPs on GGA3, GGA7, and GGA14, which were linked to *EPHA7, OLA1*, and *GRIN2A*, respectively, represent candidate genes identified from overall effects of both SEM and MTM (Table 3).

**Table 3**. Six most significant SNPs selected according –log_{10} (*p*-values) and their effects, using the full recursive SEM (SEM-A75) and MTM (MTM-A75).

We noted that the six SNPs selected according to the –log_{10} (*p*-values) from the direct effect on HHP (i.e., d_{SNPj → y(HHP)}) had small indirect effects ranging from -0.9018 to 0.2983. These indirect effects were negligible compared with their corresponding direct and total effects. Also, exploring the indirect effect sizes of the six most significant SNPs showed that indirect effects that are transmitted through inferred causal networks have the ability to change the magnitude of overall SNP effects, even changing them to the opposite direction (i.e., from positive to negative or vice versa).

It should also be noted that the estimated additive SNP effects obtained from the four SEM-GWAS can be used for inferring pleiotropy. For instance, a pleiotropic QTL may have a large positive direct effect on BW but may exhibit a negative indirect effect coming from BM, which in turn reduces the total QTL effect on BW. Arguably, the methodology employed here would be most effective when the direct and indirect effects of a QTL are in opposite directions. If the direct and indirect QTL effects are in the same direction, the power of SEM-GWAS may be the same as the overall power of MTM-GWAS. The overall effect (T_{Ŝj → y(HHP)}) of a given SNP consisted of large indirect (ind_{Ŝj → y(HHP)}) and small direct (d_{Ŝj → y(HHP)}) effects on HHP, as observed for the top most significant indirect SNPs localized on GGA4 and GAA1, whereas the opposite pattern was observed for the most significant direct SNPs on GAA3, GGA7, and GGA14, which showed large direct and small indirect effects. Although the overall effects of these SNPs from SEM-GWAS and MTM-GWAS were similar, the use of decomposition allowed us to determine that the trait of interest is affected in different manners. For instance, a given SNP effect may largely act directly on HHP without any mediation by BM and BW, whereas another SNP may be transmitting a large effect through a causal path mediated by BM and BW. Collectively, new insight regarding the direction of SNP effects can be obtained using the SEM-GWAS methodology.

## Discussion

It is becoming increasingly common to analyze a set of traits simultaneously in GWAS by leveraging genetic correlations between traits (Gao et al., 2014; Wu and Pankow, 2017). In the present study, we illustrated the potential utility of a SEM-based GWAS approach for causal inference and mediation analysis of SNP effects, which has the potential advantage of embedding a pre-inferred causal structure across phenotypic traits (Valente et al., 2010). SEM-GWAS, as an extension of standard MTM, accounts for recursive linking of mediating variables that could be either dependent or independent with restriction on a residual covariance. This is a useful approach when multiple mediators influence the final outcomes via either common or distinct biological pathways (Barfield et al., 2017; Bellavia and Valeri, 2017). SEM-GWAS is achieved by first inferring the structure of networks between phenotypic traits. For this purpose, we used a modified version of the IC algorithm described by Pearl (2009) and modified for implementing in quantitative genetics by Valente et al. (2010). The IC algorithm was used to explore putative causal links among phenotypes obtained from a residual covariance matrix, in a model that accounted for systematic and genetic confounding factors such as polygenic additive effects. It then produced a posterior distribution of partial residual correlations between any possible pairs of variables. Three different causal path diagrams were inferred from HPD intervals of 0.75, 0.85, and 0.95. We observed that the number of identified paths decreased with an increase in the HPD interval value. Only a path connecting BM and BW was present in all HPD intervals considered. Moreover, we found that the partial residual correlation between BM and HHP was weaker than that between BM and BW. This may explain why the path between BM and HHP was not detected with HPD intervals larger than 0.75.

The primary purpose of estimating the goodness of fit criterions was to determine whether full recursive SEM and MTM models with different assumptions yield the same or nearly the same BIC and AIC scores. Because our results showed that SEM and MTM produced nearly the same goodness of fit criterions, we conclude that the essential difference between these models cannot be articulated in terms of an expressive power of joint distributions or goodness of fit (Valente et al., 2013).

Estimated path coefficients reflect the strength of each causal link, quantifying the proportion of direct and indirect effects of a given SNP or genes on the outcome of interest via the mediator phenotypic traits or the predefined causal pathway between a set of mediators and the target outcome. For instance, a positive path coefficient from BM to BW suggests that a unit increase in BM directly results in an increase in BW. Our results showed that MTM-GWAS and SEM-GWAS were similar in terms of the goodness of fit as per the AIC and BIC criteria. This finding is in agreement with theoretical work of Gianola and Sorensen (2004) and Varona et al. (2007) showing the equivalence between models. Thus, MTM-GWAS and SEM-GWAS produced the same marginal phenotypic distributions and goodness of fit values. A similar approach has been proposed by Li et al. (2006), Mi et al. (2010b), and Wang and van Eeuwijk (2014). The main difference between our approach and theirs is that they used SEM in the context of standard QTL mapping, whereas our SEM-GWAS is developed for GWAS based on a linear mixed model.

The results obtained in this study using the three economic traits in chickens suggest that causal inference and the SEM framework can be used for a set of phenotypes by considering both the raw and partial correlation relationships among traits in breeding programs. For example, in model SEM-A85, BM and HHP are unconditionally independent. However, conditioning on BW results in a non-zero partial correlation. Conditioning on BW breaks the causal chain from BM to HHP as observed in the case of full recursive models (SEM-A75 and SEM-G75) and their partial correlation becomes non-zero. This indicates that when all three variables are causally connected, both raw and partial correlations will all be non-zero, but they will change the magnitude depending on the signs of the path coefficients.

The advantage of SEM-GWAS over MTM-GWAS is that the former decomposes SNP effects by tracing inferred causal networks. Our results showed that by partitioning SNP effects into direct, indirect, and total components, an alternative perspective of SNP effects can be obtained. As shown in Table 3 and Figure 4, direct and indirect effects may differ in magnitude and sign, acting in the same direction or in an antagonistic manner. Note that the total SNP effects inferred from SEM-GWAS were the same as the estimated SNP effects from MT-GWAS (Table 3). However, knowledge derived from the decomposition of SNP effects may be critical for animal and plant breeders in breaking unfavorable indirect QTL effects by reducing the frequency of undesired alleles or obtaining better SNP effect estimates than those from MTM-GWAS (e.g., Mi et al., 2010b).

## Conclusion

SEM offers insights into how phenotypic traits relate to each other. We illustrated potential advantages of SEM-GWAS relative to the commonly used standard MTM-GWAS by using three chicken traits as an example. SNP effects pertaining to SEM-GWAS have a different meaning than those in MTM-GWAS. Our results showed that SEM-GWAS enabled the identification of whether a SNP effect is acting directly or indirectly, i.e., mediated, on given trait. In contrast, MTM-GWAS only captures overall genetic effects on traits, which is equivalent to combining direct and indirect SNP effects from SEM-GWAS together. Thus, SEM-GWAS offers more information and provides an alternative view of putative causal networks, enabling a better understanding of the genetic quiddity of traits at the genomic level.

## Author Contributions

MM carried out the study and wrote the first draft of the manuscript. GR and DG designed the experiment, supervised the study, and critically contributed to the final version of manuscript. GM contributed to the interpretation of results, provided critical insights, and revised the manuscript. BV and AA participated in discussion and reviewed the manuscript. MA, AK, and RM contributed materials and revised the manuscript. All authors read and approved the final manuscript.

## Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

## Acknowledgments

MM wishes to acknowledge the Ministry of Science, Research and Technology of Iran for financially supporting his visit to the University of Wisconsin-Madison. Work was partially supported by the Wisconsin Agriculture Experiment Station under hatch grant 142-PRJ63CV to DG.

## Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2018.00455/full#supplementary-material

## References

Anderson, J. C., and Gerbing, D. W. (1988). Structural equation modeling in practice: a review and recommended two-step approach. *Psychol. Bull.* 103:411. doi: 10.1037/0033-2909.103.3.411

Barfield, R., Shen, J., Just, A. C., Vokonas, P. S., Schwartz, J., Baccarelli, A. A., et al. (2017). Testing for the indirect effect under the null for genome-wide mediation analyses. *Genet. Epidemiol.* 41, 824–833. doi: 10.1002/gepi.22084

Bellavia, A., and Valeri, L. (2017). Decomposition of the total effect in the presence of multiple mediators and interactions. *Am. J. Epidemiol.* 187, 1311–1318. doi: 10.1093/aje/kwx355

Brachi, B., Morris, G. P., and Borevitz, J. O. (2011). Genome-wide association studies in plants: the missing heritability is in the field. *Genome Biol.* 12:232. doi: 10.1186/gb-2011-12-10-232

Browning, S. R., and Browning, B. L. (2007). Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. *Am. J. Hum. Genet.* 81, 1084–1097. doi: 10.1086/521987

Falconer, D. S., and Mackay, T. F. (1996). Introduction to quantitative genetics (4th edn). *Trends Genet.* 12:280.

Forni, S., Aguilar, I., and Misztal, I. (2011). Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information. *Genet. Sel. Evol.* 43:1. doi: 10.1186/1297-9686-43-1

Gao, H., Zhang, T., Wu, Y., Wu, Y., Jiang, L., Zhan, J., et al. (2014). Multiple-trait genome-wide association study based on principal component analysis for residual covariance matrix. *Heredity* 113, 526–532. doi: 10.1038/hdy.2014.57

Gianola, D., Fariello, M. I., Naya, H., and Schön, C.-C. (2016). Genome-wide association studies with a genomic relationship matrix: a case study with wheat and arabidopsis. *G3: Genes Genomes Genet.* 6, 3241–3256. doi: 10.1534/g3.116.034256

Gianola, D., and Sorensen, D. (2004). Quantitative genetic models for describing simultaneous and recursive relationships between phenotypes. *Genetics* 167, 1407–1424. doi: 10.1534/genetics.103.025734

Hayes, B., and Goddard, M. (2010). Genome-wide association and genomic selection in animal breeding. *Genome* 53, 876–883. doi: 10.1139/G10-076

Henderson, C., and Quaas, R. (1976). Multiple trait evaluation using relatives' records. *J. Anim. Sci.* 43, 1188–1197. doi: 10.2527/jas1976.4361188x

Jamrozik, J., and Schaeffer, L. (2011). Alternative parameterizations of the multiple-trait random regression model for milk yield and somatic cell score via recursive links between phenotypes. *J. Animal Breed. Genet.* 128, 258–266. doi: 10.1111/j.1439-0388.2011.00918.x

Jiang, G., Chakraborty, A., Wang, Z., Boustani, M., Liu, Y., Skaar, T., et al. (2013). New aQTL SNPs for the CYP2D6 identified by a novel mediation analysis of genome-wide SNP arrays, gene expression arrays, and CYP2D6 activity. *BioMed. Res. Int.* 2013:493019. doi: 10.1155/2013/493019

Kennedy, B. W., Quinton, M., and van Arendonk, J. A. M. (1992). Estimation of effects of single genes on quantitative traits. *J. Animal Sci.* 70, 2000–2012. doi: 10.2527/1992.7072000x

Korte, A., Vilhjálmsson, B. J., Segura, V., Platt, A., Long, Q., and Nordborg, M. (2012). A mixed-model approach for genome-wide association studies of correlated traits in structured populations. *Nat. Genet.* 44, 1066–1071. doi: 10.1038/ng.2376

Kranis, A., Gheyas, A. A., Boschiero, C., Turner, F., Yu, L., Smith, S., et al. (2013). Development of a high density 600K SNP genotyping array for chicken. *BMC Genomics* 14:59. doi: 10.1186/1471-2164-14-59

Kumar, S., and Agrawal, S. (2013). “Disease-oriented causal networks,” in *Encyclopedia of Systems Biology* (New York, NY: Springer), 593–594.

Li, R., Tsaih, S.-W., Shockley, K., Stylianou, I. M., Wergedal, J., Paigen, B., et al. (2006). Structural model analysis of multiple quantitative traits. *PLoS Genet.* 2:e114. doi: 10.1371/journal.pgen.0020114

Listgarten, J., Lippert, C., Kadie, C. M., Davidson, R. I., Eskin, E., and Heckerman, D. (2012). Improved linear mixed models for genome-wide association studies. *Nat. Methods* 9, 525–526. doi: 10.1038/nmeth.2037

Meyer, K. (2007). WOMBAT: a tool for mixed model analyses in quantitative genetics by restricted maximum likelihood (REML). *J. Zhejiang Univ. Sci. B* 8, 815–821. doi: 10.1631/jzus.2007.B0815

Meyer, K., and Tier, B. (2012). “SNP Snappy”: a strategy for fast genome-wide association studies fitting a full mixed model. *Genetics* 190, 275–277. doi: 10.1534/genetics.111.134841

Mi, X., Eskridge, K., Wang, D., Baenziger, P. S., Campbell, B. T., Gill, K. S., et al. (2010a). Bayesian mixture structural equation modelling in multiple-trait QTL mapping. *Genet. Res.* 92, 239–250. doi: 10.1017/S0016672310000236

Mi, X., Eskridge, K., Wang, D., Baenziger, P. S., Campbell, B. T., Gill, K. S., et al. (2010b). Regression-based multi-trait QTL mapping using a structural equation model. *Stat. Appl. Genet. Mol. Biol.* 9:38. doi: 10.2202/1544-6115.1552

Momen, M., Mehrgardi, A. A., Sheikhy, A., Esmailizadeh, A., Fozi, M. A., Kranis, A., et al. (2017). A predictive assessment of genetic correlations between traits in chickens using markers. *Genet. Sel. Evol.* 49:16. doi: 10.1186/s12711-017-0290-9

Nock, N., and Zhang, L. (2011). Evaluating aggregate effects of rare and common variants in the 1000 genomes project exon sequencing data using latent variable structural equation modeling. *BMC Proc.* 5 (Suppl. 9):S47. doi: 10.1186/1753-6561-5-S9-S47

O'Reilly, P. F., Hoggart, C. J., Pomyen, Y., Calboli, F. C., Elliott, P., Jarvelin, M.-R., et al. (2012). MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. *PLoS ONE* 7:e34861. doi: 10.1371/journal.pone.0034861

Pearl, J. (2009). Causal inference in statistics: an overview. *Stat. Surveys* 3, 96–146. doi: 10.1214/09-SS057

Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A., Bender, D., et al. (2007). PLINK: a tool set for whole-genome association and population-based linkage analyses. *Am. J. Human Genet.* 81, 559–575. doi: 10.1086/519795

Raykov, T., and Marcoulides, G. A. (2012). *A first Course in Structural Equation Modeling*. Mahwah, NJ Routledge.

Rosa, G. J., Valente, B. D., de los Campos, G., Wu, X.-L., Gianola, D., and Silva, M. A. (2011). Inferring causal phenotype networks using structural equation models. *Genet. Sel. Evol.* 43:6. doi: 10.1186/1297-9686-43-6

Schadt, E. E. (2016). “Chapter 10 - Reconstructing causal network models of human disease A2 - Lehner, Thomas,” in *Genomics, Circuits, and Pathways in Clinical Neuropsychiatry*, eds B. L. Miller and M. W. State (San Diego, CA: Academic Press), 141–160.

Sikorska, K., Lesaffre, E., Groenen, P. F., and Eilers, P. H. (2013). GWAS on your notebook: fast semi-parallel linear and logistic regression for genome-wide association studies. *BMC Bioinform.* 14:166. doi: 10.1186/1471-2105-14-166

Sun, Y., Zhao, G., Liu, R., Zheng, M., Hu, Y., Wu, D., et al. (2013). The identification of 14 new genes for meat quality traits in chicken using a genome-wide association study. *BMC Genomics* 14:458. doi: 10.1186/1471-2164-14-458

Valente, B. D., Rosa, G. J., de los Campos, G., Gianola, D., and Silva, M. A. (2010). Searching for recursive causal structures in multivariate quantitative genetics mixed models. *Genetics* 185, 633–644. doi: 10.1534/genetics.109.112979

Valente, B. D., Rosa, G. J., Gianola, D., Wu, X.-L., and Weigel, K. (2013). Is structural equation modeling advantageous for the genetic improvement of multiple traits? *Genetics* 194, 561–572. doi: 10.1534/genetics.113.151209

Van Goor, A., Bolek, K. J., Ashwell, C. M., Persia, M. E., Rothschild, M. F., Schmidt, C. J., et al. (2015). Identification of quantitative trait loci for body temperature, body weight, breast yield, and digestibility in an advanced intercross line of chickens under heat stress. *Genet. Sel. Evol.* 47:96. doi: 10.1186/s12711-015-0176-7

VanRaden, P. M. (2008). Efficient methods to compute genomic predictions. *J. Dairy Sci.* 91, 4414–4423. doi: 10.3168/jds.2007-0980

Varona, L., Sorensen, D., and Thompson, R. (2007). Analysis of litter size and average litter weight in pigs using a recursive model. *Genetics* 177, 1791–1799. doi: 10.1534/genetics.107.077818

Wang, H., Misztal, I., Aguilar, I., Legarra, A., and Muir, W. (2012). Genome-wide association mapping including phenotypes from relatives without genotypes. *Genet. Res.* 94, 73–83. doi: 10.1017/S0016672312000274

Wang, H., and van Eeuwijk, F. A. (2014). A new method to infer causal phenotype networks using QTL and phenotypic information. *PLoS ONE* 9:e103997. doi: 10.1371/journal.pone.0103997

Wu, B., and Pankow, J. S. (2017). Genome-wide association test of multiple continuous traits using imputed SNPs. *GBA* 10, 379–386. doi: 10.4310/SII.2017.v10.n3.a2

Wu, X. L., Heringstad, B., and Gianola, D. (2010). Bayesian structural equation models for inferring relationships between phenotypes: a review of methodology, identifiability, and applications. *J. Anim. Breed. Genet.* 127, 3–15. doi: 10.1111/j.1439-0388.2009.00835.x

Yang, J., Benyamin, B., McEvoy, B., Gordon, S., Henders, A., and Nyholt, D. (2010). Common SNPs explain a large proportion of the heritability for human height. *Nat. Genet.* 42, 565–569. doi: 10.1038/ng.608

Keywords: causal structure, GWAS, multiple traits, path analysis, SEM, SNP effect

Citation: Momen M, Ayatollahi Mehrgardi A, Amiri Roudbar M, Kranis A, Mercuri Pinto R, Valente BD, Morota G, Rosa GJM and Gianola D (2018) Including Phenotypic Causal Networks in Genome-Wide Association Studies Using Mixed Effects Structural Equation Models. *Front. Genet*. 9:455. doi: 10.3389/fgene.2018.00455

Received: 21 June 2018; Accepted: 18 September 2018;

Published: 09 October 2018.

Edited by:

John Anthony Hammond, Pirbright Institute (BBSRC), United KingdomReviewed by:

Fabyano Fonseca Silva, Universidade Federal de Viçosa, BrazilGregor Gorjanc, University of Edinburgh, United Kingdom

Copyright © 2018 Momen, Ayatollahi Mehrgardi, Amiri Roudbar, Kranis, Mercuri Pinto, Valente, Morota, Rosa and Gianola. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Ahmad Ayatollahi Mehrgardi, mehrgardi@uk.ac.ir