Impact Factor 3.789

Frontiers reaches 6.4 on Journal Impact Factors

Original Research ARTICLE

Front. Genet., 25 September 2014 | https://doi.org/10.3389/fgene.2014.00328

A revised Fisher model on analysis of quantitative trait loci with multiple alleles

  • Division of Biostatistics, Institute for Health and Society, Medical College of Wisconsin, Milwaukee, WI, USA

Zeng et al. (2005) proposed a general two-allele (G2A) model to model bi-allelic quantitative trait loci (QTL). Comparing with the classical Fisher model, the G2A model can avoid using redundant parameters and be fitted directly using standard least square (LS) approach. In this study, we further extend the G2A model to general multi-allele (GMA) model. First, we propose a one-locus GMA model for phase known genotypes based on modeling the inheritance of paternal and maternal alleles. Next, we develop a one-locus GMA model for phase unknown genotypes by treating it as a special case of the phase known one-locus GMA model. Thirdly, we extend the one-locus GMA models to multiple loci. We discuss how the genetic variance components can be analyzed using these GMA models in equilibrium as well as disequilibrium populations. Finally, we apply the GMA model to a published experimental data set.

1. Introduction

Currently there are two types of statistical genetic models that are commonly used in genetic analysis of quantitative traits. One is the F type models that concentrate on direct modeling of the expected genotypic values at targeted quantitative trait loci (QTL) or genetic markers and association testing for various allelic effects and interactions (Fisher, 1918; Cheverud, 2000; Hansen and Wagner, 2001; Wang, 2011). Another is Fisher's analysis of variance (ANOVA) models that focus on assessing variations contributed by some genetic components (i.e., grouped allelic effects or allelic interactions) at targeted QTL or genetic markers (Fisher, 1918; Cockerham, 1954, 1963; Kempthorne, 1969; Weir and Cockerham, 1977; Wang and Zeng, 2006). A considerable amount of discussion has been made about the distinction between these two different types of genetic models (Zeng et al., 2005; Álvarez-Castro and Carlborg, 2007; Yang and Álvarez-Castro, 2008; Wang and Zeng, 2009). More recently, a comprehensive review of various genetic models was also given in Álvarez-Castro (2012).

In genetic association studies, we are often interested in direct comparison of the expected genotypic values at certain QTL or marker loci. The F models are appealing in this setting due to their simplicity in interpretation of their model parameters, which are often referred as the fixed genetic effects such as the additive and dominance effects or the allelic effects and allelic interactions in terms of the expected genotypic values. By applying the F models, we can compare the expected genotypic values via hypothesis tests on various fixed genetic effects. However, as pointed out in Wang and Zeng (2009), the p-value based association tests on these fixed genetic effects could highly depend upon the regression model, the distribution assumption and the available sample size. Besides, a statistically significant genetic effect with a small enough p-value may not necessarily imply a clinically important finding. On the other hand, the Fisher type models allow us to assess the genetic variations contributed by certain genetic components to the overall variation of a quantitative trait. By definition, these genetic variance components do not depend on the sample size, and they can provide additional information on better understanding the genetic etiology and assessing for clinical importance. Nonetheless, both the F and the Fisher type models form basis in the analysis of quantitative traits. They provide different perspectives in assessing the genetic effects of QTL or genetic markers.

The basic genetic model on assessing the genetic variance components was first proposed by Fisher (1918). Cockerham (1954, 1963) extended Fisher's one-locus model to two bi-allelic QTL with a particular focus on epistatic variance components. Kempthorne (1954, 1957) further extended the two-locus model to multiple alleles. Wang and Zeng (2006) also explored the Fisher type multi-allele two-locus model on partition of the genotypic variance in the presence of linkage disequilibrium (LD). With their model parameters referred as the average effect of the gene substitution (see Falconer and Mackay, 1996) or the average allelic effects and interactions (Wang and Zeng, 2009), these classical Fisher type models often contain constraints on their model parameters due to an over-parameterization of the expected genotypic values. These constraints could make it difficult to fit this type of models using standard least square (LS) regression approach. To avoid this problem, under a regression model framework, Zeng et al. (2005) proposed a general two-allele (G2A) model for variance component analysis on bi-allelic QTL. The G2A model retains the nice property of the classical Fisher's model on orthogonal partition of the genotypic variance in equilibrium populations. Meanwhile, without using redundant parameters, the G2A model can be fitted directly using the standard LS approach. Wang and Zeng (2009) further explored the origin of the G2A model and clarified its theoretical basis on pertaining the orthogonal partition of the genotypic variance in equilibrium populations. Álvarez-Castro and Carlborg (2007) also proposed a different way for re-parameterization of the expected genotypic values, which the authors referred as the natural and orthogonal interactions (NOIA) model. It appears that the NOIA model first defines the genetic effects in terms of the expected genotypic values, and then derives the design matrix for the genetic effects via an inverse of the matrix. Though the matrix formulation is helpful for facilitating the transformation between different parameterizations, making inverse of the matrix to construct the design matrix is difficult to implement analytically. For multiple QTL, how to define various genetic effects in terms of the expected genotypic values could also be a challenge especially when locus-by-locus interactions are involved. Besides, the NOIA model assumes that the design matrix is of full rank, which makes it unsuitable for reduced re-parameterization of the genotypic values. Currently, no NOIA model has been proposed for reduced re-parameterization of the expected genotypic values, or for multiple QTL in the presence of LD.

In this study, we further extend the G2A model to QTL with multiple alleles and multiple loci. In bi-allelic case, only one additive effect and one dominance effect are needed at each locus, and the locus-by-locus interactions can be easily included for constructing a full re-parameterization of the genotypic values. For one QTL with multiple alleles, how to define the dominance effects for various allelic interactions is not straightforward especially when phases of its genotypes are unknown. The extension to multiple loci is also cumbersome by the much more complex structure of locus-by-locus interactions. How to present the model and define various genetic variance components are not trivial tasks. To construct one-locus general multi-allele (GMA) model, we overcome the phase problem by appropriately merging the paternal and maternal allelic effects and allelic interactions in the phase-known situation. Typically, with phase unknown genotypes at a locus, we may have to assume that the paternal and maternal alleles have the same frequencies and contribute the same genetic effects so that we could merge them without distinguishing their parental origins. With phase-known genotypes, we can further break down the additive variance component into paternal and maternal variance components. For multiple QTL with multiple alleles, we develop concise expressions for constructing multi-locus GMA models and defining various genetic components. Explicit formulas for calculating various genetic variance components in equilibrium population are also derived.

The structure of this manuscript is organized as the following. First, we consider one multi-allele QTL with phase known genotypes. Following the same strategy as adopted in Wang and Zeng (2009), we start by introducing indicator variables that describe the inheritance of paternal and maternal alleles. Then we make mean corrections on these indicator variables and build one-locus GMA model based on the mean-corrected index variables. Next, we consider one QTL with phase unknown genotypes. We construct a one-locus GMA model for unphased genotypes by appropriately combining the paternal and maternal allelic effects in the phase known one-locus GMA model. We derive formulas on partitioning the genotypic variance into the additive and dominance variance components under Hardy-Weinberg equilibrium (HWE) as well as in Hardy-Weinberg disequilibrium (HWD) for both the phase-known and phase-unknown one-locus GMA models. Thirdly, we extend the one-locus GMA models to multiple loci with either phase known or phase unknown genotypes. Based on these multi-locus GMA models, we describe how the various genetic components can be defined. An orthogonal partition on the genotypic variance in an equilibrium population is also presented in both the phase known and unknown cases. In addition, we discuss how to construct the reduced multi-locus GMA models in practice. The difference in using F models and GMA models to build reduced models for the expected genotypic values is explored. Finally, we apply the GMA model to a published experimental data set.

2. Methods and Results

The variation of a quantitative trait Y is usually assumed to be contributed by both genetic and environmental effects. Let G denote the true (unobservable) genotypic value from the joint contribution of all the genetic factors to the quantitative trait Y. Given genotypes at targeted QTL or genetic marker loci, we focus on assessing the variations (i.e., variances) contributed by the allelic effects and interactions of the QTL or marker loci to the total genotypic variance VG.

2.1. One-locus Model for Phase Known

First, let us consider a single QTL with multiple alleles A1, …, Am (m ≥ 2). Suppose we know the parental origins of the alleles for observed genotypes. Then we can distinguish the paternal and maternal allelic effects separately. With m alleles, there are in total m2 possible phased genotypes: (Ai, Aj), i, j = 1, …, m, where Ai and Aj represent the paternal and maternal allele, respectively. Let Gij = E[G|g = (Ai, Aj)] denote the expected genotypic value for a phased genotype (Ai, Aj) in a study population. Then there are totally m2 possible expected genotypic values Gij, i, j = 1, …, m. To model these expected genotypic values, Fisher's classical one-locus model is given by

Gji=μ+αi+αj+δji, i,j=1,,m,    (1)

where αi (or αj) is the so-called average additive or main allelic effect of a paternal (or maternal) allele Ai (or Aj), and δij is the average allelic interaction between a paternal allele Ai and a maternal allele Aj. The above model is a typical two-way ANOVA model with the paternal and maternal gametes being treated as two independent risk factors. Although the paternal and maternal gametes often share (but do not have to) the same set of alleles A1, …, Am at the QTL, it allows the paternal and maternal gametes to have different allele frequencies and allelic effects at the QTL.

Let pi be the frequency of allele Ai on paternal gametes (i=1mpi=1), and pj be the frequency of allele Aj on the maternal gametes (j=1mpj=1). One nice feature of the Fisher model above is that it can assess the additive and dominance variance components VA and VD, which are defined as variations contributed by the additive allelic effects from the paternal and maternal alleles and the allelic interactions between the paternal or maternal alleles, respectively. For example, under HWE, it is well known that the total genotypic variance VG = ∑i,jpipj(Gij − μ)2 has an orthogonal partition VG = VA + VD, where

VA=ipi(αi)2+jpj(αj)2 ,   VD=i,jpipj(δji)2.

Note that there are in total m2 + 2m + 1 = (m + 1)2 parameters being involved in Fisher model (1) including the intercept μ, which is more than the total number m2 of the expected genotypic values Gij, i, j = 1, …, m. As a result, not all the model parameters are estimable. To avoid this problem, some constraints on the model parameters are often required. It is usually assumed that all the genetic effects are averaged to zero over any index; i.e.,

ipiαi=0,  jpjαj=0,  ipiδji=0,  jpjδji=0.    (2)

With these constraints, Fisher (1918) showed that the least square estimates (LSE) of the model parameters are given by

μ=E(G), αi=G.iG. · , αj=Gj ·G. · , δji=GjiG.iGj ·+G. ·

where G·. = E(G), Gi. = E[G|g = (Ai, −)] and G·j = E[G|g = (−, Aj)]. In simple cases, we could estimate the model parameters using these formulas. In general, however, those “irregular” constraints in (2) make it difficult to fit model (1) using the standard LS approach via commonly used software such as SAS (SAS Institute INC, Raleigh, NC), especially when we need to adjust for certain environmental covariates.

Here we propose a way to get rid of the redundant parameters. Let us first introduce the following indicator variables that describe the transmission inheritance of the paternal and maternal alleles.

zPi(g)={1,thepaternalalleleisAi0,thepaternalalleleisnotAi ,zMj(g)={1, thematernalalleleisAj0, thematernalalleleisnotAj ,

for i, j = 1, 2, …, m at the QTL. Next, using the same strategy as adopted in Wang and Zeng (2006), we further make mean corrections on these indicator variables zPi, zMj and define the following mean-corrected index variables

xPi(g)=zPi(g)E[zPi(g)]={1pi,thepaternalalleleisAipi,thepaternalalleleisnotAi ,xMj(g)=zMj(g)E[zMj(g)]={1pj,thepaternalalleleisAjpj,thepaternalalleleisnotAj ,

Then we can re-write the Fisher model (1) as

E(G|g)=μ+i=1mαixPi(g)+j=1mαjxMj(g)+i,jδjixPi(g)xMj(g),    (3)

where E(G|g) denotes the expected genotypic value given a genotype g at the targeted QTL. As shown in Wang and Zeng (2006), the above model is simply a different representation of the Fisher model (1) with the same constraints being applied on the model parameters. However, as the HWD measurements can be quantified as covariances between those mean-corrected index variables xPi's and xMj's, this model expression can facilitate derivation on examining the genetic variance components under HWD.

Now, we further exclude the redundant parameters in model (3). For a diploid subject such as human being, his or her genotype at a locus on a pair of homologous chromosomes consists of two alleles with one from the father and the other one from the mother. Therefore, we always have i=1mzPi(g)1 and j=1mzMj(g)1. Thus, xPm=i=1m1xPi and xMm=j=1m1xMj. So we can simply replace xPm by (i=1m1xPi), and xMm by (j=1m1xMj) in model (3), which leads to the following revised Fisher model

E(G|g)=μ+i=1m1αixPi(g)+j=1m1αjxMj(g)+i=1m1j=1m1δjixPi(g)xMj(g).    (4)

Model (4) provides a full re-parameterization of the m2 expected genotypic values Gij, i, j = 1, …, m, without using redundant parameters. We refer it as a revised one-locus Fisher model or a general multi-allele (GMA) model. We also refer its model parameters α*i (or α*j) as the average (additive) allelic effect of the paternal (or maternal) allele Ai (or Aj), and δ*i*j the average allelic interaction between the paternal allele Ai and maternal allele Aj, with respect to the baseline allele Am. Here, we choose allele Am as the baseline allele but it could be any other allele as well. It is easy to show that, in terms of the parameters in the original Fisher model (1), we have α*i = αi − αm, α*j = αj − αm and δ*i*j = δij − δmj − δim + δmm, for i, j = 1, …, m.

Model (4) retains most of the nice features of the original Fisher model (1) on partition of the genotypic variance. Based on this model, we have the genetic additive variance components VAP=Var(i=1m1αixPi) and VAM=Var(j=1m1αjxMj), which denote variations contributed by the additive effects of the paternal and maternal alleles, respectively. The genetic dominant variance component VD=Var(i=1m1i=1m1δjixPixMj), which accounts for a variation contributed by the interactions between paternal and maternal alleles in addition to the additive variance components. Under HWE, the paternal index variables {xPi, i = 1, …, m} are independent of the maternal index variables {xMj, j = 1, …, m}. Therefore, we have μ = E(G) and an orthogonal partition on the variance of the expected genotypic values: V(E(G|g)) = VAP + VAM + VD, where

VAP=i=1m1pi(αi)2(i=1m1piαi)2,VAM=j=1m1pj(αj)2(j=1m1pjαj)2  VD=i,j=1m1pipj(δji)2i=1m1pi(j=1m1pjδji)2j=1m1pj(i=1m1piδji)2+(i,j=1m1pipjδji)2.

In HWD, the disequilibrium measurements can be captured by the covariances between the index variables xPi's and xMj's. In this case, we no longer have an orthogonal partition on the variance of the genotypic values for the additive and dominance variance components (see Appendix A). It should be pointed out that although the model parameters defined in model (4) depend upon the choice of the reference allele Am, the additive and dominant variance components VAP, VAM, VD as well as their covariance components do not depend on such a choice.

Note that the partition of the total genotypic variance VG based on the Fisher model (1) assumes that the genotypic values are completely determined by the QTL. In general, beyond the targeted QTL, the genotypic value G could also be affected by other unobserved QTL. In this case, the total genotypic variance VG = V(E(G|g)) + E(V(G|g)), where V(G|g) is a variation that cannot be explained by the targeted QTL. So the Fisher model (1), or its revised model (4), actually provides a partition on the variance V(E(G|g)) of the expected genotypic values at the targeted QTL instead of the total genotypic variance VG. In practice, given a random sample from the study population, if we ignore the genetic by environmental interaction, a quantitative trait can typically be expressed in a linear regression model form as

yk=βzk+E(G|gk)+ϵk, k=1,,n,

where, for k = 1, …, n, gk is the observed QTL genotype of subject k, zk is a vector of subject k from some adjusted environmental covariates with fixed effects β, and ϵk is a model residual contributed by other environmental and genetic factors that cannot be captured by zk and gk at the targeted QTL. Assuming that ϵk, k = 1, …, n, are independent and identically distributed (i.i.d) with a variance σ2ϵ, we have Vy = V(E(G|g)) + σ2ϵ. To further partition V(E(G|g)) based on the QTL genotypes, we can first calculate the allele frequencies and compute the values of the index variables {xPi(gk), xMj(gk)} for each subject k = 1, …, n, in the sample. Then, by treating these index variables as fixed covariates, we can incorporate the GMA model (4) into the regression model above and fit the model using the standard LS approach. Based on the LSE α^i, α^j and δ^ji of the model parameters, we can compute the additive and dominance genetic components AP(k)=i=1m1α^ixPi(gk),AM(k)=j=1m1α^jxMj(gk) and D(k)=i=1m1j=1m1δ^jixPi(gk)xMj(gk), for k = 1, …, n. Finally, the genetic variance components VAP, VAM and VD can be estimated as the sample variances of {AP(k), k = 1, …, n}, {AM(k), k = 1, …, n}, and {D(k), k = 1, …, n}, respectively. Meanwhile, the genetic covariance components Cov(AP, AM), Cov(AP, D), and Cov(AM, D) can be estimated through the sample covariances

Cov(AP,AM)^=1nk=1n(AP(k)A¯P)(AM(k)A¯M),   Cov(AP,D)^=1nk=1n(AP(k)A¯P)(D(k)D¯),  Cov(AM,D)^=1nk=1n(AM(k)A¯M)(D(k)D¯),

respectively, where A¯P=k=1nAP(k)/n,A¯M=k=1nAM(k)/n and D¯=k=1nD(k)/n.

2.2. One-locus Model for Phase Unknown

In this subsection, we consider modeling a multi-allele QTL with phase unknown genotypes—a more common situation in practice. As we cannot distinguish the parental origins of alleles in QTL genotypes, as usual, we assume that the paternal and maternal gametes share the same set of alleles with the same allele frequencies. Let A1, …, Am (m ≥ 2) denote the alleles at a target QTL or marker locus with allele frequencies pi, i = 1, …, m. Ignoring the phases, there are m possible homozygous genotypes AiAi, i = 1 …, m, and m(m − 1)/2 possible heterozygous genotypes AiAj, ij. We also assume that these alleles contribute the same genetic effects regardless of their parental origins, which implies that the expected genotypic values Gij = E(G|g = AiAj) satisfy the symmetric property: Gij = Gji, for ij. So there are totally m(m + 1)/2 possible distinctive expected genotypic values Gij, i, j = 1, …, m. In this case, by treating the paternal and maternal gametes as two independent risk factors, the Fisher's ANOVA model for the expected genotypic values Gij can be written as

Gij=μ+αi+αj+δij, i,j=1,,m,    (5)

where αi is the average (additive) allelic effect of the paternal or maternal allele Ai (i = 1, …, m), and δij is the average allelic interaction between two alleles Ai and Aj (i, j, = 1, …, m). As pointed out in Wang (2011), the above model is different from the classical two-way ANOVA model in that the paternal and the maternal gametes share the same set of alleles and have the same allelic effects at the locus. To avoid the inestimability of model parameters in model (5) due to over-parameterization, the following constraints are often added on the model parameters

ipiαi=0,  ipiδij=0.

From the symmetric property of Gij, we also assume δjk's are symmetric. Similarly, the above constraints together with the symmetry property of δjk make it difficult to fit model (5) using the standard LS approach.

Note that model (5) can be treated as a special case of model (1) or (3) with αi = αi and δij = δji for i, j = 1, …, m. To construct a similar GMA model for model (5), we can combine the term α*ixPi with α*ixMi, and δ*i*jxPixMj with δ*j*ixPjxMi in model (4). By denoting α*i = α*i as α*i for i = 1, …, m − 1, and δ*j*i = δ*i*j as δ*ij for ij, we obtain the following GMA model

E(G|g)=μ+i=1m1αiwi(g)+i=1m1δiivii(g)+j=2m1i<jδijvij(g),    (6)

where, for i = 1, …, m − 1,

wi(g)=xPi+xMi={2(1pi), ifg=AiAi12pi, ifg=AiAic2pi, ifg=AicAic ,vii(g)=xPixMi={(1pi)2, ifg=AiAipi(1pi), ifg=AiAicpi2, ifg=AicAic ,

and for i < j

vij(g)=xPixMj+xPjxMi={(1pi)(1pj)+pipj, ifg=AiAj2pj(1pi), ifg=AiAipj(12pi), ifg=AiAijc2pi(1pj), ifg=AjAjpi(12pj), ifg=AjAijc2pipj, ifg=AijcAijc .

Here Acij denotes an allele which is different from Ai and Aj. Note that the combined index variables wi(g), vii(g), and vij(g) above are well defined on unphased genotypes, although xPi, xMj are not. We refer them as genotype coding variables, and the model parameter α*i as the average allelic effect of allele Ai (i = 1, …, m), and δ*ij as the average allelic interaction between two alleles Ai and Aj (i, j, = 1, …, m), with respect to the baseline allele Am. In terms of the parameters in the original model (5), we can show that α*i = αi − αm, for i = 1, …, m − 1; and δ*ij = δij − δim − δjm + δmm, for ij, i, j = 1, …, m − 1.

Model (6) is an extension of the one-locus G2A model proposed in Zeng et al. (2005) to one QTL with multiple alleles. Note that no vij(g)'s for i < j are needed in the bi-allelic case. The combined index variables vii(g), i = 1, …, m − 1, are also slightly different from the ones defined in Zeng et al. (2005) by a scalar of (−2) folds. Here we drop the scalar (−2) so that the coefficient δ*ii can keep the same interpretation as δ*i*i in model (4). In addition, the combined index variables vij (i < j) defined above are not exactly the same as the ones we suggested in the discussion section of Wang (2011). Based on the previously defined inheritance indicator variables, we define w*i = zPi + zMi and v*ij(g) = zPizMj + zPjzMi for i < j. Then here we have vij = v*ij − (piw*j + pjw*i) + 2pipj, for i < j.

Still, model (6) retains the nice feature of the classical Fisher's model on partition of the genotypic variance. The additive variance component VA=V(i=1m1αiwi(g)), which is contributed by the additive effects of both paternal and maternal alleles. The dominant variance component VD=V(i=1m1δiivii(g)+i<jδijvij(g)), which is contributed by all the interactions between paternal and maternal alleles. Under HWE, we have μ = E(G) and an orthogonal partition on the variance of the expected genotypic values V(E(G|g)) = VA + VD, where

VA=2i=1m1pi(αi)22(i=1m1piαi)2,VD=i,j=1m1pipj(δij)22j=1m1pj(i=1m1piδij)2+(i,j=1m1pipjδij)2.

Here we define δ*ji = δ*ij, for i < j. In HWD, the desired orthogonal partition on the variance of the expected genotypic values is no longer held. But the model can still allow us to capture the disequilibria via covariances between those index variables xPi's and xMj's (see Appendix B).

As an example, let us consider a QTL with 3 alleles A1, A2, and A3. By taking A3 as the baseline allele, model (6) leads to

E(G|g)=μ+α1w1(g)+α2w2(g)+δ11v11(g)+δ22v22(g)+δ12v12(g).    (7)

Or, in a matrix form, we have

(G11G12G22G13G23G33)=(12(1p1)2p2(1p1)2p222p2(1p1)112p112p2p1(1p1)p2(1p2)1p1p2+2p1p212p12(1p2)p12(1p2)22p1(1p2)112p12p2p1(1p1)p22p2(12p1)12p112p2p12p2(1p2)p1(12p2)12p12p2p12p222p1p2)    (μα1α2δ11δ22δ12)

If we choose A1 (or A2) instead of A3 as the baseline allele, we can obtain different re-parameterizations of the six expected genotypic values. But they all give the same partition on the variance of the expected genotypic values. Álvarez-Castro and Yang (2011) also presented a similar re-parameterization of the six expected genotypic values from their NOIA model. It appears that their one-locus 3-allele NOIA model and the GMA model (7) above are likely equivalent on partitioning the variance of the six expected genotypic values, as we will see later in the Example Subsection 2.4.

Similar to the phase-known case, we can estimate the additive, dominance variance components and the covariance Cov(A, D) from a random sample when the QTL genotypes are available. First, we can compute the values of the combined index variables wi(gk) and vij(gk) at the target QTL for each subject k = 1, …, n. Next, we can incorporate model (6) into a regression model with other possible adjusted covariates. By treating those combined index variables as regular fixed covariates, we can fit the regression model using the standard LS approach. Then, based on the fitted model, we compute the additive and dominance components A(k)=i=1m1α^iwi(gk) and D(k)=i=1m1δ^iivii(gk)+i<jδ^ijvij(gk) for each subject k = 1, …, n, where α^i,δ^ii and δ^ij are LSE of the model parameters. Finally, we can estimate VA and VD as the sample variances of {A(k), k = 1, …, n} and {D(k), k = 1, …, n}, respectively. Meanwhile, an estimate of Cov(A, D) is given by the sample covariance between {A(k), k = 1, …, n} and {D(k), k = 1, …, n}.

2.3. Multi-locus Models

The one-locus GMA models can be extended to multiple loci. Typically, for each locus k, we can create the mean-corrected index variables x(k)Pi, x(k)Mj (or combined index variables wk,i, vk,ij for phase-unknown genotypes) in the same way as in the one-locus case. Then we can build a multi-locus GMA model by including the within locus additive and dominance effects as well as the locus-by-locus interactions (i.e., epistases). The proposed multi-locus GMA model allows LD among multiple loci even though the LD may affect the partition on the variance of the expected genotypic values.

Consider L loci with alleles Ak1, …, Akmk at the k-th locus for k = 1, …, L. With phase known, there are totally m21··· m2L possible expected genotypic values: Gs1···sLt1···tL = E[G|g = (A1s1···ALsL, A1t1···ALtL)], where the joint genotype (A1s1···ALsL, A1t1···ALtL) is formed by the union of a paternal gamete A1s1···ALsL and a maternal gamete A1t1···ALtL with sk, tk ∈ {1, …, mk} for k = 1, …, L. Let pPksk and pMktk be the frequencies of the paternal allele Aksk and maternal allele Aktk, respectively. At each locus k = 1, …, L, we define the mean-corrected index variables x(k)Psk for paternal alleles Aksk (sk = 1, …, mk) and x(k)Mtk for maternal alleles Aktk (tk = 1, …, mk) in the same way as in the one-locus case. Then, when we choose A1m1, …, ALmL as the baseline alleles, a fully parameterized L-locus GMA model can be expressed as

E(G|g)=i1=01j1=01iL=01jL=01{s1,if i1=1}{t1,if j1=1}{sL,if iL=1}{tL,if jL=1}αt1tLs1sL·(xPs1(1))i1(xMt1(1))j1(xPsL(L))iL(xMtL(L))jL         (8)

where the summation of sk (or tk) is from 1 up to (mk − 1) for k = 1, …, L; sk (or tk) refers to a particular paternal allele Aksk (or maternal allele Aktk) at the k-th locus; and ik (or jk) is an indicator variable for the presence or absence of a paternal (or maternal) allele at locus k which is involved in a term. The coefficient αs*1···s*Lt*1···t*L in each term represents an average allelic effect of a single paternal or maternal allele, or an allelic interaction from a set of paternal and maternal alleles that are involved in this term with respect to the baseline alleles A1m1, …, ALmL. The superscripts (or subscripts) in αs*1···s*Lt*1···t*L are defined as s*k = ik · sk (or t*k = jk · tk), indicating which paternal (or maternal) allele at locus k is involved in this term. If the term does not have any paternal (or maternal) allele at locus k being involved, then we have s*k = 0 (or t*k = 0). Note that naP=k=1Lik (or naM=k=1Ljk) specifies the total number of paternal (or maternal) alleles being involved in a term. We refer the total number of both paternal and maternal alleles that are involved in a term na = naP + naM as the order of this term.

The multi-locus GMA model (8) provides a full re-parameterization of the m21···m2L expected genotypic values Gs1···sLt1···tL with phase-known genotypes without using redundant parameters. Note that E(G|g) can also be partitioned into a sum of the following genetic components

Cj1jLi1iL={s1,if i1=1}{t1,if j1=1}...{sL,if iL=1}{tL,if jL=1}αt1tLs1sL·(xPs1(1))i1(xMt1(1))j1(xPsL(L))iL(xMtL(L))jL,

where i1, j1, …, iL, jL = 0, 1 are indicator variables, which specify the parental origins of the contributing alleles. In other words, each genetic component consists of the terms with the same order and having their alleles all coming from the same subset of loci with the same parental origins but allowing varied allelic types. The classical genetic variance components can then be defined as variances of these genetic components. Note that the component C0···00···0 is a constant, which corresponds to an intercept without any alleles being involved. Therefore, for a L-locus GMA model with phase known genotypes, there are in total (22L−1) genetic variance components V(Cj1jLi1iL), where i1, j1, …, iL, jL = 0, 1 but not all being zeros. Among them, there are L paternal and L maternal variance components of order 1 with each having a single paternal or maternal allele being involved, L dominance variance components of order 2 with each having two alleles within the same locus being involved, L(L − 1)/2 paternal by paternal (or maternal by maternal) variance components of order 2 with two paternal (or maternal) alleles coming from different loci, and L(L − 1) paternal by maternal variance components of order 2 with one paternal and one maternal allele coming from two different loci. For examples, the within-locus paternal, maternal and dominance variance components at a locus k can be written as

VAP(k)=V(sk=1mk1α0000sk0·xPsk(k))=V(C000010),VAM(k)=V(tk=1mk1α0tk0000·xMtk(k))=V(C010000),VD(k)=V(sk=1mk1tk=1mk1α0tk00sk0·xPsk(k)xMtk(k))=V(C010010),

respectively (k = 1, …, L). For a pair of loci jk, the two-locus paternal by paternal variance component is

VAP(j)×AP(k)=V(sj=1mj1sk=  1mk1α00000sjsk0·xPsj(j)xPsk(k))          =V(C00000110),

and the two-locus paternal by maternal variance component is

VAP(j)×AM(k)=V(sj=1mj1tk=1mk1α00tk00sj00·xPsj(j)xMtk(k))          =V(C00100100).

The variance component of epistases with the highest order is

VD(1)××D(L)=V(s1=1m11t1=1m11sL=1mL1tL=1mL1αt1tLs1sLxPs1(1)xMt1(1)xPsL(L)xMtL(L))=V(C1111),

which has 2L alleles being involved with two alleles per locus.

Based on these genetic components, we can partition the variance of the expected genotypic values into the variances and covariances of the genetic components. Still, the coefficients αs*1···s*Lt*1···t*L are defined based on the baseline alleles A1m1, …, ALmL. But the variances and covariances of these genetic components do not depend on the choice of the baseline alleles. Under HWE, the within-locus paternal index variables {x(k)Psk, sk = 1, …, mk} are independent of the maternal index variables {x(k)Mtk, tk = 1, …, mk} for each k = 1, …, L. With LE, the index variables {x(j)Psj, x(j)Mtj, sj, tj = 1, …, mj} at a locus j are also independent of the index variables {x(k)Psk, x(k)Mtk, sk, tk = 1, …, mk} at a different locus k (kj). To achieve an orthogonal partition on the variance of the expected genotypic values, we need to further assume that all the paternal index variables {x(k)Psk, sk = 1, …, mk, k = 1, …, L} are independent of the maternal index variables {x(k)Mtk, tk = 1, …, mk, k = 1, …, L} across all the loci; i.e., the so-called gametic equilibrium (see Wang and Zeng, 2006). Under both the gametic and linkage equilibria, each genetic component has its mean E(Cj1jLi1iL)=0, and the covariances between different genetic components are zeros. Thus, we have E(G) = α0···00···0 and an orthogonal partition on the variance of the expected genotypic values is given by

V(E(G|g))=i1=01j1=01iL=01jL=01V(Cj1jLi1iL),    (9)

where

V(Cj1jLi1iL)={s1,s1,if i1=1}(p1s1P1{s1=s1}p1s1Pp1s1P)i1{t1,t1,if j1=1}(p1t1M1{t1=t1}p1t1Mp1t1M)j1{sL,sL,if iL=1}(pLsLP1{sL=sL}pLsLPpLsLP)iL{tL,tL,if jL=1}(pLtLM1{tL=tL}pLtLMpLtLM)jL·αt1tLs1sLαt1tLs1sL.

Similarly, we can construct multi-locus GMA models for QTL with phase unknown genotypes. Without distinguishing the parental origin of the alleles, there are totally ∏Lk = 1 mk(mk + 1)/2L possible expected genotypic values: Gs1t1···sLtL = E[G(g)|g = (A1s1A1t1, …, ALsLALtL)], for sk, tk = 1, …, mk and k = 1, …, L. We assume that the paternal and maternal allele frequencies are the same and denoted by pksk. We define the combined index variables wk, i(g) and vk,ij at each locus k in the same way as the one-locus case for k = 1, …, L. Then, when we choose A1m1, …, ALmL as the baseline alleles, a fully parameterized L-locus GMA model for QTL with phase unknown genotypes can be expressed as

E(G|g)=i1=02iL=02{s1,if i1=1}          {sL,ifiL=1}{s1t1,if i1=2}...{sLtL,if iL=2}αs1sL·(w1,s11{i1=1}·v1,s1t11{i1=2})(wL,sL1{iL=1}·vL,sLtL1{iL=2}),    (10)

where 1{ik=j} is the Kronecker function which equals 1 when ik = j and 0 otherwise, for k = 1, …, L and j = 1 or 2; the summation of sk (or tk) is from 1 up to mk − 1 with sk (or tk) referring to allele Aksk (or Aktk) at a locus k (k = 1, …, L); and ik specifies how many alleles at a locus k are involved in this term. If ik = 1, this term has only one allele Aksk (either paternal or maternal) being involved via wk,sk and we set sk* = sk. If ik = 2, both the paternal and maternal alleles Aksk and Aktk are involved via vk,sktk in this term and we set s*k = sktk regardless of the order of sk and tk. When ik = 0, this term does not have any alleles at locus k being involved and we have (wk,sk1{ik=1}·vk,sktk1{ik=2})=1. The coefficient αs*1···s*L represents the average allelic effect of a single allele, or an allelic interaction from all the alleles that are involved in this term, with respect to the baseline alleles A1m1, …, ALmL. We still refer the total number of alleles being involved in each term; i.e., na=k=1Lik, as the order of this term.

Based on the above model, we can define the genetic components as

Ci1iL={s1,if i1=1}...{sL,if iL=1}{s1t1,if i1=2}...{sLtL,if iL=2}αs1sL·(w1,s11{i1=1}·v1,s1t11{i1=2})(wL,sL1{iL=1}·vL,sLtL1{iL=2}),

for i1, …, iL = 0, 1, 2, with each component being a summation of the terms with their alleles all coming from the same subset of loci and having the same number of alleles from each locus. Excluding the one with i1 = ··· = iL = 0, which corresponds to an intercept, there are in total 3L − 1 genetic variance components V(Ci1···iL) for i1, …, iL = 0, 1, 2. Among them, there are L additive variance components with each having a single paternal or maternal allele being involved, L dominance variance components with each having both paternal and maternal alleles within the same locus being involved, L(L − 1)/2 additive by additive variance components with two alleles coming from different loci, etc. For examples, the within-locus additive and dominance variance components at a locus k can be written as

VA(k)=V(sk=1mk1α0sk0·wk,sk)=V(C010),VD(k)=V(sktkα0sktk0·vk,sktk)=V(C020),

respectively, for k = 1, …, L. For a pair of loci jk, the two-locus additive by additive interaction is

VA(j)×A(k)=V(sj=1mj1sk=1mk1α0sjsk0·wj,sjwk,sk)            =V(C0110).

The variance component of epistases with the highest order of 2L is given by

VD(1)××D(L)     =V(s1t1sLtLαs1t1,,sLtLv1,s1t1vL,sLtL)     =V(C22).

Under both the gametic and linkage equilibria, we have E(G) = α0···0 and an orthogonal partition on the variance of the expected genotypic values

V(E(G|g))=i1=02iL=02V(Ci1iL),    (11)

where

V(Ci1iL)={s1,s1,if i1=1}{2(p1s11{s1=s1}p1s1p1s1)}1{i1=1}  {sL,sL,if iL=1}{2(pLsL1{sL=sL}pLsLpLsL)}1{iL=1}  {s1,s1,t1,t1,if i1=2}{(p1s11{s1=s1}p1s1p1s1)           (p1t11{t1=t1}p1t1p1t1)}1{i1=2}  {sL,sL,tL,tL,if iL=2}{(pLsL1{sL=sL}pLsLpLsL)           (pLtL1{tL=tL}pLtLpLtL)}1{iL=2}αs1sLαs1sL.

Here, when ik = 2, we have s*k = sktk and s*′k = s′kt′k. We also define αs*1···s*L (or αs*′1···s*′L) to be the same if we switch the order of sk and tk in s*k (or sk' and tk' in s*′k) for k = 1, …, L. In the presence of disequilibria, the desired orthogonal partition may no longer hold. However, regardless of the equilibrium, the coefficients αs*1···s*L in model (10) are defined based on the baseline alleles A1m1, …, ALmL, while the variances and covariances of the genetic components do not depend on the choice of these baseline alleles.

In practice, we do not have to rely on the derived formula to estimate the genetic variance or covariance components. Similar to the one-locus case, given the observed QTL genotypes for a random sample from a study population, we can always incorporate model (8) or (10) into a regression model with other possible adjusted covariates and fit the model using standard LS approach. Then we can estimate various genetic variance components as well as the covariances among different genetic components based on the fitted model. A good fit of a fully parameterized GMA model often requires that the expected genotypic values for all possible joint genotypes of the QTL are estimable from the study sample. If certain genotypes are not observable or rarely present in subjects from the study sample, a situation which likely happens when the number of alleles or the number of QTL is large with moderate or small sample size, the design matrix for the genetic effects could become singular which implies that some genetic variance components cannot be estimated reliably. But we do not have to use fully parameterized GMA models to model the expected genotypic values. In this case, we may want to build a reduced GMA model that can provide a good approximation to the expected genotypic values overall and meanwhile has a less complicated model structure. The fact that two terms within the same genetic component are unavoidably correlated suggests that we should perhaps treat each genetic component as a whole and keep or drop its terms all at once in building a GMA model. As genetic components of lower orders tend to have bigger impact on the expected genotypic values than the higher order ones, one way to construct a reduced GMA model is perhaps to go through a stepwise forward selection procedure by hierarchically adding the lowest order genetic component that can achieve a nominal significance level (e.g., 5%) but has not yet been selected in the model into the model one at a time. Here, the classical likelihood ratio statistic can be used to assess each genetic component for entering into or dropping from the model.

It has been known that the model building procedure is often sensitive to potential confounding among the selected variables. A GMA model uses the mean-corrected index variables x(k)Psk, x(k)Mtk (or their combined index variables wk,sk, vk,sktk in the phase unknown case) as the basic units in constructing all its model terms. At least in an equilibrium population, these mean-corrected index variables can reduce the confounding among different genetic components and make them uncorrelated. This orthogonality implies that each genetic component can be assessed independently regardless of which components are presented in the model. On the other hand, an F model can be thought of directly using the inheritance indicator variables z(k)Psk and z(k)Mtk (or their merged genotype coding variables w*k,sk(g) = z(k)Psk + z(k)Msk and v*k,sktk = z(k)Pskz(k)Mtk + z(k)Ptkz(k)Msk in phase unknown case) as the basic units in constructing its terms (Wang and Zeng, 2009; Wang, 2011). Even in an equilibrium population, an F model could have its low-order terms being highly confounded with other high-order terms when they contain shared alleles (see Zeng et al., 2005). As the result, in building a predictive model based on F models, the stepwise forward selection procedure could be problematic because failing to include a significant higher order term (or component) in a reduced F model could make the assessment of some low-order terms (or components) unreliable. On selecting significant QTL from a set of loci without having the locus-by-locus interactions being involved, the choice of using GMA or F model in building a reduced model for the expected genotypic values should not matter much because using mean-corrected index variables mainly affects the intercept in this case. However, when we consider including epistases for a given set of QTL, the GMA model can appropriately use the orthogonal property among different genetic components to dissect the confounding at least in equilibrium populations, while it appears that the F models cannot make full use of the equilibrium information. When disequilibria are present, as Hardy-Weinberg equilibria are expected to be held in most of the human genomic regions and LD mainly present for closely linked loci, we would expect that most of the genetic components in a GMA model are likely uncorrelated. Therefore, in most cases, using the GMA model could still be preferable to using F model in building reduced models for expected genotypic values especially when epistases are involved.

2.4. Example

As an example, we apply the GMA model to a published experimental data set on the polymorphism at the human acid phosphatase locus (ACP1). The analysis of this data set was first conducted by Greene et al. (2000), and recently re-analyzed as an example in Álvarez-Castro and Yang (2011). The ACP1 gene involves 3 alleles A, B, C. Two phenotypic traits are considered, which measure the ACP1 enzyme activity (yac) and inhibition (yin). The estimates of the expected genotypic values and the genotype frequencies are summarized in Table 1.

TABLE 1
www.frontiersin.org

Table 1. Expected genotypic values and observed genotype frequencies.

From the genotype frequencies, we first estimate the allele frequencies as pA = 0.3534, pB = 0.5818, and pC = 0.0647. Then, for each trait, we fit a separate GMA model (7) to its expected genotypic values by taking C as the baseline allele and creating the combined index variables w1(g), w2(g), v11(g), v22(g), v12(g). For ACP1 enzyme activity, we obtain LSE of the model parameters as μ = 167.735, α*1 = −59.260, α*2 = −26.254, δ*11 = −4.800, δ*22 = 3.700 and δ*12 = −2.000. For ACP1 enzyme inhibition, we have μ = 39.386, α*1 = −16.149, α*2 = −19.714, δ*11 = −0.200, δ*22 = 4.200, and δ*12 = 2.100.

Next, for each trait separately, we calculate A(gk) = α*1w1(gk) + α*2w2(gk) and D(gk) = δ*11v11(gk) + δ*22v22(gk) + δ*12v12(gk), k = 1, …, 6, for the six ACP1 genotypes. Based on the genotype frequencies, we then calculate the genetic variance components. For ACP1 enzyme activity, we obtain VA = 658.868, VD = 0.973 and the covariance Cov(A, D) = −0.356. The total variance of the expected genotypic values is V(E(Gac|g)) = 659.129, and VA/V(E(Gac|g)) = 99.96%. For ACP1 enzyme inhibition, we have VA = 44.920, VD = 0.146 and the covariance Cov(A, D) = −0.217. The total variance of the expected genotypic values is V(E(Gin|g)) = 44.632, and VA/V(E(Gin|g)) = 100.65%. Note that the partition V(E(G|g)) = VA + VD + 2Cov(A, D), which is not orthogonal for both traits due to a slight deviation from HWE at ACP1 locus as we have observed from the genotype frequencies in Table 1. It is also interesting to see that for ACP1 enzyme inhibition, VA/V(E(Gin|g)) is bigger than 100% due to HWD and the fact that VD + 2Cov(A, D) < 0.

Using the same allele frequencies but assuming HWE, we would have slightly different genotype frequencies. Note that the LSE of the model parameters keep the same and do not depend on the genotype frequencies because they are completely determined by the allele frequencies and the six expected genotypic values when a fully-parameterized GMA is used. But the total variance of the expected genotypic values and its variance components will be different. For ACP1 enzyme activity, we obtain VA = 660.588, VD = 0.971, Cov(A, D) = 0.006, V(E(Gac|g)) = 661.573 and VA/V(E(Gac|g)) = 99.85%. For ACP1 enzyme inhibition, we have VA = 46.422, VD = 0.152, Cov(A, D) = 0.001, V(E(Gin|g)) = 46.573 and VA/V(E(Gin|g)) = 99.68%. The minor deviations from orthogonal partitions are likely due to some numerical round-off and the fact that the summation of the three allele frequencies is 0.9999 instead of exactly 1.

For this three-allele example, we also applied the NOIA model using the formulas (10) and (11) provided in Álvarez-Castro and Yang (2011), and in both HWE and HWD cases we obtained exactly the same results as above based on our own calculation, which however appear to be slightly different from what had been reported in Álvarez-Castro and Yang (2011). For example, based on the observed genotype frequencies, an estimate of VD = 1.15 for the trait of ACP1 enzyme inhibition was reported in Álvarez-Castro and Yang (2011), which is noticeably larger than VD = 0.146 that we obtain above.

3. Discussion

In the analysis of genetic variance components, a separation of the variations contributed by the additive allelic effects and allelic interactions is complicated by the fact that the observed genotypes are often phase-unknown. In this study, by appropriately merging the paternal and maternal allelic effects and allelic interactions in the phase-known situation, we propose a way to construct one-locus and multi-locus GMA models on analysis of genetic variance components for QTL with multiple alleles. In the same way as building a G2A model, we construct a GMA model by first specifying its design matrix for the genetic effects via some mean-corrected index variables. As these mean-corrected index variables are well defined based on the observed genotypes and allele frequencies, they can be treated as regular covariates for coding QTL genotypes. These one-locus or multi-locus GMA models can then be incorporated into standard regression models with other possible adjusted covariates and fitted using standard LS approach. Based on the fitted models, we can further estimate the genetic variance and covariance components through the sample variances and covariances of various genetic components. As we have pointed out, these GMA models can be applied to equilibrium populations as well as populations in Hardy-Weinberg and/or linkage disequilibria. By using the full set or a low-order subset of the index variables (or genetic components), the GMA model allows us to make either full or reduced re-parameterization of the genotypic values. When some loci have phase known genotypes while other loci have phase unknown genotypes (a possible hypothetical situation), a mixed GMA model could also be constructed by adopting the same modeling strategy.

Sometimes we may want to perform hypothesis tests on the existence of certain genetic variance and covariance components. Note that the GMA models have allele frequencies being involved in their design matrices. As allele frequencies often need to be estimated from the genotype data, they could contribute another source of variation in the LSE (β^) of the model parameters as well as the genetic variance components. When the allele frequencies can be accurately estimated, we could simply treat them as fixed constants. When the residuals in a regression model do not depend on the allele frequency estimates, based on the linear model theory (see Ravishanker and Dey, 2002), β^ are known to be unbiased with its covariance matrix cov(β^) = σ2E[(XX)−1], where X is the design matrix of a GMA model. In this case, we can assess the existence of variance components by performing traditional hypothesis tests on β^. In general, we could also assess the existence of these genetic variance and covariance components through a bootstrap procedure. By repeatedly drawing random samples of the same size from the observed random sample with replacement, we can estimate the genetic variance and covariance components for each bootstrap sample and meanwhile assess the variances in estimates of these genetic variance and covariance components and test for their existence.

In genetic studies, QTL with missing genotypes is a common phenomenon. GMA model can be used to fit QTL with missing genotypes. Rather than excluding patients with missing QTL genotypes, we could treat “missing” as an allele although this strategy may induce potential bias as we assume that all the missing alleles have the same genetic effect. GMA models could also be applied incorporation with various imputation methods. In recent years, there has been a great deal of interest in developing methodologies for QTL mapping using recombinant intercrosses from multiple inbred lines. In this case, the putative QTL often have their locations and genotypes unknown. But the allele frequencies of QTL could probably be inferred from the study design and the QTL genotypes might be imputable from their neighboring genetic markers. How to apply GMA models to this type of experimental crosses for QTL mapping could be a research topic for further exploration.

In summary, the analysis of genetic variance components for multi-allele QTL has been challenging due to complex allelic interactions and locus-by-locus interactions. In this study, we thoroughly explored the architecture of one-locus and multi-locus GMA models with either phase known or unknown genotypes. Particularly, we described in detail the architecture of the multi-locus GMA model, and how the model terms can be grouped into various genetic components. Under equilibria populations, we also derived formulas for orthogonal partition of the genetic variance components, which could be useful for analytical assessment of the variance components. Comparing to the classical Fisher model, the GMA models can estimate the genetic variance and covariance components more conveniently via standard LS approach for either one or multiple QTL with multiple alleles, in equilibrium as well as disequilibrium populations.

Conflict of Interest Statement

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

The author would like to acknowledge Dr. Zhao-Bang Zeng at Bioinformatics Research Center, North Carolina State University, for his thoughtful comments and suggestions on an earlier version of the manuscript.

References

Álvarez-Castro, J. M. (2012). Current applications of models of genetic effects with interactions across the genome. Curr. Genomics 13, 163–175. doi: 10.2174/138920212799860689

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Álvarez-Castro, J. M., and Carlborg, Ö. (2007). A unified model for functional and statistical epistasis and its application in quantitative trait loci analysis. Genetics 176, 1151–1167. doi: 10.1534/genetics.106.067348

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Álvarez-Castro, J. M., and Yang, R.-C. (2011). Multiallelic models of genetic effects and variance decomposition in non-equilibrium populations. Genetica 139, 1119–1134. doi: 10.1007/s10709-011-9614-9

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Cheverud, J. M. (2000). “Detecting epistasis among quantitative trait loci,” in Epistasis and the Evolutionary Process, eds J. B. Wolf, E. D. Brodie, and M. J. Wade (New York, NY: Oxford University Press), 58–81.

Cockerham, C. C. (1954). An extension of the concept of partitioning hereditary variance for analysis of covariances among relatives when epistasis is present. Genetics 39, 859–882.

Pubmed Abstract | Pubmed Full Text

Cockerham, C. C. (1963). “Estimation of genetic variances,” in Statistical Genetics and Plant Breeding, Vol. 982, eds W. D. Henson and H. F. Robinson (Washington, DC: National Academy of Science - National Research Council), 53–94.

Falconer, D. S., and Mackay, T. F. C. (1996). Introduction to Quantitative Genetics, 4th Edn. Harlow: Longman Group Ltd.

Fisher, R. A. (1918). The correlation between relatives on the supposition of mendelian inheritance. Trans. R. Soc. Edinburgh 52, 399–433. doi: 10.1017/S0080456800012163

CrossRef Full Text

Greene, L. S., Bottini, N., Borgiani, P., and Gloria-Bottini, F. (2000). Acid phosphatase locus 1 (acp1): possible relationship of allelic variation to body size and human population adaptation to thermal stressa theoretical perspective. Am. J. Hum. Biol. 12, 688–701. doi: 10.1002/1520-6300(200009/10)12:5<688::AID-AJHB14>3.0.CO;2-C

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Hansen, T. F., and Wagner, G. P. (2001). Modeling genetic architecture: a multilinear theory of gene interaction. Theor. Popul. Biol. 59, 61–86. doi: 10.1006/tpbi.2000.1508

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Kempthorne, O. (1954). The correlation between relatives in a random mating population. Proc. R. Soc. Lond. B Biol. Sci. 143, 103–113. doi: 10.1098/rspb.1954.0056

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Kempthorne, O. (1957). An Introduction to Genetic Statistics. New York, NY: John Wiley and Sons, Inc.

Kempthorne, O. (1969). An Introduction to Genetic Statistics. New Haven, CT: Iowa State University Press Ames.

Ravishanker, N., and Dey, D. K. (2002). A First Course in Linear Model Theory. Boca Raton, FL: Chapman and Hall, CRC.

Wang, T. (2011). On coding genotypes for genetic markers with multiple alleles in genetic association study of quantitative traits. BMC Genet. 12:82. doi: 10.1186/1471-2156-12-82

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Wang, T., and Zeng, Z. B. (2006). Models and partition of variance for quantitative trait loci with epistasis and linkage disequilibrium. BMC Genet. 7:9. doi: 10.1186/1471-2156-7-9

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Wang, T., and Zeng, Z. B. (2009). Contribution of genetic effects to genetic variance components with epistasis and linkage disequilibrium. BMC Genet. 10:52. doi: 10.1186/1471-2156-10-52

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Weir, B. S. (1996). Genetic Data Analysis II: Methods for Discrete Population Genetic Data. Sunderland, MA: Sinauer Associates, Inc.

Weir, B. S., and Cockerham, C. C. (1977). “Two-locus theory in quantitative genetics,” in Proceedings of the International Conference on Quantitative Genetics, eds E. Pollak, O. Kempthorne, and T. B. Bailey Jr. (Ames, IA: Iowa State University Press), 247–269.

Yang, R.-C., and Álvarez-Castro, J. M. (2008). Functional and statistical genetic effects with multiple alleles. Curr. Top. Genet. 3, 49–62.

Zeng, Z.-B., Wang, T., and Zou, W. (2005). Modeling quantitative trait loci and interpretation of models. Genetics 169, 1711–1725. doi: 10.1534/genetics.104.035857

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Appendix A. One-locus Model for Phase Known – in Hardy-Weinberg Disequilibrium

In HWD, we can represent the genotype frequencies as P(Ai,Aj) = pipj + Dij, where Dij measures the departure from HWE with constraints i=1mDji=0 for j = 1, …, m, and j=1mDji=0 for i = 1, …, m. Note that Dij = P(Ai,Aj)pipj = Cov(xPi, xMj), we can show that E(G)=μ+i=1m1j=1m1δjiDji, and the variance of the expected genotypic values becomes

V(E(G|g))=VAP+VAM+VD+2Cov(AP,AM)+2Cov(AP,D)+2Cov(AM,D)

where AP=i=1m1αixPi(g),AM=j=1m1αjxMj(g) and D=i=1m1j=1m1δjixPi(g)xMj(g). The formulas on calculating the additive variance components VAP and VAM are the same as in HWE case. But the formula for dominance variance component VD becomes

VD=i,j=1m1pipj(δji)2i=1m1pi(j=1m1pjδji)2j=1m1pj(i=1m1piδji)2+(i,j=1m1pipjδji)2    +i,j=1m1(δji)2Dij2i=1m1(j=1m1pjδji)(j=1m1Djiδji)2j=1m1(i=1m1piδji)(i=1m1Djiδji)    +2(i,j=1m1pipjδji)(i,j=1m1Djiδji)+2i,s=1m1pi(j=1m1pjδjs)(j=1m1Djsδji)(i,j=1m1δjiDji)2

For the covariances, we have

Cov(AP,AM)=i,j=1m1DjiαiαjCov(AP,D)=i,j=1m1Djiαiδji(i=1m1piαi)(j,s=1m1Djsδjs)j=1m1(i=1m1αiDji)(s=1m1psδjs)Cov(AM,D)=i,j=1m1Djiαjδji(j=1m1pjαj)(i,t=1m1Dtiδti)i=1m1(j=1m1αjDji)(t=1m1ptδti)

As the paternal and maternal alleles are correlated in HWD, we will likely have non-zero covariances among AP, AM and D.

Appendix B. One-locus Model for Phase Unknown – in Hardy-Weinberg Disequilibrium

In HWD, we can represent the genotype frequencies as PAiAi = p2i + Dii for j = 1, …, m, and PAiAj = 2pipj + 2Dij for ij. Since pi = PAiAi + ∑j≠iPAiAj/2, we have i=1mDij=0 for j = 1, …, m; j=1mDij=0 for i = 1, …, m; and Dij = Dji. As the disequilibrium measures can be represented as Dij = Cov(xPi, xMj) for i, j = 1, …, m, we can show that E(G)=μ+i=1m1δiiDii+2j=2m1i<jδijDij, and the variance partition of the expected genotypic values V(E(G|g)) = VA + VD + 2Cov(A, D), where

VA=2i=1m1pi(αi)22(i=1m1piαi)2+2(i=1m1Dii(αi)2+2i<jαiαjDij)VD=i,j=1m1pipj(δij)22j=1m1pj(i=1m1piδij)2+(i,j=1m1pipjδij)2+i,j=1m1Dij(δij)24j=1m1(i=1m1piδij)(i=1m1Dijδij)+2(i,j=1m1pipjδij)(i,j=1m1Dijδij)+2i,s=1m1pi(j=1m1pjδsj)(j=1m1Dsjδij)(i,j=1m1Dijδij)2Cov(A,D)=2i,j=1m1Dijαiδij2(i=1m1piαi)(s,j=1m1Dsjδsj)2j=1m1(i=1m1αiDij)(s=1m1psδsj)

In HWD, the genotype frequencies can also be parameterized as PAiAi = p2i + pi(1 − pi)f, and PAiAj = 2pipj(1 − f) for ij (see Weir, 1996). This is a special case of the above parameterization with Dii = pi(1 − pi)f and Dij = −pipjf, for ij.

Keywords: Fisher's genetic model, genetic variance components, general two-allele model, general multi-allele model, orthogonality, least square approach, average allelic effects and interactions

Citation: Wang T (2014) A revised Fisher model on analysis of quantitative trait loci with multiple alleles. Front. Genet. 5:328. doi: 10.3389/fgene.2014.00328

Received: 10 July 2014; Accepted: 03 September 2014;
Published online: 25 September 2014.

Edited by:

Qizhai Li, Chinese Academy of Sciences, China

Reviewed by:

Guohua Zou, Chinese Academy of Sciences, China
Tian-Qing Zheng, Chinese Academy of Agricultural Sciences, China

Copyright © 2014 Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Tao Wang, Division of Biostatistics, Institute for Health and Society, Medical College of Wisconsin, 8701 Watertown Plank Road, PO Box 26509, Milwaukee, WI 53226, USA e-mail: taowang@mcw.edu