Abstract
Estimating the resolution power of mapping analysis of linked quantitative trait loci (QTL) remains a difficult problem, which has been previously addressed mainly by Monte Carlo simulations. The analytical method of evaluation of the expected LOD developed in this article spreads the “deterministic sampling approach for the case of two linked QTL for single and twotrait analysis. Several complicated questions are addressed through this evaluation: the dependence of QTL detection power on the QTL effects, residual correlation between the traits, and the effect of epistatic interaction between the QTL for one or both traits on expected LOD (ELOD), etc. Although this method gives only an asymptotic estimation of ELOD, it allows one to get an approximate assessment of a broad spectrum of mapping situations. A good correspondence was found between the ELODs predicted by the model and LOD values averaged over Monte Carlo simulations.
MANY efforts have been devoted to increasing the efficiency of marker analysis of quantitative traits, including interval analysis (Lander and Botstein 1989; Knott and Haley 1992), selective sampling (Lebowitzet al. 1987; Darvasi and Soller 1992, 1994; Welleret al. 1997), replicated progeny testing (Soller and Beckmann 1990), and sequential experimentation (Boehnke and Moll 1989; Motro and Soller 1993). Recently, a general method to improve the efficiency of quantitative trait loci (QTL) mapping was proposed by taking into account simultaneous segregation at many genomic segments that affect the trait in question (Jansen and Stam 1994; Zeng 1994). A situation in which one QTL (or a chromosome segment) affects several traits simultaneously can also be considered to result in increased power (Korol et al. 1987, 1994, 1995, 1998; Jiang and Zeng 1995; Roninet al. 1995; Zeng 1997). Such an analysis may be important in markerassisted breeding strategies, dissecting heterosis as a multilocus multitrait phenomenon, developing optimized programs for evaluation and bioconservation of genetic resources, and revealing genetic architecture of fitness systems in natural populations, etc. Multipletrait mapping analysis proved to be very useful within the framework of the selective genotyping design (Welleret al. 1997; Roninet al. 1998).
The multipletrait approach may help in coping with a complicated problem arising when the considered chromosome contains several QTL (e.g., Jiang and Zeng 1995; Korolet al. 1998). If one tries to fit a singlelocus model to such a case, a ghost QTL can be detected in an interval that has no effect on the trait (Knott and Haley 1992; Martinez and Curnow 1992; Wright and Kong 1997). Especially difficult are situations with trans effects of linked QTL (Knott and Haley 1992; Luo and Kearsey 1992). That transassociation of QTL could be a common phenomenon even in interspecific crosses has been demonstrated by DeVicente and Tanksley (1993) in tomato: they found that up to 36% of the detected QTL had alleles with effects opposite to the direction expected from the parental differences.
The usual way of dealing with several linked QTL is multiple regression analysis or mixture model analysis that includes markers as regression cofactors to account for segregation of QTL of the same chromosome (Jansen and Stam 1994; Zeng 1994). The third possibility is to construct two to threeinterval mixture models, although this approach is rather cumbersome and needs intensive calculations. Employing Monte Carlo simulations with mixture models, we demonstrated recently the advantage of multiple trait analysis in detection of linked QTL effects (Korolet al. 1998). The goal of this article is to elaborate an analytical model enabling us to evaluate in a general form the expected LOD values in cases of two linked QTL. Such a model can be used as a tool to predict the expected resolution in different complicated situations. As a practical application one can consider the possibility of calculating the minimum sample size needed to detect linked QTL with certain effects on either of the correlated quantitative traits or to prove the existence of epistasis for any of the traits. Likewise, the proposed analysis allows us to predict situations where a ghost QTL will be detected using interval analysis and to evaluate the minimum marker density needed to prevent such a possibility for given effects of the linked QTL. Recently, a similar technique, referred to as “deterministic sampling, was applied to singleQTL situations in singletrait analysis, with the expected LOD values calculated numerically (Mackinnon and Weller 1995; Mackinnonet al. 1996; Wright and Kong 1997). Our major target here is analytical and numerical deterministic sampling for twotrait analysis with linked QTL. We first treat the case of a singletrait analysis and then generalize the results for the twotrait analysis. The consideration will be based on a modification of the maximumlikelihood technique relevant to asymptotic properties of the LOD test, which will be referred to as “regression of the loglikelihood function. For the case of singlemarker analysis this modification is equivalent to the usual procedure of expected LOD (ELOD) calculation (Lander and Botstein 1989) with the only difference that it is a function of the variable position of marker.
SINGLETRAIT ANALYSIS
The major target of our analysis is analytical and numerical deterministic sampling with linked QTL. Therefore, analytical expression of ELOD should be obtained that allows us to compare H_{2} (two linked QTL) and H_{1} (single QTL) for any set of parameter values.
SingleQTL models: Let a trait x be dependent on two linked loci Q_{1}/q_{1} and Q_{2}/q_{2} and let the trait values in the four QTL groups Q_{1}Q_{1}Q_{2}Q_{2}, Q_{1}Q_{1}q_{2}q_{2}, q_{1}q_{1}Q_{2}Q_{2}, and q_{1}q_{1}q_{2}q_{2} of a mapping population have normal densities f_{11}(x), f_{12}(x), f_{21}(x), and f_{22}(x) with (unknown) means
Consider a random sample of individuals genotyped for marker loci from the chromosome that carries the two QTL. With a dense molecular map, one analyzes consequently a series of markers with five different locations relative to the linked QTL:
Clearly, situations (d) and (e) are equivalent (up to parameter replacement) to (b) and (a), respectively. Our intention is to evaluate how misspecification of the model (assumption of one QTL when actually two linked QTL reside on the chromosome) affects the parameter estimation. This is done by scanning across a large number of markers, so that besides situations (a), (c), and (e), one could also encounter situations close to those of (b) and (d). Moreover, in all of the cases we assume that the trial marker exactly coincides with the putative (single) QTL. Due to the foregoing assumptions, the true expected densities of the trait distribution in the alternative marker groups for an arbitrary marker will be
Clearly,
These results allow us to evaluate the consequences of model misspecification. The behavior of the score ELOD = V_{1}(·) – V_{0} as a function of trial marker position and the parameters characterizing the effect of Q_{1}/q_{1} and Q_{2}/q_{2} are represented in Table 1 and Figure 1. Clearly, V_{1} (·)–V_{0} reaches a local maximum when the marker coincides with one of the QTL. One can easily see from the presented illustrations that the possibility of finding an indication of the existence of two QTL by revealing two local maxima depends on linkage phase (coupling or repulsion), distance between the QTL, and magnitudes of the QTL effects and their ratio (see Figure 1 and Table 1).
Two linked QTL: ELOD for testing H_{2} vs. H_{1}: As besfore, consider a situation when the target trait x depends on the two linked loci Q_{1}/q_{1} and Q_{2}/q_{2} with normal trait densities f_{11}(x), f_{12}(x), f_{21}(x), f_{22}(x) in the QTL groups Q_{1}Q_{1}Q_{2}Q_{2}, Q_{1}Q_{1}q_{2}q_{2}, q_{1}q_{1}Q_{2}Q_{2}, and q_{1}q_{1}q_{2}q_{2} of the dihaploid mapping population characterized by unknown means
Clearly, other possible situations are equivalent to these four, up to a replacement of parameters. In the foregoing singlemarker sliding, we had two discrepancies between the model specification and the real situation: (i) only one QTL was assumed, and (ii) the trial marker was treated as if its position coincides with that of the putative QTL. Now the model is improved, because the first assumption is removed. Therefore, we can consider a process of sliding with a pair of markers along the chromosome as a tool to locate the pair of QTL. Such a procedure is equivalent to twointerval mapping analysis (Haley and Knott 1992; Martinez and Curnow 1992; Jansen 1993; Korolet al. 1998) with vanishing lengths of the trial intervals. Because of the foregoing assumptions, the true expected densities of the trait distribution in four alternative marker groups for an arbitrary pair of trial markers can be written as
Comparison of the analytical and simulation results: The foregoing model allows us to deduce the expected LOD values in the QTL mapping analysis in the case of two linked QTL. However, these results are essentially asymptotic and may be biased at small samples. Therefore, it is important to assess how the obtained estimates converge to the expected parameter values when the sample sizes and marker density are increasing. To do that we employed Monte Carlo simulations. Chromosomes with two linked QTL were modeled for two population sizes (n = 500 and 2000). No crossingover interference (Haldane mapping function) was assumed. For each sample, we employed two subsets of markers, using the information on intervals 12 and 48. Table 2 shows the behavior of the average LOD values and the discrepancy between the estimated and simulated QTL positions as dependent on sample size and number of markers. The main conclusion from the simulations is that the proposed method can indeed serve as a basis to get an approximate prediction of the expected LOD for interval mapping of two linked QTL (compare the average max LODs with max ELODs).
It follows from the presented results that the difference between predicted max ELOD and the averaged over simulations max LOD in repulsion phase is smaller than that in coupling phase, in spite of the fact that our theory predicts the same value for the two phases. In both cases the experimental LODs are smaller than the predicted ones; i.e., for the same combinations of parameters the simulated LODs were higher in repulsion phase. A simple explanation can be proposed for this effect. The simulated procedure includes analyses for two hypotheses, H_{1} and H_{2}. In Monte Carlo experiments with two linked QTL, we can consider two options for fitting parameters of the maximumlikelihood function to the H_{1} hypothesis (Lander and Botstein 1989; Haley and Knott 1992; Korolet al. 1998): (i) fixed position of the putative QTL, when its position is assumed to be known and coincides with either of the two simulated positions (which would not necessarily be true in the practical data analysis when these positions are unknown); (ii) variable position of the putative QTL that is assumed unknown, but can be found because it provides maximum value of the maximumlikelihood function. Certainly, the achievable maximum is higher in the second situation resulting in an underestimation of the LOD value for H_{2} vs. H_{1} (not shown).
Applications: The proposed analytical tool allows us to evaluate easily, without the necessity of Monte Carlo simulations, the behavior of the ELOD values across all possible locations of the putative QTL, for any fixed sets of parameters (see Figure 2), which is important for designing mapping experiments. For example, using the obtained expression of max ELOD, we can get an estimate of the minimum sample size needed to discriminate between H_{1} and H_{2}, when H_{2} is true (i.e., when we have a pair of linked QTL with some effects d_{1} and d_{2}), with a certain preset test power. This is based on the fact that the expected LOD value is distributed as noncentral chisquare with degrees of freedom equal to the difference in the number of parameters specifying the alternatives (H_{2} and H_{1}) (Wald 1943). This tool enables us to compare different practical situations with respect to the foregoing prediction of the minimum sample size (see Lander and Botstein 1989). The usefulness of such an option is especially obvious for mapping of linked QTL, where the efficiency of the experimental design depends on many factors characterizing the unknown “configuration of the problem: the distance of the putative QTL, their relative effects on trait mean value and variance, linkage phase (coupling vs. repulsion), and presence or absence of epistatic interaction, etc. We now consider two examples to illustrate the possibilities of the proposed analysis: the dependence of ELOD for H_{2} vs. H_{1} on epistasis and the detectability of epistasis provided H_{2} is already proved.
The effect of epistatic interaction on ELOD for QTL detection: In the example on epistatic interactions the trait values in the four QTL classes were modeled as:
The detectability of epistasis (comparison of H_{2} under ϵ ≠ 0 vs. H_{2} under ϵ = 0): In the foregoing section we could see how epistasis affects the expected LOD values when singleQTL and twoQTL models are applied to the analysis. The proposed tool allows us also to predict the expected LOD for the situation when one wants to contrast two versions of the hypothesis H_{2} (two linked QTL in the chromosome): H_{2} (ϵ = 0), i.e., additive effects of the QTL, and H_{2} (ϵ ≠ 0), i.e., assuming epistasis. Testing for epistasis (coadaptation) and evaluating the magnitude of epistasis have recently become an important component of QTL mapping analysis (Doebleyet al. 1995; Liet al. 1997). This meaningful subject has a long history in both evolutionary genetics (Dobzhansky 1970; Wright 1977), theories of recombination and sex evolution (Barton 1995; Otto and Feldman 1997), and agricultural genetics (Yuet al. 1997). However, only with QTL mapping can epistatic effects be objectively detected and evaluated. Each of the foregoing alternative versions of H_{2}, without and with epistasis, can be compared to H_{0} (no QTL in the chromosome), using the proposed approximation. The difference between the resulting max ELODs will give us max ELOD for the presence of epistasis. An example presented in Table 3 illustrates the closeness between the predicted LOD values and the average LODs obtained in direct Monte Carlo simulations.
TWOTRAIT ANALYSIS
As was shown in our previous simulation study (Korolet al. 1998), joint analysis of correlated quantitative traits may increase the mapping resolution in situations with linked QTL, i.e., when H_{2} (two linked QTL) and H_{1} (one QTL) are compared. The higher the residual correlation the better the expected LOD. In twotrait analysis, the residual correlation between the traits in the QTL groups may be caused by nongenetic mechanisms, pleiotropy, or linkage of genes from other chromosomes affecting either of the traits, and by pleiotropy and linkage of genes from the chromosome under consideration.
As in singletrait analysis, to analyze twoQTL situations we calculate max ELOD for the alternative hypotheses: H_{2} vs. H_{1}. This means that we need to develop bivariate analogues of the foregoing singleQTL and twoQTL models based on single and twomarker sliding procedures. Hence, the goal of the first part of this section is to obtain the regression of the loglikelihood function assuming that only one QTL resides in the chromosome that in fact carries two linked QTL. Let the traits x and y be dependent on two loci, Q_{1}/q_{1} and Q_{2}/q_{2}, residing in the marked chromosome and let the bivariate trait distributions in the four QTL groups Q_{1}Q_{1}Q_{2}Q_{2}, Q_{1}Q_{1}q_{2}q_{2}, q_{1}q_{1}Q_{2}Q_{2}, and q_{1}q_{1}q_{2}q_{2} of dihaploid mapping population be normal densities f_{11}(x, y), f_{12}(x, y), f_{21}(x, y), and f_{22}(x, y) with unknown vectors of means
Consider a random sample of individuals each characterized for traits x and y and a set of marker loci from the chromosome in question. For an arbitrary marker, we take into account the same five situations (a±e) as those considered above for the singletrait analysis. Then the true expected densities of the bivariate trait distribution in the alternative marker groups for an arbitrary scanning marker will be
Consider an arbitrary bivariate distribution with finite central moments (up to the fourth). Then the maximum of loglikelihood per individual for the Gaussian model will converge in probability to the maximum of the regression of the loglikelihood function per individual. Assume that the trial marker is exactly at the same position as our putative QTL and the trait distributions in the alternative groups MM and mm are bivariate normals. The regression of the loglikelihood will take the form
Consider now a process of scanning with a pair of markers along the chromosome. Because of the foregoing assumptions, the true expected densities of the trait distribution in four alternative marker groups for an arbitrary pair of trial markers can be written as
Assume independent variance effects of the linked QTL for each of the traits (i.e.,
To illustrate how the proposed model works we now address two questions concerning the dependence of the expected LOD value for discrimination between H_{1} and H_{2} on (i) the residual correlation between the traits and (ii) epistasis. Let us fix the effect of one of the QTL (say Q_{1}/q_{1}) and consider how max ELOD depends on the effect of the second QTL (Q_{2}/q_{2}) and on the residual correlation (ρ) between the quantitative traits. We are interested here in testing H_{2} vs. H_{1}, assuming additive effects of the two QTL. Two situations are considered: in the first, each trait depends on only one of the linked QTL (Figure 4a), whereas in the second case both QTL affect the first trait and one of the QTL affects the second trait (Figure 4b). One can conclude that the detection power increases with the residual correlation between the analyzed traits. An additional conclusion is that the power increases with the effect of Q_{2}/q_{2} up to some “saturation point. In the first situation the saturation is reached when the effect of Q_{2}/q_{2} becomes equal to that of Q_{1}/q_{1} (see Figure 4a). The only difference in the second situation is that the saturation point depends on ρ: the larger abs(ρ) the earlier (at lower effects of Q_{2}/q_{2}) the saturation (see Figure 4b). For the second situation, let us consider the complication caused by epistasis. Namely, we allow for epistatic interaction between the QTL with respect to the trait x. As in the foregoing example on singletrait analysis, it is interesting here to evaluate how epistasis affects the expected detection power. Figure 5a demonstrates that epistasis may be helpful in discriminating between H_{2} (two linked QTLs) and H_{1} (only a single QTL in the chromosome). This effect is manifested for both positive and negative residual correlations, but the sign of ρ is important in determining the details of the behavior of max ELOD as a function of epistasis and the effect of Q_{2}/ q_{2} on trait y (not shown). The second example (Figure 5b) demonstrates a situation in which the QTL interact epistatically for both traits. Clearly, the provided examples are not more than illustrations of the possibilities of the proposed analytical tool. Each of the questions discussed in these illustrations can be dealt with in necessary detail.
CONCLUSION
Resolution power of mapping analysis of linked QTL remains a difficult problem, which was previously addressed mainly in terms of Monte Carlo simulations. This has restricted the possibilities of detailed evaluation and comparison of different mapping situations and experimental designs. The proposed analytical method of evaluating the expected LOD generalizes for the case of two linked QTL the corresponding estimates derived by Lander and Botstein (1989), Mackinnon and Weller (1995), and Mackinnon et al. (1996) (referred to as “deterministic sampling). Our model allows us to analyze situations with variance effect and epistatic interaction between the putative QTL. We developed here also a twolocus analogue of our previous analytical predictor (see Korolet al. 1995) of the expected LOD for twotrait mapping analysis. And again, many complicated questions can be addressed, like dependence of the QTL detection power on residual correlation between the traits, accounting of epistatic interaction between the QTL for one or both traits, and the influence of variance effect for one or both traits on ELOD. Although this method gives only an asymptotic estimation of ELOD, it allows one to get an approximate assessment of a broad spectrum of specific mapping situations. Clearly, any asymptotic effect found by the proposed tool can (and should) be checked by Monte Carlo simulations for given sample sizes. Our comparisons made for a series of situations indeed show a good correspondence between the predicted max ELODs and LOD values averaged over Monte Carlo runs. An important point is that our results prove (for the considered class of situations) the important theoretical fact of consistency of interval mapping analysis with two linked QTL (convergence of the parameter estimates to the true values with increasing sample size and vanishing interval length).
Acknowledgments
We thank ZB. Zeng for valuable comments. Two anonymous reviewers provided useful criticisms and suggestions on the earlier version of the manuscript. This research was supported by the Israeli Ministry of Absorption and Ministry of Science.
Footnotes

Communicating editor: ZB. Zeng
 Received January 20, 1998.
 Accepted September 28, 1998.
 Copyright © 1999 by the Genetics Society of America