Modification of Experimental Design and Statistical Method for Mapping Imprinted QTLs Based on Immortalized F2 Population

Genomic imprinting is an epigenetic phenomenon, which plays important roles in the growth and development of animals and plants. Immortalized F2 (imF2) populations generated by random cross between recombinant inbred (RI) or doubled haploid (DH) lines have been proved to have significant advantages for mapping imprinted quantitative trait loci (iQTLs), and statistical methods for this purpose have been proposed. In this paper, we propose a special type of imF2 population (R-imF2) for iQTL mapping, which is developed by random reciprocal cross between RI/DH lines. We also propose two modified iQTL mapping methods: two-step point mapping (PM-2) and two-step composite point mapping (CPM-2). Simulation studies indicated that: (i) R-imF2 cannot improve the results of iQTL mapping, but the experimental design can probably reduce the workload of population construction; (ii) PM-2 can increase the precision of estimating the position and effects of a single iQTL; and (iii) CPM-2 can precisely map not only iQTLs, but also non-imprinted QTLs. The modified experimental design and statistical methods will facilitate and promote the study of iQTL mapping.


INTRODUCTION
Genomic imprinting is a phenomenon found in animal and plant, in which two alleles of a gene show unequal expression depending on their parental origins. Genes involved in such phenomenon are called imprinted genes. Many imprinted genes have been found in animal and human (Nolan et al., 2001;Morison et al., 2005;Long and Cai, 2007;Babak et al., 2008;Hagan et al., 2009;Girardot et al., 2013;Pembrey et al., 2014;Hur et al., 2016;Jiang et al., 2017;Mackay and Temple, 2017). In plant, the first imprinted gene, which is involved in the coloration of maize kernel endosperm, was discovered as early as about half centuries ago (Kermicle, 1970). Compared with those in animal, however, the imprinted genes identified in plant so far are still very limited, of which most are found from Arabidopsis, rice and maize (Danilevskaya et al., 2003;Haun et al., 2007;Luo et al., 2009;Bauer and Fischer, 2011;Raissig et al., 2011;Zhang et al., 2011;Ikeda, 2012;Pei et al., 2019).
It has been found that many quantitative traits are affected by genomic imprinting (Spencer, 2002;Croteau and Croteau, 2004;Sandovici et al., 2005;Santure and Spencer, 2011;Wang et al., 2012). The quantitative trait loci (QTLs) showing imprinting effect are called imprinted QTL (iQTL). A number of different experimental designs and corresponding statistical methods have been proposed for mapping iQTLs (Knott et al., 1998;de Koning et al., 2000;Pratt et al., 2000;Strauch et al., 2000;Hanson et al., 2001;Haghighi and Hodge, 2002;Shete and Amos, 2002;Shete et al., 2003;Knapp and Strauch, 2004;Mantey et al., 2005;Cui et al., 2006, 2007, Cui, 2007Liu et al., 2007;Li et al., 2008Li et al., , 2012aHager et al., 2008;Yang et al., 2010;Zhou et al., 2012;Karami et al., 2019). F 2 (outbred or inbred) and BC 1 populations are usually used for iQTL mapping (Haley et al., 1994;de Koning et al., 2000;Cui et al., 2006;Li et al., 2012a), but they have obvious shortcomings, such as relatively low power in iQTL detection, low accuracy in estimating the positions and effects of iQTLs, inability of permanent preservation of the population, and unrepeatability. Besides, determination of the parental origins of alleles is also difficult or problematic under the F 2 and BC 1 designs (Wu et al., 2002;Cui et al., 2006;Wolf et al., 2008;Lawson et al., 2013). In addition, the imprinting effect cannot be separated from the maternal effect in the BC 1 design. Wen and Wu (2014) proposed statistical methods for iQTL mapping using an immortalized F 2 (abbreviated as imF 2 ) population generated from random crosses between recombinant inbred (RI) lines or doubled haploid (DH) lines. Compared with the previous designs, the imF 2 design has significant advantages for iQTL mapping. First, the parental origins of marker alleles in each imF 2 line can be exactly known from the cross. Second, analysis based on imF 2 lines can reduce environmental error so as to increase the statistical power of iQTL mapping. Third, a very large imF 2 population can be produced without increasing the cost of molecular marker assay. However, there are also shortcomings in the experimental design and mapping methods proposed by Wen and Wu (2014). In the experimental design, the work of constructing an imF 2 population is laborious. In the statistical methods, iQTL mapping is performed only by testing the imprinting effect without making use of the information of additive effect and dominance effect. This may reduce the precision of iQTL mapping.
In this paper, to overcome the above shortcomings, we propose a modified imF 2 design and modified statistical methods for iQTL mapping based on the work of Wen and Wu (2014). We demonstrate by simulation studies that the modified methods can map both iQTLs and non-imprinted QTLs (niQTLs) simultaneously as well as improve the accuracies of estimation of the positions and effects of iQTLs. In addition, the modified design can potentially reduce the workload in the construction of the imF 2 population.

Modification of Experimental Design
Suppose there is a DH or RI population derived from a cross between two pure lines, P 1 and P 2 . The experimental design proposed by Wen and Wu (2014) for iQTL mapping is to develop an imF 2 population by randomly crossing DH or RI lines, namely, Line i × Line j (i, j = 1, 2, 3, . . .; i = j). Consider a QTL with two alleles, Q 1 and Q 2 . The two alleles can form four genotypes: Q 1 Q 1 , Q 1 Q 2 , Q 2 Q 1 , and Q 2 Q 2 , with one allele (say, Q 1 ) from the male gamete and the other (Q 2 ) from the female gamete in each genotype. Let a, d and i be the additive effect, dominance effect and imprinting effect of the QTL, respectively. Thus, in an imF 2 population, the single-QTL model would be (Wen and Wu, 2014): where y j is the trait value of the j th combination (or hybrid line); µ is the population mean; x j , z j and t j are dummy variables taking values depending on the QTL genotype in the j th combination ( Table 1); and ε j is residual error following a normal distribution N(0, σ 2 ).
In the above design, the cross in each combination is "unidirectional, " namely, one line is used as female parent and the other as male parent. However, there can be an alternative genetic mating design, in which reciprocal crosses are performed for every combination, namely, Line i × Line j (positive cross, PC) and Line j × Line i (negative cross, NC; i, j = 1, 2, 3, . . .; i < j). This modified experimental design generates a special imF 2 population. We call it reciprocal-cross imF 2 (R-imF 2 ) population. For distinction, we shall call the usual imF 2 population as unidirectional-cross imF 2 (U-imF 2 ) population. Genetically, Eq.
(1) is still applicable to R-imF 2 . Therefore, the iQTL mapping methods for U-imF 2 are also applicable to R-imF 2 .

Modification of Point Mapping Method
Suppose the parental DH or RI population has been genotyped and therefore a genetic map has been constructed. Thus, the genotypes of imF 2 lines can be deduced from the parental DH or RI lines and the genetic map can be used for iQTL mapping. Suppose the genetic map is of ultrahigh density so that the markers can well represent the whole genome. Thus, iQTLs can be mapped by testing every marker throughout the genome. We call this approach point mapping (PM; Wen and Wu, 2014).
Suppose the size (total number of hybrid lines) of the imF 2 population is 2n (for R-imF 2 population, there are n PC and n NC hybrid lines, respectively). Let RSS 0 , RSS 1 and RSS 2 be the Frontiers in Genetics | www.frontiersin.org minimum residual sum of squares calculated based on Eq. (1) under the hypotheses H 0 : a = d = i = 0, H 1 : i = 0 but not a = d = 0, and H 2 : not a = d = i = 0, respectively. Thus, two approximate log-likelihood ratio tests can be performed as below: and The PM method proposed by Wen and Wu (2014) maps iQTLs by checking the imprinting effect of every marker in the genome using Eq.
(2). The LOD 1 significance threshold is estimated by permutation tests (Churchill and Doerge, 1994). A genomic region covered by a LOD 1 peak exceeding the threshold is thought to harbor an iQTL and the highest point of the peak is the most probable position of the iQTL. Obviously, this is a one-step method (denoted as PM-1), in which an iQTL is mapped based on its imprinting effect only. The modified PM method proposed here is a two-step method (denoted as PM-2). The first step is QTL mapping, namely, to map QTLs (including imprinted and non-imprinted) by testing every marker in the genome using Eq. (3). Similarly, the LOD 2 significance threshold used in this step can be estimated by permutation tests. The second step is iQTL identification, namely, to identify iQTLs among the mapped QTLs by checking the imprinting effect of each QTL using Eq. (2). A QTL is taken as an iQTL if its imprinting effect is significant. Otherwise, it is taken as a usual niQTL. The LOD 1 significance threshold used in the second step can also be estimated by permutation tests, but the tests are performed only on the mapped QTLs rather than on every marker in the genome.

Modification of Composite Point Mapping Method
The PM method can be extended to composite point mapping (CPM) by adding some markers as cofactors into Eq. (1), namely (Wen and Wu, 2014): where m 1 , m 2 , and m 3 , a * k 1 , d * k 2 , and i * k 3 , and x * k 1 j , z * k 2 j , and t * k 3 j are the numbers, effects and corresponding dummy variables of additive, dominance and imprinting cofactors, respectively; other symbols are the same as those in Eq. (1). Cofactors can be selected by stepwise regression. Note that the three effects of a marker are orthogonal or independent to each other, among which only the significant ones are selected by the stepwise regression. Therefore, the markers selected as cofactors based on different effects can be different (Zeng, 1994). The CPM method proposed by Wen and Wu (2014) is a one-step method, corresponding to PM-1. Similarly, there can be an alternative two-step CPM method (CPM-2). CPM-1 and CPM-2 have a similar procedure to that of PM-1 and PM-2, respectively. The only difference of CPM from PM is that the RSS 0 , RSS 1 and RSS 2 in Eqs. (2 and 3) are calculated based on Eq. (4) rather than on Eq. (1) under the corresponding hypotheses (H 0 , H 1 , and H 2 ).

Simulation Studies
To examine the feasibility and efficiency of the modified imF 2 design (R-imF 2 ) and the modified statistical methods (PM-2 and CPM-2) for iQTL mapping in comparison with the previous design (U-imF 2 ) and methods (PM-1 and CPM-1), two simulation studies were conducted. The first study was to compare the performances of R-imF 2 and U-imF 2 , and of PM-1 and PM-2 in the mapping of a single iQTL; the second study was to compare the performances of CPM-1 and CPM-2 as well as PM-1 and PM-2 in the mapping of multiple iQTLs. The programs for PM and CPM were written in Visual Basic 6.0 1 .

Simulation Study I
In this simulation study, we assumed that an iQTL was located at the middle of a chromosome, which was 100 cM in length and had one marker every cM. The iQTL segregated in a DH population of 100 lines, from which a U-imF 2 or R-imF 2 population comprising 800 hybrid lines was generated. The imprinting effect of the iQTL explained 2% of the phenotypic variance in the U-imF 2 or R-imF 2 population. Six different types of iQTL in terms of the effects (a, d, and i) were investigated, including the fulleffect type, in which all sorts of effect exist, and five partial-effect types, in which either additive effect or dominance effect, or both do not exist ( Table 2; Cheverud et al., 2008). The simulated data were analyzed with PM-1 and PM-2, respectively. Each case was simulated for 500 times. LOD thresholds for PM-1 and PM-2 at the overall significance level of 0.05 were estimated by simulation under the null hypothesis with 5,000 replicates. The results ( Table 2) showed: (i) When other conditions (iQTL type and mapping method) were fixed, the results (means and standard deviations of estimates and statistical powers) obtained under the two designs were all very similar, suggesting that the two designs are basically equivalent for iQTL mapping. (ii) Except in the case of type DIB, which had no additive and dominance effects, the power of QTL detection in PM-2 (step 1) was always higher than the power of iQTL detection in PM-1, and the difference was especially large in the cases of type FULL, PEP and PEM. However, the power of iQTL detection in PM-2 (step 2) was always lower than that in PM-1, and the difference was especially large in the case of type DIB. (iii) The mean estimates of iQTL position obtained by PM-1 and PM-2 were very close to the real value in all the cases, suggesting that a single iQTL can be unbiasedly mapped by both methods. However, the standard deviation of iQTL position obtained by PM-2 was always significantly smaller than that obtained by PM-1 except in the case of type DIB. Even for type DIB, the former was still a little  The data in parenthesis are the total proportion of phenotypic variance explained by the iQTL (the proportion of phenotypic variance explained by the imprinting effect of the iQTL is 0.02 in each type). b The estimates of position and effects are shown as "mean ± standard deviation." c The powers were estimated based on 500 times of simulation. The data in parenthesis are the power of QTL detection in PM-2 (step 1). The LOD thresholds for PM-1, step 1 of PM-2, and step 2 of PM-2 were 1.97, 3.25, and 1.79 in U-imF 2 and 1.97, 3.36, and 1.80 in R-imF 2 , respectively.
smaller than the latter. This suggests that PM-2 is more precise than PM-1 for iQTL mapping in general. (iv) The estimation results of imprinting effect obtained by PM-1 and PM-2 were similar in all the cases. Noticeably, the means were always a little larger than the real value, suggesting that both methods may slightly overestimate the imprinting effect. For the additive and dominance effects, the means obtained by PM-2 were very close to the real values, suggesting that the estimation is unbiased; but the means obtained by PM-1 were always obviously smaller than the real values, suggesting that PM-1 may underestimate the additive and dominance effects. These results suggest that PM-2 is better than PM-1 for estimating the additive and dominance effects of iQTL.

Simulation Study II
In this simulation study, we assumed that a species had three pairs of chromosomes, each of which was 150 cM in length and had one marker every cM. There were seven QTLs in the genome, including three iQTLs on chromosome 1, one iQTLs and one niQTL on chromosome 2, and two iQTLs on chromosome 3 ( Table 3). An R-imF 2 population comprising 800 hybrid lines was generated from a DH population of 100 lines. Each QTL accounted for ∼7% of the phenotypic variance in the R-imF 2 population. The simulated data were analyzed with PM-1, PM-2, CPM-1, and CPM-2, respectively. Cofactors for CPM-1 and CPM-2 were selected by stepwise regression at the significance level of 0.05. LOD thresholds at the overall significance level of 0.05 were estimated by permutation test with 1,000 replicates. The results (Figure 1 and Table 3) showed: (i) All of the iQTLs were precisely mapped by both CPM-1 and CPM-2, and the estimates of iQTL positions obtained by the two methods were almost completely the same (with only a slight difference at Q1-3). PM-1 and PM-2 also precisely mapped some of the iQTLs. These two a The data in parenthesis are the percentages of phenotypic variance explained by the QTL and its imprinting effect, respectively. b The estimates in parenthesis were obtained based on unremarkable or unclear LOD peaks. ns, not significant. The LOD thresholds for PM-1, step 1 of PM-2, step 2 of PM-2, CPM-1, step 1 of CPM-2, and step 2 of CPM-2 were 2.42, 3.82, 2.37, 2.85, 4.15, and 2.79, respectively. methods obtained exactly the same estimates of iQTL positions. However, the LOD peaks of PM-1 and PM-2 were broad. In addition, there were many small peaks, which may make it difficult to identify the peaks of true iQTLs (such as the peaks of Q1-2 in PM-1 and PM-2, and the peak of Q3-1 in PM-2) and result in ghost or false iQTLs (such as the peak on the left of Q2-1 and that between Q2-1 and Q2-2 in PM-1, and the peaks between Q1-1 and Q1-2, between Q2-1 and Q2-2, and between Q3-1 and Q3-1 in PM-2). (ii) Corresponding to the estimation of iQTL positions, the estimates of iQTL effects were also completely the same between PM-1 and PM-2 and almost completely the same between CPM-1 and CPM-2, respectively. In most of the cases, the estimates of effects obtained by CPM-1 and CPM-2 were closer to the real values than those obtained by PM-1 and PM-2. (iii) The LOD peaks of PM-2 were always higher than those of PM-1. This is consistent with the results of simulation study I. For the same reason, the LOD peaks of CPM-2 were higher than those of CPM-1 except for the DIB-type iQTL (Q3-1). In addition, as expected, the niQTL (Q2-1) was mapped only by PM-2 and CPM-2, respectively.

DISCUSSION
The advantages of iQTL mapping based on imF 2 populations have been demonstrated before (Wen and Wu, 2014). R-imF 2 is a special type of imF 2 population. Although the simulation study results suggest that R-imF 2 does not apparently improve the result of iQTL mapping in comparison with U-imF 2 ( Table 2), it is expected to be advantageous for experimental operation. For each cross combination, only one hybrid line is produced in U-imF 2 , while two hybrid lines are produced in R-imF 2 . Therefore, R-imF 2 only needs half of the number of combinations used in U-imF 2 . This makes the cross work more convenient and may, to some extent, alleviate the workload of developing the imF 2 population. According to the simulation study results, PM-2 can estimate the position and the additive and dominance effects of an iQTL more precisely than PM-1 ( Table 2). This is understandable. PM-1 estimates the position of an iQTL based on its imprinting effect only, while PM-2 estimates the position of an iQTL based on not only its imprinting effect, but also its additive and dominance effects. Obviously, the latter would have a higher statistical power as long as the additive and dominance effects exist ( Table 2). This would surely increase the estimation precision of iQTL position and therefore increase the estimation precision of iQTL effects.
Although PM-2 can noticeably improve the estimation of iQTL position and effects, the power of detecting iQTL in PM-2 is always lower than that in PM-1 ( Table 2). This means that there is a cost of losing power for gaining precision in PM-2. The reason may be that an iQTL detected by PM-1 is at the position, where the imprinting effect has the highest significance, while the iQTL position estimated by PM-2 may not be at the point, where the imprinting effect is the most significant. Nevertheless, the power difference of iQTL detection between PM-1 and PM-2 is not large except in the case of type DIB (Table 2). Therefore, the improvement of estimation precision achieved by PM-2 is cost worthy.
Although the PM methods behave well in the mapping of a single iQTL, they are not ideal for multiple iQTL mapping (Figure 1). In practice, therefore, it is more appropriate to use the CPM methods. PM-2 demonstrates the merit of two-step analysis. CPM-2 also exhibits the merit of higher LOD score in the identification of QTL position (Figure 1). However, the LOD peaks obtained by CPM-1 and CPM-2 usually have the same width for the same iQTL (Figure 1). This suggests that the two methods have similar resolution in iQTL mapping. Therefore, the higher LOD score of CPM-2 might have little help for increasing the precision of iQTL mapping, probably due to the role of cofactors. Nevertheless, CPM-2 still has an advantage over CPM-1, namely, it can map both iQTLs and niQTLs.
Considering that the basic principle and conclusions of iQTL mapping may not change with the density of markers (Wen and Wu, 2014), we did not consider in this paper the situation of iQTL mapping based on conventional low-density maps. The methods described above can be directly applied to the conventional map as long as the values of the dummy variables at the position to be tested in Eqs. (1 and 4) are replaced with the expected values calculated according to the flanking markers (Wen and Wu, 2014).
Power is the most frequently used index for evaluating the efficiency of a QTL mapping method, which can reflect the probability of type II error. Besides, false discovery rate (FDR) may be also an important index for the evaluation because it can reflect the probability of type I error (Li et al., 2012b). A good QTL mapping method should have higher power but lower FDR. Similar to the power, the FDR in QTL mapping can also be estimated by computer simulation (Li et al., 2012b). In the simulation study I of this study, one QTL was set at the center of a chromosome of 50 cM in length in each case. Since the QTL really existed, a single QTL detected on the chromosome could be always regarded as true, although the estimated QTL position was very imprecise (far from the real position) sometimes. Certainly, if there were two or more QTLs detected on the chromosome simultaneously, the additional QTL should be false. However, such situations did not occur in the simulations. Therefore, the FDR was always zero in our simulation study. In other words, the conditions assumed in our simulation study avoided the occurrence of false discovery. This was beneficial to the comparative study based on the power.
In this study, we only consider the iQTL mapping based on one-environment experiment. However, the genetic model can be easily extended to adapt the data of multi-environment experiment, from which the QTLby-environment interaction can be analyzed using suitable statistical methods such as the mixed linear model approach, which has been used for mapping QTLs with the digenic epistasis and QTL-by-environment interaction as well as additive and dominance effects based on imF 2 population (Gao and Zhu, 2007).

DATA AVAILABILITY STATEMENT
All datasets generated for this study are included in the article/supplementary material, further inquiries can be directed to the corresponding authors.