Optimized Use of Low-Depth Genotyping-by-Sequencing for Genomic Prediction Among Multi-Parental Family Pools and Single Plants in Perennial Ryegrass (Lolium perenne L.)

Ryegrass single plants, bi-parental family pools, and multi-parental family pools are often genotyped, based on allele-frequencies using genotyping-by-sequencing (GBS) assays. GBS assays can be performed at low-coverage depth to reduce costs. However, reducing the coverage depth leads to a higher proportion of missing data, and leads to a reduction in accuracy when identifying the allele-frequency at each locus. As a consequence of the latter, genomic relationship matrices (GRMs) will be biased. This bias in GRMs affects variance estimates and the accuracy of GBLUP for genomic prediction (GBLUP-GP). We derived equations that describe the bias from low-coverage sequencing as an effect of binomial sampling of sequence reads, and allowed for any ploidy level of the sample considered. This allowed us to combine individual and pool genotypes in one GRM, treating pool-genotypes as a polyploid genotype, equal to the total ploidy-level of the parents of the pool. Using simulated data, we verified the magnitude of the GRM bias at different coverage depths for three different kinds of ryegrass breeding material: individual genotypes from single plants, pool-genotypes from F2 families, and pool-genotypes from synthetic varieties. To better handle missing data, we also tested imputation procedures, which are suited for analyzing allele-frequency genomic data. The relative advantages of the bias-correction and the imputation of missing data were evaluated using real data. We examined a large dataset, including single plants, F2 families, and synthetic varieties genotyped in three GBS assays, each with a different coverage depth, and evaluated them for heading date, crown rust resistance, and seed yield. Cross validations were used to test the accuracy using GBLUP approaches, demonstrating the feasibility of predicting among different breeding material. Bias-corrected GRMs proved to increase predictive accuracies when compared with standard approaches to construct GRMs. Among the imputation methods we tested, the random forest method yielded the highest predictive accuracy. The combinations of these two methods resulted in a meaningful increase of predictive ability (up to 0.09). The possibility of predicting across individuals and pools provides new opportunities for improving ryegrass breeding schemes.


INTRODUCTION
Perennial ryegrass (Lolium Perenne L.) is the most valuable forage species in the temperate regions of northwest Europe, America, South Africa, Japan, Australia, and New Zealand (Humphreys et al., 2010). Traditionally, ryegrass breeding programs use recurrent selection based on genetic merit estimated from recorded phenotypes. This system results in a moderate genetic gain of about 7% per decade (Hayes et al., 2013). Efforts to reshape ryegrass breeding programs by introducing marker information, led to the conclusion that markers in candidate genes for complex traits generally explains only a small proportion of observed variance (Hayes et al., 2013), thus limiting the efficacy of marker-assisted selection (MAS) approaches (Jannink et al., 2010).
Genomic prediction (GP) (Meuwissen et al., 2001) allows one to perform selection based on genomic estimated breeding values (GEBVs) derived from dense genome-wide DNA markers. Recently, Fè et al. (2015Fè et al. ( , 2016 proposed GP as an effective way to use high-density marker information to overcome the limitations of MAS in ryegrass breeding. Fè et al. (2015Fè et al. ( , 2016 performed GBLUP-GP analysis on bi-parental familypools and reported medium-to high-predictive ability (PA) for both disease resistance and quantitative agronomic traits. The bi-parental families were genotyped using a genotyping-bysequencing (GBS) assay, which proved to be an efficient way to genotype ryegrass pools of heterogeneous individuals (Byrne et al., 2013). At each SNP locus of each sample, GBS provides a number of sequence-reads classified into two sets: sequences carrying the reference allele (S R ) and sequences carrying the alternative allele (S A ). The sum of S R and S A. is the coverage depth or the total number of sequences (S T ). The ratio between S A and S T gives an estimate of the true allele-frequency for the sample at the SNP locus. Such a frequency can be used as an SNP score, and for family pools, it should be interpreted as an estimate of the proportion of alternative alleles across all the individuals within the pool.
One premise of using GP in breeding programs is that samples can be genotyped at a lower cost than phenotyping them (Meuwissen et al., 2001;Goddard and Hayes, 2007). Increasing sample multiplexing is a straightforward way to reduce sequencing costs, but it results in a reduction of coverage depth (S T ) (Elshire et al., 2011). However, reducing S T in ryegrass family pools leads to a decline in the accuracy of the allele-frequency estimate . In addition,  showed that genomic relationship matrices (GRM) calculated by using low-S T SNPs, are biased toward higher diagonal values, resulting in underestimates in genomic heritability. This bias in diagonal values is a consequence of overestimating inbreeding and underestimating heterozygosity at low S T . A method to correct this bias is needed in order to utilize low-S T genomic data for GBLUP-GP.
Reducing coverage depth also increases the fraction of missing data. This has been reported to be one of the main problems working with GBS data (Beissinger et al., 2013). Imputation of missing genotypes has been shown to be an effective approach for both increasing power in association studies (Marchini and Howie, 2010) and mitigating losses in accuracy in GP (Poland et al., 2012;Rutkoski et al., 2013). Several highly-accurate methods have been developed to assign allelic states to missing values in genotype data (reviewed by Marchini and Howie, 2010). However, these methods require using a high-quality reference genome (with chromosomescale pseudomolecules), which is still not available for ryegrass. Efficient, haplotype-independent imputation methods exist, such as those implemented in Linkimpute (Money et al., 2015(Money et al., , 2017; however, such methods were developed for standard marker coding as counts of alleles, and so cannot be applied to pool allele-frequencies. The unavailability of a high-quality, reference genome for ryegrass (with chromosome-scale pseudomolecules) and our need to code markers as allele-frequencies, means that we must find alternative, haplotype-independent, imputation strategies for use in ryegrass.
Although GP has reportedly succeeded in ryegrass, additional studies are needed to determine how to efficiently use GP in breeding programs. Ryegrass breeders typically follow the following steps to develop new varieties: (1) parental individuals, selected from elite varieties, are crossed to generate F 1 progenies, (2) seeds from each F 1 are multiplied in isolation to generate F 2 families that are then phenotyped in several replicates as family pools, (3) single plants (SPs) from selected F 2 families are evaluated as individual genotypes, (4) synthetic varieties (SYNs) are constructed by polycrossing several SPs from the best performing F 2 families (generally between 6 and 10 parents), (5) SYNs are maintained and evaluated as family pools, and after selection, (7) the best-performing SYNs are submitted for official testing (Detailed reviews of breeding methods for grasses are presented by Vogel and Pedersen, 1993;Hayes et al., 2013). In the present work, we considered data for all three kinds of breeding material (SPs, F2-families, and SYNs). Careful considerations on steps to be improved by GP are still needed. One important contribution would be to develop procedures that are capable of predicting the performances of individuals from the pools. In particular, because certain phenotypes cannot be measured in individuals, it would be useful to predict individual SP from pool-data on F2-families and/or SYNs. This would increase the efficiency of selecting new SYN parents. Therefore, the objectives of this study were to: (1) Derive a method for calculating GRM, using allelefrequencies for various kinds of family-pools, and for quantifying the factors that bias GRM diagonal elements (including low S T ).
(2) Derive a method to correct for biased GRM diagonal elements (due to low S T ) . (3) Compare haplotype-independent methods for imputing missing genotypes scored as continuous allele-frequencies.
(4) Test the effectiveness of imputation strategies and bias correction on predictive ability within and across different breeding materials.

Genomic Data Simulation and Expected GRM
Genomic data were simulated for every ryegrass breeding material considered: single plants (SPs), F 1 and F 2 families, and synthetic varieties (SYNs). The simulated genomic data were used to calculate genomic relationship matrices (GRM). Various genomic data were produced to: (1) define the magnitude of unbiased GRM diagonal elements for individuals and various types of pools, (2) quantify the effects of factors influencing the GRM diagonal elements (such as small population size, number of contributing parents, inbreeding within the family-pools, and low S T ), and (3) test a method to correct for biases due to low S T . To do this, genomic data were simulated as follows: (1) We generated 5,000 independent SNP markers for 300 parent pairs (parents were assumed to be unrelated) with an allele-frequency (p) sampled from a β-distribution with parameters α = 2 and β = 8. (2) We created 300 F 1 family-pools by simulating crosses between each parent pair. Various F 1 family-pool sizes were tested, ranging from 5 to 100 individuals. (3) We generated 300 F 2 families, each created by simulating crossings between pairs of randomly selected F 1 individuals (within the same F 1 family pool). Each of the F 2 individuals created by crossing two F 1 plants was considered to be a single plant (SP). Pools of all the SPs originating from the same F 1 family crosses were considered to be members of the same F 2 family. Various F 2 family population sizes were tested, ranging from 5 to 100 individuals. The genotypes of SPs belonging to the same F 2 family were then averaged to generate F 2 -family allele frequencies.
(4) We generated 300 SYNs, each generated by crossing 8 SPs randomly selected from different F 2 families. (None of the crossed SPs were selected from the same F2 family.) We used 8 SPs because this was the average number of parents used to generate the SYNs in our real data (introduced later). The number of individuals generated by each cross was set to 50. This is because the results of F 1 and F 2 families showed that 50 individuals were enough to avoid allele drift due to small population size. Then the genotypes of individuals belonging to the same SYN were averaged to generate SYN allele frequencies.
Following this simulation procedure, true allele frequencies (p) were created for F 2 families, SPs and SYNs. Allele-frequency data (ranging between zero and 1.0) were continuous for pools of F 2 families and SYNs (resulting from average genotypes of several individuals), while data were discrete for individual SPs (equal to either zero or 1.0 for the two contrasting homozygous genotypes, and 0.5 for the heterozygous genotype). For each breeding material, the simulated, true allele frequencies (p) were used to produce estimated allele frequencies (p) for S T values ranging from 1 to 100. This was done by random sampling S T reads, wherein P(S A ) = p and P(S R ) = 1-p. Estimated allele-frequencies were calculated asp = S A /S T . (Missing values were not considered in the simulation.) GRM were computed using true allele frequencies and estimated allele frequencies at different S T values. We also computed GRM corrected for low S T inaccuracies, using the method described later.

Plant Material and Phenotyping
The phenotypic data we used were derived from a standard diploid ryegrass breeding program conducted at DLF Seeds A/S (Store Heddinge, Denmark). Three different kinds of breeding material, commonly produced in ryegrass breeding programs, were present: (1) SPs: 1,225 single plants, produced in 2014 from 50 different F 2 families. (2) F 2 families: 1,791 bi-parental F 2 families, phenotyped and genotyped as pools, produced between 2000 and 2012.
(3) SYNs: 127 multi-parental, synthetic families obtained by crossing from 6 to 10 randomly-selected single plants from superior F 2 families, phenotyped and genotyped as pools.
For F 2 families and SYNs, phenotypic measurements were based on replicated sward plots for each family, for which only family means were recorded. For SPs, individual phenotypes were obtained. The following agronomic traits were considered: (1) Heading date (HD), defined as days after May 1, in which plants start showing at least one spikelet per tiller. HD is available for all breeding material. (2) Crown rust resistance (CRR), measured by visual scoring during the period of maximum infection. The scale ranged from 1 (plant completely covered by rust) to 9 (no rust symptoms). CRR is available for all breeding material. (3) Seed yield (SY), expressed in g m −2 . This trait was scored only for F 2 families and SYNs.
The phenotype data for F 2 families and the SYNs were scored over several years across different locations. All fields were organized into trials that were further divided into plots. Detailed descriptions of phenotyping strategy and field design are given in Fè et al. (2015Fè et al. ( , 2016. SP fields were organized by sowing groups of 50 SPs collected from the same F 2 family in separate rows (There were no replicates of the genotype). The score for CRR was collected in 2014 in Les Alleuds (France). The SPs were sown during spring and the CRR was scored in September after a natural crown rust attack. Heading date (ear emergence date) was assessed in 2014 in Store Heddinge, Denmark. SPs were sown during late summer and scored during the following season.

Plant Genotyping
Sequence data were produced using GBS approach, with the methylation-sensitive restriction enzyme ApeKI to target the low copy fraction of the genome. Sampling and library preparation followed the protocol described by Byrne et al. (2013). F 2 families and SYNs were genotyped based on a pooled sample from the family.
Plant materials were genotyped in three rounds by using an Illumina HiSeq2000 (100 bp single-end) genome sequencer. Different multiplexing set-ups were used in these three assays: Assay 1: consisting of 16 libraries containing maximum 64 samples per library. Each library was sequenced using four lanes (995 F 2 families were included in this assay).
Assay 2: consisting of 14 libraries containing maximum 96 samples per library. Each library was sequenced using four lanes (all 1,225 SPs and 39 SYNs were included in this assay).
Assay 3: consisting of 16 libraries containing maximum 64 samples per library. Each library was sequenced using four lanes (796 F 2 families and 89 SYNs were included in this assay).
The sequencing data from the three assays were aligned against a draft sequence assembly to produce common SNP calls for all the samples (sensu Byrne et al., 2015). Markers with a missing rate above 0.5, and a MAF lower than 0.01 were excluded. A total of 897,426 SNP frequencies distributed across 26,384 scaffolds were available for further analyses.

Imputation Methods
Three methods were used to impute missing SNP data: mean imputation (MNi), k nearest neighbor imputation (kNNi), and random forest imputation (RFi). Imputation was carried out scaffold by scaffold for kNNi and RFi. For MNi, each missing data point x ij for pool i marker j was replaced with the mean x ij of the non-missing values for marker j of other individuals or pools. For kNNi, each missing data point was imputed by replacing it with the weighted average of the data points at the k closest markers (Troyanskaya et al., 2001). Specifically, for each marker j, all other markers were first sorted according to the Euclidean distance to marker j. Each marker was included twice, both in the original and flipped state (1 minus the pool or individual allele frequency), to ensure that markers in strong negative linkage disequilibrium were also considered to impute the marker under analysis. Subsequently, for each row i of marker j, the weighted average of the k closest markers at row i were used to estimating the marker value at data point x ij . The weight of each marker was assigned 1/d 2 where d was the Euclidean distance between marker j and the marker to be weighted. The k parameter was set to k = 6 after testing the accuracy of the imputation for each data set.
For RFi, missing marker values were estimated using a random forest regression algorithm (Breiman, 2001) as implemented in the R package "random Forest" (Liaw and Wiener, 2002). Random forest is a machine-learning algorithm that uses a group of decision trees to determine a classification or to predict a value for a new instance. RFi starts by first imputing all missing marker values using MNi. Subsequently, the algorithm estimates and updates missing markers as follows: (1) for the first marker j, a group average of 100 regression trees were grown (for each regression tree, the algorithm generated a bootstrap sample of non-missing individuals and a random sample of markers), (2) missing values for marker j were predicted as group averages of the 100 trees applied to the other markers, (3) the imputed marker j was updated on the marker matrix, (4) steps one to three were repeated for all the markers, and (5) steps one to four were repeated with new imputed markers, for a maximum of 10 iterations or until the difference between the newly-imputed and the last-imputed dataset began to diverge.
The imputation accuracy for MNi and RFi was estimated by masking 0.1% of observed values for each marker with missing values. After imputing these data points, the accuracy was described using R 2 defined as: where j was iterated across all the masked values. Ten replicates of this simulation were carried out on 10% of randomly-selected scaffolds.

GRM Calculation and Bias Correction
GRM calculations were based on VanRaden (2008), adapted to use allele frequencies (ranging between 0 and 1) rather than allele-variants. First, allele frequencies were arranged in a matrix F ij , with i indexing the samples and j indexing the markers. The matrix was then centered by the mean SNP frequencies (M j = F j -F j ). When working with allele-frequencies, the mean of all allele-frequency samples at a given SNP is equivalent to the minimum allele frequency (MAF orp). M was then used to compute G, as follows: where σ 2 G is a scaling parameter, corresponding to the sum of the expected SNP variance across genotypes, as computed by Ashraf et al. (2014): where m equals the number of markers,p j equals the frequency of the jth marker, and n represents the ploidy number of the breeding material under analysis. The average genotype of a family pool can be considered to be polyploidal genotype with a ploidy level equal to the sum of the ploidy number of the parents used to generate it (a conceptual demonstration of this assumption is shown for F 2 families by Ashraf et al., 2014). Ploidy levels of 2, 4, and 16 were considered for SPs, F 2 families, and SYNs, respectively. Using this scaling factor, the expected diagonal element of G is equal to one plus the inbreeding coefficient. When the GRM includes different breeding materials, we used n = 4, so that diagonal elements were scaled relative to the F 2 families. Because SNP frequencies are estimated by sequencing a finite number of reads (S T ), they are affected by binomial samplings that increase the variance of SNP frequencies (σ 2 G ). This extra variance can be derived using a normal approximation for the binomial distribution, as described in the following equations. The number of alternative alleles observed for each pool at each SNP is distributed as (normal approximation): where S T is the number of observed reads and p is the true marker frequency. The observed allele-frequency estimate (p) from sequence reads is obtained by dividing S A by S T . The sampling distribution ofp at a specific locus in a pool with a true allele frequency p can be described as: The binomial error variance for the genotype estimate is therefore p 1 − p /S T , while the expected binomial variance (σ 2 Bin ) is: which converges to zero as S T increases. For instance, the binomial error variance for a diploid F 2 family will be 3 4 p(1−p) S T . The same conclusion was reported by Ashraf et al. (2014) using a different derivation. Equation 6 generalizes the expression to any ploidy or any number of contributing parents.
When the GRM is computed, the diagonal element obtained represents the sum of squared allele-frequencies over the total number of SNPs (m). This diagonal is inflated because it includes the binomial variances of all the allele frequencies. The expected binomial variance from all m SNPs due to low sequencing depth (σ 2 Bin ) of the family pool i is equal to: wherep j is the observed allele frequency for SNP j, n is the assumed ploidy number of the pool, and S Tij is sequencing depth for pool i and SNP j. The observed marker variance (σ 2 G ) of the pool i is equals to: This variance is inflated due to binomial variance. The inflation (ω) can be defined for each sample as the fraction of the total marker variance that is due to binomial sampling, and is equal to: where Db i is the ith element of the biased diagonal element in G, while Dc i is the corrected element.

Statistical Models and Cross-Validation Schemes
The phenotypic data were analyzed using linear mixed models. Genomic information was incorporated by the Genomic Best Linear Unbiased Prediction (GBLUP) method (Habier et al., 2007;VanRaden, 2008). We adopted the following model: where y is a vector with phenotypic observations, µ is the overall mean, 1 is a vector of ones, X is the design matrix of fixed effects, and t is the vector of trial effects nested within location and year, Z 1 and Z 2 represent design matrices of random factors, i is a vector of genomic breeding values where i ∼ N(0, Gσ 2 i ) where G is the genomic relationship matrix, l is a vector of interaction effects of genotype by location by year where l ∼ N(0, Iσ 2 ily ), and e is a vector of random residuals where e ∼ N(0, Iσ 2 e ). Variance components were estimated using the restricted maximum likelihood method, using the software package DMU (Jensen et al., 1997;Madsen and Jensen, 2013). We compared models using GRMs calculated with genotype datasets imputed using three methods (MNi, kNNi, and RFi) with or without correction for diagonal bias. First, the phenotypes were corrected for fixed effects by running each model on the full dataset. Then, genomic estimated breeding values (GEBVs) were determined by masking phenotypes by using two different cross-validation procedures: a leave-one-out and an across-set cross-validation. In the leave-one-out procedure, one sample per iteration was excluded from the model training data and then the GEBV of the missing data point was predicted using the data from all other samples. In the acrossset validation procedure, the genotypes belonging to each of the three different breeding materials (SPs, F 2 families, and SYNs) were left out and GEBV of these data points then were predicted using the other two sets. The predictive abilities (PA) were computed as the Pearson's correlation coefficient between the phenotype corrected for the fixed effects (averaged across replicates for each sample) and the predicted GEBVs.

Analysis of Simulated Data
SNP datasets were simulated to study the diagonal elements of genomic relationship matrices (GRM) for the three different ryegrass breeding materials (individuals and two types of pools). The GRM diagonal elements reflected the variance of family pools and were affected by at least four factors: (1) genetic drift due to small population size in the pools, (2) the number of contributing parents of the family pool, (3) the extent of inbreeding created in the F 1 multiplication of the family pool, and (4) inaccuracies in the allelefrequency estimates due to low S T values for the genomic data.
We simulated different F 1 and F 2 family population sizes to investigate genetic drift effects when small population sizes are used (due to deviation from the Hardy-Weinberg (HW) equilibrium). GRM data were computed using true allele frequencies. An increase in GRM diagonal elements due to small population size was detectable (Figure 1). However, we found that around 50 individuals per family were sufficient to maintain the HW-equilibrium. Because breeding populations are typically much larger than 50 individuals, the effect of small population size was not further investigated and all simulations were produced using a population size of 50 individuals per family.
GRM values were also computed using simulated, true allele frequencies to investigate differences in the diagonal elements for different breeding materials. The average GRM diagonal for the single plants (SPs, coming from the F 2 families) was equal to 1.25 (Figure 2), reflecting an inbreeding coefficient of 0.25. This result can be explained by the fact that SPs in F 2 are generated by crossing F 1 full sibs with an average co-ancestry of 0.25. However, the average GRM diagonals for the F 2 families and for the SYNs were equal to 1.0, reflecting that there is no inbreeding effect on the mean family genotypes, i.e., the variance of means between the F 2 families and the SYNs is not affected by the inbreeding within the families and SYNs. A theoretical derivation for the SPs and F 2 family variances is given in Appendix I, based on standard quantitative genetic theory. This theoretically verifies the results for SPs and pools with two parents, accounting for both inbreeding and genetic drift. GRMs were also calculated using observed allele frequencies at each S T . The resulting averages of the diagonal elements are reported in Figure 2. A large inflation of the GRM diagonal was observed for a low S T . Moderate inflation was still observed at a rather high S T , and the difference between the expected and observed inbreeding coefficients remained substantial until S T was about 50. Diagonal elements of the GRM (calculated using observed allele-frequencies) were also corrected for binomial sampling error due to low S T , as described in Equation 9. The average corrected diagonal elements are displayed in Figure 2. They showed no inflation at all the S T values we considered.

Real Data Analysis
For practical application, we considered the three breeding materials, SPs (individuals), F 2 families (pools), and SYNs (pools), genotyped in three different assays (using different multiplexing sequencing parameters) to investigate the relevance of bias correction and missing data imputation on the GRM values.
The three assays exhibited different S T scores and different missing data fractions. Specifically, for Assay 1 S T was 12.6 and its missing fraction was 20.4%; for Assay 2, S T was 3.3 and its missing fraction was 58.5%, while for Assay 3, S T was 13.4 and its missing fraction was 9.7%.

GRM Bias Correction and Missing Genotype Imputation
In Figure 3, the diagonal elements of GRMs are displayed as a function of the average sample S T across all markers. This figure shows diagonal elements before ( Figure 3A) and after ( Figure 3B) the bias correction was performed. As described above, based on simulated data and use of pool frequencies  without error, the expected average diagonal value for F 2 families is 1.0. In the real data, the average diagonal value for F 2 families observed before correction was 1.81, and the diagonal elements were not randomly distributed around the mean, showing a correlation with the sample S T (Figure 3A, green dots). After correction, the average diagonal value decreased to 1.35 and the diagonal elements were randomly distributed around the mean (Figure 3B, green dots).
The expected average diagonal value of an SP would be 1.25, if scaled according to its own ploidy level. However, in our study, the GRMs were scaled to the ploidy level of F 2 families, which is twice the ploidy level of the SPs. With this different scaling, the expected average diagonal value of SPs would be 2.5. In the real data, the observed average diagonal value of SPs before bias correction was 3.17, while it declined to 2.47 after the bias correction.
The expected average diagonal value of an SYN would be 1.0, if scaled according to its own ploidy level. However, the GRMs in our study were scaled to the ploidy level of F 2 families, while SYNs were pools of genotypes derived from a polycross of 6 to 10 SPs. Therefore, SYNs have an assumed ploidy between 3-and-5 times higher than the F 2 families and the expected average diagonal value for SYNs with this different scaling should be between 0.33 and 0.20. The observed average diagonal elements of the GRM were divided in two groups, the first (SYNs genotyped in Assay 2 with low S T scores), showed an average value of 3.46, while the second group (SYNs genotyped in Assay 3 with high S T scores) showed an average value of 1.38 ( Figure 3A, blue dots). After correction for depth bias, all the diagonal elements of SYNs clustered together with an average value of 0.38 (Figure 3B, blue dots).
Accuracies associated with imputing missing SNPs were calculated as R 2 between the observed and imputed values and were: 0.5, 0.73, and 0.77 for MNi, kNNi, and RFi, respectively. The computation time required for calculating kNNi was relatively short (2.7 h for the full genomic dataset) on a standard computer, whereas the RFi computation required about 110 times more processing time.

Accuracies of Genomic Predictions
Predictive abilities (PAs) of GBLUP-GP were evaluated by cross validation using GRMs calculated using SNP data imputed with each of the three imputation methods (MNi, kNNi and RFi), and with and without correction for low S T scores. Results are presented for the leave-one-out cross validation strategy within each breeding material (Figure 4) and for the across-set, crossvalidation procedure (Figure 5). Using GRM corrected for low S T bias yielded higher PAs for each of the three breeding materials and for each of the three traits, regardless of the cross-validation strategy used. The RFi imputation was the method that led to the most accurate estimates for breeding values, followed by the kNNi and the MNi imputation methods. The highest PAs were observed when the RFi-imputed data were used together with a correction for low S T scores. We observed larger PAs in scenarios that used MNi imputations than in scenarios where no corrections were made.
When using the leave-one-out cross-validation procedure, the largest PA (predictive ability, correlation of GEBV with corrected phenotype) was for predicting crown rust resistance in F 2 families (PA range: 0.399-0.444), for predicting heading date for synthetic varieties (PA range: 0.719-0.813), and for predicting heading date for single plants (PA range: 0.703-0.742). In the across-set crossvalidation procedure, the largest PA was for seed yield for F 2 families (PA range: 0.379-0.445), seed yield for synthetic varieties (PA range: 0.348-0.492), and heading date for single plants (PA range: 0.52-0.574).
The leave-one-out procedure yielded the highest PAs for single plants and F 2 families, while the across-set cross-validation procedure yielded the highest PAs for synthetic varieties. Finally, we observed that the correction for low S T bias had a larger positive effect on PA than the effect of imputation strategy.

DISCUSSION
In our study, simulations were performed to depict the expected inbreeding of various ryegrass breeding materials (individual single plants, biparental F 2 families and multiparental synthetic varieties as pools) and to quantify bias introduced in estimating inbreeding when low sequence depth (S T ) GBS assays were used. We also derived methods for correcting for bias in genomic relationship matrices (GRM) that were calculated using genomic data at low S T. These simulations yielded unbiased estimates of the relative inbreeding level for different breeding materials.
Phenotypic and genotypic data for the different breeding materials were obtained from a commercial breeding program and were used to show the effectiveness of the proposed bias correction method to increase the predictive ability (PA) in cross validation. We showed that several ryegrass breeding materials could be combined into one unbiased GRM. Predictions of each breeding material were successfully carried out, based on information from the other types of breeding material.

Correction for Bias in GRM
Genotyping by sequencing (sensu Elshire et al., 2011) is a simple and robust genotyping approach that has been proposed to estimate allele frequencies in populations and family-pools (Byrne et al., 2013). The cost of plant phenotyping is relatively low compared to the costs of various genotyping approaches. Therefore, replacing some field evaluations with genotyping is only attractive when genotyping is also inexpensive. From this perspective, GBS is becoming advantageous because its cost per unit is low and it is continually declining. Furthermore, GBS can be made especially cost-effective when coverage depth is low (Barabaschi et al., 2015). However, if the depth of coverage is low, GBS will also produce a high amount of missing data (Beissinger et al., 2013). A low coverage depth (S T ) also decreases the accuracy of allele-frequency estimates in the samples considered, which introduces biases that negatively affect heritability estimates, mapping, and genomic prediction (Ashraf et al., 2014). Ashraf et al. (2014) showed that estimates of allele effects in association studies are biased downwards when allele frequencies are used that contain estimation errors due to low S T . Ashraf et al. (2016) were also the first to report that diagonal elements of GRMs are inflated when genotyping SNPs with low S T . This finding was confirmed in our study and can be explained by inflated diagonals in the GRMs that falsely indicate a high amount of inbreeding in the analyzed samples. Moreover, the simulations presented in our work showed that bias was high at low S T , but still detectable at medium-to-high coverage depths (∼50).
Measuring error at low S T represents a limitation to the routine use of genetic markers in ryegrass breeding schemes because: (1) low-S T assays are often planned to minimize the costs, (2) different breeding materials, which are differently affected by the magnitude of sequencing errors, will often have to be included in the same genomic prediction analysis, and (3) new rounds of genotyping assays, which may differ in their coverage depth, will have to be used every year. One important outcome of our work is that we were able to develop a method that can efficiently remove bias due to measuring errors in allele frequencies (even at very low coverage depths) and can be extended to all the different breeding materials (individuals and various pools) used in the ryegrass breeding pipeline.
Increasing multiplex sequencing can be a successful strategy for reducing the cost or for increasing the number of genotyped samples that will enter the training population. Several studies have shown that the size of the training population is more important for predictive abilities, than the bias and missing data due to low coverage (Bassi et al., 2015). Finding the balance between cost and the size of the training set will be one of the principal challenges hindering future implementation of genomic prediction in ryegrass breeding programs.
We found that PA increased when bias-corrected GRMs were used. Also, in almost all cases, the bias-correction in the GRM provided a greater improvement in PA, than the improvements obtained from replacing the simple mean imputation with more advanced methods for imputation of missing genotype data. This indicates that bias due to low depth of coverage was more important than the loss of information resulting from an increased amount of missing data associated with a low depth of coverage.

Imputation of Missing Genotype Data
In the procedure of VanRaden (2008) for building GRMs, missing genotype data is replaced with the mean value of the non-missing genotypes, which we called mean imputation (MNi) method. Other, more advanced, imputation methods can be applied, but because our data consisted of allele frequency estimates on pools of individuals, and because there is no high-quality reference genome for ryegrass, we could only consider haplotype-and map-independent methods. We compared the k-nearest-neighbor (kNNi) and the random-forest (RFi) imputation methods as alternatives for MNi, and we found that there were two main advantages in using the kNNi and RFi methods: they were both map-independent and they could be relatively accurate.
The main factors that affect imputation accuracy of RFi and kNNi are the minor allele frequency (MAF) of the markers, the degree of relatedness between samples, and the linkage disequilibrium (LD) between markers (Rutkoski et al., 2013). Several studies have shown that SNP datasets with low MAF are easier to impute. This is because, at low MAF, missing markers can be quite accurately inferred just by using the most-frequent allele in the dataset (Hickey et al., 2012;Rutkoski et al., 2013). Moreover, the presence of closely-related samples in the dataset allows one to impute a missing marker data point by using information from markers more related to it, thus increasing the accuracy of the imputation (Hickey et al., 2012;Rutkoski et al., 2013).
Our study was conducted on ryegrass breeding material that consisted of groups of related samples sharing one or two parental lines. Moreover, this material had already been subjected to several rounds of selection, which may have reduced the variance in marker frequencies, resulting in rather low MAFs. These two elements enabled us to obtain adequate imputation accuracies. The ryegrass LD has been shown to decay after a few hundred bp, no matter what breeding material is used (Fè et al., 2015). Short-ranging LD has been related to less accurate imputation performances in several studies (Hickey et al., 2012;Rutkoski et al., 2013). This is because there is a reduced chance of using highly-correlated markers to aid imputation of missing data-points. Despite the low LD in ryegrass, the high marker density in our panel ensured that at least a proportion of them were close enough to be highly correlated, permitting us to obtain high imputation accuracies.
Another important finding of our study was the improved PAs we obtained after using both RFi and kNNi imputation procedures, compared to the standard MNi imputation. Although the gain in PA due to the choice of imputation strategy was not as large as the one resulting from our bias correction, a gain was still observed in all the different scenarios we examined. This result should promote the routine use of one of these two imputation methods (over standard MNi) to increase PA, by only adding a limited computing cost.

Genomic Prediction
The PAs we obtained demonstrate that using the genomic prediction approach for ryegrass breeding is very promising. Similar genomic prediction performances were reported for F 2 families by Fè et al. (2015Fè et al. ( , 2016. However, we showed that it was possible to predict the breeding values of F 2 families, single plants, and synthetic varieties and still ensure medium to high PAs, both by using data from the same breeding material, as well as from using the other kinds of breeding material as training data. The potential for obtaining genomic-estimated breeding values (GEBV) for both individuals and pools was a key finding of ours that should lead to improvements in ryegrass breeding programs. For instance, it will be possible to accurately select single plants and generate synthetic varieties based on GEBVs of single plants. Phenotypic scores of SPs are difficult to generate for some key traits (e.g., yield-related traits), which can only be measured in plots consisting of several plants (Vogel and Pedersen, 1993). A precise evaluation of SP performances for these traits would require cloning each SP and use test-cross procedures (Hayes et al., 2013). However, cloning and maintaining a SP for the time needed to complete the evaluation of the test crosses is often considered too costly and too time-consuming. Therefore, SPs are often randomly selected from highly-performing F 2 families, assuming that the performance of the F 2 family reflects the SP's performance. Using SP GEBVs will increase the accuracy of SP selection for these traits that cannot be directly measured on single plants. Additionally, using SP GEBVs could allow a reduction in the generation interval required in breeding scenarios, for instance, because the phenotypic evaluation of F 2 families could be avoided.
Although the predictive abilities (PAs) of SPs was considerably lower than the PAs observed for pools (F 2 families and SYNs), the PA for a SYN resulting from crossing selected SPs would be higher. This is because the predicted breeding value of a SYN would be the average of the predicted breeding values of the SP parents of the SYN, and as explained previously, this average is more accurate due the averaging of the Mendelian-sampling genetic components in the SPs' breeding values.