^{1}

^{2}

^{*}

^{2}

^{1}

^{1}

^{2}

^{1}

^{2}

Edited by: Charles Chen, Oklahoma State University, United States

Reviewed by: Changwei Shao, Yellow Sea Fisheries Research Institute (CAFS), China; Zibei Lin, La Trobe University, Australia

*Correspondence: Antoine Allier,

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

The implementation of genomic selection in recurrent breeding programs raises the concern that a higher inbreeding rate could compromise the long-term genetic gain. An optimized mating strategy that maximizes the performance in progeny and maintains diversity for long-term genetic gain is therefore essential. The optimal cross-selection approach aims at identifying the optimal set of crosses that maximizes the expected genetic value in the progeny under a constraint on genetic diversity in the progeny. Optimal cross-selection usually does not account for within-family selection, i.e., the fact that only a selected fraction of each family is used as parents of the next generation. In this study, we consider within-family variance accounting for linkage disequilibrium between quantitative trait loci to predict the expected mean performance and the expected genetic diversity in the selected progeny of a set of crosses. These predictions rely on the usefulness criterion parental contribution (UCPC) method. We compared UCPC-based optimal cross-selection and the optimal cross-selection approach in a long-term simulated recurrent genomic selection breeding program considering overlapping generations. UCPC-based optimal cross-selection proved to be more efficient to convert the genetic diversity into short- and long-term genetic gains than optimal cross-selection. We also showed that, using the UCPC-based optimal cross-selection, the long-term genetic gain can be increased with only a limited reduction of the short-term commercial genetic gain.

Successful breeding requires strategies that balance immediate genetic gain with the maintenance of population diversity to sustain long-term progress (

Historically, breeders used to select the best individuals based on phenotypic observations, considered as a proxy of their breeding value, i.e., the expected value of their progeny. In order to better estimate the breeding value of individuals, phenotypic selection has been complemented by pedigree-based prediction of breeding values (

Several approaches have been suggested to balance the short- and long-term genetic gain while selecting crosses in GS. In line with

The CSI may consider crosses individually; i.e., the interest of a cross does not depend on the other crosses in the selected set. In classical recurrent GS, candidates with the highest GEBVs are selected and intercrossed to maximize the expected progeny mean in the next generation. In this case, the CSI is simply the mean of parental GEBVs. However, such an approach maximizes neither the expected response to selection in the progeny, which involves genetic variance generated by Mendelian segregation within each family, nor the long-term genetic gain. Alternative measures of the interest of a cross have been proposed to account for parent complementarity, based on within cross variability and expected response to selection. ^{2}) has been derived in

Alternatively, one can consider a more holistic CSI for which the interest of a cross depends on the other selected crosses. This is the case in optimal contribution selection (

In plant breeding, one typically has larger biparental families than in animal breeding. Especially with GS, the selection intensity within-family can be largely increased so that plant breeders capitalize much more on the segregation variance within families than animal breeders. In previous works, the genetic gain (

In this study, we propose to adjust

Schematic view of the simulated breeding program.

We simulated a breeding program to compare the effect of different CSIs on short- and long-term genetic gain in a realistic breeding context considering overlapping and connected generations (i.e., cohorts) and the use of doubled haploid (DH) technology to derive progeny (

Each simulation replicate started from a population of 40 founders sampled among 57 Iodent maize genotypes from the Amaizing project (

We initiated a virtual breeding program starting from the founder genotypes with a burn-in period of 20 years that mimicked recurrent phenotypic selection. Burn-in started by randomly crossing the 40 founders into 20 biparental families, i.e., two-way crosses, during the first 3 years to initiate three overlapping cohorts. In each cohort, 80 DH progeny genotypes per cross were simulated. Phenotypes were simulated considering the genotype at QTLs, an error variance corresponding to a trait repeatability of 0.4 in the founder population and no genotype by environment interactions. For phenotyping, every individual was evaluated in four environments in 1 year. Since no secondary trait was considered and sufficient seed production for extensive progeny testing was assumed, we simulated a unique within-family selection of the 5% best progeny (i.e., 4 DHs) that is a common selection intensity in maize breeding. During burn-in, we first considered within-family phenotypic selection and then used the 50 DHs with the largest phenotypic mean as potential parents of the next cohort. These were randomly mated, i.e., without any constraint on parental contributions, to generate 20 biparental families of 80 DH lines. After 20 years of burn-in, this created extensive linkage disequilibrium as often observed in elite plant breeding programs (e.g.,

We considered different scenarios for genome-wide marker effects and progeny evaluation. In order to eliminate the uncertainty caused by the estimation of marker effects, we first compared several CSI assuming that we have access to the positions and effects of the 1,000 QTLs (referred to as TRUE scenario). For a representative subset of the CSI showing differentiated results in the TRUE scenario, we also considered a more realistic scenario where the effects of QTLs are unknown and selection was based on the effects of 2,000 noncausal SNPs randomly sampled over the genome. In this scenario, marker effects were obtained by back-solving (

Considering ^{th}

where _{1} (respectively _{2}) is a (_{1}_{2}) is a (|

The (_{T}_{1},…,_{N}_{p}^{th}_{T}

We define the constraint on diversity (

where _{j}

In the OCS, as defined above, the progeny derived from the

Two inbred lines _{1} and _{2} are considered as parental lines for a candidate cross _{1}× _{2} and (_{1}, _{2})’ denotes their genotyping matrix. Following

where _{1}, _{2} and β_{T}

To follow parental contributions, we consider _{1} parental contribution as a normally distributed trait (_{1} and _{2} instead of using identity-by-descent parental contributions (_{C}_{1} to follow _{1} genome contribution at QTLs as _{1} contribution in the progeny before selection _{C}_{1}=0.5(_{1}β_{C}_{1}+_{2}β_{C}_{1}+1). The progeny variance _{1} contribution in the progeny before selection is computed using Eq. 4b by replacing β_{T}_{C}_{1} The progeny mean for _{2} contribution is then defined as _{C}_{2} = 1-_{C}_{1}.

Following _{1} contribution in progeny as follows:

The expected mean performance of the selected fraction of progeny, i.e., UC (_{1}×_{2} is as follows:

where _{1} and _{2} genome contributions in the selected fraction of progeny are as follows (

Accounting for within-family selection intensity ^{(}^{i}^{)}(

The constraint on diversity ^{(}^{i}^{)}(

where ^{(}^{i}^{)} is defined like _{1} and _{2} by the post-selection parental contributions ^{(}^{i}^{= 0)}(^{(}^{i}^{= 0)}(

In practice, one does not evaluate only one set of crosses but several ones in order to find the optimal set of crosses to reach a specified target that is a function of ^{(}^{i}^{)}(^{(}^{i}^{)}(

where ^{*}] is the minimal diversity constraint at time ^{*}], where ^{*}∈ℕ^{*} is the time horizon when the genetic diversity ^{*}) = ^{*} should be reached. In this study,

where ^{0} is the initial diversity at

Targeted diversity trajectories for three different shape parameters (s = 1, linear trajectory; s = 2, quadratic trajectory; and s = 0.5, inverse quadratic trajectory) for fixed initial diversity (He^{0} = 0.3) at generation 0 and targeted diversity (He* = 0.01) at generation 60 (t* = 60). We considered in this study only linear trajectories (s = 1).

We considered different cross-selection approaches varying in the within-family selection intensity (^{(}^{i}^{)}(^{(}^{i}^{)}(^{(}^{i}^{)}(^{(}^{i}^{= 0)}(^{(}^{i}^{= 2.06)}(^{*} = {0.01, 0.10, 0.15} that should be reached in ^{*} = 60 years. We defined the OCS methods, further referred to as OCS-He*, with ^{(}^{i}^{= 0)}(^{(}^{i}^{= 0)}(^{(}^{i}^{= 2.06)}(^{(}^{i}^{= 2.06)}(

Summary of tested cross-selection indices (CSI) in TRUE scenario defined for a set of crosses

Cross-selection index |
Gain term | Diversity term |
---|---|---|

PM | ^{(}^{i}^{= 0)}( |
– |

OCS-He* (3 different He*) | ^{(}^{i}^{= 0)}( |
^{(}^{i}
^{= 0)}( |

UC | ^{(}^{i}^{= 2.06)}( |
– |

UCPC-He* (3 different He*) | ^{(}^{i}^{= 2.06)}( |
^{(}^{i}^{= 2.06)}( |

He* = {0.15; 0.10; 0.01} to be reached linearly (s = 1) at the end of simulation (t^{*} = 60 years). V^{(i = 0)}(^{(i = 2.06)}(^{(i = 0)}(^{(i = 2.06)}(

Simulation 1 aimed at evaluating the interest to account for the effect of selection on parental contributions, i.e., post-selection parental contributions (using UCPC), compared to ignore selection, i.e., ante-selection parental contributions (similarly as in OCS), to predict the genetic diversity (He) in the selected fraction of progeny of a set of 20 crosses (using Eqs. 9 and 3, respectively). We considered a within-family selection intensity corresponding to selecting the 5% most performant progeny. We used the same genotypes, genetic map, and known QTL effects as for the first simulation replicate of the PM CSI in the TRUE scenario (

We ran 10 independent simulation replicates of all eight CSI summarized in _{10}(ဃ

The interest of long-term genetic gain relies on the ability to breed at long term, which depends on the short-term economic success of breeding. Following this rationale, we penalized strategies that compromised the short-term commercial genetic gain using the discounted cumulative gain following _{T}^{T}_{T}_{∈[1,60]} = 1/60; i.e., the same importance was given to all cohorts. We compared different values of ρ and reported results for ρ = 0, ρ = 0.04 giving approximatively seven times more weight to short-term gain (after 10 years) compared to long-term gain (after 60 years) and ρ = 0.2 giving nearly no weight to gain after 30 years of breeding.

We also measured the additive genic variance at QTLs _{j}_{j}

Compared to the usual approach that ignores the effect of selection on parental contributions, accounting for the effect of within-family selection increased the squared correlation (

Squared correlations (

Mean prediction error (predicted − empirical) of predicting the genetic diversity (He) in the selected fraction of progeny of a set of 20 biparental crosses in the TRUE scenario depending on the mean difference of performance between parents (Delta true breeding value TBV). Mean prediction error is measured as the predicted He − empirical post-selection He, considering

Considering known QTL effects (TRUE scenario), we observed that UC yielded significantly higher short- and long-term genetic gain at commercial level (_{10}) than PM (on average, _{10} = 9.316 [±0.208] compared to 8.338 [±0.195] 10 years post burn-in and _{10} = 18.293 [±0.516] compared to 15.744 [±0.449] 60 years post burn-in;

Genetic gains for different cross-selection indices in the TRUE scenario (PM: parental mean, UC: usefulness criterion, OCS-He*: optimal cross-selection and UCPC-He*: UCPC-based optimal cross-selection) according to the generations. _{10}) measured as the mean of the 10 best progeny, and _{10} relative to selection based on parental mean (PM).

Genetic and genic additive variances for different cross-selection indices in the TRUE scenario (PM: parental mean, UC: usefulness criterion, OCS-He*: optimal cross-selection, and UCPC-He*: UCPC-based optimal cross-selection) according to the generations.

Genetic diversity at QTLs for different cross-selection indices in the TRUE scenario (PM: parental mean, UC: usefulness criterion, OCS-He*: optimal cross-selection, and UCPC-He*: UCPC-based optimal cross-selection) according to the generations.

Considering known QTL effects (TRUE scenario), the tested optimal cross-selection methods OCS-He* and UCPC-He* showed lower short-term genetic gain at the whole progeny level (_{10}; _{10}; _{10} = 21.925 [±0.532] and 21.892 [±0.525];

For all targeted diversities and all simulation replicates, accounting for within-family selection (UCPC-He*) yielded a significantly higher short-term commercial genetic gain (_{10}) after 5 and 10 years compared to OCS-He* [_{10}) after 60 years was also higher for UCPC-He* than for OCS-He* with He* = 0.01 in the 10 simulation replicates (on average, _{10} = 22.869 [±0.641] compared to 21.892 [±0.525]) and less importantly with He* = 0.10 in nine out of 10 replicates (on average, _{10} = 22.474 [±0.645] compared to 21.925 [±0.532]). However, for He* = 0.15, UCPC-He* outperformed OCS-He* at the long term in only three out of 10 replicates (on average, _{10} = 20.665 [±0.573] compared to 20.938 [±0.553]) [

Discounted cumulative gain in TRUE scenario for three different parameters ρ giving more weight to short-term gain in different levels and assuming known QTL effects (TRUE scenario).

Cross-selection index (CSI) | Discounted cumulative gain | ||
---|---|---|---|

ρ = 0 | ρ = 0.04 | ρ = 0.2 | |

UCPC - He* = 0.01 | 15.949 (±0.398) | 12.321 (±0.284) | 6.682 (±0.143) |

UCPC - He* = 0.10 | 15.174 (±0.386) | 11.788 (±0.280) | 6.593 (±0.158) |

UC | 14.408 (±0.355) | 11.689 (±0.266) | 6.822 (±0.145) |

OCS - He* = 0.01 | 15.148 (±0.346) | 11.675 (±0.262) | 6.360 (±0.149) |

OCS - He* = 0.10 | 14.630 (±0.349) | 11.278 (±0.264) | 6.230 (±0.149) |

UCPC - He* = 0.15 | 14.205 (±0.334) | 11.176 (±0.250) | 6.454 (±0.149) |

OCS - He* = 0.15 | 14.056 (±0.337) | 10.884 (±0.250) | 6.103 (±0.155) |

PM | 12.609 (±0.280) | 10.392 (±0.217) | 6.345 (±0.155) |

Mean discounted cumulative gain with ρ = 0 (constant weight along years), ρ = 0.04 (decreasing weight along years) and ρ = 0.2 (nearly null weights after 30 years) on the ten independent replicates. CSI are ordered in decreasing discounted cumulative gain with ρ = 0.04.

For a given He*, the additive genic variance (

Considering estimated marker effects (GS scenario) yielded lower genetic gain than when considering known marker effects [_{10} = 8.338 [±0.237] compared to 7.713 [±0.256] 10 years post burn-in and _{10} = 15.367 [±0.358] compared to 13.287 [±0.436] 60 years post burn-in; _{10} = 16.398 [±0.426] compared to 14.438 [±0.320] 40 years post burn-in and _{10} = 18.161 [±0.470] compared to 15.367 [±0.358] 60 years post burn-in; _{10} = 8.162 [±0.208] compared to 7.734 [±0.237] 10 years post burn-in and _{10} = 11.881 [±0.272] compared to 11.313 [±0.323] 20 years post burn-in; _{10} = 16.398 [±0.426] compared to 15.850 [±0.384] 40 years post burn-in and _{10} = 18.161 [±0.470] compared to 17.528 [±0.438] 60 years post burn-in; _{10} = 6.402 [±0.166] compared to 7.713 [±0.256] 10 years post burn-in and _{10} = 10.810 [±0.329] compared to 13.287 [±0.436) 60 years post burn-in;

Evolution of different variables for different cross-selection indices according to the generations in the GS scenario (PM, parental mean; UC, usefulness criterion; OCS-He*, optimal cross-selection; and UCPC-He*, UCPC-based optimal cross-selection for He* = 0.01) and in the PS scenario (PM, parental mean). _{10}), and _{10} relatively to PM (GS), genetic gain is measured on true breeding values.

Accounting for within-family selection increased the squared correlation and reduced the mean error of post-selection genetic diversity prediction (

In a first approach, we considered no constraint on diversity during cross-selection and compared cross-selection maximizing the UC or maximizing the PM in the TRUE scenario, assuming known QTL effects and positions. The UC yielded higher short-term genetic gain at commercial level (_{10}; _{10}) and whole progeny level (G) compared to intercrossing the best candidate parents (PM). This long-term gain was driven by a higher additive genic variance at QTLs (

Assuming known marker effects, we observed that considering a constraint on diversity, i.e., optimal cross-selection, always maximized the long-term genetic gain, at the cost of a variable penalty for short-term gain, compared to no constraint on diversity (e.g., UC). We further compared the OCS (

Short-term economic returns of a breeding program condition the resources invested to maintain/increase response to selection and therefore long-term competitive capacity. Hence, to fully take advantage of their benefit at long term, it is necessary to make sure that tested breeding strategies do not compromise too much the short-term commercial genetic gain. For this reason, we considered the discounted cumulative commercial gain following

In simulations, we first considered 1,000 QTLs with known additive effects sampled from a centered normal distribution. For a representative subset of CSIs (PM, UC, UCPC-He*, and OCS-He* with He* = 0.01; ^{2}) corresponded to the variance of the predicted breeding values, which are shrunk compared to TBVs, depending on the model accuracy (referred to as variance of posterior mean [VPM] in Lehermeier et al.). An alternative would be to consider the marker effects estimated at each sample of a Monte Carlo Markov Chain process, e.g., using a Bayesian ridge regression, to obtain an improved estimate of the additive genetic variance (referred to as posterior mean variance [PMV] in

In practice, QTL effects are unknown, so the selection of progeny cannot be based on TBVs, and thus the selection accuracy (^{2}.

In this study, we assumed fully homozygous parents and two-way crosses. However, neither the optimal cross-selection nor UCPC-based optimal cross-selection is restricted to homozygote parents. Considering heterozygote parents in optimal cross-selection is straightforward. Following the extension of UCPC to four-way crosses (

We considered a within-family selection intensity corresponding to the selection of the 5% most performant progeny as candidates for the next generation. Equal selection intensities were assumed for all families, but in practice due to experimental constraints or optimized resource allocation (e.g., generate more progeny for crosses showing high progeny variance but low progeny mean), within-family selection intensity can be variable. Different within-family selection intensities (see Eqs. 8 and 9) can be considered in UCPC-based optimal cross-selection, but an optimization regarding resource allocation of the number of crosses and the selection intensities within crosses calls for further investigations. However, in marker-assisted selection schemes based on QTL detection results (

Proposed UCPC-based optimal cross-selection was compared to OCS in a targeted diversity trajectory context. We considered a linear trajectory, but any genetic diversity trajectory can be considered (e.g.,

We considered candidate parents coming from the three last overlapping cohorts (

Our simulations also assumed fixed environments and a single targeted trait over 60 years. However, in a climate change context and with rapidly evolving societal demands for sustainable agricultural practices, environments and breeders objectives will likely change over time. In a multitrait context, the multiobjective optimization framework proposed in

Publicly available datasets were analyzed in this study. This data can be found here:

ST, CL, AC, and LM supervised the study. AA performed the simulations and wrote the manuscript. ST worked on the implementation in the simulator. All authors reviewed and approved the manuscript.

This research was funded by RAGT2n and the ANRT CIFRE grant no. 2016/1281 for AA.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The Supplementary Material for this article can be found online at: