ORIGINAL RESEARCH article

Front. Plant Sci., 08 October 2021

Sec. Plant Breeding

Volume 12 - 2021 | https://doi.org/10.3389/fpls.2021.720123

Increased Predictive Accuracy of Multi-Environment Genomic Prediction Model for Yield and Related Traits in Spring Wheat (Triticum aestivum L.)

  • 1. Borlaug Institute for South Asia, Ludhiana, India

  • 2. Department of Biological Sciences and Biotechnology, Institute of Advanced Research, Gandhinagar, India

  • 3. International Maize and Wheat Improvement Center, New Delhi, India

  • 4. Department of Plant Pathology, Kansas State University, Manhattan, KS, United States

  • 5. Department of Biotechnology, Thapar Institute of Engineering & Technology, Patiala, India

  • 6. Department of Plant Resources and Environment, Jeju National University, Jeju-si, South Korea

  • 7. Global Wheat Program, International Maize and Wheat Improvement Center, Texcoco, Mexico

Article metrics

View details

17

Citations

4,4k

Views

1,3k

Downloads

Abstract

Genomic selection (GS) has the potential to improve the selection gain for complex traits in crop breeding programs from resource-poor countries. The GS model performance in multi-environment (ME) trials was assessed for 141 advanced breeding lines under four field environments via cross-predictions. We compared prediction accuracy (PA) of two GS models with or without accounting for the environmental variation on four quantitative traits of significant importance, i.e., grain yield (GRYLD), thousand-grain weight, days to heading, and days to maturity, under North and Central Indian conditions. For each trait, we generated PA using the following two different ME cross-validation (CV) schemes representing actual breeding scenarios: (1) predicting untested lines in tested environments through the ME model (ME_CV1) and (2) predicting tested lines in untested environments through the ME model (ME_CV2). The ME predictions were compared with the baseline single-environment (SE) GS model (SE_CV1) representing a breeding scenario, where relationships and interactions are not leveraged across environments. Our results suggested that the ME models provide a clear advantage over SE models in terms of robust trait predictions. Both ME models provided 2–3 times higher prediction accuracies for all four traits across the four tested environments, highlighting the importance of accounting environmental variance in GS models. While the improvement in PA from SE to ME models was significant, the CV1 and CV2 schemes did not show any clear differences within ME, indicating the ME model was able to predict the untested environments and lines equally well. Overall, our results provide an important insight into the impact of environmental variation on GS in smaller breeding programs where these programs can potentially increase the rate of genetic gain by leveraging the ME wheat breeding trials.

Introduction

Wheat (Triticum aestivum L.) is an essential cereal to secure global food security (Curtis and Halford, 2014). Significant efforts are needed to accelerate high-yielding varieties to fulfill future global wheat demand by 2050 (Hellin et al., 2012). Hence, the enhancement of grain yield (GRYLD) is a foremost target for wheat breeders. GRYLD is a complex trait administered by many small-effect loci with significant loci × loci interactions (Arzani and Ashraf, 2017; Sehgal et al., 2017). Moreover, the GRYLD trait is associated with strong genotype × environment interaction, which makes its genetic enhancement a difficult work.

Genomic selection (GS) integrates genome-wide dense markers and, as presented by Meuwissen et al. (2001), is a different marker-assisted selection approach. GS proves to be a powerful tool to improve the selection accuracy and prediction for quantitative traits in crop breeding (Crossa et al., 2017). GS utilizes a large set of, usually unidentified markers, spread over the whole genome in the same way as every quantitative trait locus (QTL) is in linkage disequilibrium (LD). GS is particularly beneficial for traits that cannot be evaluated on a few plants and for traits that are hard to estimate. It is still a vital issue for plant breeders to upsurge the accuracy of genomic prediction for selecting the advanced breeding lines.

The GS has been widely used in wheat breeding to predict various traits, such as yield, disease resistance, grain weight, heading, iron and zinc contents, end-use quality, and physiological traits (Charmet et al., 2014; Velu et al., 2016; Hayes et al., 2017; Juliana et al., 2017a,b; Norman et al., 2017; Lozada et al., 2019; Tomar et al., 2021). As such, GS embraces the prospects for the genomic enhancement of qualitative and quantitative traits. Many available GS models have been tested on various breeding and trait scenarios. Earlier numerous comparative studies of the GS model predictions in wheat showed that Random Forest and Reproducing Kernel Hilbert Space models performed better for traits of interest. However, any single GS model could not outperform other models (Pérez-Rodríguez et al., 2012; Charmet et al., 2014). Earlier studies have stated that many interconnected factors impact the overall model performance (Jannink et al., 2010; Heslot et al., 2012), such as heritability, population structure, statistical models, i.e., parametric and nonparametric models, cross-validation (CV) approaches, the genetics of traits, training population size and composition, marker density, and LD among markers and QTLs (Jannink et al., 2010; Pérez-Rodríguez et al., 2012; Crossa et al., 2017; Norman et al., 2018; Lozada et al., 2019).

The GS delivers the promise to accelerate genetic gain by increasing precision in selecting and shortening the breeding cycles. However, GS is a relatively new and advanced method for smaller and low-resource South Asian wheat breeding programs. Previously, substantial progress has been made in testing and validating various models for GRYLD and related traits in wheat in South Asia, albeit on larger breeding populations (De los Campos et al., 2009; Crossa et al., 2010, 2011, 2016; Heffner et al., 2011; Burgueño et al., 2012; Pérez-Rodríguez et al., 2012; Rutkoski et al., 2015; Juliana et al., 2017a,b, 2019; González-Camacho et al., 2018). These studies have highlighted the impact of environment and genotype × environment on the GS model performance. Therefore, to optimize the genetic gain from GS, the group of field-testing environments can be leveraged.

In this study, high-yielding, advanced wheat breeding lines from The International Maize and Wheat Improvement Center (CIMMYT) were evaluated for two consecutive wheat seasons (2017 and 2018) to adapt to the diverse environments of North and Central India. To evaluate the performance of multi-environment (ME) GS models on our specific set of selection environments, we tested different GS CV schemes mimicking the breeding schemes where untested lines and environmental performance are highly valuable to achieve the desired selection gains. This study is highly relevant particularly in the South Asian context where trial sizes are relatively small and broadly adapted wheat lines are sought after.

Materials and Methods

Plant Material

A set of 141 South Asian spring wheat lines (T. aestivum L.) were selected from the International Yield Trial of CIMMYT International Nurseries (elite germplasm). These lines constitute a diverse association panel. The seeds of 141 genotypes were obtained from the Germplasm Resource Unit, CIMMYT (Mexico). Detailed information with a pedigree for each genotype is given in Supplementary Table 1.

Field Trials and Phenotypic Evaluation

The panel of selected lines was evaluated in field trials at the Borlaug Institute for South Asia (India) at Jabalpur (JBL) (23°14′00.6N and 80°04′40.7E) and Ludhiana (LDH) (30°59′28.74N and 75°44′10.87E), locations during the crop season for 2 years (2017 and 2018), genotypes were phenotyped and evaluated across all trials for four traits [days to maturity (DAYSMT), days to heading (DTHD), GRYLD, and thousand-grain weight (TGW)] (Supplementary Table 2). The experiment was conducted in two replications following the Randomized Block Design (RBD). The normal agronomic practice was followed for trial management. The row-to-row distance was maintained at 20 cm.

Genotyping-by-Sequencing and SNP Filtering

Genomic DNA was extracted from the fresh leaves of seedling wheat using the modified cetyltrimethylammonium bromide (CTAB) method (Dreisigacker et al., 2016). Genotyping-by-sequencing (GBS) was performed in Illumina HiSeq 2500 using a protocol suggested by Poland et al. (2012). Single nucleotide polymorphism (SNP) calling was performed using TASSEL version 5.2.43 (Bradbury et al., 2007) using the TASSEL-GBSv2 pipeline. Using Beagle version 4.1, the missing data were imputed with default settings. After quality control (filter criteria: sample call rate > 0.8, Minor allele frequency (MAF) ≥ 0.05, SNP call rate > 0.7), 14,563 polymorphic SNPs and 141 genotypes were selected for the follow-up analysis (Supplementary Table 3). Among polymorphic SNP markers, 40.66, 50.66, and 8.68% were from the A, B, and D genomes, respectively. With a genomic coverage of 13.9 GB and 14,563 markers across the genome, the average marker density was one marker per 0.95 Mb. The highest marker density with one marker per 0.54 Mb of chromosome 2B and the lowest marker density with one marker per 6.854 Mb at chromosome 4D were observed.

Statistical Analysis of Phenotypes

Each location-year combination is treated as a distinct environment for analysis purposes. Broad-sense heritability for each trait/environment combination was estimated at the plot level, and raw phenotypic values were adjusted to derive the best linear unbiased predictions (BLUPs) (Supplementary Table 4) for each trait at each environment using META-R (Alvarado et al., 2020) by using the following formula:

where Yik is the trait of interest, μ is the mean effect, Repi is the effect of the ith replicate, Genk is the effect of the kth genotype, ϵikis the error associated with the ith replication and the kth genotype, which is assumed to be normally and independently distributed, with mean 0 and homoscedastic variance. For across environments, Yijk is the trait response and the ith environment, Repj(Envi) is the effect of jth Rep in the ith environment, and Envi×Genk is the environment × genotype interaction. The resulting analysis produced the adjusted trait phenotypic values in the form of BLUPs within and across environments. The BLUPs model considers genotypes as random effects, minimizing the effect of screening time and other environmental effects.

In addition, the components of the phenotypic variance of a given trait at an individual environment and across environments were also extracted to calculate the broad-sense heritability using the formula as follows:

where and are the genotype and error variance components, respectively, is genotype × environment interaction variance, nEnvs is the number of environments, and nReps is the number of replicates. All effects are considered random for calculating the BLUPs (Supplementary Table 4) and the broad-sense heritability. The BLUPs phenotypic distributions of GRYLD and other traits at each environment were plotted to check normality assumptions. Phenotypic and genetic correlations were calculated for each trait and environment combination in R software version 4.0.2. (R Core Team, 2019) using FactoMineR version 2.4 (Lê et al., 2008) and factoextra version 1.0.7 (Kassambara and Mundt, 2020).

Baseline Single-Environment (SE) Genomic BLUP Model (GBLUP), CV Schemes, and Predictive Ability

The baseline SE genomic prediction analysis was implemented in the BWGS program (Charmet et al., 2020). BWGS performs a GBLUP analysis using a marker-based relationship matrix. CV delivers an unbiased evaluation for the performance of a GS model; therefore, a 5-fold CV approach was implemented for reducing the unwanted bias (Kohavi, 1995), where the genotypes (for each trait separately) were randomly split into five equal-sized folds. SE_CV1 model was fitted with missing phenotypic values for the tested individuals. Prediction accuracy (PA) was subsequently calculated as the correlation of predicted breeding values with the observed phenotypic values for the missing genotypes. This step was repeated for each environment and fold separately. The genomic PA was then calculated by iteratively assigning 1-fold as the validation set and the remaining folds as the training set. This five-fold validation process was repeated 50 times to randomly shuffle the lines in each fold. The accuracy of the genomic predictions was measured as the Pearson's correlation between the predicted and actual trait BLUPs.

A mixed model of the simplified form was fitted for genomic predictions as follows:

where y is a vector of adjusted phenotypes, X is a design matrix relating the fixed effects to each genotype, b is a vector of fixed effects, Z is a design matrix connecting records to genetic values, g is a vector of additive genetic effects for a genotype, and e is a vector of random normal deviates with variance .

Advanced ME GBLUP Model, CV Schemes, and Predictive Ability

The advanced ME genomic prediction analysis was implemented in Solving Mixed Model Equations in the R (sommer) package (Covarrubias-Pazaran, 2016). Two types of ME_CV schemes representing actual breeding scenarios were implemented. The first scenario represents a use case where some genotypes are missing across all environments (ME_CV1). ME_CV1 was fitted by masking the phenotypic values of genotypes belonging to the test set across all environments. PA was calculated as the correlation of predicted and observed phenotypic values for the missing genotypes at each environment separately. In the second scenario, the entire trial or all genotypes are missing at one of the environments (ME_CV2). ME_CV2 was fitted by masking the phenotypic values of all lines in an SE. The trained model was then used to predict the breeding values of lines in the missing environment. PA was calculated as the correlation of predicted and observed phenotypic values of the tested lines. The CV schemes are illustrated in Figure 1.

Figure 1

In ME genomic predictions, the SE model was rewritten and implemented as follows:

where yij represents response of jth line in the ith environment (i = 1, 2,……i, j = 1, 2,…… j; gj is the effect of jth line with g = (g1……..gj)T~N(0, Gg), is the genomic variance, Gg is the genomic relationship matrix. Ei represents the effect of the ith environment. gEij is the random term that takes into account the interaction between the genomic effect of jth line and the ith environment with gE= (g1………gj)T~N (0, G), where is the interaction variance. Eij is a random residual effect of the jth line in the ith environment [N (0, )], where is the residual variance.

Results

Heritability, Correlations, and Trait Characterization

A range of variation was detected for GRYLD and other related traits across environments/years (LDH17 and LDH18 and JBL17 and JBL18). The highest averaged GRYLD over environments/years was observed at JBL18 (9.4 ton/ha), followed by JBL17 (8.7 ton/ha), LDH17 (8.2 ton/ha), and LDH18 (7.9 ton/ha). Similarly, the TGW trait also showed variation across environments. The highest averaged TGW over environments/years was observed at JBL17 (69 g), followed by JBL18 (59.5 g), LDH17 (58.4 g), and LDH18 (53.5 g). We observed significant G × E interaction for the GRYLD and DAYSMT in JBL18 and LDH17 (Tables 1, 2). For all traits, the broad-sense heritability ranged from 0.47 to 0.96. The broad-sense heritability of DTHD was the highest (0.96) in LDH17, while GRYLD, the lowest (0.47) was in JBL18, and the highest (0.74) was in LDH17. TGW had the highest stability and relatively high heritability (0.80–0.86) across environments.

Table 1

Loc#EnvTrait##H2G VarR VarG MeanLSDCVG Sig
JBLJBL17DTHD0.8410.984.0481.963.652.450
DAYSMT0.865.861.89124.822.521.100
GRYLD0.480.290.637.871.0810.090.000151
TGW0.8626.598.9254.665.475.470
JBL18DTHD0.7812.797.3079.264.713.410
DAYSMT0.714.893.96124.673.321.609.08E-13
GRYLD0.470.150.348.760.796.670.000172
TGW0.8012.536.3446.224.455.450
LDHLDH17DTHD0.9612.611.1994.852.111.150
DAYSMT0.744.793.29148.733.091.226.88E-15
GRYLD0.740.210.157.060.665.552.73E-14
TGW0.8115.427.0345.484.735.830
LDH18DTHD0.888.582.44103.712.891.510
DAYSMT0.888.182.25144.522.801.040
GRYLD0.620.160.207.260.696.111.92E-08
TGW0.8314.666.1344.304.475.590

Variability analysis of various yield-related agronomic traits for four environments at two locations.

#

Loc, location; Env, Environment; H2, heritability; G Var, genotypic variance; R Var, residual variance; LSD, least significant difference; CV, critical variance; G Sig, genotypic significance; LDH, Ludhiana; JBL, Jabalpur.

##

DTHD, days to heading; DAYSMT, days to maturity; GRYLD, grain yield; TGW, thousand-grain weight.

Table 2

TraitsH2G VarG × E VarR VarG MeanLSDCVn Repn EnvG SigG × E Sig
DTHD0.908.942.293.7489.942.692.15248.93E-731.16E-18
DAYSMT0.834.001.942.83135.682.321.24244.34E-442.01E-21
GRYLD0.380.050.150.337.740.497.43240.00033.69E-13
TGW0.789.907.417.1047.674.075.59241.13E-334.23E-35

Variability analysis of various yield-related agronomic traits for four environments at two locations.

H2, heritability; G Var, genotypic variance; R Var, residual variance; LSD, least significant difference; CV, critical variance; G Sig, genotypic significance; DTHD, days to heading; DAYSMT, days to maturity; GRYLD, grain yield; TGW, thousand-grain weight.

The phenological traits DTHD and DAYSMT displayed the strongest positive correlation (0.88), followed by a weak positive correlation TGW-GRYLD (0.15), while GRYLD and DTHD (−0.73) demonstrated negative correlations. The lowest correlation was observed between GRYLD and DAYSMT (−0.76) traits. The principal component analysis (PCA) of multivariate analysis enables the easier understanding of effects and networks among different traits and elucidates genotypic difference among a set of given traits, i.e., the first two PCs explained 92% of the total variation. The PC1 explained 70.3% of the total variance and PC2 explained 21.7% of the total variance (Figure 2).

Figure 2

Baseline SE Model: Performance of Untested Lines in the Same Environment

A GS scenario representing SE breeding programs was tested. In this model, the PAs of the GS models for each of the four traits were separately generated for all four tested environments. In other words, the environments were treated as independent. Overall, the PA of the SE model was significantly lower among the three tested GS scenarios (Table 4; Figure 3). PA was the highest for TGW (0.34) and the lowest for GRYLD (0.18) traits. A relatively low moderate PA ranging between 0.24 and 0.25 was observed for DAYSMT and DTHD traits. Among the tested environments, JBL18 had the lowest overall PA (0.01–0.02) compared to the rest of the three environments for DTHD and DAYSMT (0.25–0.40). TGW was the only trait where a highly consistent and moderate PA (0.32–0.35) across all environments was observed. PA for GRYLD was the highest for LDH18 (0.32) and the lowest for JBL17 (0.08).

Figure 3

Advanced ME Model: Performance of Tested Lines in Untested Environments and Untested Lines in Tested Environments

The inclusion of environmental information in ME models significantly improved the PA over SE models across all traits and environments (Figure 3). A very high and consistent PA ranging from 0.69 to 0.85 was observed for all traits and environments for both ME models (ME_CV1 and ME_CV2). The most considerable improvement in PA due to ME was observed for the GRYLD trait, where PA increased from 0.18 to 0.73 for SE and ME models (Table 4). Interestingly, identical trait rankings were also observed for two ME models, where the DTHD ranked the highest (0.85) and GRYLD ranked the lowest (0.69–0.73) among all four traits. While the ME models had identical trait rankings, the environments ranked slightly differently for the two models for all traits. For instance, both years (2017 and 2018) at the LDH location had higher overall PA compared to JBL for all traits.

Discussion

Crop breeders regularly evaluate the performance of genotypes and collect multiple traits data in various environments. The genotype-based selection on phenotypic and GBS marker information using genomic prediction models is gradually acquiring acceptance in breeding with the initiation of economical next-generation sequencing (NGS) technologies (Poland and Rife, 2012). Limited study has been conducted using the multi-environment genomic prediction (ME-GP) methods due to the complexity and higher computing requirements (Oakey et al., 2016; Rincent et al., 2017; Montesinos-López et al., 2018; Roorkiwal et al., 2018; Bhandari et al., 2019; Tolhurst et al., 2019; Pandey et al., 2020).

Trait Correlation and Characterization: A Vital Factor for Improving Accuracy in ME-GP

In this study, advanced breeding lines as part of the bread wheat program of CIMMYT were evaluated under irrigated conditions at two locations (JBL and LDH) for 2 years (2017 and 2018) (i.e., four environments). This study evaluated four traits (i.e., DTHD, DAYSMT, GRYLD, and TGW) for use in an ME trait GP model. GRYLD and related traits were positively correlated to each other in two sets (i.e., 1: DAYSMT and DTHD; and 2: GRYLD and TGW) (Figure 4). This positive correlation of GRYLD with TGW in this study points out that the GRYLD was mainly distinct by the TGW factor. The negative relationship between GRYLD and DTHD indicates that the early-headed genotypes play a vital role in the stability of advanced breeding line yield during grain filling and finally affecting the yield component (Sharma and Smith, 1986).

Figure 4

Yield and Related Trait Heritability Difference Among Environments

Our results showed that the heritability of the traits ranged from moderate (i.e., GRYLD) to high (i.e., DAYSMT, DTHD, and TGW). Among the four traits, the phenological traits (i.e., DTHD and DAYSMT) and TGW particularly showed high stable broad-sense heritability ranging from 0.71 to 0.96. It suggests the high quality of the phenotypic measurements and significant predictive potential of the traits. GRYLD, a highly quantitative and environmentally sensitive trait (Maphosa et al., 2014; Würschum et al., 2018), showed considerable fluctuation across environments with JBL environment having relatively lower heritability (0.47–0.48) compared to LDH (0.62–0.74). The variance explained by agronomic traits was significant (Table 1) and indicating a large G × E impact on GRYLD resulted in a lower heritability compared to other traits. Hence, lower heritability estimates for GRYLD were expected as numerous genes govern it. The low heritability and yield variances also could be the possible effect of the smaller plot size and lower sowing density (Rode et al., 2011; Sallam et al., 2015; Thorwarth et al., 2017; Bhatta et al., 2018) (Tables 1, 2). The climate in these two environments is considerably different. While the growing season length is relatively shorter in JBL due to the high overall temperature, the LDH location has a moderately colder climate and longer growing season (Mondal et al., 2016). On the one hand, these highly variable environments do underscore a highly challenging phenotypic landscape; it also presents a significant opportunity to leverage the ME trial framework for trait improvement (Lillemo et al., 2005; Braun et al., 2010). The presence of significant genetic and environmental correlations (i.e., positive correlation in TGW and GRYLD, and DAYSMT and DTHD) in our experiments led us to hypothesize that the correlated traits and environmental relationships can be leveraged to improve the selection accuracy through marker-based ME-GS models (Figure 4). Therefore, we proceeded with applying the ME model to test this hypothesis on our selected set of lines (Table 3).

Table 3

Genetic correlationsPhenotypic correlations
TraitsDTHDDAYSMTGRYLDTraitsDTHDDAYSMTGRYLD
DAYSMT0.94DAYSMT0.83
GRYLD−0.30−0.29GRYLD−0.22−0.08
TGW−0.33−0.260.22TGW−0.29−0.240.07

Genetic and phenotypic correlations in agronomic important traits.

DTHD, days to heading; DAYSMT, days to maturity; GRYLD, grain yield; TGW, thousand-grain weight.

SE and ME Genomic Prediction Across Years and Sites and ME Model Utilities in Crop Breeding

While weak predictive capability continues to be a major issue in successfully applying GS (Crossa et al., 2013), numerous studies have demonstrated that GS could be beneficial for quantitative traits such as GRYLD with low heritability and also on how GS can be utilized in a breeding program by using even low to moderate GP in early generation selection (Belamkar et al., 2018; Lado et al., 2018; Michel et al., 2018). There are several aspects influencing the PA of GP models. Some of the crucial aspects associated with this study of ME were the genetic relationship between the testing and training sets, the size of the training set, heritability and trait architecture, and correlations among traits and environments (Asoro et al., 2011; Crossa et al., 2013; Heslot et al., 2013; Sallam et al., 2015; Zhang et al., 2015; Duangjit et al., 2016; Lado et al., 2016; Wang et al., 2016; Thorwarth et al., 2017; Akdemir and Isidro-Sánchez, 2019; Olatoye et al., 2020). Even though the size of the population was small in our study, the GP using correlated traits in the ME_CV1 and ME_CV2 schemes had higher PA, indicating that correlated traits up to some extent could balance the impact on the sizes of small population.

Models that leverage E and G × E components have been shown to improve the genomic prediction accuracies for highly quantitative traits such as phenology and GRYLD (Burgueño et al., 2012; Dias et al., 2018). To evaluate the potential of genomic predictions in highly productive but variable environments of JBL and LDH, we simulated three different genomic prediction scenarios representing actual breeding programs. A comparison of single and ME models showed a 2- to 3-fold improvement in model performance for all traits (Table 4; Figure 3). Among the four traits, GRYLD showed the highest (3.8X) absolute increase in PA from SE to ME models, highlighting the significance of ME modeling in GRYLD predictions. For the SE model, TGW had the most consistent PA across four environments (0.32–0.34), which was in agreement with the highly stable heritability and a lower fraction of G × E observed for this trait (Table 2; Figure 3). Interestingly, the PA of the two ME models (CV1 and CV2) showed no significant change, suggesting that the ME model was able to predict well the untested environments and lines equally. A model can be highly predictive of untested environments in scenarios where environments are highly correlated (Malosetti et al., 2016; Jarquín et al., 2017), which seems to be the case for our environments as reflected by the low G × E and high heritability (Table 1; Figure 3). Similarly, a remarkable improvement in the predictive performance of ME_CV1 can be partially attributed to the fact that our sampled set of lines came from the same breeding program and the sample size of 141 lines was relatively moderate. From the perspective of a breeding program, the strong performance of the two ME models suggests that our breeding program can increase the overall population size without losing any significant predictive power through sparse testing at these two environments (Cullis et al., 2020; Jarquin et al., 2020). A high population size from the sparse testing framework here can deliver a high selection gain through increased selection intensity.

Table 4

TraitsAverage prediction accuracy
SE_CV1ME_CV1ME_CV2
DAYSMT0.240.780.78
DTHD0.250.850.85
GRYLD0.180.690.73
TGW0.340.820.83

Genomic prediction accuracies averaged across four environments for four traits and three modeling scenarios (a) single-environment CV1 (SE_CV1), (b) multi-environment CV1 (ME_CV1), and (c) multi-environment CV2 (ME_CV2).

DTHD, days to heading; DAYSMT, days to maturity; GRYLD, grain yield; TGW, thousand-grain weight.

Conclusion

Breeding for quantitative traits is challenging due to the complex genetic architecture of traits that are highly affected by the complex G × E interactions in field trials. A suitable genomic prediction modeling strategy can potentially address this challenge through ME genomic prediction models. In this study, we evaluated genomic prediction accuracies of advanced spring wheat lines under four diverse environments in two wheat-growing regions in India. The ME-GS models showed significant improvement over SE models in terms of prediction accuracies. Our results suggest that ME can be leveraged to improve the breeding selection efficiency for major agronomic and phonological traits. Over the years, CIMMYT has established an extensive network of field-testing sites in South Asian countries including India, Pakistan, Bangladesh, and Nepal. Our results suggest that the wheat breeding programs in these countries can greatly benefit from GS through better modeling of environmental variance and sparse testing of a larger cohort of breeding lines. Future research efforts will be directed toward including high-throughput phenotyping traits such as plant height, Normalized Difference Vegetation Index (NDVI), and senescence into the genomic prediction framework to improve the selection efficiency of spring wheat in the South Asian breeding programs.

Funding

This study was supported by the United States Agency for International Development (USAID), Feed the Future Innovation Lab for Applied Wheat Genomics (Cooperative Agreement No. AID-OAA-A-13-00051), and CGIAR Research Program on Wheat (CRP) Partner Grant to BISA (Grant Code: A5017.09.64).

Publisher's Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Statements

Data availability statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

Author contributions

VT and DS drafted the manuscript. VT, DS, GD, and YC analyzed the data. UK, JP, and RS designed the field trials, conducted genotyping, and provided breeding lines. VT and YG collected field data. UK, BT, JP, RS, and AJ supervised the overall study. All authors contributed to the article and approved the submitted version.

Acknowledgments

We would like to thank the field staff at the field sites of the Borlaug Institute of South Asia at Jabalpur and Ludhiana for their assistance with the data collection.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2021.720123/full#supplementary-material

Supplementary Figure 1

Weather information of LDH17 and LDH18.

Supplementary Figure 2

Weather information of JBL17 and JBL18.

Supplementary Table 1

List of 141 genotypes with pedigree information used in this study.

Supplementary Table 2

List of traits that were evaluated during this study in the field trials.

Supplementary Table 3

GBS HapMap data used in this study.

Supplementary Table 4

Best linear unbiased predictions (BLUPs) data used in this study.

References

  • 1

    AkdemirD.Isidro-SánchezJ. (2019). Design of training populations for selective phenotyping in genomic prediction. Sci. Rep.9, 1–15. 10.1038/s41598-018-38081-6

  • 2

    AlvaradoG.RodríguezF. M.PachecoA.BurgueñoJ.CrossaJ.VargasM.et al. (2020). META-R: a software to analyze data from multi-environment plant breeding trials. Crop J.8, 745–756. 10.1016/j.cj.2020.03.010

  • 3

    ArzaniA.AshrafM. (2017). Cultivated ancient wheats (triticum spp.): a potential source of health-beneficial food products. Compr. Rev. Food Sci. Food Saf.16, 477–488. 10.1111/1541-4337.12262

  • 4

    AsoroF. G.NewellM. A.BeavisW. D.ScottM. P.JanninkJ. (2011). Accuracy and training population design for genomic selection on quantitative traits in elite north american oats. Plant Genom.4:007. 10.3835/plantgenome2011.02.0007

  • 5

    BelamkarV.GuttieriM. J.HussainW.JarquínD.El-basyoniI.PolandJ.et al. (2018). Genomic selection in preliminary yield trials in a winter wheat breeding program. G3 Genes, Genomes, Genet.8, 2735–2747. 10.1534/g3.118.200415

  • 6

    BhandariA.Bartholom,éJ.Cao-HamadounT.-V.KumariN.FrouinJ.KumarA.et al. (2019). Selection of trait-specific markers and multi-environment models improve genomic predictive ability in rice. PLoS ONE14:e0208871. 10.1371/journal.pone.0208871

  • 7

    BhattaM.MorgounovA.BelamkarV.BaenzigerP. S. (2018). Genome-Wide association study reveals novel genomic regions for grain yield and yield-related traits in drought-stressed synthetic hexaploid wheat. Int. J. Mol. Sci.19:3011. 10.3390/ijms19103011

  • 8

    BradburyP. J.ZhangZ.KroonD. E.CasstevensT. M.RamdossY.BucklerE. S. (2007). TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics23, 2633–2635. 10.1093/bioinformatics/btm308

  • 9

    BraunH. J.AtlinG.PayneT. (2010). Multi-location testing as a tool to identify plant response to global climate change, in Climate Change and Crop Production, ReynoldsM. P.. (CABI International), 115–138.

  • 10

    BurgueñoJ.CamposG.de los WeigelK.CrossaJ. (2012). Genomic prediction of breeding values when modeling genotype × environment interaction using pedigree and dense molecular markers. Crop Sci.52, 707–719. 10.2135/cropsci2011.06.0299

  • 11

    CharmetG.StorlieE.OuryF. X.LaurentV.BeghinD.ChevarinL.et al. (2014). Genome-wide prediction of three important traits in bread wheat. Mol. Breed.34, 1843–1852. 10.1007/s11032-014-0143-y

  • 12

    CharmetG.TranL.-G.AuzanneauJ.RincentR.BouchetS. (2020). BWGS: A R package for genomic selection and its application to a wheat breeding programme. PLoS ONE15:e0222733. 10.1371/journal.pone.0222733

  • 13

    Covarrubias-PazaranG. (2016). Genome-Assisted prediction of quantitative traits using the R package sommer. PLoS ONE11:e0156744. 10.1371/journal.pone.0156744

  • 14

    CrossaJ.CamposG.de los PérezP.GianolaD.BurgueñoJ.ArausJ. L.et al. (2010). Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics186, 713–724. 10.1534/genetics.110.118521

  • 15

    CrossaJ.JarquínD.FrancoJ.Pérez-RodríguezP.BurgueñoJ.Saint-PierreC.et al. (2016). Genomic prediction of gene bank wheat landraces. G3 Genes|Genomes|Genetics6:1819. 10.1534/g3.116.029637

  • 16

    CrossaJ.PérezP.CamposG.de los MahukuG.DreisigackerS.MagorokoshoC. (2011). Genomic selection and prediction in plant breeding. J. Crop Improv.25, 239–261. 10.1080/15427528.2011.558767

  • 17

    CrossaJ.PérezP.HickeyJ.BurgueñoJ.OrnellaL.Cerón-RojasJ.et al. (2013). Genomic prediction in CIMMYT maize and wheat breeding programs. Hered112, 48–60. 10.1038/hdy.2013.16

  • 18

    CrossaJ.Pérez-RodríguezP.CuevasJ.Montesinos-LópezO.JarquínD.de los CamposG.et al. (2017). Genomic selection in plant breeding: methods, models, and perspectives. Trends Plant Sci.22, 961–975. 10.1016/j.tplants.2017.08.011

  • 19

    CullisB. R.SmithA. B.CocksN. A.ButlerD. G. (2020). The design of early-stage plant breeding trials using genetic relatedness. J. Agric. Biol. Environ. Stat.25, 553–578. 10.1007/s13253-020-00403-5

  • 20

    CurtisT.HalfordN. G. (2014). Food security: the challenge of increasing wheat yield and the importance of not compromising food safety. Ann. Appl. Biol.164, 354–372. 10.1111/aab.12108

  • 21

    De los CamposG.NayaH.GianolaD.CrossaJ.LegarraA.ManfrediE.et al. (2009). Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics182, 375–385. 10.1534/genetics.109.101501

  • 22

    DiasK. O. D. G.GezanS. A.GuimarãesC. T.NazarianA.da Costa e SilvaL.ParentoniS. N.et al. (2018). Improving accuracies of genomic predictions for drought tolerance in maize by joint modeling of additive and dominance effects in multi-environment trials. Hered121, 24–37. 10.1038/s41437-018-0053-6

  • 23

    DreisigackerS.DeepmalaS.JaimezR. A.Luna-GarridB.Muñoz-ZavalaS.Núñez-RíosC.et al. (2016). CIMMYT Wheat Molecular Genetics: Laboratory Protocols and Applications to Wheat Breeding.Mexico, DF: CIMMYT.

  • 24

    DuangjitJ.CausseM.SauvageC. (2016). Efficiency of genomic selection for tomato fruit quality. Mol. Breed.36, 1–16. 10.1007/s11032-016-0453-3

  • 25

    González-CamachoJ. M.OrnellaL.Pérez-RodríguezP.GianolaD.DreisigackerS.CrossaJ. (2018). Applications of machine learning methods to genomic selection in breeding wheat for rust resistance. Plant Genome11:170104. 10.3835/plantgenome2017.11.0104

  • 26

    HayesB. J.PanozzoJ.WalkerC. K.ChoyA. L.KantS.WongD.et al. (2017). Accelerating wheat breeding for end-use quality with multi-trait genomic predictions incorporating near infrared and nuclear magnetic resonance-derived phenotypes. Theor. Appl. Genet.130, 2505–2519. 10.1007/s00122-017-2972-7

  • 27

    HeffnerE. L.JanninkJ.-L.IwataH.SouzaE.SorrellsM. E. (2011). Genomic selection accuracy for grain quality traits in biparental wheat populations. Crop Sci.51, 2597–2606. 10.2135/cropsci2011.05.0253

  • 28

    HellinJ.ShiferawB.CairnsJ. E.ReynoldsM.Ortiz-MonasterioI.BanzigerM.et al. (2012). Climate change and food security in the developing world: Potential of maize and wheat research to expand options for adaptation and mitigation. J. Dev. Agric. Econ.4, 311–321. 10.5897/JDAE11.112

  • 29

    HeslotN.AkdemirD.SorrellsM. E.JanninkJ. L. (2013). Integrating environmental covariates and crop modeling into the genomic selection framework to predict genotype by environment interactions. Theor. Appl. Genet.127, 463–480. 10.1007/s00122-013-2231-5

  • 30

    HeslotN.YangH.-P.SorrellsM. E.JanninkJ.-L. (2012). Genomic selection in plant breeding: a comparison of models. Crop Sci.52, 146–160. 10.2135/cropsci2011.06.0297

  • 31

    JanninkJ.-L.LorenzA. J.IwataH. (2010). Genomic selection in plant breeding: from theory to practice. Brief. Funct. Genomics.9, 166–177. 10.1093/bfgp/elq001

  • 32

    JarquinD.HowardR.CrossaJ.BeyeneY.GowdaM.MartiniJ. W. R.et al. (2020). Genomic prediction enhanced sparse testing for multi-environment trials. G3 Genes| Genomes|Genet.10, 2725–2739. 10.1534/g3.120.401349

  • 33

    JarquínD.SilvaC. L.da GaynorR. C.PolandJ.FritzA.HowardR.et al. (2017). Increasing genomic-enabled prediction accuracy by modeling genotype × environment interactions in kansas wheat. Plant Genome10:0130. 10.3835/plantgenome2016.12.0130

  • 34

    JulianaP.Montesinos-LópezO. A.CrossaJ.MondalS.González PérezL.PolandJ.et al. (2019). Integrating genomic-enabled prediction and high-throughput phenotyping in breeding for climate-resilient bread wheat. Theor. Appl. Genet.132, 177–194. 10.1007/s00122-018-3206-3

  • 35

    JulianaP.SinghR. P.SinghP. K.CrossaJ.Huerta-EspinoJ.LanC.et al. (2017a). Genomic and pedigree-based prediction for leaf, stem, and stripe rust resistance in wheat. Theor. Appl. Genet.130, 1415–1430. 10.1007/s00122-017-2897-1

  • 36

    JulianaP.SinghR. P.SinghP. K.CrossaJ.RutkoskiJ. E.PolandJ. A.et al. (2017b). Comparison of models and whole-genome profiling approaches for genomic-enabled prediction of Septoria Tritici Blotch, Stagonospora Nodorum Blotch, and Tan Spot resistance in wheat. Plant Genome10:0082. 10.3835/plantgenome2016.08.0082

  • 37

    KassambaraA.MundtF. (2020). Factoextra: Extract and Visualize the Results of Multivariate Data Analyses. Available online at: https://cran.r-project.org/packagefactoextra (accessed May 05, 2020).

  • 38

    KohaviR. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of IJCAI'952, 1137–1143.

  • 39

    LadoB.BarriosP. G.QuinckeM.SilvaP.GutiérrezL. (2016). Modeling genotype × environment interaction for genomic selection with unbalanced data from a wheat breeding program. Crop Sci.56, 2165–2179. 10.2135/cropsci2015.04.0207

  • 40

    LadoB.VázquezD.QuinckeM.SilvaP.AguilarI.GutiérrezL. (2018). Resource allocation optimization with multi-trait genomic prediction for bread wheat (Triticum aestivum L.) baking quality. Theor. Appl. Genet.131, 2719–2731. 10.1007/s,00122-018-3186-3

  • 41

    LêS.JosseJ.HussonF. (2008). FactoMineR: an R package for multivariate analysis. J. Stat. Softw.25, 1–18. 10.18637/jss.v025.i01

  • 42

    LillemoM.GinkelM.van, TrethowanR. M.HernandezE.CrossaJ. (2005). Differential adaptation of CIMMYT bread wheat to global high temperature environments. Crop Sci.45, 2443–2453. 10.2135/cropsci2004.0663

  • 43

    LozadaD. N.MasonR. E.SarinelliJ. M.Brown-GuediraG. (2019). Accuracy of genomic selection for grain yield and agronomic traits in soft red winter wheat. BMC Genet.20, 1–12. 10.1186/s12863-019-0785-1

  • 44

    MalosettiM.Bustos-KortsD.BoerM. P.EeuwijkF. A.van (2016). Predicting responses in multiple environments: issues in relation to genotype × environment interactions. Crop Sci.56, 2210–2222. 10.2135/cropsci2015.05.0311

  • 45

    MaphosaL.LangridgeP.TaylorH.ParentB.EmebiriL. C.KuchelH.et al. (2014). Genetic control of grain yield and grain physical characteristics in a bread wheat population grown under a range of environmental conditions. Theor. Appl. Genet. 7127, 1607–1624. 10.1007/s00122-014-2322-y

  • 46

    MeuwissenT. H. E.HayesB. J.GoddardM. E. (2001). Prediction of total genetic value using genome-wide dense marker maps. Genetics.157. 41819–1829. 10.1093/genetics/157.4.1819

  • 47

    MichelS.KummerC.GalleeM.HellingerJ.AmetzC.AkgölB.et al. (2018). Improving the baking quality of bread wheat by genomic selection in early generations. Theor. Appl. Genet.131, 477–493. 10.1007/s00122-017-2998-x

  • 48

    MondalS.SinghR. P.MasonE. R.Huerta-EspinoJ.AutriqueE.JoshiA. K. (2016). Grain yield, adaptation and progress in breeding for early-maturing and heat-tolerant wheat lines in South Asia. F. Crop. Res.192, 78–85. 10.1016/j.fcr.2016.04.017

  • 49

    Montesinos-LópezA.Montesinos-LópezO. A.GianolaD.CrossaJ.Hernández-SuárezC. M. (2018). Multi-environment genomic prediction of plant traits using deep learners with dense architecture. G3 Genes|Genomes|Genet.8, 3813–3828. 10.1534/g3.118.200740

  • 50

    NormanA.TaylorJ.EdwardsJ.KuchelH. (2018). Optimising genomic selection in wheat: effect of marker density, population size and population structure on prediction accuracy. G3 Genes|Genomes|Genetics8, 2889–2899. 10.1534/g3.118.200311

  • 51

    NormanA.TaylorJ.TanakaE.TelferP.EdwardsJ.MartinantJ.-P.et al. (2017). Increased genomic prediction accuracy in wheat breeding using a large Australian panel. Theor. Appl. Genet.130, 2543–2555. 10.1007/s00122-017-2975-4

  • 52

    OakeyH.CullisB.ThompsonR.ComadranJ.HalpinC.WaughR. (2016). Genomic selection in multi-environment crop trials. G3 Genes|Genomes|Genet.6, 1313–1326. 10.1534/g3.116.027524

  • 53

    OlatoyeM. O.ClarkL. V.LabonteN. R.DongH.DwiyantiM. S.AnzouaK. G.et al. (2020). Training population optimization for genomic selection in miscanthus. G3 Genes|Genomes|Genet.10, 2465–2476. 10.1534/g3.120.401402

  • 54

    PandeyM. K.ChaudhariS.JarquinD.JanilaP.CrossaJ.PatilS. C.et al. (2020). Genome-based trait prediction in multi- environment breeding trials in groundnut. Theor. Appl. Genet.133, 3101–3117. 10.1007/s00122-020-03658-1

  • 55

    Pérez-RodríguezP.GianolaD.González-CamachoJ. M.CrossaJ.ManèsY.DreisigackerS. (2012). Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat. G3 Genes|Genomes|Genetics2, 1595–1605. 10.1534/g3.112.003665

  • 56

    PolandJ. A.BrownP. J.SorrellsM. E.JanninkJ.-L. (2012). Development of high-density genetic maps for barley and wheat using a novel two-enzyme Genotyping-by-Sequencing approach. PLoS ONE7:e32253. 10.1371/journal.pone.0032253

  • 57

    PolandJ. A.RifeT. W. (2012). Genotyping-by-Sequencing for plant breeding and genetics. Plant Genome5:005. 10.3835/plantgenome,2012.05.0005

  • 58

    R Core Team (2019). R: A Language and Environment for Statistical Computing. R Found. Stat. Comput. Avaialble online at: https://www.R-project.org/

  • 59

    RincentR.KuhnE.MonodH.OuryF.-X.RoussetM.AllardV.et al. (2017). Optimization of multi-environment trials for genomic selection based on crop models. Theor. Appl. Genet.130, 1735–1752. 10.1007/s00122-017-2922-4

  • 60

    RodeJ.AhlemeyerJ.FriedtW.OrdonF. (2011). Identification of marker-trait associations in the German winter barley breeding gene pool (Hordeum vulgare L.). Mol. Breed.30, 831–843. 10.1007/s11032-011-9667-6

  • 61

    RoorkiwalM.JarquinD.SinghM. K.GaurP. M.BharadwajC.RathoreA.et al. (2018). Genomic-enabled prediction models using multi-environment trials to estimate the effect of genotype × environment interaction on prediction accuracy in chickpea. Sci. Rep.8, 1–11. 10.1038/s41598-018-30027-2

  • 62

    RutkoskiJ.SinghR. P.Huerta-EspinoJ.BhavaniS.PolandJ.JanninkJ. L.et al. (2015). Genetic gain from phenotypic and genomic selection for quantitative resistance to stem rust of wheat. Plant Genome8:74. 10.3835/plantgenome2014.10.0074

  • 63

    SallamA. H.EndelmanJ. B.JanninkJ.-L.SmithK. P. (2015). Assessing genomic selection prediction accuracy in a dynamic barley breeding population. Plant Genome8:20. 10.3835/plantgenome2014.05.0020

  • 64

    SehgalD.AutriqueE.SinghR.EllisM.SinghS.DreisigackerS. (2017). Identification of genomic regions for grain yield and yield stability and their epistatic interactions. Sci. Rep.7, 1–12. 10.1038/srep41578

  • 65

    SharmaR. C.SmithE. L. (1986). Selection for high and low harvest index in three winter wheat populations1. Crop Sci.26, 1147–1150. 10.2135/cropsci1986.0011183X002600060013x

  • 66

    ThorwarthP.AhlemeyerJ.BochardA.-M.KrumnackerK.BlümelH.LaubachE.et al. (2017). Genomic prediction ability for yield-related traits in German winter barley elite material. Theor. Appl. Genet.130, 1669–1683. 10.1007/s00122-017-2917-1

  • 67

    TolhurstD. J.MathewsK. L.SmithA. B.CullisB. R. (2019). Genomic selection in multi-environment plant breeding trials using a factor analytic linear mixed model. J. Anim. Breed. Genet.136, 279–300. 10.1111/jbg.12404

  • 68

    TomarV.DhillonG. S.SinghD.SinghR. P.PolandJ.ChaudharyA. A.et al. (2021). Evaluations of genomic prediction and identification of new loci for resistance to stripe rust disease in wheat (Triticum aestivum L.)Front. Genet. 12:710485 (in press). 10.3389/fgene.2021.710485

  • 69

    VeluG.CrossaJ.SinghR. P.HaoY.DreisigackerS.Perez-RodriguezP.et al. (2016). Genomic prediction for grain zinc and iron concentrations in spring wheat. Theor. Appl. Genet.129, 1595–1605. 10.1007/s00122-016-2726-y

  • 70

    WangX.LiL.YangZ.ZhengX.YuS.XuC.et al. (2016). Predicting rice hybrid performance using univariate and multivariate GBLUP models based on North Carolina mating design II. Hered118, 302–310. 10.1038/hdy.2016.87

  • 71

    WürschumT.LeiserW. L.LangerS. M.TuckerM. R.LonginC. F. H. (2018). Phenotypic and genetic analysis of spike and kernel characteristics in wheat reveals long-term genetic trends of grain yield components. Theor. Appl. Genet.131, 2071–2084. 10.1007/s00122-018-3133-3

  • 72

    ZhangJ.SongQ.CreganP. B.JiangG.-L. (2015). Genome-wide association study, genomic prediction and marker-assisted selection for seed weight in soybean (Glycine max). Theor. Appl. Genet.129, 117–130. 10.1007/s00122-015-2614-x

Summary

Keywords

single-environment, multi-environments, genotyping by sequencing, genomic selection (GS), genomics predictions, best linear unbiased predictions, wheat

Citation

Tomar V, Singh D, Dhillon GS, Chung YS, Poland J, Singh RP, Joshi AK, Gautam Y, Tiwari BS and Kumar U (2021) Increased Predictive Accuracy of Multi-Environment Genomic Prediction Model for Yield and Related Traits in Spring Wheat (Triticum aestivum L.). Front. Plant Sci. 12:720123. doi: 10.3389/fpls.2021.720123

Received

03 June 2021

Accepted

03 September 2021

Published

08 October 2021

Volume

12 - 2021

Edited by

Valentin Wimmer, KWS Saat, Germany

Reviewed by

Pedro José Martínez-García, Spanish National Research Council, Spain; Mian Abdur Rehman Arif, Nuclear Institute for Agriculture and Biology, Pakistan

Updates

Copyright

*Correspondence: Vipin Tomar Uttam Kumar

†Present address: Daljit Singh, The Climate Corporation, Bayer Crop Science, Creve Coeur, MO, United States

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics