Increased Predictive Accuracy of Multi-Environment Genomic Prediction Model for Yield and Related Traits in Spring Wheat (Triticum aestivum L.)

Genomic selection (GS) has the potential to improve the selection gain for complex traits in crop breeding programs from resource-poor countries. The GS model performance in multi-environment (ME) trials was assessed for 141 advanced breeding lines under four field environments via cross-predictions. We compared prediction accuracy (PA) of two GS models with or without accounting for the environmental variation on four quantitative traits of significant importance, i.e., grain yield (GRYLD), thousand-grain weight, days to heading, and days to maturity, under North and Central Indian conditions. For each trait, we generated PA using the following two different ME cross-validation (CV) schemes representing actual breeding scenarios: (1) predicting untested lines in tested environments through the ME model (ME_CV1) and (2) predicting tested lines in untested environments through the ME model (ME_CV2). The ME predictions were compared with the baseline single-environment (SE) GS model (SE_CV1) representing a breeding scenario, where relationships and interactions are not leveraged across environments. Our results suggested that the ME models provide a clear advantage over SE models in terms of robust trait predictions. Both ME models provided 2–3 times higher prediction accuracies for all four traits across the four tested environments, highlighting the importance of accounting environmental variance in GS models. While the improvement in PA from SE to ME models was significant, the CV1 and CV2 schemes did not show any clear differences within ME, indicating the ME model was able to predict the untested environments and lines equally well. Overall, our results provide an important insight into the impact of environmental variation on GS in smaller breeding programs where these programs can potentially increase the rate of genetic gain by leveraging the ME wheat breeding trials.

Genomic selection (GS) has the potential to improve the selection gain for complex traits in crop breeding programs from resource-poor countries. The GS model performance in multi-environment (ME) trials was assessed for 141 advanced breeding lines under four field environments via cross-predictions. We compared prediction accuracy (PA) of two GS models with or without accounting for the environmental variation on four quantitative traits of significant importance, i.e., grain yield (GRYLD), thousand-grain weight, days to heading, and days to maturity, under North and Central Indian conditions. For each trait, we generated PA using the following two different ME cross-validation (CV) schemes representing actual breeding scenarios: (1) predicting untested lines in tested environments through the ME model (ME_CV1) and (2) predicting tested lines in untested environments through the ME model (ME_CV2). The ME predictions were compared with the baseline single-environment (SE) GS model (SE_CV1) representing a breeding scenario, where relationships and interactions are not leveraged across environments. Our results suggested that the ME models provide a clear advantage over SE models in terms of robust trait predictions. Both ME models provided 2-3 times higher prediction accuracies for all four traits across the four tested environments, highlighting the importance of accounting environmental variance in GS models. While the improvement in PA from SE to ME models was significant, the CV1 and CV2 schemes did not show any clear differences within ME, indicating the ME model was able to predict the untested environments and lines equally well. Overall, our results provide an important insight into the impact of environmental variation on GS in smaller breeding programs where these programs can potentially increase the rate of genetic gain by leveraging the ME wheat breeding trials.

INTRODUCTION
Wheat (Triticum aestivum L.) is an essential cereal to secure global food security (Curtis and Halford, 2014). Significant efforts are needed to accelerate high-yielding varieties to fulfill future global wheat demand by 2050 (Hellin et al., 2012). Hence, the enhancement of grain yield (GRYLD) is a foremost target for wheat breeders. GRYLD is a complex trait administered by many small-effect loci with significant loci × loci interactions (Arzani and Ashraf, 2017;Sehgal et al., 2017). Moreover, the GRYLD trait is associated with strong genotype × environment interaction, which makes its genetic enhancement a difficult work.
Genomic selection (GS) integrates genome-wide dense markers and, as presented by Meuwissen et al. (2001), is a different marker-assisted selection approach. GS proves to be a powerful tool to improve the selection accuracy and prediction for quantitative traits in crop breeding (Crossa et al., 2017). GS utilizes a large set of, usually unidentified markers, spread over the whole genome in the same way as every quantitative trait locus (QTL) is in linkage disequilibrium (LD). GS is particularly beneficial for traits that cannot be evaluated on a few plants and for traits that are hard to estimate. It is still a vital issue for plant breeders to upsurge the accuracy of genomic prediction for selecting the advanced breeding lines.
The GS has been widely used in wheat breeding to predict various traits, such as yield, disease resistance, grain weight, heading, iron and zinc contents, end-use quality, and physiological traits (Charmet et al., 2014;Velu et al., 2016;Hayes et al., 2017;Juliana et al., 2017a,b;Norman et al., 2017;Lozada et al., 2019;Tomar et al., 2021). As such, GS embraces the prospects for the genomic enhancement of qualitative and quantitative traits. Many available GS models have been tested on various breeding and trait scenarios. Earlier numerous comparative studies of the GS model predictions in wheat showed that Random Forest and Reproducing Kernel Hilbert Space models performed better for traits of interest. However, any single GS model could not outperform other models (Pérez-Rodríguez et al., 2012;Charmet et al., 2014). Earlier studies have stated that many interconnected factors impact the overall model performance (Jannink et al., 2010;Heslot et al., 2012), such as heritability, population structure, statistical models, i.e., parametric and nonparametric models, cross-validation (CV) approaches, the genetics of traits, training population size and composition, marker density, and LD among markers and QTLs (Jannink et al., 2010;Pérez-Rodríguez et al., 2012;Crossa et al., 2017;Norman et al., 2018;Lozada et al., 2019).
The GS delivers the promise to accelerate genetic gain by increasing precision in selecting and shortening the breeding cycles. However, GS is a relatively new and advanced method for smaller and low-resource South Asian wheat breeding programs. Previously, substantial progress has been made in testing and validating various models for GRYLD and related traits in wheat in South Asia, albeit on larger breeding populations (De los Campos et al., 2009;Crossa et al., 2010Crossa et al., , 2011Crossa et al., , 2016Heffner et al., 2011;Burgueño et al., 2012;Pérez-Rodríguez et al., 2012;Rutkoski et al., 2015;Juliana et al., 2017aJuliana et al., ,b, 2019González-Camacho et al., 2018). These studies have highlighted the impact of environment and genotype × environment on the GS model performance. Therefore, to optimize the genetic gain from GS, the group of field-testing environments can be leveraged.
In this study, high-yielding, advanced wheat breeding lines from The International Maize and Wheat Improvement Center (CIMMYT) were evaluated for two consecutive wheat seasons (2017 and 2018) to adapt to the diverse environments of North and Central India. To evaluate the performance of multienvironment (ME) GS models on our specific set of selection environments, we tested different GS CV schemes mimicking the breeding schemes where untested lines and environmental performance are highly valuable to achieve the desired selection gains. This study is highly relevant particularly in the South Asian context where trial sizes are relatively small and broadly adapted wheat lines are sought after.

Plant Material
A set of 141 South Asian spring wheat lines (T. aestivum L.) were selected from the International Yield Trial of CIMMYT International Nurseries (elite germplasm). These lines constitute a diverse association panel. The seeds of 141 genotypes were obtained from the Germplasm Resource Unit, CIMMYT (Mexico). Detailed information with a pedigree for each genotype is given in Supplementary Table 1.

Field Trials and Phenotypic Evaluation
The panel of selected lines was evaluated in field trials at the Borlaug Institute for South Asia (India) at Jabalpur  Table 2). The experiment was conducted in two replications following the Randomized Block Design (RBD). The normal agronomic practice was followed for trial management. The row-to-row distance was maintained at 20 cm.

Genotyping-by-Sequencing and SNP Filtering
Genomic DNA was extracted from the fresh leaves of seedling wheat using the modified cetyltrimethylammonium bromide (CTAB) method . Genotyping-bysequencing (GBS) was performed in Illumina HiSeq 2500 using a protocol suggested by . Single nucleotide polymorphism (SNP) calling was performed using TASSEL version 5.2.43 (Bradbury et al., 2007) using the TASSEL-GBSv2 pipeline. Using Beagle version 4.1, the missing data were imputed with default settings. After quality control (filter criteria: sample call rate > 0.8, Minor allele frequency (MAF) ≥ 0.05, SNP call rate > 0.7), 14,563 polymorphic SNPs and 141 genotypes were selected for the follow-up analysis (Supplementary Table 3). Among polymorphic SNP markers, 40.66, 50.66, and 8.68% were from the A, B, and D genomes, respectively. With a genomic coverage of 13.9 GB and 14,563 markers across the genome, the average marker density was one marker per 0.95 Mb. The highest marker density with one marker per 0.54 Mb of chromosome 2B and the lowest marker density with one marker per 6.854 Mb at chromosome 4D were observed.

Statistical Analysis of Phenotypes
Each location-year combination is treated as a distinct environment for analysis purposes. Broad-sense heritability for each trait/environment combination was estimated at the plot level, and raw phenotypic values were adjusted to derive the best linear unbiased predictions (BLUPs) (Supplementary Table 4) for each trait at each environment using META-R (Alvarado et al., 2020) by using the following formula: where Y ik is the trait of interest, µ is the mean effect, Rep i is the effect of the ith replicate, Gen k is the effect of the kth genotype, ǫ ik is the error associated with the ith replication and the kth genotype, which is assumed to be normally and independently distributed, with mean 0 and homoscedastic variance. For across environments, Y ijk is the trait response and the ith environment, Rep j (Env i ) is the effect of jth Rep in the ith environment, and Env i × Gen k is the environment × genotype interaction. The resulting analysis produced the adjusted trait phenotypic values in the form of BLUPs within and across environments. The BLUPs model considers genotypes as random effects, minimizing the effect of screening time and other environmental effects.
In addition, the components of the phenotypic variance of a given trait at an individual environment and across environments were also extracted to calculate the broad-sense heritability using the formula as follows: e nReps (within environments) where σ 2 g and σ 2 e are the genotype and error variance components, respectively, σ 2 ge is genotype × environment interaction variance, nEnvs is the number of environments, and nReps is the number of replicates. All effects are considered random for calculating the BLUPs (Supplementary Table 4) and the broad-sense heritability. The BLUPs phenotypic distributions of GRYLD and other traits at each environment were plotted to check normality assumptions. Phenotypic and genetic correlations were calculated for each trait and environment combination in R software version 4.0.2. (R Core Team, 2019) using FactoMineR version 2.4 (Lê et al., 2008) and factoextra version 1.0.7 (Kassambara and Mundt, 2020).

Baseline Single-Environment (SE) Genomic BLUP Model (GBLUP), CV Schemes, and Predictive Ability
The baseline SE genomic prediction analysis was implemented in the BWGS program (Charmet et al., 2020). BWGS performs a GBLUP analysis using a marker-based relationship matrix. CV delivers an unbiased evaluation for the performance of a GS model; therefore, a 5-fold CV approach was implemented for reducing the unwanted bias (Kohavi, 1995), where the genotypes (for each trait separately) were randomly split into five equalsized folds. SE_CV1 model was fitted with missing phenotypic values for the tested individuals. Prediction accuracy (PA) was subsequently calculated as the correlation of predicted breeding values with the observed phenotypic values for the missing genotypes. This step was repeated for each environment and fold separately. The genomic PA was then calculated by iteratively assigning 1-fold as the validation set and the remaining folds as the training set. This five-fold validation process was repeated 50 times to randomly shuffle the lines in each fold. The accuracy of the genomic predictions was measured as the Pearson's correlation between the predicted and actual trait BLUPs.
A mixed model of the simplified form was fitted for genomic predictions as follows: where y is a vector of adjusted phenotypes, X is a design matrix relating the fixed effects to each genotype, b is a vector of fixed effects, Z is a design matrix connecting records to genetic values, g is a vector of additive genetic effects for a genotype, and e is a vector of random normal deviates with variance δ 2 e .

Advanced ME GBLUP Model, CV Schemes, and Predictive Ability
The advanced ME genomic prediction analysis was implemented in Solving Mixed Model Equations in the R (sommer) package (Covarrubias-Pazaran, 2016). Two types of ME_CV schemes representing actual breeding scenarios were implemented. The first scenario represents a use case where some genotypes are missing across all environments (ME_CV1). ME_CV1 was fitted by masking the phenotypic values of genotypes belonging to the test set across all environments. PA was calculated as the correlation of predicted and observed phenotypic values for the missing genotypes at each environment separately. In the second scenario, the entire trial or all genotypes are missing at one of the environments (ME_CV2). ME_CV2 was fitted by masking the phenotypic values of all lines in an SE. The trained model was then used to predict the breeding values of lines in the missing environment. PA was calculated as the correlation of predicted and observed phenotypic values of the tested lines. The CV schemes are illustrated in Figure 1.
In ME genomic predictions, the SE model was rewritten and implemented as follows: where y ij represents response of jth line in the ith environment (i = 1, 2,. . . . . . i, j = 1, 2,. . . . . . j; g j is the effect of jth line with FIGURE 1 | Prediction scheme for the single-environment (SE) and multi-environment (ME) genomic prediction models with two cross-validation schemes (CV1 and CV2) used in this study. SE_CV1 model: the SE prediction model with CV scheme 1 where a trait [e.g., grain yield (GRYLD)] is predicted at a time; we used 80% of individuals as the training set (phenotyped and genotyped, light green) and 20% of the individuals as the testing set (genotyped only, light gray with validation code for the trait to be predicted, yield as an example here). ME_CV1 model: the ME prediction model with CV scheme 1 for new un-phenotyped individuals; we used 80% of individuals as the training set (phenotyped for all traits and genotyped; light green) and 20% of the individuals as the validation set (genotyped but not phenotyped for any trait; light gray with validation code for the trait to be predicted, GRYLD as an example here). ME_CV2 model: the ME prediction model with CV scheme 2 where 100% of the information from other traits are available for the individuals to be predicted; we used 80% of individuals as the training set (phenotyped for all traits and genotyped; light green) and 20% of individuals as the validation set (phenotyped for associated traits but not for the targeted traits, and genotyped; light gray with predication code for the trait to be predicted, yield as an example here). Rectangles represent genotypes, and colors represent whether the phenotypic information was used (light green) or not (light gray with validation code for the trait to be predicted, GRYLD as an example) for the population. A similar scheme was applied for predicting days to heading (DTHD), days to maturity (DAYSMT), and thousand-grain weight (TGW). g = (g 1........ g j )T∼N(0, δ 2 1 Gg), δ 2 1 is the genomic variance, Gg is the genomic relationship matrix. E i represents the effect of the ith environment. gE ij is the random term that takes into account the interaction between the genomic effect of jth line and the ith environment with gE= (g 1 . . . . . . . . . g j )T∼N (0, δ 2 2 I I ⊗ G), where δ 2 2 is the interaction variance. Eij is a random residual effect of the jth line in the ith environment [N (0, δ 2 2 )], where δ 2 2 is the residual variance.

Heritability, Correlations, and Trait Characterization
A range of variation was detected for GRYLD and other related traits across environments/years (LDH17 and LDH18 and JBL17 and JBL18). The highest averaged GRYLD over environments/years was observed at JBL18 (9.4 ton/ha),  followed by JBL17 (8.7 ton/ha), LDH17 (8.2 ton/ha), and LDH18 (7.9 ton/ha). Similarly, the TGW trait also showed variation across environments. The highest averaged TGW over environments/years was observed at JBL17 (69 g), followed by JBL18 (59.5 g), LDH17 (58.4 g), and LDH18 (53.5 g). We observed significant G × E interaction for the GRYLD and DAYSMT in JBL18 and LDH17 (Tables 1, 2). For all traits, the broad-sense heritability ranged from 0.47 to 0.96. The broadsense heritability of DTHD was the highest (0.96) in LDH17, while GRYLD, the lowest (0.47) was in JBL18, and the highest (0.74) was in LDH17. TGW had the highest stability and relatively high heritability (0.80-0.86) across environments. The phenological traits DTHD and DAYSMT displayed the strongest positive correlation (0.88), followed by a weak positive correlation TGW-GRYLD (0.15), while GRYLD and DTHD (−0.73) demonstrated negative correlations. The lowest correlation was observed between GRYLD and DAYSMT (−0.76) traits. The principal component analysis (PCA) of multivariate analysis enables the easier understanding of effects and networks among different traits and elucidates genotypic difference among a set of given traits, i.e., the first two PCs explained 92% of the total variation. The PC1 explained 70.3% of the total variance and PC2 explained 21.7% of the total variance (Figure 2).

Baseline SE Model: Performance of Untested Lines in the Same Environment
A GS scenario representing SE breeding programs was tested. In this model, the PAs of the GS models for each of the four traits were separately generated for all four tested environments. In other words, the environments were treated as independent. Overall, the PA of the SE model was significantly lower among the three tested GS scenarios (Table 4; Figure 3). PA was the highest for TGW (0.34) and the lowest for GRYLD (0.18) traits. A relatively low moderate PA ranging between 0.24 and 0.25 was observed for DAYSMT and DTHD traits. Among the tested environments, JBL18 had the lowest overall PA (0.01-0.02) compared to the rest of the three environments for DTHD and DAYSMT (0.25-0.40). TGW was the only trait where a highly consistent and moderate PA (0.32-0.35) across all environments was observed. PA for GRYLD was the highest for LDH18 (0.32) and the lowest for JBL17 (0.08).

Advanced ME Model: Performance of Tested Lines in Untested Environments and Untested Lines in Tested Environments
The inclusion of environmental information in ME models significantly improved the PA over SE models across all traits and environments (Figure 3). A very high and consistent PA ranging from 0.69 to 0.85 was observed for all traits and environments for both ME models (ME_CV1 and ME_CV2). The most considerable improvement in PA due to ME was observed for the GRYLD trait, where PA increased from 0.18 to 0.73 for SE and ME models (Table 4). Interestingly, identical trait rankings were also observed for two ME models, where the DTHD ranked the highest (0.85) and GRYLD ranked the lowest (0.69-0.73) among all four traits. While the ME models had identical trait rankings, the environments ranked slightly differently for the two models for all traits. For instance, both years (2017 and 2018) at the LDH location had higher overall PA compared to JBL for all traits.

DISCUSSION
Crop breeders regularly evaluate the performance of genotypes and collect multiple traits data in various environments. The genotype-based selection on phenotypic and GBS marker information using genomic prediction models is gradually acquiring acceptance in breeding with the initiation of economical next-generation sequencing (NGS) technologies (Poland and Rife, 2012). Limited study has been conducted using the multi-environment genomic prediction (ME-GP) methods due to the complexity and higher computing requirements (Oakey et al., 2016;Rincent et al., 2017;Montesinos-López et al., 2018;Roorkiwal et al., 2018;Bhandari et al., 2019;Tolhurst et al., 2019;Pandey et al., 2020). FIGURE 3 | Bar plots showing the prediction accuracy (PA) of DAYSMT, DTHD, GRYLD, and TGW using SE and ME models from individual experiments across locations (LDH17, LDH18, JBL17, and JBL18). SE_CV1 predicting SE at a time, ME_CV1 predicting new lines with genotypic information only, and ME_CV2 predicting partially phenotyped lines by using genotypic and phenotypic information from all traits from individuals in the training set, and genotypic and correlated phenotypic traits in the testing set. Trait Correlation and Characterization: A Vital Factor for Improving Accuracy in ME-GP In this study, advanced breeding lines as part of the bread wheat program of CIMMYT were evaluated under irrigated conditions at two locations (JBL and LDH) for 2 years (2017 and 2018) (i.e., four environments). This study evaluated four traits (i.e., DTHD, DAYSMT, GRYLD, and TGW) for use in an ME trait GP model. GRYLD and related traits were positively correlated to each other in two sets (i.e., 1: DAYSMT and DTHD; and 2: GRYLD and TGW) (Figure 4). This positive correlation of GRYLD with TGW in this study points out that the GRYLD was mainly distinct by the TGW factor. The negative relationship between GRYLD and DTHD indicates that the early-headed genotypes play a vital role in the stability of advanced breeding line yield during grain filling and finally affecting the yield component (Sharma and Smith, 1986).

Yield and Related Trait Heritability Difference Among Environments
Our results showed that the heritability of the traits ranged from moderate (i.e., GRYLD) to high (i.e., DAYSMT, DTHD, and TGW). Among the four traits, the phenological traits (i.e., DTHD and DAYSMT) and TGW particularly showed high stable broad-sense heritability ranging from 0.71 to 0.96. It suggests the high quality of the phenotypic measurements and significant predictive potential of the traits. GRYLD, a highly quantitative and environmentally sensitive trait (Maphosa et al., 2014;Würschum et al., 2018), showed considerable fluctuation across environments with JBL environment having relatively lower heritability (0.47-0.48) compared to LDH (0.62-0.74). The variance explained by agronomic traits was significant (Table 1) and indicating a large G × E impact on GRYLD resulted in a lower heritability compared to other traits. Hence, lower heritability estimates for GRYLD were expected as numerous genes govern it. The low heritability and yield variances also could be the possible effect of the smaller plot size and lower sowing density (Rode et al., 2011;Sallam et al., 2015;Thorwarth et al., 2017;Bhatta et al., 2018) (Tables 1, 2). The climate in these two environments is considerably different. While the growing season length is relatively shorter in JBL due to the high overall temperature, the LDH location has a moderately colder climate and longer growing season (Mondal et al., 2016). On the one hand, these highly variable environments do underscore a highly challenging phenotypic landscape; it also presents a significant opportunity to leverage the ME trial framework for trait improvement (Lillemo et al., 2005;Braun et al., 2010). The presence of significant genetic and environmental correlations (i.e., positive correlation in TGW and GRYLD, and DAYSMT and DTHD) in our experiments led us to hypothesize that the correlated traits and environmental relationships can be leveraged to improve the selection accuracy through markerbased ME-GS models (Figure 4). Therefore, we proceeded with applying the ME model to test this hypothesis on our selected set of lines (Table 3).

SE and ME Genomic Prediction Across Years and Sites and ME Model Utilities in Crop Breeding
While weak predictive capability continues to be a major issue in successfully applying GS (Crossa et al., 2013), numerous studies have demonstrated that GS could be beneficial for quantitative traits such as GRYLD with low heritability and also on how GS can be utilized in a breeding program by using even low to moderate GP in early generation selection Lado et al., 2018;Michel et al., 2018). There are several aspects influencing the PA of GP models. Some of the crucial aspects associated with this study of ME were the genetic relationship between the testing and training sets, the size of the training set, heritability and trait architecture, and correlations among traits and environments (Asoro et al., 2011;Crossa et al., 2013;Heslot et al., 2013;Sallam et al., 2015;Zhang et al., 2015;Duangjit et al., 2016;Lado et al., 2016;Wang et al., 2016;Thorwarth et al., 2017;Akdemir and Isidro-Sánchez, 2019;Olatoye et al., 2020). Even though the size of the population was small in our study, the GP using correlated traits in the ME_CV1 and ME_CV2 schemes had higher PA, indicating that correlated traits up to some extent could balance the impact on the sizes of small population. Models that leverage E and G × E components have been shown to improve the genomic prediction accuracies for highly quantitative traits such as phenology and GRYLD (Burgueño et al., 2012;Dias et al., 2018). To evaluate the potential of genomic predictions in highly productive but variable environments of JBL and LDH, we simulated three different genomic prediction scenarios representing actual breeding programs. A comparison of single and ME models showed a 2-to 3-fold improvement in model performance for all traits (Table 4; Figure 3). Among the four traits, GRYLD showed the highest (3.8X) absolute increase in PA from SE to ME models, highlighting the significance of ME modeling in GRYLD predictions. For the SE model, TGW had the most consistent PA across four environments (0.32-0.34), which was in agreement with the highly stable heritability and a lower fraction of G × E observed for this trait (Table 2; Figure 3). Interestingly, the PA of the two ME models (CV1 and CV2) showed no significant change, suggesting that the ME model was able to predict well the untested environments and lines equally. A model can be highly predictive of untested environments in scenarios where environments are highly correlated (Malosetti et al., 2016;Jarquín et al., 2017), which seems to be the case for our environments as reflected by the low G × E and high heritability (Table 1; Figure 3). Similarly, a remarkable improvement in the predictive performance of ME_CV1 can be partially attributed to the fact that our sampled set of lines came from the same breeding program and the sample size of 141 lines was relatively moderate. From the perspective of a breeding program, the strong performance of the two ME models suggests that our breeding program can increase the overall population size without losing any significant predictive power through sparse testing at these two environments (Cullis et al., 2020; Jarquin et al., 2020). A high population size from the sparse testing framework here can deliver a high selection gain through increased selection intensity.

CONCLUSION
Breeding for quantitative traits is challenging due to the complex genetic architecture of traits that are highly affected by the complex G × E interactions in field trials. A suitable genomic prediction modeling strategy can potentially address this challenge through ME genomic prediction models. In this study, we evaluated genomic prediction accuracies of advanced spring wheat lines under four diverse environments in two wheat-growing regions in India. The ME-GS models showed significant improvement over SE models in terms of prediction accuracies. Our results suggest that ME can be leveraged to improve the breeding selection efficiency for major agronomic and phonological traits. Over the years, CIMMYT has established an extensive network of field-testing sites in South Asian countries including India, Pakistan, Bangladesh, and Nepal. Our results suggest that the wheat breeding programs in these countries can greatly benefit from GS through better modeling of environmental variance and sparse testing of a larger cohort of breeding lines. Future research efforts will be directed toward including high-throughput phenotyping traits such as plant height, Normalized Difference Vegetation Index (NDVI), and senescence into the genomic prediction framework to improve the selection efficiency of spring wheat in the South Asian breeding programs.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

AUTHOR CONTRIBUTIONS
VT and DS drafted the manuscript. VT, DS, GD, and YC analyzed the data. UK, JP, and RS designed the field trials, conducted genotyping, and provided breeding lines. VT and YG collected field data. UK, BT, JP, RS, and AJ supervised the overall study. All authors contributed to the article and approved the submitted version.