Genome-Wide Association Mapping and Genomic Prediction of Anther Extrusion in CIMMYT Hybrid Wheat Breeding Program via Modeling Pedigree, Genomic Relationship, and Interaction With the Environment

Anther extrusion (AE) is the most important male floral trait for hybrid wheat seed production. AE is a complex quantitative trait that is difficult to phenotype reliably in field experiments not only due to high genotype-by-environment effects but also due to the short expression window in the field condition. In this study, we conducted a genome-wide association scan (GWAS) and explored the possibility of applying genomic prediction (GP) for AE in the CIMMYT hybrid wheat breeding program. An elite set of male lines (n = 603) were phenotype for anther count (AC) and anther visual score (VS) across three field experiments in 2017–2019 and genotyped with the 20K Infinitum is elect SNP array. GWAS produced five marker trait associations with small effects. For GP, the main effects of lines (L), environment (E), genomic (G) and pedigree relationships (A), and their interaction effects with environments were used to develop seven statistical models of incremental complexity. The base model used only L and E, whereas the most complex model included L, E, G, A, and G × E and A × E. These models were evaluated in three cross-validation scenarios (CV0, CV1, and CV2). In cross-validation CV0, data from two environments were used to predict an untested environment; in random cross-validation CV1, the test set was never evaluated in any environment; and in CV2, the genotypes in the test set were evaluated in only a subset of environments. The prediction accuracies ranged from −0.03 to 0.74 for AC and −0.01 to 0.54 for VS across different models and CV schemes. For both traits, the highest prediction accuracies with low variance were observed in CV2, and inclusion of the interaction effects increased prediction accuracy for AC only. In CV0, the prediction accuracy was 0.73 and 0.45 for AC and VS, respectively, indicating the high reliability of across environment prediction. Genomic prediction appears to be a very reliable tool for AE in hybrid wheat breeding. Moreover, high prediction accuracy in CV0 demonstrates the possibility of implementing genomic selection across breeding cycles in related germplasm, aiding the rapid breeding cycle.

Anther extrusion (AE) is the most important male floral trait for hybrid wheat seed production. AE is a complex quantitative trait that is difficult to phenotype reliably in field experiments not only due to high genotype-by-environment effects but also due to the short expression window in the field condition. In this study, we conducted a genome-wide association scan (GWAS) and explored the possibility of applying genomic prediction (GP) for AE in the CIMMYT hybrid wheat breeding program. An elite set of male lines (n = 603) were phenotype for anther count (AC) and anther visual score (VS) across three field experiments in 2017-2019 and genotyped with the 20K Infinitum is elect SNP array. GWAS produced five marker trait associations with small effects. For GP, the main effects of lines (L), environment (E), genomic (G) and pedigree relationships (A), and their interaction effects with environments were used to develop seven statistical models of incremental complexity. The base model used only L and E, whereas the most complex model included L, E, G, A, and G × E and A × E. These models were evaluated in three cross-validation scenarios (CV0, CV1, and CV2). In cross-validation CV0, data from two environments were used to predict an untested environment; in random cross-validation CV1, the test set was never evaluated in any environment; and in CV2, the genotypes in the test set were evaluated in only a subset of environments. The prediction accuracies ranged from −0.03 to 0.74 for AC and −0.01 to 0.54 for VS across different models and CV schemes. For both traits, the highest prediction accuracies with low variance were observed in CV2, and inclusion of the interaction effects increased prediction accuracy for AC only. In CV0, the prediction accuracy was 0.73 and 0.45 for AC and VS, respectively, indicating the high reliability of across

INTRODUCTION
Hybrid wheat offers great promise in terms of higher grain yield and stability across a wide range of wheat-producing environments globally (Gowda et al., 2010Mühleisen et al., 2014;Zhao et al., 2015;Basnet et al., 2019;Adhikari et al., 2020a;Easterly et al., 2020). Hybrid wheat has gotten attention in private and public sector breeding programs since the 1950s, but it has not led to any considerable commercial success (Virmani and Edwards, 1983;Longin et al., 2012;Adhikari et al., 2020a). There has been a growing interest in hybrid wheat research again in the last decade in North America (Dreisigacker et al., 2005;Basnet et al., 2019;Adhikari et al., 2020a,b;Easterly et al., 2020) and Europe Longin et al., 2012;Zhao et al., 2015). However, hybrid wheat varieties are currently commercially marketed only in Europe and some parts of India and China, and acreages are fairly low in all three areas Gupta et al., 2019).
The most vital limitation for the commercial success of hybrid wheat has always been the complexity and cost of hybrid seed production even though chemical-based sterility, cytoplasmic male sterility, and genetic male sterility systems with varying levels of efficiency are available (Virmani and Edwards, 1983;Longin et al., 2012). Irrespective of the method of hybrid seed production, one of the major factors that determines hybrid seed set is the outcrossing ability, which is determined by pollen load released by male parents upon flowering outside the floral structure (De Vries, 1971;Whitford et al., 2013) and opening of flowers in the female parents . Wheat flowers are cleistogamous, and extrusion of anthers outside the floral structure is a highly correlated proxy of pollen mass release that determines hybrid seed production (Whitford et al., 2013;Langer et al., 2014;Boeven et al., 2016Boeven et al., , 2018. Hence, inheritance of anther extrusion (AE) in the context of hybrid seed production plays a key role and has, therefore, been extensively studied in Europe and North America (Langer et al., 2014;Boeven et al., 2016Boeven et al., , 2018Muqaddasi et al., 2017aMuqaddasi et al., ,c,b, 2019. AE was previously thought to be inherited in an oligogenic manner (Sage and De Isturiz, 1974). However, recent studies on the inheritance of AE with the use of high-density molecular markers have found the trait to be complex and quantitative (Boeven et al., 2016). Several groups have studied the inheritance of AE in the context of Fusarium head blight (caused by Fusarium species, including Fusarium graminearum, F. culmorum, F. avenaceum, F. poae, and Microdochium nivale) resistance and found similar results (Skinnes et al., 2010;Lu et al., 2013;Buerstmayr and Buerstmayr, 2015;Steiner et al., 2019). Genome-wide association studies (GWAS) and bi-parental mapping have identified dwarfing genes/alleles, also referred to as "reduced height loci" (Rht), present in modern wheat germplasm as the major loci or co-localized with major loci governing AE and pollen mass in wheat (Lu et al., 2013;Boeven et al., 2016;He et al., 2016). The Rht genes (Rht-B1 and Rh-tD1) that cause plant height reduction also shorten the anther filaments and negatively impact AE (Boeven et al., 2016). Another height reduction loci Rht24, which does not reduce AE, has been suggested as an alternative to the Rht1 loci for use in male lines in hybrid breeding . Other than the Rht genes, previous studies reported the presence of several minor effect loci positively and negatively associated with AE in European winter wheat (Muqaddasi et al., 2017a,b) and CIMMYT spring wheat (Muqaddasi et al., 2019). Despite the identification of marker-trait associations and QTL, their application in hybrid wheat breeding via marker-assisted selection (MAS) for extending AE has limited scope since they are of very modest effects. Moreover, the Rht alleles have a very important role in height reduction of the wheat plant that is paramount for maintaining grain yield via prevention of lodging. Rht alleles have been widely deployed since the green revolution and are ubiquitous in elite CIMMYT germplasm (Ogihara et al., 2013;Aisawi et al., 2015;Basnet et al., 2019). These genes cannot be excluded for the sole purpose of increasing AE. In this context, genome-wide prediction or genomic selection (GS) appears to be the best strategy to breed for AE by exploiting the cumulative effect of many effect loci scattered throughout the genome (Meuwissen et al., 2001).
GS via whole-genome regression methods uses the information from thousands of molecular markers to capture not only major-effect genes but also the contribution of genomic regions with small effects (Meuwissen et al., 2001). GS has been utilized extensively in animal breeding and plant breeding to predict traits with complex genetic architecture using the information from molecular markers and pedigree information (Hayes et al., 2009;Crossa et al., 2017). In the case of multienvironment trials (METs) in plant breeding, GS models did not explicitly model G × E information since the phenotypic data from METs were analyzed separately to derive single phenotypic estimates, and they were used as single trait in a genomic estimated best linear unbiased predictor (G-BLUP) model (Burgueño et al., 2012). G-BLUP models have been extended for a multi-environment setting by Burgueño et al. (2012), where genetic correlations are used to explicitly model G × E. Jarquín et al. (2014) extended the G × E model to include environmental covariates both as main effects and interaction effects with genotypes and locations in a reaction norm model. The reaction norm model has been used extensively to predict complex traits with multi-environment data in wheat and cotton and was found to reveal higher prediction accuracies than single-trait G-BLUP and multi-trait G-BLUP models (Jarquín et al., 2014;Pérez-Rodríguez et al., 2015;Sukumaran et al., 2017Sukumaran et al., , 2018. GS has been used to predict AE using single-trait G-BLUP (Boeven et al., 2016;Muqaddasi et al., 2017c). However, G × E and reaction norm models have not been tested thus far, despite AE showing high levels of G × E.
This study aims to (i) explore the wheat genome for major effect QTL associated with AE via GWAS and (ii) apply reaction norm G × E models to predict AE in a multi-environment setting with the goal of driving genetic gain for AE in the CIMMYT hybrid wheat breeding program.

Plant Materials and Field Experiments
The study consisted of 603 advanced parental lines from the CIMMYT hybrid wheat breeding program. The lines were planted in 2-m-long, double-row linear plots, with 20-cm interrow spacing at El Batan, Mexico (20.83 • N, 100.83 • W) in 2017 and 2019, and at Obregon, Mexico (27.48 • N,109.93 • W) in 2018 growing cycles. The trials within a location were unreplicated, and plants/spikes per plant were used as biological replicates (i.e., two to three individual plants per genotype and one to three spikes per plant). In each experiment, four or five random spikes from different plants in each plot were tagged prior to flowering, and two male floral traits, as they relate to AE, AE visual scores (VS), and extruded anther count (AC), were taken from the plots at flowering and post-flowering stages, respectively. Trapped anthers in two lateral florets (first and second florets) from six to eight middle spikelets were counted in each randomly tagged spike, and then the deduced AC was expressed as a percentage by using the formula by Boeven et al. (2016): Anther extrusion (%) = Total extruded anthers Total number of anthers ×100% (1) For VS, a score of 1 to 10 was assigned to each genotype during flowering based on visual observations, where 1 indicates no anther extrusion and 10 indicates maximum anther extrusion. AC data were collected from all three environments, whereas VS was collected from only 2018 and 2019 experiments. In 2017, we missed the critical time during flowering to collect reliable VS data.

Statistical Analysis of Phenotypic Data
Phenotypic data of AC from each trial were analyzed separately using software package META-R (Alvarado et al., 2015). Best linear unbiased estimates (BLUEs) were calculated for each genotype, considering the effects of genotypes as fixed and the effects of environments as random. In addition to the BLUEs, variance components were calculated considering all the factors as random. Variance components were estimated for the combined analysis of AC data from all three environments to get an estimate of G × E. For VS, since there were no repeated measurements, the environments were considered replications for the purpose of variance components estimation. The genomic prediction models were run using BLUEs from each experiment for AC and raw phenotypic data for VS.
Broad-sense heritability (H 2 ) within environment was estimated as: H 2 across environments was estimated as: where σ 2 G is the variance due to genotype, σ 2 GE is the variance due to genotype × environment, σ 2 e is the error variance, l is the number of environments, and r is the number of replications using multi-environment trial analysis.

DNA Extraction and Genotyping
Genomic DNA was extracted from freeze-dried leaves collected from five individual plants per line using a modified CTAB (cetyltrimethylammonium bromide) method described in CIMMYT laboratory protocols  and quantified using a NanoDrop 8000 spectrophotometer V 2.1.0.
The population was genotyped with the 20K Infinium is elect SNP array by TraitGenetics (Gatersleben, Germany). The marker dataset was filtered for polymorphism and minor allele frequency (<5%) and >50% missing data were removed. In addition to these filtering steps, markers with known genetic positions in the genetic map developed by Wang et al. (2014) were extracted for use in GWAS.

Genome-Wide Association Study
BLUEs from each individual environment were used for GWAS using the mixed linear model (MLM) model implemented in the Genome Association and Prediction Integrated Tool (GAPIT) (Tang et al., 2016). The population structure was assessed via principal component analysis, and the first three components were used as covariates in the population structure (Q) defined by the kinship (K) (Q + K) model. Linkage disequilibrium (LD) decay was assessed by plotting pairwise LD between marker pairs and their genetic distance. Bonferroni correction for multiple testing implemented in GAPIT was used to identify significant associations, which corresponded to a −log 10 (P) value of >5.

Statistical Models for Genomic Prediction
We used the conventional Genomic Best Linear Unbiased Prediction (GBLUP) extended by the genotype-by-environment interaction term using molecular markers (G × E) and pedigree information (A × E) via the reaction norm model (Jarquín et al., 2014).

Baseline Line Model
Consider that the trait performance (y ij ) of the ith line observed in the jth environment can be described as the sum of an overall mean common to all genotypes in all environments µ, plus random deviations as follows: where L i is the random effect of the ith line, E j is the random effect of the jth environment, LE ij is the interaction between the ith line and the jth environment, and e ij is the random error term accounting for non-explained variability. The effects are assumed to be independent and identically distributed outcomes following normal densities, such that , and e ij iid ∼ N(0, σ 2 e ), while the interaction term from properties of the multivariate density is distributed as follows: and σ 2 e are the associated variance components, and Z E and Z L are the incidence matrices that connect phenotypes with environments and lines, and # represents the Hadamar product (cell-by-cell product between two matrices). Z E and Z L are the transpose of the respective incidence matrices.
In the model above, the random effect of the line (L i ) can be replaced by g i , which is an approximation of the genetic value of the ith line from the genomic relationship matrix. Also the effects of the line (L i ) can be replaced by a i , which is the additive effect obtained from the pedigree information. In the models described below, we used either g i or a i or both g i and a i as well as their interactions with environment E j (gE ij , or aE ij ).

G × E Models for AC and VS Measured in Environments
We applied a sequence of reaction norm models similar to that used by Jarquín et al. (2014), with genomic-based relationship matrices, and by Pérez-Rodriguez et al. (2015), with pedigreebased relationship matrices. Model 1 included only the main effects of environment (E) and lines (L), whereas model 2 added genomic (G) genomic information to model 1. Model 3 included all three main effects of L, G, and E, and the genomic × environment interactions (G × E). Model 4 included main effects of environment (E), lines (L), and pedigree (A), whereas model 5 added the pedigree × environment (A × E) to the main effect terms of model 4. Model 6 included the main effects of environment (E), lines (L), genomic (G), and pedigree (A). Finally, we fitted model 7, which included all main effects and the two interactions G × E and A × E. A description of the seven models considered in this study is given below.

Main Effect Model 1
This simple main effect model considers the response of the ith wheat male line and the jth environment (y ij ) as a function of a random effect model that accounts for only the effect of the environment (E i ) and the accession (L j ), plus a residual (ε ij ): where µ is an intercept, and the random terms remaining are described as in the baseline model. The main effect of environment (E i ) models the environment information via the incidence matrix of genotypes (Z L ) observed in different environments. In this model, the effects of the lines are regarded as independent; therefore, there is no borrowing of information between untested and tested landrace accessions.

Main Effect Model 2
The next main effect model adds in Eq. 5, the random effect of genomic relationship g i , which is an approximation of the true genetic value of the ith male wheat line. This approximation is given by the jointly regression on marker covariates g i = p m=1 x im b m , where x im is the genotype of the ith line at the mth marker, and b m is the corresponding effect with the assumption . ., p) and σ 2 b is the variance of the marker effects. The vector g = (g 1 , . . . , g I ) contains the genomic values of all the lines, and it is assumed to follow a multivariate normal density with zero mean and covariance matrix Cov(g) = Gσ 2 g , where G is the genomic relationship matrix that describes the genomic similarities between pairs of lines and, σ 2 g which is proportional to σ 2 b (σ 2 g ∝ σ 2 b ), the genomic variance. Therefore, main effect model 2 becomes where the vector of random effects is assumed g ∼ N(0, Gσ 2 g ) to be, and the other random effects remain as described The random effects g = (g 1 , . . . , g J ) are correlated such that model 2 allows borrowing of information across L i tested and untested lines. The genomic matrix G given by where p m is the estimated frequency of the allele whose number of copies at the ith accession is counted in x im . Centering (i.e., subtracting 2p m from the genotype codes) and standardization (i.e., dividing by as a genomic variance. This model does not allow specific genomic effects for each environmental condition but rather a common effect for same lines across environments. Thus, the interaction between markers and environments is introduced in the next model.

Main Effect and Interaction Model 3
This model is obtained by extending the main effect model 2 (Eq. 6) to include interaction effects (gE ij ) between each marker SNP and each environment. This model can be written as where E j , L i , and g i have already been defined, gE ∼ is the interaction of the genome with environment, with σ 2 gE as the variance component of gE, and the other model terms are as defined previously.

Main Effect and Interaction Model 4
Model 4 is similar to model 2 (Eq. 6), but instead of including the random effect of genomic g i , it includes the random effect accounted for by the pedigree a i . This model adds the random effect that incorporates pedigree information by means of the additive relationship matrix (A) to model 1 (Eq. 5), where a i is a random additive effect of the line, which in this case accounts for pedigree relationships, where a = (a, . . . , a I ) contains the pedigree values of all the lines and is assumed to follow a multivariate normal density with zero mean and covariance matrix Cov (a) = Aσ 2 a , where A is the additive relationship matrix, and σ 2 a is the additive genetic variance. The random effects are correlated such that model 4 allows borrowing between tested and untested lines based on the numerical relationship matrix (A). Similarly, this model does not allow specific responses to each environment but instead common effects across environments. Thus, the interaction between lines and environments is introduced in the following model via pedigree information instead of marker data.

Main Effect and Interaction Model 5
This model is obtained by extending model 4 (Eq. 8) to include interaction effects (aE ij ). Thus, where,E j , L i , and g i have already been defined, aE ∼ N 0, Z L AZ L # Z E Z E σ 2 aE is the interaction of the genome with environment, with d σ 2 aE as the variance component of aE:

Main Effect and Interaction Model 6
Model 6 is similar to model 4 but adds the genomic relationship g i :

Main Effect and Interaction Model 7
This is the complete model with all main effects and interactions: where the terms are already defined.

Assessing Model Prediction Accuracy by Random Cross-Validation
The described models were fitted in various validation settings to estimate prediction accuracy within an environment (i.e., despite how training and testing set were configured, the correlation between predicted and observed values was computed within environments). For both traits measured, three different validation schemes were studied. We repeated the random crossvalidations from Burgueño et al. (2012) and Jarquín et al. (2014) and considered three prediction problems: (1) (CV1) prediction of 20% of wheat lines that have not been evaluated in any environment; (2) (CV2) prediction in incomplete field trials i.e., prediction of performance of lines that have been evaluated in some environments but not in others; and (3) prediction of performance of all lines in an untested environment, using performance data of those lines from correlated environments. CV1 was obtained by assigning accessions to folds; hence, when the phenotype of an accession is predicted, the corresponding training set contains no record of this accession. CV2 was obtained by assigning individual records of each accession to folds; hence, when one is predicting the ith line, there are records for the same accession that were part of the training set but observed in a different environment. In both prediction problems, a fivefold cross-validation was performed, where 80% of the accessions formed the training set and 20% of the accessions comprised the testing set for each partition. The assignation of training and testing sets was repeated 20 times (5 × 2 = 100 random partitions) for each one of these crossvalidation schemes (CV1 and CV2). We also evaluated cross-validation CV0, where all lines in one environment were fully predicted by the other environments.

Data Repository
The data repository can be found at http://hdl.handle.net/11529/ 10548495.

Phenotypic Variation and Correlation Between Traits
The AC in 2017 ranged from 14.3 to 95.8 with an average of 47.2 and a standard deviation (SD) of 18.1 (Figure 1 and Table 1 The genotypic variance was significant for both traits in all three trials ( Table 1). The broad-sense heritability estimates for AC ranged from 0.78 to 0.93 across the three environments, whereas the broad-sense heritability for combined VS was 0.32.
The BLUEs of AC were significantly correlated across the three experiments (0.6 -0.67, p < 0.01) (Supplementary Figure 1). The visual scores across the two environments (2018 and 2019) were also significantly correlated (0.44, p < 0.01). The VS data were also highly correlated with AC data from all three trials (Supplementary Figure 1).

Markers Retained After Quality Control
After filtering for polymorphism, minor allele frequency, and missing data, 10,534 markers were retained. The whole marker dataset was used to create a genomic relationship matrix used for GP. For GWAS, the marker dataset was additionally filtered for presence of known genetic positions in the consensus map by Wang et al. (2014). The number of markers retained for GWAS was 7,649.

Population Structure and Linkage Disequilibrium
Population structure was assessed via principal component analysis. Population structure observed was not very strong since the first three principal components (PCs) only explained 16% of the cumulative variance (Figure 2). Linkage disequilibrium (LD) was assessed by calculating pairwise LD decay over genetic distance ( Figure S2). Considering r 2 = 0.2 to be the extent of average intrachromosomal LD, in this population, LD calculated on a sliding window of 100 adjacent markers showed the LD blocks extended up to 25 cM (Figure 2).  ϕ Visual score data did not have replications within one environment, and data from environments were used as replication for estimating variance components. **p < 0.001. decreased the AC by 6.16%, whereas BS00065313_51 decreased the VS by 0.3. The MTA detected on chromosome 5B by the significant SNP marker RAC875_rep_c104919_902 increased the VS by 0.39. For three out of the four markers significantly associated with AC and VS, the favorable allele has a higher frequency in the elite male population we studied ( Table 2).

Genome-Wide Prediction
We used seven models of increasing complexity for the genomewide predictions. The prediction accuracy was assessed using three cross-validation scenarios.

Cross-Validation Scenario 1 (CV1)
For CV1, phenotypes of lines that have never been evaluated in the field were predicted using line information, environment information, genomic information, pedigree information, and interaction terms. Prediction accuracies of model 1 (E + L) were negative for both traits under CV1 (Table 3). Upon the addition of genomic information (model 2, E + L + G), the prediction accuracy increased and was in the range of 0.42-0.44 across environments for AC and 0.33-0.46 for VS ( Table 3). Inclusion of pedigree information without genomic relationship information decreased the prediction accuracy for both traits. Model 4 (E + L + A) had prediction accuracies in the range of 0.31-0.34 for AC and 0.22-0.34 for VS, which is lower than the prediction accuracies obtained by model 2 for both traits (Table 3). When pedigree information was added in the presence of genomic information, it improved the prediction accuracies. The prediction accuracies for AC were higher for model 6 (E + L + G + A) than those for model 4. In the case of VS, the inclusion of pedigree information along with genomic relationship slightly increased the prediction accuracies in 2018 (Table 3). Models with interaction terms generally have slightly higher accuracies as compared to models with main effects only. For example, for both traits model 3 (E + L + G + G × E) had slightly higher prediction accuracy compared to model 2 (E + L + G), and model 5 (E + L + A + A × E) had slightly higher prediction accuracy compared to model 4 (E + L + A). Model , which is the most complex model, had comparable prediction accuracies for both traits with model 3 (E + L + G + G × E). Compared to model 3, model 7 had higher prediction accuracy only in 2018 for AC and 2019 for VS (Table 3). In most cases, model 3 (E + L + G + G × E) had the highest prediction accuracies among the seven models tested in CV1 for both traits.

Cross-Validation Scenario 2 (CV2)
CV2 represents the case of incomplete field trials, where some lines are tested in one environment but are missing in other environments. The phenotypic record of one or more environments is used in conjunction with genomic, pedigree, line, and environment information to predict the missing phenotypic record of lines. Prediction accuracies were higher in CV2 for all seven models across two traits compared to CV1 (Table 3). In CV2, model 1, which had negative prediction accuracy, had comparable prediction accuracies with model 3. Very nominal increments in prediction accuracies were observed with increasing model complexities across both traits. Some variation was observed in prediction across environments. For example, prediction accuracies for AC were higher in 2018 and 2019 compared to 2017. Similarly, prediction accuracies for VS were higher in 2018 compared to 2019. As in CV1, the models predicted AC better than VS in CV2. In addition to higher prediction accuracies, the standard deviation (SD) of prediction accuracies was much smaller in CV2 compared to CV1 (Table 3).

Predicting an Untested Environment (CV0)
CV0 is a prediction scenario, where a full dataset from a correlated environment is used to predict the performance of lines in an untested environment. Under CV0, all seven prediction models performed very well across all three environments and two traits (  Table 4). The differences that were apparent between model 2 and model 4 via the inclusion of pedigree vs. genomic relationship in CV1 were not apparent in CV0. Similarly, the inclusion of interaction effects also did not make as much difference as it did in CV1.

DISCUSSION
The success of hybrid wheat breeding depends on reduced costs for hybrid seed production and grain yield heterosis. The presence of heterosis of grain yield in hybrid wheat has already been established over several decades, while newer studies have suggested, in addition, that the development of heterotic pools could increase the level of heterosis (Longin et al., 2013;Zhao et al., 2015;Rembe et al., 2019). Reducing the cost of hybrid seed production appears to be a more complex challenge. Methods of hybrid seed production, such as cytoplasmic male sterility, genetic male sterility, and chemical hybridization methods, need to be optimized. In addition, the floral biology of wheat needs to be redesigned to favor cross-pollination (Whitford et al., 2013;Boeven et al., 2016). The most important factor to facilitate crosspollination in wheat is higher AE, while ensuring that the male parent does not lose its ability to contribute to higher grain yield in the subsequent hybrid crosses. A recurrent selection scheme needs to be implemented within the male germplasm pool to develop superior male lines with desirable floral traits favoring cross-pollination along with other attributes for superior yield and quality. In the absence of large-effect loci, high G × E variance and labor-intensive phenotyping, MAS, and visual selection are inadequate. Hence, in this study, we demonstrate the utility of genome-wide prediction for AE via modeling for G × E and environmental covariates as a more reliable substitute for MAS and/or visual selection.

Phenotypic Evaluation of AE
AC and VS appear to be reliable measurements for AE, as demonstrated by their high heritability estimates. The heritability of AC in this study ranged from 0.79 to 0.93 which is comparable to heritability reported in similar previous studies (Boeven et al., 2016;Muqaddasi et al., 2017a). The heritability of VS in this 3 | Mean and standard deviation of genomic prediction accuracies from the reaction norm models (Jarquín et al., 2014) under the two cross-validation scenarios (CV1 and CV2) for two traits representing anther extrusion; anther count (%) and visual score were collected from a field experiment spanning three environments (El Batan 2017, Obregon 2018 and El Batan 2019).  study was quite modest compared to similar previous studies. Previous studies have reported heritabilities in the range of 0.5 to 0.8 for VS (Boeven et al., 2016;Muqaddasi et al., 2019). However, it should be noted that the error variance for VS was confounded with G × E due to the lack of replications within an environment. High positive correlations between VS and AC within the same environment indicated that VS could be a reliable trait for measuring AE. VS has been found to be an adequate trait to measure AE in several previous studies (Boeven et al., 2016Muqaddasi et al., 2017c,a). The continuous distribution of AC indicated that AE is a complex trait governed by cumulative effects of numerous minor effect loci, making it suitable for GS. Moreover, significant genotypic variances for the two traits measured indicated that these traits can be improved by breeding efforts ( Table 1; Boeven et al., 2016Boeven et al., , 2018.

Marker Trait Associations for AE
Height-reducing loci, such as Rht-B1 and Rht-D1, have been shown to reduce AE in several previous studies (Boeven et al., 2016;Würschum et al., 2018). None of the large effect Rht loci were identified in our analysis. CIMMYT spring wheat germplasm has been subjected previously to GWAS for AE and, similarly, Rht loci were not identified (Muqaddasi et al., 2017c).  This is most likely due to the fact that Rht loci, in particular Rht-B1, have been largely deployed in CIMMYT since the 1970s, and Rht-B1 is almost fixed in recent elite germplasm (Aisawi et al., 2015;Basnet et al., 2019). Two MTAs were identified in chromosome 5A, one on the distal and one on the proximal end of the chromosome, based on physical positions. Previous studies have reported QTL for AE with minor to moderate effects on chromosome 5A via linkage mapping in biparental populations (Lu et al., 2013;Buerstmayr and Buerstmayr, 2015;Muqaddasi et al., 2019). The previously reported QTL for AE are spread throughout chromosome 5A, and the MTAs in 5A share close proximity with several of these previously reported QTLs. For example, the MTAs identified in this study in the short arm of 5A is at 26.51 cM (98.94 Mb), which is in between the AE QTL identified by Buerstmayr and Buerstmayr (2015) at 20 cM and Lu et al. (2013) (Skinnes et al., 2010;Boeven et al., 2016;Muqaddasi et al., 2017b,a). Boeven et al. (2016) have reported an MTA for AE in 1B at 74.4 cM; Muqaddasi et al. (2017b) have reported an MTA at 56.4 cM; Muqaddasi et al. (2017a) have reported an MTA at 70.08 cM; and Skinnes et al. (2010) have reported a QTL with confidence interval of 86-102 cM. The MTA detected in 1B in this study at 60.62 cM lies close enough to these previously reported loci. Similarly, in 5B Muqaddasi et al. (2017b) have reported an MTA at 108.7 cM, whereas we identified an MTA at 117.84 cM in this study.
Since the phenotypic effects of these MTAs across studies were low, there is limited scope to use these MTAs in MAS. The only option might be to use these QTL together with the Rht loci in MAS, in the event that they can be successfully validated and concurrently do not show any negative effect on other traits, e.g., lodging or grain yield.
Despite having high heritabilities for VS and AC, the MTAs explain a very low amount of phenotypic variance. This signifies the highly polygenic nature of inheritance of AE and suggests that MAS is not the best strategy for driving genetic gains in AE. In this context, genomic prediction/selection can be an excellent strategy for making selection gains in breeding for AE.

Genome-Wide Predictions
Genome-wide prediction for AE has been previously conducted on an unrelated smaller subset of CIMMYT spring wheat germplasm (Muqaddasi et al., 2017c) and winter wheat germplasm from Western Europe (Boeven et al., 2016). Both studies used single-trait models without explicitly modeling G × E. Incorporating G × E effects and environmental covariates has previously shown higher prediction accuracy for grain yield (Burgueño et al., 2012;Jarquín et al., 2014;Sukumaran et al., 2017Sukumaran et al., , 2018Krause et al., 2019), micronutrient concentration (Velu et al., 2016), and lint yield in cotton (Pérez-Rodríguez et al., 2015). In this study, we attempted to implement the same approach for predicting AE.

Prediction of Performance of Untested Lines
In CV1, lines that were never tested before were predicted by borrowing information from closely related individuals, which is akin to the previous GS studies for AE (Boeven et al., 2016;Muqaddasi et al., 2017c). Boeven et al. (2016) reported prediction accuracies of 0.3 for VS and 0.6 for AC implementing ridge regression BLUP (RR-BLUP), whereas inclusion of weighted effects of Rht loci via weighted ridge regression BLUP (wRR-BLUP) increased the prediction accuracies to 0.5 for VS and 0.7 for AC. Muqaddasi et al. (2017c) reported a prediction accuracy of 0.6 for VS in a CIMMYT population using RR-BLUP. However, the prediction accuracy was standardized with the square root of heritability in the Muqaddasi et al. (2017c) study. The prediction accuracies of the main effect model (model 2) in CV1 for both VS and AC are comparable to these previous studies. When interaction effects were added in models 3 and 7, the prediction accuracies tended to increase slightly (1-3%). In CV1, models using the G matrix (models 2 and 3) always had higher prediction accuracies than models using pedigree-based A matrix (models 4 and 5), which has been reported previously by Campos et al. (2009Campos et al. ( , 2010 and Crossa et al. (2010Crossa et al. ( , 2011.
In model 1, where only the main effect of environments and lines are used for prediction, negative prediction accuracy is observed. Since neither pedigree information nor genomic information is included in this model, it does a very poor job of predicting performance. This is expected since genomic prediction is based on using information borrowed from related individuals via pedigree and genomic relationship, and in model 1 only incidence matrices for lines and environments are included. Once we start including pedigree and genomic information the prediction accuracies are positive and higher.

Prediction of Performance of Previously Tested Lines in Untested Environments
CV2 and CV0 were the scenarios where phenotypic information of the same line from one environment was used to predict VS and AC in another environment. CV2 is similar to what is also called sparse testing, where some of the lines are missing, whereas CV0 is the prediction of the whole population in a previously untested environment. In cases when a phenotypic record of the line being tested is used to train the model (CV2 and CV0), the prediction accuracy is higher compared to the case where the line has never been tested previously (CV1) (Sukumaran et al., 2017(Sukumaran et al., , 2018Basnet et al., 2019). We found similar results in the study, as expected. These CV scenarios can be very useful in the case of traits that are difficult to phenotype due to cost or the labor-intensive nature of phenotyping. The results for CV0 and CV2 indicate that untested sites, environments, and years can be predicted with high reliability. CV0 and CV2 scenarios can supplement the field evaluation efforts of breeding programs. In particular, sparse testing, i.e., the CV2 scenario, is already in practice in hybrid breeding for the development of heterotic pools in wheat (Zhao et al., 2015).
Inclusion of interaction effects such as G × E and A × E produced mixed results. For AC, the inclusion of interaction effects produced a very nominal increase in prediction accuracy (1-3%), whereas for VS it decreased nominally in most cases.

Implications for Hybrid Wheat Breeding
GS is promising for driving the genetic gain of AE. However, the prediction accuracies are also dependent on trait heritability values (Velu et al., 2016;Acosta-Pech et al., 2017). AC had higher heritability compared to VS in our study. VS is easier to phenotype than AC. Here, VS data were unreplicated, and the error variance was confounded with G × E. It is easier and costeffective to increase replications for VS than to collect data for AC routinely in the breeding program. Hence, the use of replicated trials can help increase the prediction accuracy for VS.
Inclusion of interaction terms had a very nominal advantage in prediction accuracy for AC, whereas it was sometimes counterproductive in VS, most likely due to increasing model complexity. Based on observations from this study, it is possible to predict AE with reasonable accuracy using pedigree data only, but the inclusion of genomic data should always be preferred. Inclusion of both pedigree and genomic data appears to work best but not in all cases.
Reciprocal recurrent selection is a promising strategy in hybrid wheat breeding (Rembe et al., 2019). For VS, sparse testing appears to be a good strategy, where a subset of lines would be tested in some environments but not all. Information on tested relatives in an environment can be used to predict untested lines. For AC, which is labor-intensive and expensive to phenotype, data from a subset of highly maintained trials can be used to predict performance in an untested environment.

DATA AVAILABILITY STATEMENT
The datasets for this study can be found in the CIMMYT Research Data & Software Repository Network (http://hdl. handle.net/11529/10548495).

AUTHOR CONTRIBUTIONS
AA collected and analyzed the data, and wrote the manuscript. BB conceived and designed the experiment, collected and analyzed phenotypic data, and wrote the manuscript. JC and DJ analyzed the data and wrote the manuscript. PB and FC did field experiments, collected phenotypic data, and reviewed the manuscript. SD generated and curated genotypic data, analyzed data, and reviewed the manuscript. YM and AI developed the concept and reviewed the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
The authors thank the CGIAR Research Program in Wheat (CRP-WHEAT https://wheat.org/) for funding this work. The authors also thank Syngenta for generous support in SNP genotyping of wheat lines used in this study under the CIMMYT-Syngenta partnership on hybrid wheat research.