GWAS to Identify Genetic Loci for Resistance to Yellow Rust in Wheat Pre-Breeding Lines Derived From Diverse Exotic Crosses

Yellow rust (YR) or stripe rust, caused by Puccinia striformis f. sp tritici Eriks (Pst), is a major challenge to resistance breeding in wheat. A genome wide association study (GWAS) was performed using 22,415 single nucleotide polymorphism (SNP) markers and 591 haplotypes to identify genomic regions associated with resistance to YR in a subset panel of 419 pre-breeding lines (PBLs) developed at International Center for Maize and Wheat Improvement (CIMMYT). The 419 PBLs were derived from an initial set of 984 PBLs generated by a three-way crossing scheme (exotic/elite1//elite2) among 25 best elites and 244 exotics (synthetics, landraces) from CIMMYT’s germplasm bank. For the study, 419 PBLs were characterized with 22,415 high-quality DArTseq-SNPs and phenotyped for severity of YR disease at five locations in Mexico. A population structure was evident in the panel with three distinct subpopulations, and a genome-wide linkage disequilibrium (LD) decay of 2.5 cM was obtained. Across all five locations, 14 SNPs and 7 haplotype blocks were significantly (P < 0.001) associated with the disease severity explaining 6.0 to 14.1% and 7.9 to 19.9% of variation, respectively. Based on average LD decay of 2.5 cM, identified 14 SNP–trait associations were delimited to seven quantitative trait loci in total. Seven SNPs were part of the two haplotype blocks on chromosome 2A identified in haplotypes-based GWAS. In silico analysis of the identified SNPs showed hits with interesting candidate genes, which are related to pathogenic process or known to regulate induction of genes related to pathogenesis such as those coding for glunolactone oxidase, quinate O-hydroxycinnamoyl transferase, or two-component histidine kinase. The two-component histidine kinase, for example, acts as a sensor in the perception of phytohormones ethylene and cytokinin. Ethylene plays a very important role in regulation of multiple metabolic processes of plants, including induction of defense mechanisms mediated by jasmonate. The SNPs linked to the promising genes identified in the study can be used for marker-assisted selection.


InTRODUCTIOn
Wheat (Triticum aestivum L.) is one of the most important cereal crops with an estimated production of 736.1 million tons in 2019 (FAO, 2018). The average wheat production in Mexico is 3.3 million tons (SIAP, 2019), while the consumption is 7.17 million tons (SAGARPA, 2018). Yellow rust (YR) or stripe rust caused by the fungus Puccinia striformis f. sp tritici Eriks is one of the main phytosanitary challenges of wheat cultivation in the world, causing yield losses of 30 to 100% (Bux et al., 2012). The fungus reproduces rapidly, can travel at very large distances, has high mutation rates, and adapts to different climatic zones (Ali et al., 2017). In the presence of rust the flow of carbohydrates through the phloem to the grains is considerably reduced (Chen, 2005) due to the reduction of the photosynthetic area (Chen et al., 2015). As a result, infected plants produce fewer spikelets and form less and shriveled grains (Chen et al., 2013), which are of low industrial quality and lower food value (Afzal et al., 2008;Abdennour et al., 2018).
Seedling or race-specific resistance (i.e. resistance controlled by major genes) provides highly effective protection against pathogenic races (Boyd, 2005). However, the fungus can mutate into new biotypes or can produce new physiological races in a short time span to overcome seedling resistance . An alternative to achieve greater durability of the resistance is to develop varieties that have durable resistance based on slow rusting genes also known as adult plant resistance (APR) genes .
The advances in next generation sequencing technologies and high-throughput genotyping systems have revolutionized the field of plant genomics, leading to easy availability of thousands of single nucleotide polymorphisms (SNPs) in crops. Wheat has particularly benefited from these advancements, in which marker number and density have been the limiting factors for a long time (Sehgal et al., 2017). Dense sets of SNPs now available from different marker platforms [90K Illumina iselect, genotyping by sequencing (GBS), DArTseq, high-density Affymetrix Axiom ® genotyping array] have significantly transformed the genetic toolkit available in wheat. It has become possible to untangle the genetic architecture of traits by genome-wide association study (GWAS), a leading approach for complex trait dissection and identification of novel and superior alleles to be utilized for breeding (Sun et al., 2017). Use of GWAS studies has allowed finding new allelic variation for a plethora of traits in crops (Suwarno et al., 2015;Pantalião et al., 2016). In wheat, GWAS has been used for various traits including resistance to a number of diseases (Crossa et al., 2007;Yu et al., 2012;Edae et al., 2014;Lopes et al., 2015Battenfield et al., 2016;Singh et al., 2016;Sehgal et al., 2017;Valluru et al., 2017;Singh et al., 2018). For YR, GWAS studies have led to significant advances in identification of genomic regions for both seedling resistance and APR (Rosewarne et al., 2013;Zegeye et al., 2014;Jighly et al., 2015).
The objective of this research was to identify genetic loci associated with resistance to YR in pre-breeding lines (PBLs) developed at International Maize and Wheat Improvement Center (CIMMYT). These lines were derived by a three-way crossing scheme (exotic/elite1//elite2) among CIMMYT's 25 best elites and exotics (synthetics, landraces) from gene bank (Singh et al., 2018).

Plant Material
Out of an initial set of 984 PBLs (Singh et al., 2018) In the FW cycle of 2015-2016, the sowing was done in blocks of 20 experimental plots with each plot consisting of two rows separated by 0.75 m. The blocks were separated by 1 m, in which the susceptible variety 'Morocco' was planted in the form of hill plots and was inoculated with MEX14.191 and MEX10.44 races. To ensure sufficient and homogenous distribution of Pst across the experimental trial, seeds of 'Morocco' were also planted as border rows all around the trial. The sowing density of experimental plots was 46 kg ha −1 . A total of five irrigations were applied at 0, 35, 65, 85, and 105 days after sowing and plots were fertilized with the dose 240-60-00 N-P-K; all the phosphorus and half of the nitrogen at sowing and the rest in the first auxiliary irrigation. For inoculation, urediniospores were collected from previous cycles with a mechanical collector and kept at -55°C in a freezer. Before use, frozen urediniospores received a thermal shock with water at 60°C for 7 min and then rehydrated for 4 h in a humid chamber (Mariscal et al., 2009). A suspension was made of hydrated urediniospores with water and an emulsifier at a concentration of 1 × 10 6 ml -1 to be injected in the stems of the susceptible cultivar Morocco. In the SS cycle of 2016, lines were sown in rows of 40 hill plots using 9 g per plot density. Seeds of variety 'Morocco' were sown only as border rows all around the trial. Plots were fertilized with the formula 120-60-00 N-P-K at sowing and no irrigation was applied. Crops were allowed to develop under natural rainfall conditions and under natural epidemic conditions in Texcoco and Tlaxcala, where races MEX10.44, MEX14.191, MEX14.146, and MEX14.141 predominate (Huerta-Espino et al., 2015). The most frequent Pst race in these locations is MEX14.191 (Huerta-Espino et al., 2015). The avirulence/virulence formula of these races is presented in Supplementary Table 2.
Rust scores were taken according to the modified Cobb scale (Peterson et al., 1948). The percentage of final severity of the lines was considered when the Morocco variety reached to the stage of 100% leaf damage. On the basis of 0-100 scale, accessions were grouped as highly resistant to resistant (0-20), intermediate , and susceptible to highly susceptible (51-100).

Analysis of Phenotypic Data
Best linear unbiased predictors (BLUPs) were calculated from two replications data for each of the five environments separately and across combined environments using Meta-R program V.6.0 (Alvarado et al., 2016). The following the mixed linear model was used: "y = Xr + Za + Wb + e", where y is the response variable, X, Z, and W are incidence matrices for replicate, genotype, and block nested in replicate effects, respectively; r, a, and b are the effects of replicate, lines, and block nested in replicate, respectively. Line and block effects were considered as random effects. e is the experimental error, which follows a normal distribution with zero mean and variance. The variance components were used for calculating broad sense heritability (h 2 ) for traits in each location and across combined locations. For individual location, h 2 was calculated as follows: where V g is genotypic variance and V err is the error variance and r = number of replications in a location. For combined locations, h 2 was calculated as follows: where V g is genotypic variance, V g × e is genotype × environment interaction variance, V err is the error variance, n = number of environments, and r = number of replications.
Histograms showing the dispersion of disease scores of the 419 lines were made for each location with a script executed in R version 3.5.1.

Genotyping
Genomic DNA was extracted from fresh leaves collected from the 419 individuals using the CTAB (cetyltrimethylammonium bromide) method but modified as described in Dreisigacker et al. (2016) in CIMMYT laboratory manual. DNA quality and concentration were determined by electrophoresis in 1% agarose gel. The samples were genotyped by DArTseq technology (Sansaloni et al., 2011) in the Genetic Analysis Service for Agriculture (SAGA) with current headquarters at CIMMYT, El Batan, Texcoco, Estado de Mexico, using 30 μl of DNA (80-100 ng μl −1 ) in 96-well plates. The FASTQ files (full reads of 77 bp) were quality filtered using a Phred quality score of 30, which represent a 90% of base call accuracy for at least 50% of the bases. More stringent filtering was also performed on barcode sequences using a Phred quality score of 10, which represent 99.9% of base call accuracy for at least 75% of the bases. A proprietary analytical pipeline developed by DArT P/L was used to generate allele calls for SNP and presence/absence variation (PAV) markers. Then, a set of filtering parameter was applied to select high quality markers for this specific study. One of the most important parameters is the average reproducibility of markers in technical replicates for a subset of samples which was set at 99.5%. Another critical quality parameter is call rate. This is the percentage of targets that could be scored as "0" or "1", the threshold was set at 50%. SNPs with missing data more than 20% and minor allele frequency <0.05 were eliminated. To obtain physical positions of SNPs, sequence reads of the SNPs were blasted to the reference genome of RefSeq V.1.0 with E value <10 −8 and identity >95%.

Population Structure and Linkage Disequilibrium (LD) Analysis
To determine the population structure fastStructure v.1.0 (Raj et al., 2014) was used using the Bayesian clustering method. The parameters set were 10,000 burning cycles and 10,000 iterations of Monte Carlo Markov Chains using the admixture model. Cluster values (K) ranging from 2 to 10 were chosen and five independent runs were conducted for each K. A zip archive containing all of the results-f files was created and used as an input in Structure Harvester program to estimate delta K (∆K). Additionally, the principal component analysis (PCA) was performed with the TASSEL software (Trait Analysis by Association, Evolution and Linkage version 5.2.50), using 22,415 high quality and filtered DArTseq markers (missing data <30% and minor allele frequency >0.05). The population structure was corroborated by a cladogram made with the TASSEL program. The kinship matrix was also estimated using the 22,415 markers using TASSEL. The PCA was drawn using the rgl package in R. GAPIT version 2.0 was used to obtain correlation estimates of the frequency of the squared allele of LD (r 2 ) for all pairwise comparisons. Pattern of LD decay was visualized by plotting pair-wise r 2 values against the distance (cM) between markers for A, B, and D genomes separately and for whole genome. A smooth line was fit to the data using second-degree locally weighted scatterplot smoothing (LOESS; Breseghello and Sorrells, 2006) as implemented in SAS. For the LOESS estimation of LD decay, genetic distance was estimated as the point where the LOESS curve first crosses the baseline r 2 of 0.1.

haplotype Block Construction
Haplotype blocks were generated considering the D′ linkage unbalance parameter using the R script described in Gabriel et al. (2002). When many markers had the same genetic position, only the first marker of those groups was taken. We calculated D′ 95% confidence intervals between SNPs and each comparison was categorized as "strong LD", "inconclusive", or "strong recombination". A haplotype block was created if 95% of the comparisons in one block were in "strong LD". For two or more SNPs to be classified in "strong LD", the minimum lower and upper confidence interval values were set to 0.7 and 0.98, respectively. The haplotypes were displayed as blocks of marker numbers and alleles. They were named as combinations of the prefix "H" followed by a number representing the chromosome, a point, and then a number that represents a particular allelic combination.

Association Analysis for Resistance to YR
The phenotypic BLUPs of each line were used for the association analysis using TASSEL program (version 5.2.50). A mixed linear model was used with PCA as a fixed variate and kinship as random. The quality of the association was analyzed by drawing Q-Q plots (Supplementary Figure 1). To declare significant associations, the criterion of false discovery rate was used and it was calculated using package Q-VALUE in R. The associations were represented by drawing Manhattan graphs in TASSEL.

In Silico Analysis
In silico analysis of the significant loci was conducted in Ensembl Plants sequence database (https://plants.ensembl. org/Triticum_aestivum/Info/Index). Based on BLAST in the server, quantitative trait loci (QTL) were located on short and long arms of chromosomes and candidate gene hits were reported.

Phenotypic Evaluation
The minimum and maximum YR scores for all individual locations and across five locations are listed in Supplementary  Table 3. Wide variation occurred among PBLs in all environments (Figure 1), with scores ranging from 0 to 70, 0 to 35, 2.5 to 70, 5 to 100, and 0 to 50% in Celaya, Jalisco, Texcoco, Tlaxcala, and Villagrán, respectively. A normal distribution of the trait was observed at all locations. 72.7, 92.3, 78.2, 33.1, and 93.5% of the lines were resistant, 15.5, 1.6, 10.2, 51.3, and 0.4% displayed intermediate reaction, and 11.6, 6.1, 11.4, 15.5, and 6.1% were moderately susceptible to very susceptible (51 to 100%). ANOVA analysis showed significant variation in lines at all locations (Supplementary Table 3). The zero estimates of genotype × environment interactions indicate stability of genotypic performance across environments. Location-wise heritability was high; ranging from 0.93 in Texcoco to 0.98 in Jalisco (Supplementary Table 3) with 0.84 across combined locations.
Genotyping 51,232 SNPs and PAV were generated across 419 lines after allele calling, of which 22,415 filtered SNPs were utilized for further analysis. PAV markers were not used in any analysis. 6,838 SNPs were found in genome A, 7,605 in genome B, and 7,972 in genome D. The chromosome with the highest number of SNPs was 7D with 1,718 SNPs, followed by chromosome 2B with 1,453 SNPs. Chromosomes with fewer SNPs were 6A with 790, 4D with 635, and 4B with 603 SNPs. A total of 591 genome-wide haplotype blocks were obtained, of which 253 were detected in genome A, 260 in genome B, and 78 in genome D. The number of SNPs in haplotype blocks ranged from 2 to 8.

LD Decay
LD was estimated by calculating the squared allele frequency correlation (r 2 ) among all possible pairs of markers for each of the 21 chromosomes. Obtained r 2 values were then plotted against genetic distance (cM) for each of the three genomes separately and across whole genome (Figure 4). LD decayed at 2.5, 5.0, and 2.5 cM for A, B, and D genomes, respectively at cut off r 2 = 0.1, while for whole genome, decay was observed at 2.5 cM. Based on this average LD decay, size of QTL was estimated i.e. all significant SNPs within 2.5 cM were considered as part of the same QTL.
The analysis by location showed variable R 2 as well as level of significance of 14 SNPs. Jalisco was the locality where the markers explained lower percentage of the phenotypic variation for YR resistance with values of 5.9 to 9%; Villagrán, on the other hand, showed the highest values with a range of 7.8 to 13.8% (Table 1). When comparing some of the best lines (0 to 5% severity) with some of the worst (36 to 54% severity) for their response to rust, it was observed that the resistant ones carried practically all 14 resistant alleles, however, the susceptible lines had very few resistant alleles (Supplementary Tables 4 and 5). A general trend that emerged was that the more resistant alleles the lines had, the greater was their resistance, suggesting an additive effect of the loci involved.
Out of 591 haplotype blocks obtained genome-wide, seven showed significant association (P-value <10 −6 ) with YR resistance ( Table 2) and were located on chromosomes 1B (one), 2A (five), and 2B (one). The haplotype block on chromosome 1B, H2.5, was detected in Jalisco. The haplotype blocks H4.1, H4.2, and H4.18 on chromosome 2A were found to be significant in five locations, while blocks H4.4 and H4.5 were significant in one or two locations. The block H5.20 on chromosome 2B was found to be significant only in the state of Jalisco. Of the seven haplotype blocks, five are made up of two SNPs (H2.5, H4.18, H4.4, H4.5, and H5.20) and remaining two (H4.1 and H4.2) with five and four SNPs, respectively. The allelic effect of these haplotypes on percentage of severity of YR ranged between 7.9 and 19.9% among locations and 8.1 and 24.0% across locations. Of the 14 significant SNPs identified in   (Figure 7) were detected between the favorable and unfavorable alleles. The allelic combination that showed the lowest percentage of rust was GGGGC. Haplotype H4.2 detected differences in Celaya of 18.17%, in Tlaxcala of 32.83%, in Texcoco of 14.29%, and in the average of the five locations of 15.67%. The allelic combination that showed lower percentage of rust was AAGA.
In silico analysis of the 14-associated SNPs revealed hits with four candidate genes, which are related to pathogenic processes or known to regulate induction of genes related to pathogenesis (Supplementary Table 6).

DISCUSSIOn
The prevalence and severity of YR in the selected environments for this study presented variations: in environments with humid and fresh climate, there was higher severity, while in warmer and drier environments the severity was lower; on the other hand, rust races present naturally were not always the same. However, in all cases the behavior of wheat lines against rust infection maintained the same pattern i.e. lines with higher resistance or susceptibility were resistant or susceptible in almost all environments. Frequencies of lines in the histograms showed a skewed distribution to the left where the resistant lines are represented, which could be attributed to the selective pressure exerted by the breeders through directed crosses.
Population structure can play as a confounding factor in GWAS analysis and it should be dealt with to avoid false associations. STRUCTURE and PCA are two widely used tools to infer cryptic population structure from genome-wide data such as high density SNPs (Abraham and Inouye, 2014). Both analysis showed three subpopulations/subgroups (SP 1-SP 3) (Figures 2 and 3). SP1 grouped 131 lines that averaged the highest severity value of YR (21.6%) and a range of 4 to 54%. The lines of SP1 shared parents Baj#1, Reedling#1, Villa Juarez F2009, and Tacupeto F2001 in their pedigree. SP2 formed a group of 110 lines with an average severity of 14.8% and a range of 5.1 to 47.5%. Seri.1b//Kauz/  Hevo/3/Amad*2/4/Kiritati was the common parent shared by 45.9% of the group. SP3 was the largest group with 178 lines. This group recorded the lowest average severity of YR (12.6%), with a range of 4 to 32%. This group had parent Fret2*2/4/Sni/Trap#1/3/ Kauz*2/Trap//Kauz/5/Kachu common in pedigrees of most of the lines. LD is reported to decay at short distance and hence faster in outcrossing crop species, such as maize (Dinesh et al., 2016), and at large distance (up to 40 cM) and hence slower in self-pollinated crop species, such as wheat (Yu et al., 2014). Extent of LD decay depends upon genetic distance and it determines the number of markers required for association mapping (Mather et al., 2007). LD decay varies widely among wheat populations, ranging from 5 or 10 cM (Würschum et al., 2013;Edae et al., 2014;Zegeye et al., 2014;Sehgal et al., 2017) to 20 or >20 cM (Crossa et al., 2007;Somers et al., 2007;Benson et al., 2012;Chen et al., 2012). In the present study, the extent of LD decay (r 2 = 0.1) was 2.5 cM for whole genome (Figure 4), which suggests higher genetic diversity of the investigated pre-breeding panel as compared to previous studies This could be attributed to the way the panel has been generated. A three way crossing scheme (exotic/elite1//elite2) among exotics and elites was employed to generate PBLs in such a way that each PBL acquired approximately 25% of the exotic genome and 75% of the elite genome at an early stage (Singh et al., 2018). Therefore, exotic alleles were incorporated into elite backgrounds even before their trait values were identified. This strategy enabled investigation of greater number of genetic variants at a time and allowed recombination between exotic and elite genomes. In addition, many previous studies reported the slowest LD decay in D genome as compared to A and B genomes (Chao et al., 2007). In this study, a faster LD decay in the D genome was observed; comparable to A genome and faster than B genome (Figure 4), which could be attributed to the use of synthetics for developing PBLs, driving more recombination in the D genome. Rosewarne et al. (2013) reported metaQTL genomic regions for YR resistance by using information of 140 QTL from over 60 publications. We compared the locations of the significant QTL identified in the present study with metaQTL in Rosewarne et al. (2013) and with latest studies (Naruoka et al., 2015;Bulli et al., 2016;Long et al., 2019). We also used interactive map containing information of all stripe rust genes in MASwheat (https://maswheat.ucdavis.edu/protocols/ YellowRust/YellowRustMap.html). Four QTL were identified on chromosome 2A on short arm in the present study. Of these, Yr2A.1PBL and Yr2A.2PBL overlapped with the metaQTL QRYr2A.1 on 2AS (Rosewarne et al., 2013) where Yr56 and Yr17 are located. However, when we compared physical positions, Yr17 was 4 to 15 Mb away from markers in Yr2A.1PBL and Yr2A.2PBL. The two QTL reported recently by Long et al. (2019) and Bulli et al. (2016) on 2AS were also part of metaQTL QRYr2A.1. Of the remaining two QTL, Yr2A.4PBL is more towards distal end of 2AS where Long et al. (2019) reported a novel locus QDL.sicau-2A based on disease severity scores. Hence, it is highly likely that out of the four, Yr2A.3PBL is a novel locus. BLAST analysis of significant SNPs on 2A revealed hits with three candidate genes (Supplementary Table 6). The marker 1092886 showed similarity with glunolactone oxidase gene that regulates the final stage of the synthesis of ascorbic acid, which is related to the pathogenic processes and the activation of reactive oxygen species (Smirnoff, 1996). The marker 7940374 identified a gene that produces an acyl transferase, such as quinate O-hydroxycinnamoyl transferase, which regulates the production of secondary metabolites derived from the tyrosine and phenylalanine routes (Hirschmann et al., 2014). This is one of the essential enzymes for the synthesis of hydroxycinnamic acid amides (HCAAs), which are known to regulate fundamental processes such as the responses of  plants to biotic and abiotic stress (Campos et al., 2014). It has been shown that the increase in HCAAs is accompanied by an increase in salicylic acid levels and induction of genes related to pathogenesis (Macoy et al., 2015). SNP 1028859 is related to functions of proteins with F-box and DUF domains, silencing of which in Arabidopsis has shown to confer drought tolerance mediated by ABA (Kim et al., 2012). The role of F-box proteins in pathogenic defense is still unknown. On 2B, Yr2B.1PBL (identified by SNP 5411524) on short arm reduced YR severity by 43%. There are three metaQTL on 2BS, of which metaQTL QRYr2B.2 particularly is a gene rich region containing a number of seedling and APR genes, for example, Yr27 and Yr31. The Yr2B.1PBL identified here is part of metaQTL QRYr2B.2; while the two QTL reported by Long et al. (2019) were part of metaQTL QRYr2B.3. Based on physical position, Yr27 and Yr31 are 53 to 111Mb away from SNP 5411524. In silico analysis of the SNP 5411524 showed candidate hits with TraesCS2B02G045300, a NB-ARC domain. The NB-ARC domain is a functional ATPase domain, and its nucleotide-binding state is proposed to regulate activity of the R protein. It is proposed that binding and hydrolysis of ATP by this domain induces conformational changes in the overall protein, leading to formation of the apoptosome, which leads to cell death. Many genes have been reported to contribute to wheat resistance against stripe rust fungus by regulating cell death. Studies of expression profiles of NB-ARC genes in wheat demonstrated their participation in response to leaf rust P. triticina (Chandra et al., 2017). Hence, it could be possible that this SNP is tightly linked to a seedling resistance gene. These results warrant an in-depth analysis (allelism test) to determine whether it is a novel gene or an allele of one of previously mapped genes on 2BS.
There are three metaQTL regions associated with YR resistance on chromosome 2D. The one reported on 2DS was identified in a single study (Lu et al., 2009). Naruoka et al. (2015) recently reported a new stable APR locus on 2DS (stable in nine environments) in a winter wheat association panel.
Here, we identified two QTL, Yr2D.1PBL and Yr2D.2PBL, both   (Lorenzo et al., 2003;McGrath et al., 2005;Pré et al., 2008;Wasternack, 2015). Use of multi-allelic haplotypes has significantly improved the power and robustness of GWAS studies in major crops including soybean , barley (Lorenz et al., 2010), and maize (Lu et al., 2012). A recent study in durum wheat also showed that the haplotype-based analysis resulted in an increase of the phenotypic variance explained (50.4% on average) and the allelic effect (33.7% on average) when compared to single marker analysis (N'Diaye et al., 2017). When we compared the results obtained by haplotype-based GWAS with SNP-based GWAS, we found that the percentage variation (R 2 ) explained by haplotypes ranged from 7.9 to 19.9% in different locations, while for the associated 14 significant SNPs it ranged from 6.0 to 14%, resulting in an average 6.5% higher R 2 by haplotypes compared to single SNPs. This clearly demonstrates the advantage provided by the former approach in identifying genomic regions that control YR resistance.

COnCLUSIOnS
This research identified 14 SNP markers and 7 haplotypes associated with YR resistance in a diverse set of pre-breeding lines. These associations were delimited to seven QTLs based on average LD decay of 2.5 cM. Three of these QTLs are likely to be novel and represent attractive targets for markerassisted selection. In silico analysis of the SNPs revealed four candidate genes known to regulate induction of genes related to pathogenesis. In depth upstream analysis of these genes can help dissect the underlying mechanism of YR resistance in PBLs.

DATA AVAILABILITY STATEMEnT
The datasets generated for this study can be found in the CIMMYT Research Data & Software Repository Network https://data. cimmyt.org/dataset.xhtml?persistentId=hdl:11529/10548313.