Population Diversity, Dynamics, and Differentiation of Wheat Stripe Rust Pathogen Puccinia striiformis f. sp. tritici From 2010 to 2017 and Comparison With 1968 to 2009 in the United States

Stripe rust, caused by Puccinia striiformis f. sp. tritici (Pst), is a serious disease on wheat in the United States, especially after 2000. In the present study, 2,247 Pst isolates collected over all stripe rust epidemiological regions in the United States from 2010 to 2017 were genotyped at 14 simple sequence repeat (SSR) loci to investigate the population diversity, dynamics, and differentiation. A total of 1,454 multilocus genotypes (MLGs) were detected. In general, the populations in the west (regions 1–6) had more MLGs and higher diversities than the populations in the east (regions 7–12). The populations of 2010 and 2011 were more different from the other years. Genetic variation was higher among years than among regions, indicating the fast changes of the population. The divergence (Gst) was bigger between the west population and east population than among regions within either the west or east population. Gene flow was stronger among the regional populations in the east than in the west. Clustering analyses revealed 3 major molecular groups (MGs) and 10 sub-MGs by combining the genotypic data of 2010–2017 isolates with those of 1968–2009. MG1 contained both 1968–2009 isolates (23.1%) and 2010–2017 isolates (76.9%). MG2 had 99.4% of isolates from 1968–2009. MG3, which was the most recent and distinct group, had 99.1% of isolates from 2010–2017. Of the 10 sub-MGs, 5 (MG1-3, MG1-5, MG3-2, MG3-3, and MG3-4) were detected only from 2011 to 2017. The SSR genotypes had a moderate, but significant correlation (r = 0.325; p < 0.0001) with the virulence phenotype data. The standard index values of association (rbarD = 0.11) based on either regional or yearly populations suggest clonal reproduction. This study indicated high diversity, fast dynamics, and various levels of differentiation of the Pst population over the years and among epidemiological regions, and the results should be useful for managing wheat stripe rust.


INTRODUCTION
The fungal pathogen Puccinia striiformis Westend. f. sp. tritici Erikss. (Pst) causing stripe (yellow) rust is one of the most destructive pathogens threatening global wheat production (Chen, 2005(Chen, , 2020Wellings, 2011). The United States is one of the most serious stripe rust epidemic countries. In the United States, wheat stripe rust was first recognized in 1915, but its epidemic mainly occurred in the west, especially the Pacific Northwest and California before 2000 (Carleton, 1915;Line and Qayoum, 1992;Line, 2002). Since 2000, however, wheat stripe rust has become more frequent and caused severe epidemics across the continental United States including the states east of the Rocky Mountains (Chen, 2005(Chen, , 2007Chen et al., 2010).
To understand the disease epidemiology and pathogen variation, a series of studies have been done on the Pst pathogen from virulence testing, population structure, reproductive system, mutation, somatic recombination to genome sequencing, and functional genomics (Chen et al., , 2010Chen, 2012, 2014;Cheng and Chen, 2014;Cheng et al., 2016;Xia et al., 2016Xia et al., , 2018aXia et al., ,b, 2020Lei et al., 2017;Liu et al., 2017Liu et al., , 2021Li et al., 2019bLi et al., ,c, 2020. Previous studies on virulence and molecular analyses support asexual reproduction of Pst in the United States Pacific Northwest (Cheng and Chen, 2014;Liu et al., 2021).  identified higher diversity and higher genetic differentiation in the west population than the east population of the United States using simple sequence repeat (SSR) markers. Similarly, Xia et al. (2016) also found that the west population was more diverse than the east population using secreted-protein single-nucleotide polymorphism (SP-SNP) markers. However, these studies, which used a relatively small number of isolates collected in 2010 and/or 2011, are considered as pilot studies to demonstrate the usefulness of different types of molecular markers for a snapshot of the population structure. Because of this reason, these studies could not provide long-term dynamics of the Pst population. Using both SSR and SP-SNP markers, more than 1,000 isolates representing collections from 1968 to 2009 were characterized (Liu et al., 2021). This study of Pst isolates from over 40 years identified several events of introduction and a trend of the Pst populations changing from relatively simple to complex. However, the isolates collected since 2010 have been tested only for virulence and identified to races 1 . These collections have not been molecularly characterized. It is not clear whether the US Pst population has undergone any major genetic changes since 2010.
In 2010, wheat stripe rust caused a widespread epidemic across the United States in the recorded history, resulting in large-scale application of foliar fungicides, especially in the Great Plains and the western states . In 2011 and 2012, stripe rust continued causing severe damage in the Pacific Northwest (Chen, 2014;. Furthermore, the stripe rust epidemics in both 2015 and 2016 were severe, causing country-wide yield losses of 8.2 and 5.6%, respectively (Chen and Kang, 2017;Chen, 2020). In addition to the favorable weather conditions, the high number of new races and widespread predominant races identified in these years were attributed at least partially to the epidemics Chen, 2012, 2016;Wan et al., , 2017. However, it was not clear whether the pathogen population had significant changes in genetic structures. Therefore, this study is conducted to characterize the population diversity and dynamics of Pst from 2010-2017 and compare the populations in these years with those of 1969-2009 characterized by Liu et al. (2021) using the selected SSR markers to provide the long-term dynamics of the pathogen in the United States.

Sample Collection and Urediniospore Multiplication
During the growing seasons from 2010 to 2017, stripe rust samples were collected from commercial wheat fields, breeding nurseries, variety trials, and disease screening nurseries across all 12 regions of stripe rust epidemiology throughout the United States as previously described (Chen et al., 2010) (Table 1 and Supplementary Figure 1). A total of 2,247 isolates were obtained, and the urediniospores were multiplied for DNA extraction. In brief, each leaf sample was washed with water, placed on a wet filter paper in a Petri dish, and incubated at 4 • C overnight. The fresh urediniospores were transferred with a fine brush onto two-leaf-stage seedlings of wheat cultivar "Nugaines" that is susceptible at the seedling stage to all Pst races identified thus far in the United States. The inoculated plants were incubated in a dew chamber at 10 • C for 24 h without light and grown in a growth chamber at a diurnal cycle gradually changing from 4 • C at 2:00 AM to 20 • C at 2:00 PM with 8-h dark/16h light corresponding to the low/high temperature periods (Chen et al., , 2010. To prevent cross-contamination, plants inoculated with different isolates were separated with plastic booths. Urediniospores were vacuum collected with a custommade glass collector and kept in a desiccator for later use.

SSR Markers and PCR Amplification
Fourteen pairs of SSR primers were selected based on their codominant polymorphisms among Pst isolates in previous studies (Cheng et al., 2012(Cheng et al., , 2016Cheng and Chen, 2014;Sharma-Poudyal et al., 2020;Liu et al., 2021). They were CPS02, CPS04, CPS08, and CPS13 PstP001, PstP002, PstP003, PstP005, PstP006, and PstP029 (Cheng et al., 2012); and RJ18, RJ20, RJ21, and RJ8N (Enjalbert et al., 2002;Bahri et al., 2009a). These markers, except PstP001 and PstP005, were previously used by Liu et al. (2021) to characterize the Pst collections from 1968 to 2009. They used two additional markers (PstP004 and PstP033). For unresolved issues, PstP004 and PstP033 did not produce fragments in many isolates in the preliminary tests of the present study, and therefore, these two markers were replaced with PstP001 and PstP005. The sequences, annealing temperatures, and amplified fragments of the primers are provided in Supplementary Table 1. To use fluorescence for detecting PCR products, an M13 tag (5 -CACGACGTTGTAAAACGAC) was added to the 5 end of each forward primer (Schuelke, 2000). For each SSR locus, the forward primer was labeled with black, green, or blue florescent dye, with the red fluorescent dye used for the size marker. SSR loci with the same or close allele sizes were labeled with different florescent dyes to achieve the maximum possible number of the loci per run in the sequencer. Amplification of PCR was performed in a Bio-Rad iCycler (Bio-Rad, Hercules, CA, United States) following the protocol described in previous studies (Cheng et al., 2012;Cheng and Chen, 2014). Each reaction (12 µL) contained 1.2 µL of 10x reaction buffer with 15 mM MgCl 2 , 0.96 µL 2.5 mM dNTP, 0.12 µL of 5 µM forward primer, 0.6 µL 5 µM reverse primer, 0.24 µL of 5 µM M13 universal primer, 0.2 µL 5 U/µL Taq polymerase, 4 µL DNA (total 20 ng), and 4.68 µL sterile ddH 2 O. The amplification cycles and conditions were 94 • C for 5 min for initial denaturation; 42 cycles of 94 • C for 30 s, 45-54 • C for 30 s depending on primers, and 72 • C for 45 s; and 7 min of final extension at 72 • C. The sizes of the PCR products were estimated using capillary electrophoresis on an ABI3730 Genotyper (Applied Biosystems, Foster City, CA, United States). The internal molecular weight standard for ABI3730 was Genescan 445-LIZ (Applied Biosystems). Allele sizes in base pairs were scored and analyzed using software GeneMarker V2.2 (Softgenetics, State College, PA, United States).

Analyses of Multilocus Genotypes and Genotypic Diversity
If isolates in the present study have identical alleles across all 14 SSR loci, they were assigned to the same multilocus genotype (MLG). The MLGs were named in continuation with the previous MLGs identified from the isolates of 1968 to 2009. As 12 SSR markers were commonly used in the present study and the previous study (Liu et al., 2021), we used the 12 markers to find isolates that have identical alleles with previously designated MLGs and used the previous MLGs to designate the isolates in the present study. For isolates that were identical at the 12 common marker loci, but different at the two additional marker loci used in the present study, they could be numbered the same MLG plus extension numbers to indicate the differences. Actually, we did not encounter any of such cases in the present study.
As Pst is a dikaryotic fungus at the uredinial stage, each isolate was scored for two alleles to determine homozygous or heterozygous for each SSR locus using GeneMarkerV2.2 (Softgenetics). The data file generated by GeneMarker was converted to a "csv" file to be analyzed with the R package. The sufficiency of markers to describe the population structure was assessed through the detection of MLGs plotted against the number of loci, and a graph was generated using the "poppr" package in the R program (Kamvar et al., 2014). The genetic relationships among individual MLGs were analyzed based on Bruvo's distance and presented by a minimum spanning network in "poppr." Genotypic diversity was determined by estimating both genotypic richness (the number of observed MLGs) and evenness (the distribution of genotype abundance). Stoddart and Taylor's index and Shannon's diversity index for MLG diversity were calculated as G = 1/ P i 2 and H = P i lnP i , respectively, where P i is the observed frequency of the ith MLG in a population (Shannon, 1948(Shannon, , 2001Stoddart and Taylor, 1988). Moreover, evenness (E 5 ) (values of E 5 ranging from 0 to 1, with low values indicating that a certain genotype dominates in the collection of isolates) (Grünwald et al., 2003) and Nei's unbiased gene diversity that is also referred as average heterozygosity (H exp , corrected for sample size) (Nei, 1978) were calculated for each region (R1-R12), the west (R1-R6), and east (R7-R12) regions, and each year (2010-2017).

Spatial and Temporal Population Variation, Differentiation, and Phylogenetic Relationships
Analysis of molecular variance (AMOVA), which allows the hierarchical partitioning of genetic variation within and between regions and years (Excoffier et al., 1992), was conducted by defining the Pst isolates based on stripe rust epidemiological regions in the United States from 2010 to 2017 using function "poppr.amova" in the poppr R program version 4.0.3 (Kamvar et al., 2014;Meirmans and Liu, 2018). Population differentiation measured by the standardized coefficient of gene differentiation (Gst) (Hedrick, 2005) among the 12 regions and 8 years was analyzed using the software GenAlEx 6.503 (Peakall and Smouse, 2006), and the Gst values of the 12 regional populations were also used for generating a migration network using the "diversity" package in the R program to visualize the gene flow patterns among the epidemiological regions. To further investigate the population differentiation at the spatial and temporal levels, phylogenetic trees were generated based on Nei's genetic distance for the 12 epidemiological regions and 8 years in bootstrap analysis with 1,000 replicates using the function "aboot" in the "poppr" program.

Identification of Molecular Groups
To identify putative clusters of genetically related isolates in the present study (2,247 isolates) in comparison with the 1,083 isolates from 1968 to 2009 (Liu et al., 2021), hierarchical clustering analysis based on the 12 SSR markers (CPS02, CPS04, CPS08, CPS13, RJ18, RJ20, RJ21, RJ8N, PstP002, PstP003, PstP006, and PstP029), which were commonly used by Liu et al. (2021) and in the present study, was conducted using the dissimilarity values and the "ward.D2" method with the "hclust" function in the R stats 4.0.3 program (Murtagh and Legendre, 2014). The parameters for hierarchical cluster analysis were the same as previously described (Sharma-Poudyal et al., 2020). The discriminant analysis of principal components (DAPC), a no model-based method developed and implemented in the "adegenet" R package, was performed to generate a scatterplot about the membership probability of each isolate in the corresponding molecular groups (MGs) (Jombart, 2008;Jombart et al., 2010). To assess how the temporal populations differ from each other, the DAPC was also performed based on the yearly populations. The DAPC was carried out for all loci, and an α score optimization was used to determine the number of principal components to retain.

Reproduction Mode
The mode of reproduction was assessed by evaluating observed linkage among SSR loci against expected distributions from permutation using the index of association (IA) (Brown et al., 1980a,b;Goss et al., 2014). Unfortunately, IA has been reported to increase with the number of loci and thus is not suitable for comparisons across studies (Agapow and Burt, 2001). To remedy this, adjusted/standard IA (rbarD) has been introduced to force the index to lie between 0 (linkage equilibrium) and 1 (full disequilibrium) (Agapow and Burt, 2001). Therefore, in the present study, rbarD was calculated using the "poppr" program, and the null hypothesis rbarD = 0 was tested with 1,000 permutations (Kamvar et al., 2014). If an observed rbarD value was outside the distribution expected from unlinked loci at p < 0.001, the population was considered clonal (Goss et al., 2014;Kamvar et al., 2014).

Determination of Correlation Between Molecular Genotypes and Virulence Phenotypes
The 2,221 of 2,247 isolates were previously studied for virulence and identified to races 2 . Mantel tests were conducted for the correlation coefficient between the molecular data and virulence data. The genetic distance matrix based on SSR markers among all isolates was obtained using function "genetic_distance" in the "gstudio" package, and the mantel function was obtained using the "vegan" package in the R Program 4.0.3 (Mantel, 1967;Dyer et al., 2004;Legendre and Legendre, 2012).

The Number, Frequency, and Distribution of MLGs
The marker sufficiency test found that 97.8% (1,422 of 1,454) of the MLGs could be identified with 13 markers (CPS02, CPS04, CPS08, CPS13, PstP001, PstP002, PstP003, PstP005, PstP006, RJ18, RJ20, RJ21, and RJ8N), and the inclusion of one more marker (PstP029) did not significantly change the number of MLGs (Supplementary Figure 2). The result indicated that the 14 markers were sufficient for studying the genetic structure of Pst population.
Among the 12 regions, R1 had the highest number of MLGs (730) and the highest number of private MLGs (639), whereas R4 had the lowest number (17) of MLGs and the lowest number (7) of private MLGs ( Table 2). These results were related to the highest number of isolates (973) in R1 and the lowest number of isolates (22) in R4. However, when the number of isolates was considered, these two regions had relatively low MLG/isolate (g/N) ratios (0.75 and 0.77, respectively), whereas R8, R10, and R12 had the highest g/N ratios (0.92, 0.94, and 0.95, respectively). Across the west regions (R1-R6), 1,118 MLGs were identified from 1,667 isolates, whereas 420 MLGs were identified from 580 isolates across the east region (R7-R12). The east region had a relatively high g/N ratio (0.72) compared to the west region (0.67). Overall, the whole country had a g/N ratio of 0.65; fewer than two samples needed to identify one MLG.
At the temporal level, the number of MLGs varied from year to year, ranging from 99 in 2010 to 290 in 2013 ( Table 3)

Genotypic and Gene Diversities
The genotypic diversity G (Stoddart and Taylor's index of MLGs diversity) and H (Shannon-Wiener index of MLGs diversity) values and their respective 95% confidence intervals by bootstrap statistics for the regional populations are given in Table 2, and the index values and confidence intervals for the yearly populations are given in Table 3. Among the 12 epidemiological regions, R1 had the highest values of G (209.13, confidence interval = 166.33-251.93) and H (6.29, confidence interval = 6.22-6.36), and the confidence intervals did not overlap with those of the other regions ( Table 2). In contrast, R4 had the lowest G (12.74, confidence interval = 11.73-13.74) and H (2.71, confidence interval = 2.40-3.02) values. Generally, the west regions had relatively high genotypic diversity than the east regions. The G-value in the west (across R1-R6) was 162.2 with a confidence interval of 125.52-198.87, which was higher than the G-value of 97.6 with a confidence interval of 75.88-119.25 in the east (across R7-R12). A similar result was obtained for the H-values, 6.13 (confidence interval = 6.44-6.58) for the west and 5.24 (confidence interval = 5.54-5.76) for the east.
The evenness (E 5 ) was determined for the distributions of MLGs in the regional population ( Table 2). The E 5 values ranged from 0.39 in R1 to 0.83 in R4 in the west with an overall value of 0.24 across all six west regions (R1-R6), whereas the E 5 values ranged from 0.40 in R7 to 0.94 in R10 with an overall value of 0.34 across all six east regions (R7-R12). The expected heterozygosity (H exp ), which represents gene diversity, was estimated for each regional population. The overall heterozygosity across the east region (H exp = 0.44) was slightly higher than that of the west region (H exp = 0.41). These results indicated that the Pst populations in the west were less even, but more diverse than those of the east.
Among the yearly populations, the 2010 and 2011 populations had the lowest G-and H-values and confidence intervals than those of the other years ( Table 3). The 2010 and 2011 populations also had the lowest evenness. Moreover, the 2011 population also had the lowest expected heterozygosity (gene diversity) (H exp = 0.33). These results indicated that the 2010 and 2011 populations were less diverse than those of 2012 to 2017.
The AMOVA results revealed significant variations by year and by region ( Table 4). Variations among regions, among years  (Stoddart and Taylor, 1988) calculated using the "poppr" package in the R program 4.3. b Shannon-Wiener Index of MLG diversity and confidence interval (Shannon, 1948) calculated using the "poppr" package in the R program 4.3. c Evenness, E 5 . (Grünwald et al., 2003). d Nei's unbiased gene diversity (Nei, 1978) over 14 SSR loci. e Standardized index of association, rbarD. p < 0.001 indicates significant disequilibrium due to linkage among loci, indicating populations are clonal (Agapow and Burt, 2001).
within regions, and within years within regions, as well as among years, among regions within years, and within regions within years were all significant (p < 0.001). However, the highest variation was within years within regions or within regions within years, indicating that the Pst population in a single region of any year was highly diverse.

Spatial Population Differentiation and Genetic Relationships
The gene differentiation Gst values of the 12 regions are shown in Supplementary Figure 6. The Gst values (0.005-0.129) were generally high between the west and east populations, but low (0.001-0.051) between the west populations (R1-R6) and even lower (−0.003 to 0.043) between east populations (R7-R12). A migration network was generated based on the Gst values, showing higher migration rates (lower differentiation) between the east populations than between the west populations (Figure 2). The gene flow (N m ) among the west populations was 17.25, but much higher among the east populations (N m = 28.41), indicating that the west populations were more diverse and differentiated than the east populations.
To determine the genetic relationships among the populations of different regions, the phylogenetic analysis was conducted using the MLGs based on Nei's genetic distance (Figure 3). Among the 12 regional populations, the west populations (R1-R6) were clearly separated from the west populations (R7-R12). Among the west populations, R1, R3, and R5 were more closely related, whereas R2, R4, and R6 were more closely related. Among the east populations, R7, R11, and R12 were more closely related, whereas R8, R9, and R10 were more closely related.

Temporal Population Differentiation and Genetic Relationships
Similarly, the gene differentiation (Gst) values of the eight yearly population were calculated and are presented in Table 5.   (Nei, 1978) over 14 SSR loci. e Standardized index of association, rbarD. p < 0.001 indicates significant disequilibrium due to linkage among loci, indicating populations are clonal (Agapow and Burt, 2001).     To identify the marker patterns for the 10 sub-MGs, one isolate was selected based on MLG and first year of detection to represent each subgroup ( Table 6). MG1-1, MG1-3, MG1-4, and MG1-5 that contained isolates mainly from 2010 to 2017 were heterozygous (H) at a half or more of marker loci. MG3-1, MG3-2, MG3-3, and MG3-4 that also mainly contained the isolates from 2010 to 2017 were homozygous for the major alleles (A) at a half or more loci. MG1-2 differed from other sub-MGs by its homozygous with the minor allele (B) at the RJ21 locus and possible at the RJ8N locus. MG2 differed from other MGs mainly by its homozygosity at the CPS13 locus (Table 6 and  Supplementary Table 5).

Mode of Reproduction
The standard IA (rbarD) values ranged from 0.06 in R12 to 0.19 in R3 ( Table 2) and 0.07 in 2017 to 0.37 in 2010 (Table 3), with  the mean of 0.1 for both the west and east populations and a value of 0.107 for the overall population. All populations had p < 0.001 and located outside the bell curve of the simulated distribution (expected from unlinked loci) of a randomly mating population for the Pst population (Supplementary Figure 9). Thus, the data rejected the hypothesis of no linkage among markers but supported the clonal mode of reproduction in the Pst population.

Correlation Between the Molecular and Virulence Data
Of the 2,247 isolates genotyped with the SSR markers, 2,221 were tested for virulence phenotypes (Supplementary Table 6). Races in the 18 predominant MLGs associated with multiple races and 50 MLGs each associated with only one race, together with years detected, are listed in Table 9. The major races associated with the 18 predominant MLGs and their distributions in the epidemiological regions and across the years of 2010-2017 are also shown in Supplementary Figures 3, 5. Correlation analysis was conducted using the phenotypic data and SSR marker data. The overall correlation coefficient (r), when all of the 2,221 isolates were considered without any grouping, was 0.325 (p < 0.001) between the marker genotypes and virulence phenotypes. There were 50 MLGs with more than two isolates each identified as a single race, of which 17 were race PSTv-37 and 11 were race PSTv-52, indicating that many isolates from different regions in different years were identified as the same MLGs or races (Table 9). Interestingly, for all 18 predominant MLGs, more than 60% of isolates were identified as two or three predominant races. Predominant MLGs (MLG2026, MLG1989, and MLG2018) clustered in MG3-1 had 90, 75, and 77% of the isolates, respectively, identified as PSTv-11 and PSTv-14, which are closely related races, and all three MLGs shared common alleles at 12 of the 14 SSR loci, except PstP002 that was different between MLG2026 and MLG1989 and RJ8N that was different between MLG2026 and MLG2018. Two predominant MLGs (MLG500 and MLG27) in MG1 had 60 and 82% of the isolates identified as PSTv-37, PSTv-35, and PSTv-36, also closely related races, and MLG500 shared alleles at all 14 marker loci except for RJ21 with MLG27. Both MLG671 and MLG650 were in MG1-3 and shared the same alleles at all 14 marker loci except the CPS04 locus. The isolates of the two MLGs were identified as the same or similar races with 82% of isolates of MLG671 as PSTv-37, PSTv-31, and PSTv-34 and 82% of isolates of MLG650 as PSTv-37, PSTv-31, and PSTv-30. Four MLGs (MLG500, MLG27, MLG650, and MLG671) that had similar races shared all marker loci except CPS02, CPS04, and RJ21, whereas MLG1355 clustered together with MLG500 and MLG27 in MG1 shared only four loci (CPS08, RJ20, Pstp001, and Pstp006) with 60% of isolates identified as old complex races PSTv-18 and PSTv-19. Eight MLGs (MLG1055, MLG1099, MLG1623, MLG1082, MLG1550, MLG989, MLG1106, and MLG909) in MG1 (MG1-4 and MG1-3) had close genetic relationships as they shared 10 of the 14 marker loci (except for CPS02, CPS08, Pstp003, and Pstp006 that were also shared by most of these MLGs with different alleles at only one to four loci). Similarly, these MLGs had 67-100% of their isolates identified as closely related races PSTv-37 and PSTv-52. Moreover, MLG1743 and MLG2023 in MG3 (MG3-2 and MG3-1, respectively), which shared the same alleles at all 14 loci except the CPS08 locus, had 92 and 83% of isolates identified as closely related races PSTv-4 and PSTv-53 (Table 9).

DISCUSSION
Fourteen SSR markers used in the present study produced adequate number of MLGs in the Pst isolates collected throughout the United States from 2010 to 2017. For example, with 13 markers, 97.8% of the MLGs identified with the 14 markers could be identified. Identified MLGs were different in both numbers and genotypes between the west and east regions. More MLGs identified in the west regions than in the east regions were due to the differences in climatic conditions for the survival of the pathogen, the frequent occurrence and development of stripe rust epidemics, number of cultivars, and resistance genes utilized in different regions. Only 85 MLGs (6%) were commonly detected in the west and east regions, indicating few exchanges of inocula between the two big regions. Also, many shared MLGs that mainly distributed in the east were likely from the west, because Pst can be more likely spread from west to east than from east to west because of the prevailing wind direction from the west to the east (Chen, 2005). Eight MLGs shared mainly between R6 in the west and R7 in the east demonstrate that inoculum exchanges have occurred between these two regions. Of the 13 MLGs that were detected only in the east regions, 9 were shared by all six east regions (R6-R12), and 4 MLGs were detected only in R7. In contrast, of the 93 MLGs detected only in the west regions, 33 were shared by all six regions (R1-R6), and the other 60 MLGs were detected in only one region, especially R1. This comparison indicates that inoculum exchanges have  (74) occurred more frequently among the east regions than among the west regions. This may be related to the lack of natural barriers, relatively uniform crops, and wide growth of the same or similar cultivars in the east regions (Chen et al., 2010;Wan and Chen, 2012;.
In the present study, 6 of the 18 MLGs with 10 or more isolates appeared only in 2010 and/or 2011 and then completely disappeared, indicating that these large MLGs might be totally eradicated or reduced to an undetectable level afterward. The disappearance of these MLGs could be attributed to the changes of wheat cultivars, large-scale application of foliar fungicides, and cold winter and/or dry summer conditions that are unfavorable to the pathogen in some of the years (Chen, 2014(Chen, , 2020. The diversity of the overall population and those of individual epidemiological regions were high, and obvious differences in diversity were observed among the 12 regions (R1-R12) and between the west and east regions. The Stoddart and Taylor's index values of MLG diversity (G), together with the 95% confidence interval, demonstrated that the west populations had a significant higher diversity than the east populations. Especially, R1 (eastern Washington, northeastern Oregon, and northern Idaho), which is one of the most important wheat production regions and the most frequent and severe stripe rust epidemic regions in the United States, had the highest G-value. This region also has the highest number of diverse Pst races every year (Line and Qayoum, 1992;Chen et al., 2002, 2010Line, 2002Chen, 2012, 2014;. The evenness (E 5 ) was low in this region and also relatively low in the west regions (R1-R6) compared with the east regions (R7-R12) as the MLGs were not evenly distributed, and some MLGs were highly predominate in the west, especially in R1. The high diversity and differentiation in the west regions can be attributed to the high diversity of wheat cultivars with different resistance genes; cropping systems; climatic conditions more favorable to the pathogen survival and infection; and geographic barriers as previously pointed out (Chen et al., 2010;Chen, 2012, 2014;Liu et al., 2020;Mu et al., 2020). In contrast, the east regions had overall relatively low diversity as measured by either Stoddart and Taylor's MLGs diversity and Shannon-Wiener index of MLG diversity, and relatively high evenness and gene flow. The low diversity and differentiation of Pst in the east are consistent with the virulence data Wan et al., unpublished data). Compared to the west, the relative small number of samples may limit the detection of genotypes at low frequencies in the east regions. However, the vast majority of the east region is flat, and the same or similar wheat cultivars are grown in large acreages, which selects similar races or genotypes adapted to the cultivars and environment. The relatively uniform geographic topology and lack of mountainous barriers allow long-distance dissemination of Pst genotypes from relatively few overseason sources.
Although there were some differences among the epidemiological regions in heterozygosity that measures gene diversity, the values of the different regions were all below 0.5, and the overall heterozygosity was low (0.43). The low heterozygosity or gene diversity could be due to the clonal production of Pst as discussed below, as inbreeding increases the frequency of homozygotes at the expense of heterozygotes (Allendorf and Utter, 1979). The west population had relatively low heterozygosity (0.41) compared to the east population (0.43). This could be related to the fact that a group of races represented by PSTv-14 was predominant in 2010 and 2011 in the west but seldom detected in the east Chen, 2012, 2014;, and the group is more homozygous than the races predominant in the east based on genome sequences (Cuomo et al., 2012). This group of races is represented by MLG-2026 in MG3-1 (Supplementary Figure 3), which is homozygous at all 14 SSR loci ( Table 6). In contrast, MLG-27 in MG1-1, corresponding to the race group represented by PSTv-37, is heterozygous at 8 of the 14 SSR loci.
The analysis of the Pst populations over the 8 years from 2010 to 2017 allowed us to monitor dynamic changes, and the comparison with the previously identified MLGs from the 1968-2009 collections (Liu et al., 2021) provided a clear picture of dynamics for a much longer period. To track changes of MLGs over the years, all MLGs in the present study were named continuously including those identified from the 1968-2009 collections (Liu et al., 2021). Ideally, the 14 SSR markers used in the previous study (Liu et al., 2021) should have been used in the present study. Unfortunately, 2 (PstP004 and PstP033) of their 14 markers did not amplify any fragments for many isolates in the initial screening and were replaced by PstP001 and PstP005 in the present study, as we attempted to use markers all codominant and no missing loci. As we identified several sub-MGs appeared after 2010 in the present study, and deletion was found in Pst isolates for a genomic region carrying avirulence cluster , we hypothesize that the PstP004 and PstP033 loci could be deleted from some of the recent isolates. Further studies are needed to test the hypothesis for understanding the pathogen evolution. Nevertheless, we named the different MLGs based on marker types at the 12 commonly used markers to be consistent for the MLGs from 1968 to the present. A total of 2,060 MLGs were identified, including 614 MLGs identified from the 1968-2017 collections in the previous study (Liu et al., 2021)  were detected in both studies. MLG27 was first detected in 2003 as one isolate (Liu et al., 2021), but was not detected until 2010 when it was detected from 12 isolates in that year. Similarly, MLG500 was first detected in 2004 also as one isolate (Liu et al., 2021) but suddenly detected from 81 isolates identified in 2010. Most isolates (99.4%) of the 1,454 MLGs identified in the present study were not detected in the previous study. Furthermore, 1,436 (98.8%) of the MLGs were detected from <10 isolates in the present study. Both percentages of new MLGs and private MLGs over the years were very high, all greater than 80% (Table 3) in individual years and across the 8 years. The AMOVA revealed that the genetic variation was higher among years than that among regions. All these results indicate that the Pst population changes fast.
The clonal reproduction of Pst populations in the United States was confirmed based on the test of linkage disequilibrium in the present study. The IA was originally developed as a measure of multilocus linkage disequilibrium (Brown et al., 1980a,b) and was found to be able to detect signatures of sexual reproduction and population structure (Brown et al., 1980a,b;Smith et al., 1993). Based on the values of rbarD, which is a standardization of IA (Agapow and Burt, 2001), the present study did not support sex reproduction for the Pst population in the United States as the observed rbarD values were far from the theoretical distribution region of the rbarD values (p < 0.001). The result of the present study is consistent with the previous reports of lack of sexual reproduction for the Pst population in the United States (Cheng and Chen, 2014;Liu et al., 2021) and in the world (Bahri et al., 2009b;Ali et al., 2010). Thus, sexual recombination cannot be a major mechanism for the diversity observed in the Pst population. Because Pst is a dikaryon, clonal reproduction protects the polymorphism at each single locus within isolates, due to the fixed heterozygosity through the "Meselson effect" (Balloux et al., 2003). Much of the genetic variation observed in the present study was maintained in individual isolates.
Mutation should be the major mechanism for explaining the high diversity as most MLGs in the same MG or sub-MGs had different alleles at one or few marker loci, and the stepwise mutation pattern could be commonly found for MLGs in the same groups (Supplementary Table 3). For example, MLG496-1, which was first detected in 2009, differed from MLG130-1, which was detected in 2007, which differed only at the RJ18 locus (changing from 358/364 bp in MLG130-1 to 358/358 in MLG496-1). Using ethyl-methanesulfonate mutagenesis, different genotypes or races can be generated from a single Pst isolate (Li et al., 2019b(Li et al., , 2020. Somatic recombination is a plausible mechanism contributing some part of the genetic variation, especially for the formation of new sub-MGs detected in the present study. In our laboratory, somatic recombination has been demonstrated to generate new races or genotypes under controlled conditions (Lei et al., 2017). In Puccinia graminis f. sp. tritici, the wheat stem rust pathogen, races virulent to resistance gene Sr50 were found to be produced by somatic recombination that caused loss of AvrSr50 , and somatic hybridization caused emergence of the Ug99 race lineage (Li et al., 2019a). As Pst migration and incursion from country to country, or from continent to continent, are well known (Wellings and McIntosh, 1990;Steele et al., 2001;Hovmøller et al., 2002Hovmøller et al., , 2008Hovmøller et al., , 2016Sharma-Poudyal et al., 2013, 2020Ali et al., 2014), some of the MLGs, especially those in the sub-MGs that suddenly appeared in 2010 and 2011 found in the present study ( Figure 5) were likely introduced to the United States.
The correlation coefficient (0.325) between the molecular and virulence data in the present study was comparable with the value of 0.34 reported between the virulence phenotypes and molecular genotype data for the United States Pst populations of 2010 and 2011 using secreted protein gene-derived SNP markers (Xia et al., 2016). Based on the MLG relationships with phenotypes, 50 MLGs with more than two isolates from different years and regions were identified as a single race. For the 18 predominant MLGs (Supplementary Figures 3, 5), more than 60% of their isolates were identified as two or three predominant races. For example, MLG2026, MLG1989, and MLG2018 were identified mainly as PSTv-11 and PSTv-14, and they were clustered into MG3-1 and shared 12 homozygous loci out of the 14 loci. The high homozygosity of this race group was consistent with the study by Cheng et al. (2016). Eight MLGs (MLG1055, MLG1099, MLG1623, MLG1082, MLG1550, MLG989, MLG1106, and MLG909) with isolates mainly identified as two closely related races (PSTv-37 and PSTv-52) were clustered into MG1 (MG1-4 and MG1-3) and shared alleles at 10 of 14 loci. Therefore, races PSTv-37 and PSTv-52, which have been predominant throughout the United States in the last decade Liu et al., 2017Liu et al., , 2021, are genetically similar. The results show that, although isolates in the same MLG can be different races and vice versa, correspondence existed between molecular characterization and virulence phenotypes. Thus, it is possible to quickly identify isolates into race groups using molecular markers for providing information that can be immediately used for managing stripe rust based on resistance gene in growing wheat cultivars.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
XC, AW, and MW collected stripe rust samples. QB, AW, and MW participated in spore multiplication and DNA extraction, and SSR marker genotyping. QB was a major contributor in DNA extraction, marker genotyping, analyzed and interpreted the data, and in writing the manuscript. DS provided equipment for genotyping, guided data collection, and revised manuscript. XC conceived the study, coordinated stripe rust sample collection, designed the experiments, and wrote the manuscript. All authors reviewed and approved the final manuscript.