Genetic diversity and population structure of Epichloë fungal pathogens of plants in natural ecosystems

Understanding the population genetic processes driving the evolution of plant pathogens is of central interest to plant pathologists and evolutionary biologists alike. However, most studies focus on host-pathogen associations in agricultural systems of high genetic and environmental homogeneity and less is known about the genetic structure of pathogen populations infecting wild plants in natural ecosystems. We performed parallel population sampling of two pathogenic Epichloë species occurring sympatrically on different host grasses in natural and seminatural grasslands in Europe: E. typhina infecting Dactylis glomerata and E. clarkii infecting Holcus lanatus. We sequenced 422 haploid isolates and generated genome-wide SNP datasets to investigate genetic diversity and population structure. In both species geographically separated populations formed genetically distinct groups, however, population separation was less distinct in E. typhina compared to E. clarkii. The patterns of among population admixture also differed between species across the same geographic range: we found higher levels of population genetic differentiation and a stronger effect of isolation by distance in E. clarkii compared to E. typhina, consistent with lower levels of gene flow in the former. This pattern may be explained by the different dispersal abilities of the two pathogens and is expected to be influenced by the genetic structure of host populations. In addition, genetic diversity was higher in E. typhina populations compared to E. clarkii, indicative of higher effective population size in E. typhina. These results suggest that the effect of genetic drift and the efficacy of selection may differ in the two species. Our study provides evidence of how ecologically similar species occupying the same geographical space can experience different evolutionary contexts, which could influence local adaptation and co-evolutionary dynamics of these fungal pathogens.


Introduction
Fungal plant pathogens are fundamental and ubiquitous components of natural biodiversity (Bradley et al., 2008;Burdon and Laine, 2019). And yet, fungal plant pathogens are not commonly studied in natural ecosystems but rather in an agricultural setting, because they cause yield losses in important crop plants and may have negative impacts on human or animal health (Fisher et al., 2012). Evolutionary research in the field of plant pathology therefore heavily focuses on mechanisms leading to disease outbreaks in agricultural ecosystems usually associated with the rapid evolution of virulence or fungicide resistance related traits (McDonald and Linde, 2002). Agricultural systems are useful for the study of antagonistic coevolution because these systems are characterized by strong selection and rapid adaptation, providing, in a way, short-term evolutionary "experiments" (Stukenbrock and McDonald, 2008). A potential drawback, however, is the fact that the starting conditions of all these "experiments" are essentially the same: pathogens emerge under environmental and genetic uniformity. How plant pathogens evolve under more natural conditions remains less well understood. The impact of fungal plant pathogens on wild plants has received some attention in the case of invasive diseases [for example Hymenoscyphus fraxineus causing ash-dieback (McMullan et al., 2018), or the chestnut blight fungus Cryphonectria parasitica (Milgroom et al., 2008)], but mechanisms underlying persistence and evolutionary dynamics of host-pathogen associations in native systems remain much more elusive. Exploring the genetic structure of populations is central to the study of the evolutionary interactions between fungal pathogens and their host plants. The distribution of genetic variation within populations (genetic diversity) and among populations (genetic differentiation/patterns of gene flow) can provide insights into the evolutionary histories, distributions and disease dynamics of interacting species and determines their evolutionary potential (Thompson, 2005, Chapter 2, Raw Materials for Coevolution I; Barrett et al., 2008). For instance, the lack of genetic structure in host populations of cultivated plants favors the contagious spread of few (or a single) fungal pathogen genotypes and has been linked to devastating epidemics in agricultural systems (Stukenbrock and McDonald, 2008;Persoons et al., 2017). In contrast, persistent antagonistic coevolution of pathogens with their hosts as it is observed in natural systems requires generation and maintenance of genetic variation in both partners. For example the analysis of natural populations of the flax rust pathogen Melampsora lini in Australia showed extensive genetic variation and population differentiation at loci associated with pathogenicity, suggesting that local evolutionary processes such as selection and genetic drift maintain high diversity between fungal pathogen populations (Barrett et al., 2009). In another well studied model pathosystem, the anther-smut fungus Microbotryum lynchnidis-dioicae infecting Silene latifolia, patterns of genetic structure were consistent with recolonization from ancient glacial refugia (Vercken et al., 2010;Badouin et al., 2017), and the population structure of the fungal pathogen was strongly congruent with the population structure of its host plant across the distribution range, a pattern likely driven by co-evolutionary processes (Feurtey et al., 2016). It appears that genetic structure of both the pathogen and the host underlies dynamics of disease prevalence at temporal and spatial scales and, for example in the case of powdery mildew Podosphaera plantaginis infecting Plantago lanceolata, this genetic and environmental heterogeneity among populations may buffer large-scale disease outbreaks (Jousimo et al., 2014). In order to increase our understanding of the evolutionary forces governing plant-pathogen interactions, more empirical studies on natural populations of fungal pathogens are needed that consider populations across a broad geographic range and encompass multiple comparable species.
Members of the ascomycete genus Epichloë, notably the E. typhina species complex, are an intriguing system in which to investigate the diversity and population structure of fungal pathogens in natural environments (Bultman et al., 2011). Fungi of the E. typhina complex reproduce sexually and sterilize their host plant while doing so. Since infections are systemic and long-lasting, the effect on host plant fitness is detrimental and infected plants may have no more reproductive output for the rest of their lifetime. In sterilizing systems such as this, interaction partners are engaged in a tight arms-race with strong selection pressures acting on the host plant to develop resistance and, on the pathogen, to overcome this resistance (Ashby and Gupta, 2014;Tellier et al., 2014). While different Epichloë species are usually highly specialized on a single host grass species, these pathogens often occur in sympatric settings in natural grasslands hosting a variety of different grass species. This system thus provides an exceptional opportunity to sample sympatric populations and study ecologically similar pathogen species occupying the same geographic space.
In this study we investigated the population genetic structure of sympatric populations of two sibling species of the E. typhina complex occurring on distinct host grasses from natural grasslands across Europe. Using genome-wide SNPs we found differences in the amount of population genetic structure between the two species. To fully explore how these species differ and to provide insight into the causes and consequences of observed differences in population genetic structure, we addressed the following questions: (1) How does the population genetic structure differ between two species that have overlapping distributions? (2) Do patterns of gene flow, isolation by distance and genetic differentiation (F ST ) differ between the two species? (3) How do genetic diversity statistics (π, Tajima's D) vary between these species? The parallel sampling of multiple populations from two species allows us to investigate population genetic structure within-species and compare between two species across the same geographic range. Our study provide insights into how two species with similar lifestyles can experience distinct evolutionary contexts, and increases our understanding of how population genetic processes can affect fungal pathogens in natural ecosystems.

Materials and methods
2.1. Study system 2.1.1. Fungal pathogens The sexually reproducing members of the ascomycete genus Epichloë are highly specialized biotrophic endophytes and pathogens of many pooid grasses forming long-lasting systemic infections (Craven et al., 2001;Leuchtmann et al., 2014).
Whole plants are usually infected by a single haploid genotype (Leuchtmann and Clay, 1997), and obligate outcrossing takes place between two genotypes of opposite mating types, each infecting a different host individual (bipolar heterothallic mating system) (White and Bultman, 1987;Schardl et al., 2014). As endophytes they form systemic and symptomless infections in vegetative tissues of the host grasses and then hijack grass flowering tillers to produce gametes and complete their own sexual cycle (Clay and Schardl, 2002). The formation of the fungal reproductive structure (stroma) around the grass inflorescence causes choke disease, usually affecting all host inflorescences and therefore sterilizing the entire plant by preventing production of pollen and seed (Chung and Schardl, 1997).
Here we focus on two closely related Epichloë species within the E. typhina species complex occurring on different Poaceae hosts: E. typhina infecting Dactylis glomerata and E. clarkii infecting Holcus lanatus. Hereafter we use the name E. typhina meaning solely the lineage specialized on D. glomerata. Note, however, that this name has also been used to refer to the aggregate of genetically distinct lineages from different hosts which are phylogenetically not clearly resolved (Treindl et al., 2021a). The two taxa are currently assigned the taxonomic rank of subspecies based on their sexual compatibility, i.e., their ability to hybridize in experimental crosses (Leuchtmann and Schardl, 1998;Leuchtmann et al., 2014). Indeed, hybrid ascospores between E. typhina and E. clarkii have been observed in natural sympatric populations (Bultman et al., 2011), indicating that at least the potential for gene-flow is retained. However, there is no evidence of host grasses infected by hybrid strains in natural populations suggesting the existence of postzygotic barriers. In line with our previous work which provided evidence of reproductive isolation and genome-wide high levels of genetic differentiation (Schirrmann et al., 2015(Schirrmann et al., , 2018Treindl and Leuchtmann, 2019), we consider E. typhina and E. clarkii as closely related yet distinct species at an advanced stage of divergence.
In a previous study, we generated high-quality genome assemblies of reference strains for both species which provide an essential genomic resource for the work conducted here (Treindl et al., 2021b). Analyses of the structural organization of Epichloë genomes revealed a clear pattern of genome compartmentalization into gene-rich and AT-rich compartments, consistent with the so called "two-speed" model of genome evolution (see also Winter et al., 2018). This particular genome organization shared by some but not all plant-pathogenic fungi is thought to play an important role in driving rapid adaptation in the co-evolutionary arms-race between host and pathogen (Raffaele and Kamoun, 2012;Möller and Stukenbrock, 2017;Torres et al., 2020). A comparison of the E. typhina and E. clarkii reference genomes showed a high degree of genomic synteny between the sibling species, however, also revealed striking differences in genome size and compartmentalization likely related to an underlying difference in effective population size (Treindl et al., 2021b). Besides the differences in genome structure and the obvious distinction based on host specificity, E. typhina and E. clarkii also differ in morphological characteristics including the size and disarticulation patterns of their ascospores (White, 1993;Leuchtmann and Clay, 1997).

Host grasses
The host species D. glomerata (orchard grass or cocksfoot), and H. lanatus (Yorkshire fog) are both long-lived perennial grasses native to Europe (Tutin et al., 1980). They are windpollinated outcrossers, flowering typically May to July and are common and widely distributed in nutrient rich grasslands, where they often co-occur. Cultivars of D. glomerata may also be sown as part of commercial forage grass mixtures (e.g., see UFA-Samen, Herzogenbuchsee, Switzerland, and BARENBRUG France, Montévrain), while H. lanatus is not commercially sown. Although choke disease caused by E. typhina infections has been shown to cause yield reductions in commercial seed production of D. glomerata (Large, 1954;Pfender and Alderman, 2006), to our knowledge there has been no attempt to control pathogen prevalence in natural, semi-natural or pasture grasslands in Europe besides measures of selective breeding in commercial fields of seed producers and recommendations concerning mowing practices (Raynal, 1991).

Sample collection and isolation of fungi
Sampling was performed at seven locations across Western Europe between May and July 2016-2018. Because sympatric populations are not common in Europe, selection of sites was governed by availability and chance while they covered a good portion of the hosts' North-South distribution. At these locations large numbers (>20) of infected host grasses (D. glomerata and H. lanatus or, in one location H. mollis) occurred intermixed or in close proximity less than 50 meters apart as evidenced by the presence of fungal reproductive structure on flowering tillers (large circles in Figure 1 and Table 1). The majority of the sampled populations were located in permanent natural meadows with no apparent evidence of recent high-intensity agricultural use such as regular sowing, fertilizing or grazing by livestock. One population (So) was located in a permanent semi-natural meadow which may be under low-intensity agricultural use. It is noteworthy that the transition between completely natural (no management) and semi-natural (extensive management) grasslands is fluid and even natural meadows often experience anthropogenic influences such as occasional mowing (e.g., in the Aub and Kew populations). However, stroma forming grasses were not observed in intensively managed meadows such as highly fertilized meadows or grass sown as an annual crop under crop rotation for high-intensity silage or hay production although single infected plants were observed around the rims of such fields (personal observation). It appears that agricultural practices that involve regular mowing (i.e., usually before host grasses flower) negatively affect the pathogen prevalence. We consider our populations to be «natural populations» in the sense that they are part of permanent grasslands not under strong human disturbance. From each host species 21-33 infected plants per location were sampled by picking one choked flowering tiller per plant individual. Each sample represents a different haploid genotype, since an individual plant is assumed to be infected by a single fungal strain resulting from horizontal transmission of meiotic ascospores (Leuchtmann and Clay, 1997). In Kew Gardens (London, UK) E. clarkii was sampled from a previously unknown host species, Holcus mollis, occurring in sympatry with E. typhina on D. glomerata. Holcus mollis, unlike H. lanatus, can reproduce clonally through the formation of  Table 1.
rhizomes in addition to normal sexual reproduction and prefers less nutrient rich sites in natural meadows or light woodland (Ovington and Scurfield, 1956). Therefore, we cannot exclude the possibility that some isolates from this population originate from a single fungal genotype systemically propagated via rhizomes within the population of H. mollis, although tillers were collected from plants at a distance of at least five meters. The sympatric population in Western Switzerland (Aub) had been sampled previously 13 years prior to this study in 2005 (Steinebrunner and Leuchtmann, unpubl. data). We included these for sequencing and hereafter designate the isolates as Aub (sampled in 2018) and AubX (sampled in 2005). At locations where only one infected host species was found (small circles in Figure 1 and Table 1) we collected infected flowering tillers from 1 to 4 individuals per population (Bret, Other 1-4) and refer to these as "additional" samples.
Fungal strains were isolated from freshly collected tillers by splitting open the stromata with a sterile blade to expose the interstitial fungal mycelium between undeveloped inflorescence and leaf blades. Pieces of mycelium were then transferred with sterile tweezers and subsequently grown on supplemented malt extract-agar (SMA) plates containing 1% malt extract, 1% glucose, 0.25% bacto peptone, 0.25% yeast extract, 1.5% bacto agar and 0.005% oxytetracycline (Pfizer, New York, NY, USA). Isolates were carefully checked under the dissecting scope for purity and Epichloë identity as described previously (Christensen et al., 1993). Pure cultures were grown in V8 liquid media on a rotary shaker at 120 rpm for 10-14 days at room temperature. Mycelium was vacuum filtered, freeze dried for 24 h and then ground using liquid nitrogen. All live axenic cultures were also stored on SMA in tubes covered with mineral oil at 4 • C at ETH Zürich.

DNA preparation and sequencing
Genomic DNA was extracted from 10 mg freeze dried material using the sbeadex mini plant kit (LGC Genomics, Berlin, Germany), but replacing the lysis buffer with PVP (sbeadex livestock kit) as this buffer had previously shown improved DNA yields. Extraction was automated on a KingFisher system following the manufacturers protocol (Thermo Fisher Scientific, Waltham, MA, USA). The DNA concentration of extracted samples was quantified using a Spark R Multimode Microplate Reader (Tecan Trading AG, Switzerland). Paired-end libraries of 2 × 150 bp fragments with an insert size of 330 bp were prepared with NEBNext R Ultra II DNA Library Prep Kits (New England Biolabs, Ipswich, MA, USA), and sequencing was performed on a HiSeq4000 Illumina sequencer (Illumina, San Diego, CA, USA), at 16× coverage on average. We sequenced whole genomes of 240 individuals of E. typhina and 240 individuals of E. clarkii of which 211 each yielded sufficient data for the analyses. All Illumina

Mapping and variant calling
Reads were mapped against the high-quality reference genomes of E. typhina (Ety_1756, BioProject ID PRJNA533210) and E. clarkii (Ecl_1605_22, BioProject ID PRJNA533212) (Treindl et al., 2021b). Mapping was performed with BWA (Li and Durbin, 2009) and variant calling was done with the Genome Analysis Toolkit (McKenna et al., 2010), using the GATK Short Variant Discovery best practice workflow. 1 Details and examples of how each command was run are provided on github. 2 In brief, the raw reads were trimmed and adaptors removed using trimmomatic v0.35 (Bolger et al., 2014). The trimmed reads were mapped to the respective reference genomes using bwa mem v0.7.17 (Li and Durbin, 2009). Low quality alignments (qual >10) were removed and duplicates were marked. Mapping statistics were obtained using sambaba v0.6.6 (Tarasov et al., 2015). After mapping, variants were identified using HaplotypeCaller, GatherVCFs and JointGenotyper.

Filtering
Variants were filtered with custom scripts using vcftools (Danecek et al., 2011), bcftools , and R (R Core Team, 2013), [for example scripts see github repository (see text footnote 2)]. We removed mitochondrial sequences and indels, thus only SNPs in nuclear DNA (Chr1-7) were used in all subsequent analysis. The following hard-filters were applied using vcftools v1.2.8 minor allele count (-mac 3) and maximum mean depth values (-max-meanDP 100). The latter filter removed variants with excessive coverage (mean depth > mean read depth) across all samples as these are indicative of either repetitive regions or paralogs. We filtered out SNPs that were not well sampled across populations by removing SNPs with >30% missing within a single population and removed isolates that had >80% missing data. We then filtered by read quality and read depth following GATK best practice recommendations (Van der Auwera et al., 2013), and based on our evaluation of the data. We removed sites with low mapping quality (MQ value <40, the Root Mean Square of the mapping quality of the reads across all samples) and with a high relative variance in read depth [variance in read depth divided by mean read depth >80 for E. typhina and >30 in E. clarkii (Supplementary Figure 1)]. We visually inspected regions at different SNP densities using Integrative Genomics Viewer [IGV (Robinson et al., 2011)] to assess the variant calls and based on these observations chose to excluded sites where ≥3 SNPs had been called within 10 bp. Lastly, we removed sites that previous filtering steps had rendered monomorphic. Summary statistics including missingness and mean depth of this dataset are shown in Supplementary Figure 2. For some downstream analyses additional filters were applied and are described below and listed in Supplementary Table 1.

Analyses of genetic structure
We analyzed genetic structure of 211 E. typhina and 211 E. clarkii isolates using three approaches that provide insight into allele frequency variation (see McVean, 2009): Principal component analysis (PCA), Bayesian clustering analysis (STRUCTURE) and discriminant analyses of principal component (DAPC, for details see Supplementary Notes). PCA, was implemented in the R package SNPRelate (Zheng et al., 2012). PCAs were done for the complete SNP datasets, LD-pruned datasets (see below) and datasets with no missing data. We repeated analyses after removing the MdR population from the datasets, as these individuals are so genetically distinct in both species that they obscure the genetic structure of the other populations. We also visualized the amount of sub-structure within each population using PCA (see Supplementary Notes: Population substructure). For the STRUCTURE analysis, we applied a minor allele frequency filter of 0.05 using vcftools (-maf 0.05) to exclude rare alleles and singletons, because they can strongly impact clustering by STRUCTURE (Linck and Battey, 2019). Bayesian assignment assumes that loci are independent, so we further removed SNPs that are in linkage disequilibrium (LD), pruning variants with a pairwise r 2 greater than 0.3 in 10 kb windows using bcftools v1.9. The LD-pruned datasets included 152,883 biallelic SNPs in E. typhina and 79,352 SNPs in E. clarkii. STRUCTURE v2.3.4 (Pritchard et al., 2000), was run for K = 1-12 with 20,000 Monte Carlo Markov Chain (MCMC) iterations following a 10,000 iteration burn-in for each of twelve replicate runs per K. We used an admixture ancestry model with correlated allele frequencies and no prior information about the demography. The output of the STRUCTURE analysis was processed using STRUCTURE HARVESTER (Earl and vonHoldt, 2012). The model with the maximized rate of change in log likelihood values ( K) was considered the optimal number of clusters based on methods described by Evanno et al. (2005). We assessed probability of assignment and genetic admixture among populations by plotting the proportion of membership to each of the inferred ancestry clusters for every individual, using the optimal model parameters (the best run) for each K.

Summary statistics of genetic variation
Population genetic summary statistics were calculated with vcftools v1.16, using a patch that allows computation with haploid datasets. 3 Calculations were performed for the seven sympatric populations only (additional samples from allopatric locations were removed), leaving 191 individuals in E. typhina and 208 individuals in E. clarkii. After removing "additional" isolates we again removed any monomorphic variants. Populations sampled in different years from the Aubonne location (Aub and AubX) were analyzed as separate populations.

Nucleotide diversity π, Tajimas D
We quantified within-population genetic diversity by calculating nucleotide diversity in 10 kb non-overlapping windows [π, the average number of differences between individuals (Takahata and Nei, 1985)], and Tajima's D in 10 and 40 kb nonoverlapping windows (Tajima, 1989). We calculated the average nucleotide diversity and Tajima's D per population as the mean across all windows. Within each species we tested if population means were significantly different from each other by performing an ANOVA and least significant difference (LSD) test.

F ST
To assess relative genetic differentiation between populations we calculated pairwise fixation indexes F ST (Weir and Cockerham, 1984), per site between all populations using vcftools (haploid switch, see above). We then calculated mean F ST in nonoverlapping windows of 5 kb (10 kb) along each chromosome using R version 4.0.2 (R Core Team, 2013), and genome-wide weighted mean F ST between population pairs across all SNPs. To investigate patterns of isolation-by-distance (IBD) in E. typhina and E. clarkii we tested for a positive correlation between genetic distance (the mean weighted F ST ) and geographic distance. We used the R package lme4 (Bates et al., 2015), to fit a linear mixed-effects model to the data with populations as random effects and distance as a fixed effect. We also calculated divergence as the number of fixed differences between population pairs (SNPs with an F ST of 1).

Analysis of linkage disequilibrium
We removed loci with a minor allele frequency below 0.05 (within a population) because differences in allele frequencies can (upwardly) bias estimation of LD. We calculated the coefficient of correlation (r 2 ) between pairs of SNPs that were up to 20 kb apart using plink (v. 1.90) and calculated mean LD within 100 bp windows. We visualized the decay of r 2 over physical distance using a locally weighed polynomial regression (LOESS) model using the function geom_smooth in the R package ggplot2 (Wickham, 2016).

Effective population size N e
To compare effective population sizes within and between species we estimated effective population sizes (N e ) from polymorphism data with the parameter θ = 2N e µ (for more details see Supplementary Notes: Calculation of effective population size). Estimates were based on Watterson's estimators of θ (Watterson, 1975), which is based on the number of segregating sites in a population relative to the sequence length, and an assumed mutation rate (µ) of 3.3 × 10 −8 per site. This mutation rate is based on estimates in yeast (Lynch et al., 2008), and has been used in other studies (Stukenbrock et al., 2011). The number of segregating sites per populations can be found in Supplementary  Table 2. From this we calculated the proportion of segregating sites within each population relative to the total variation among all populations, as a measure of genetic diversity comparable between species (Supplementary Table 3).
We also analyzed the distribution of genetic variation separately for distinct genomic compartments in the Epichloë genomes.

Genetic structure of populations of two Epichloë species
After removal of isolates with low coverage and replicates we retained genome-wide polymorphic sites for 211 isolates in each of the species. We obtained 658,021 biallelic SNPs in the E. typhina dataset and 400,033 SNPs in the E. clarkii dataset after initial filtering (complete dataset, Supplementary Table 1).
We found that population structure of the two species reflected the geographic origin of sampled populations. Looking at the PCA first, the most obvious result was that in both species the Spanish population was highly differentiated (Supplementary Figures 3A,  D). In order to better visualize the genetic structure in the PCA plots we removed the Spanish population (Figures 2, 3). In E. typhina, the first PCA axis represents the West-to-East distribution (explaining 3.93% of the variation), and the second axis represents the South-to-North distribution (2.58% variation explained, Figure 2B). Spatially intermediate "additional" samples (Bret and Other 1-4) were also genetically intermediate. In E. clarkii the clustering was more distinct, there were seven clusters each corresponding to a single geographic population, with the exception of the Kew population, which consisted of two genetically distinct groups (Figure 3B). In both species, isolates sampled 13 years apart from the same Western Switzerland location did not form genetically distinct groups (AubX and Aub sampled 2005 and 2018), suggesting little genetic differentiation has occurred during this time. These patterns were robust to different filtering approaches as outlined in the methods section (complete, LD pruned, no missing data) (Supplementary Figures 3, 4). We also visualized the genetic structure using PCA of each population separately and the two populations that were the most spread out in principal component space (E. typhina Cev and E. clarkii Kew) also showed evidence of sub-structure (see Supplementary Notes and Supplementary Figures 5, 6).
The STRUCTURE results were largely consistent with the PCA as expected. In both species the Spanish population was assigned to a distinct cluster (K) irrespective of the inferred number of clusters (K = 2 to K = 12), and there was no genetic differentiation between isolates sampled at different time points in the population from Aubonne (AubX and Aub sampled 2005 and 2018). Similar to the PCA results, genetic differentiation was more pronounced in E. clarkii than E. typhina. In E. typhina not all populations formed distinct genetic clusters. The most likely number of clusters was K = 6 as inferred by the maximized likelihood and K values across twelve replicate runs (Figures 2A, C, D). At K = 6, only two of the six clusters contained all isolates from a single location (red: MdR, green: So), whereas three populations were predominantly assigned to the same cluster. Isolates from most locations appeared to have ancestry from the same orange cluster, to varying degrees. Isolates from more Northern locations (El, Kew, Bret) and three isolates from Plymouth showed the highest ancestry proportion to this cluster. The exception was isolates from Cev, the majority of which had genotypes assigned to two distinct clusters (yellow and dark green) but also contained admixed isolates. Additionally, the population from Western Switzerland contained seven isolates assigned to a sixth cluster (purple), three of which were sampled in 2005 (AubX) and four in 2018 (Aub) (see Supplementary Figure 7), but the majority of isolates had mixed ancestry. A number of isolates from Aub and Cev as well as six additional isolates collected in the Southeastern tip of France showed a higher proportion of mixed ancestry with membership proportion <0.5 for any of the inferred clusters (gray symbols in PCA). The relationship among genetic clusters and location did not get much clearer at K = 7.
In E. clarkii the clustering was more distinct and the number of genetic clusters matched the number of sampling locations and nearly all clusters consisted of isolates from a single sampling location. The exceptions were populations from Southern France (Cev) and Southern Switzerland (So), which were assigned to Population structure analyses of 211 E. typhina isolates based on genome-wide SNPs. The LD pruned dataset including 152,883 SNPs was used in the STRUCTURE analyses whereas the complete dataset of 658,021 SNPs was used for the PCA. (A) Sampling locations of E. typhina isolates infecting D. glomerata; pie charts represent Bayesian cluster membership proportion for K = 6, diameters reflect sample size (range n = 3-33). (B) Principal component analysis of genetic differentiation among isolates excluding the population from Spain (MdR); the percentage of variance explained by the first two principal components is shown in parentheses; symbols indicate sampling locations of populations and colors correspond to the genetic clusters at K = 6 assigned to each individual by Bayesian clustering for isolates with ≥0.5 membership to a single cluster; gray symbols are isolates not assigned to any cluster when using the threshold of 0.5. (C) Delta K plot of the STRUCTURE Bayesian clustering analyses for Ks of 2-12. (D) Bar plots of ancestry membership proportions (y-axis) as inferred by STRUCTURE for K = 2, K = 6, and K = 7; each isolate displayed as a vertical bar and populations are indicated on the x-axis; each color represents a cluster and isolates are ordered according to longitude (West to East); sample sizes are: MdR = 33, Bret = 4, Kew = 14, Auv = 27, Cev = 31, Aub = 22, AubX = 16, El = 26, So = 22, Other 1-4 = 3 + 3 + 6 + 4. For code and origin of populations see Table 1.
the same cluster (Figures 3A, C, D). At lower and higher Ks (K = 5, K = 6, K = 8, Supplementary Figure 8), however, these two populations were assigned to distinct clusters. The population collected at Kew Gardens (Kew) consisted of genotypes from two distinct clusters as well as a single admixed isolate. Although Central European populations contained a higher number of individuals with mixed ancestry, admixture proportions were lower than in E. typhina. Overall, ten isolates were not assigned to a cluster as they showed <0.5 membership proportion to any single cluster. Of these admixed isolates, all except for three genotypes collected in Western France (Bret) and the admixed isolate from Kew shared the majority of their ancestry with their population of origin. Population structure analyses of 211 E. clarkii isolates based on genome-wide SNPs. The LD pruned dataset including 79,352 SNPs was used in the STRUCTURE analyses whereas the complete dataset of 400,033 SNPs was used for the PCA. (A) Sampling locations of E. clarkii individuals infecting H. lanatus and H. mollis (Kew and Bret); pie charts represent Bayesian cluster membership proportion for K = 7, diameters reflect sample size (range n = 3-30). (B) Principal component analysis of genetic differentiation among isolates excluding the population from Spain (MdR); The percentage of variance explained by the first two principal components is shown in parentheses; symbols indicate sampling locations of populations and colors correspond to the genetic clusters at K = 7 assigned to each isolate by Bayesian clustering for isolates with ≥ 0.5 membership to a single cluster; gray symbols are isolates not assigned to any cluster when using the threshold of 0.5. (C) Delta K plot of the STRUCTURE Bayesian clustering analyses for Ks of 2-12. (D) Bar plots of ancestry membership proportions (y-axis) as inferred by STRUCTURE for K = 3, K = 7, and K = 8; each isolate displayed as a vertical bar and populations are indicated on the x-axis; each color represents a cluster and isolates are ordered according to longitude (West to East); sample sizes are: MdR = 29, Bret = 3, Kew = 19, Auv = 30, Cev = 28, Aub = 27, AubX = 18, El = 27, So = 30. For code and origin of populations see Table 1.  Table 2). Population pairs with the highest F ST values also had the highest number of fixed differences between them, and overall, there were more fixed differences between E. clarkii populations than E. typhina (Supplementary Table 4). We found a positive correlation between genetic differentiation and geographic distance consistent with a pattern of isolation-by-distance (IBD) in both species (Figure 4). A comparison of the slopes of the two linear models showed that the effect of IBD was stronger in E. clarkii compared to E. typhina with higher genetic differentiation between  Isolation by distance among Epichloë populations. Relationship between pairwise F ST and geographic distance (air distance between population pairs in km) for E. typhina (green) and E. clarkii (purple). Correlations for both species were significant (E. typhina: p > 0.001; E. clarkii: p > 0.001) with increasing geographic distance having a stronger effect in E. clarkii p > 0.001.

E. typhina has higher genetic diversity within populations
Mean genome wide nucleotide diversity (π) ranged from 0.0024 to 0.0027 in E. typhina populations and from 0.0008 to 0.0012 in E. clarkii populations (Supplementary Figure 9,  Supplementary Table 5). This was consistent with a previous study which found higher diversity in E. typhina compared to E. clarkii in one sympatric population pair (Schirrmann et al., 2018). Mean Tajima's D was not significantly different from zero across populations of E. typhina, ranging from −0.369 to −0.061 Figure 10, Supplementary Table 6). In E. clarkii mean Tajima's D was also not different from zero, but there was greater variation between populations ranging from −0.354 to 1.318. Kew and MdR had the highest values with 0.903 (± 0.05) and 1.318 (± 0.05), respectively, and these were significantly different from other population means (see Supplementary Table 6).

Analysis of linkage disequilibrium
Linkage Disequilibrium (LD) decayed quickly with distance, r 2 < 0.2 in 1-2 kb in all E. typhina populations and most E. clarkii populations indicating ongoing recombination (Supplementary Figure 11). Two populations in E. clarkii were exceptions (Supplementary Figure 12): In Kew, LD decayed more slowly with distance, but followed the expected pattern, in contrast in the Spanish population MdR LD began to increase with distance from 200-7000 bp, before declining again.

Higher effective population size N e in E. typhina
We used Watterson's estimators of θ (Watterson, 1975) and an approximate mutation rate of 3.3 × 10 −8 per site to estimate and compare effective population sizes within and between Epichloë species. Note that the estimates of N e inferred here contain an uncertainty given that the mutation rate was not directly estimated in this system. Actual mutation rates may be lower which would lead to higher estimates of N e . Effective population sizes in E. typhina were between twofold and fourfold larger than in E. clarkii populations, which also had more variance in N e estimates among populations (Table 3: Estimates of effective population size N e ). In E. typhina the population from Western Switzerland (Aub) had the largest N e (43,941) and the British population (Kew) had the smallest (37,034). In E. clarkii the population from Southern Switzerland (So) had the largest N e (18,809) and the Spanish population (MdR) had the smallest (9,251).

Discussion
Our parallel study of two closely related fungal pathogens revealed that even ecologically similar sympatric species can experience distinct evolutionary contexts. Despite the two species sharing a highly specialized host-sterilizing lifestyle, we identified striking differences in their genetic differentiation, genetic variation and effective population size. This has implications for adaptive evolution in these species. In the following we discuss the effects of gene flow on genetic structure and relate this to differences in the pathogen's biology and phylogeographic history.

Different levels of gene flow among spatially structured populations
In our analysis of genome-wide variation we found a clear pattern of biogeographic structure in both E. typhina and E. clarkii. Spatially separated populations were genetically differentiated, and there was evidence of clinal variation across the distribution. In both species genetic distance increased with geographic distance and the most remote population in Spain was genetically very distinct. Despite the similarities in population structure at a broad scale, patterns of gene-flow among populations were different between the two species. E. typhina populations had higher genetic diversity and less differentiation between populations, indicating greater gene flow particularly among Central European and Northwestern populations. In contrast, we found lower diversity and higher levels of genetic differentiation among E. clarkii populations and a stronger effect of isolation by distance. Reduced gene flow and greater isolation between populations of E. clarkii could result in these populations experiencing greater effects of drift. The lower genetic diversity and lower effective population size within E. clarkii populations found here supports this hypothesis.
In addition, comparison of genome organization and size between the reference genomes of the two species suggested that genome expansion in E. clarkii compared to E. typhina may be linked to a reduction in effective population size leading to stronger effects of drift compared to E. typhina (Treindl et al., 2021b). Of the total variation assessed in each species, a similar proportion of sites were segregating within most populations indicating that the two species likely have similar mutation rates (Supplementary Table 3). In spite of a lower overall diversity, E. clarkii populations still harbor a considerable amount of standing genetic variation which is not surprising given the obligately outcrossing lifestyle of the pathogen. Indeed, most populations showed a rapid LD decay similar to E. typhina populations reflecting frequent sexual recombination. The exceptions were two populations, Kew and MdR, which were found to have significantly less genetic diversity and high levels of linkage disequilibrium. Variation in LD across all loci is driven by multiple processes, including population substructure, bottlenecks and admixture. However, if high LD is coupled with low genetic diversity, it is unlikely the result of admixture, as this would increase genetic diversity. In Kew, LD decayed more slowly with distance compared to central European populations, a slower decay and lower genetic diversity is most likely the result of a bottleneck and/or increased prevalence of asexual reproduction. In MdR LD decayed with distance until ∼200 bp, but then increased with distance until ∼7 kb. Long range LD (e.g., between loci on different chromosomes) could be the result of a recent strong bottleneck (founder effect), high prevalence of asexual production or admixture. As genetic diversity was also lower in the MdR population we think the observed pattern of LD is unlikely to be driven by admixture. Further genetic studies of these populations are needed to determine what is driving these patterns.
Although Tajima's D values in these two populations were higher than in other populations, the genome-wide means were not significantly different from zero and so did not support evidence of past reductions in population size or a lack of low frequency of alleles. Nevertheless, such demographic processes affecting genome-wide variation will need to be taken into consideration in future studies when investigating the impact of selection in these populations.

Pathogen dispersal ability affects geographical and genetic structure
The distribution of Epichloë pathogens is determined by the range and availability of suitable host plants, and the ability of pathogens to reach hosts and establish an infection primarily by means of ascospore dispersal. While both host grass species are widespread across our study area and often co-occur, the abundance of their obligate pathogens differs and Epichloë infections are not everywhere: E. clarkii can be very common locally, such is the case of sympatric populations sampled in this study, but generally has a patchier distribution, whereas infections of E. typhina on D. glomerata are more widely distributed. The population genetic structure would suggest differences in the effective long-range dispersal of these species. We hypothesize that differences in dispersal are related to their spore sizes. E. typhina produces long, thread-like spores which may facilitate efficient long-distance dispersal by wind, whereas E. clarkii produces shorter and wider part-spores (White, 1993;Leuchtmann and Clay, 1997). Higher numbers of part-spores may be more dispersed over shorter distances and increase disease pressure locally. Different dispersal abilities could explain the differences in pathogen distribution and are consistent with our genetic data, suggesting larger population sizes and higher levels of gene flow in E. typhina compared to E. clarkii populations.
Another type of propagule produced by the fungus that could potentially mediate dispersal are spermatia (mitotic spores formed on reproductive tissue) or conidia (formed on vegetative tissue). Spermatia are carried by Botanophila flies among fungal reproductive structures (stromata) within populations but are unlikely to be wind-dispersed and cause new infections in distant host plants (Leuchtmann and Clay, 1997). Spermatia or conidia may also cause chance infections of neighboring plants within a population mediated by slugs or rain splashes (Tadych et al., 2007;Hoffman and Rao, 2013). Even if such local dispersal occurred, we would expect it to have similar effects on E. typhina and E. clarkii and it is unlikely to affect the geographical structuring of genetic variation at the scale observed in this study.

The impact of host distribution and joint phylogeographic history
Genetic variation in specialized obligate pathogens such as E. typhina infecting D. glomerata and E. clarkii infecting H. lanatus is expected to be strongly influenced by the distribution and genetic structure of their host plants (Wilson et al., 2005;Greischar and Koskella, 2007). Here we provide some discussion about the hosts, first considering their current distribution and secondly considering the more distant past. Both D. glomerata and H. lanatus are native and widely distributed plants in grasslands across Europe and share similar long-distance pollen dispersal mediated by wind whereas seeds usually disperse more locally (Düll and Kutzelnigg, 2016). Our sampling was conducted in natural grassland ecosystems where the impact of anthropogenic use is likely minimal. However, sowing for pasture farming or forage production could have differentially affected the genetic make-up of the two host grass populations. H. lanatus is not commercially sown due to its low nutritional value as a forage grass and may even be considered a weed, whereas D. glomerata cultivars are frequently sown as part of commercial forage grass mixtures. Active sowing of D. glomerata may result in an overall large effective population size and more continuous distribution and this could explain why populations of its pathogen E. typhina also have more gene flow and higher diversity. Considering the more distant past, the observed genetic structure could reflect the joint phylogeographic history of pathogens and hosts dating back to the last ice age. In particular, the existence of three genetically distinct southern ice-age refugia (Iberia, Italy and the Balkans) and subsequent northward recolonization as ice-sheets retreated is a well-established hypothesis supported by phylogeographic studies in a growing number of temperate species including grasses (Hewitt, 2000;Fjellheim et al., 2006;McGrath et al., 2007;François et al., 2008;Provan and Bennett, 2008;von Cräutlein et al., 2019). It also supports population genetic patterns found in a host-pathogen system. In the anther smut pathogen Microbotryum lynchnidisdioicae and its white campion host (Silene latifolia) three genetically differentiated clusters were identified that reflected the concerted recolonization of the host plant and its fungal pathogen from distinct southern glacial refugia (Vercken et al., 2010;Feurtey et al., 2016;Badouin et al., 2017). Our results do not indicate obvious footprints of distinct glacial refugia, however, this may be because we did not sample the entire distribution. Studies spanning a larger geographic area, including Eastern and Southern European populations would be better suited to test for such patterns.

Additional host species confirmed for E. clarkii
In our sampling of these natural populations of E. clarkii, we found two locations in the UK and in Northern France where the pathogen occurred on the host grass Holcus mollis. This was surprising given that E. clarkii is considered to be highly specialized on H. lanatus and, moreover, H. mollis is the host of a distinct Epichloë species, E. mollis (Leuchtmann et al., 2014). However, a literature review revealed that this finding indeed backs up previous observations from the UK, where E. clarkii was identified on H. mollis based on its ascospore morphology (Spooner and Kemp, 2005). We now provide additional evidence with genetic data and, for the first time, report H. mollis as an E. clarkii host on European mainland. It needs to be noted that hybrids between H. lanatus and H. mollis have been described (Holcus x hybridus Wein), which usually show morphologically intermediate phenotypes but tend to resemble H. mollis more closely (Beddows, 1961;Carroll and Jones, 1962). In our samples, species identification of the putative H. mollis host grasses was based on the presence of orthotropic shoots/rhizomes indicating clonal propagation and the presence of longer hairs on the nodes compared to H. lanatus. Characters of the inflorescence could not be considered since they were encased in the fungal fruiting structure. Although these morphological characters match well with descriptions of H. mollis, we cannot entirely exclude that samples originate from hybrids, and this will need to be investigated further. In any case, this finding opens up new avenues to investigate the evolution of host specialization and divergence within the Epichloë study system.

Conclusion
The comparison of genetic structure within and between populations of E. typhina and E. clarkii indicate distinct evolutionary histories of the two species which could be due to differences in dispersal and/or the underlying genetic structure of host populations. Our data suggests that E. typhina infecting D. glomerata is a more ubiquitous pathogen with higher dispersal abilities and E. clarkii infecting H. lanatus and H. mollis forms more fragmented populations restricted by lower dispersal abilities. These differences could influence the efficacy of selection and the local adaptation in these two species. E. clarkii with its lower effective population size is more likely to experience metapopulation dynamics with frequent local extinction and recolonization events which may lead to loss of genetic diversity within populations and generate genetic differences between populations through bottlenecks and genetic drift. Future work should investigate signatures of adaptive variation in these two pathogens in order to disentangle effects of drift and selection. This study and dataset presented here are the foundations for understanding coevolution and adaptation in the Epichloë study system and provides novel insights into the population structure and patterns of gene flow. It demonstrates how the biology of these species and the interaction with their host plants may influence evolutionary trajectories and helps to increase our understanding of the population genetic processes that generate and maintain diversity of fungal pathogens in natural ecosystems.

Data availability statement
The datasets presented in this study can be found in online repositories. The name of the repository and accession numbers can be found below: ENA; PRJEB59262 and ERR10815001-ERR10815422.

Author contributions
AL and AT conceived the study design. AT and JS developed methodology and did the formal analyses. AT wrote the initial draft. AT, JS, and AL reviewed and edited the manuscript. AL acquired funding. All authors read and agreed to the submitted version of the manuscript.

Funding
This study was supported by the Swiss National Science Foundation (SNSF), Grant Number: 31003A_169269.