The Population Genetics of Alternaria tenuissima in Four Regions of China as Determined by Microsatellite Markers Obtained by Transcriptome Sequencing

A total of 32,284 unigenes were obtained from the transcriptome of Alternaria tenuissima, a pathogenic fungus causing foliar disease in tomato, using next-generation sequencing (NGS) technology. In total, 24,670 unigenes were annotated using five databases, including NCBI non-redundant protein, Swiss-Prot, euKaryotic Orthologous Groups, Kyoto Encyclopedia of Genes and Genomes, and the Gene Ontology. A total of 1,140 simple sequence repeats were also identified for use as molecular markers. Sixteen of the simple sequence repeat loci were selected to study the population structure of A. tenuissima. A population genetic analysis of 191 A. tenuissima isolates, sampled from four geographic regions in China, indicated that A. tenuissima had a high level of genetic diversity, and that the selected simple sequence repeat markers could reliably capture the genetic variation. The null hypothesis of random mating was rejected for all four geographic regions in China. Isolation by distance was observed for the entire data set, but not within clusters, which is indicative of barriers to gene flow among geographic regions. The analyses of Bayesian and principal coordinates, however, did not separate four geographic regions into four separate genetic clusters. The different levels of historical migration rates suggest that isolation by distance did not represent a major biological obstacle to the spread of A. tenuissima. The potential epidemic spread of A. tenuissima in China may occur through the transport of plant products or other factors. The presented results provide a basis for a comprehensive understanding of the population genetics of A. tenuissima in China.


INTRODUCTION
Alternaria tenuissima is an important global pathogen on a large variety of economically important crops, including broad bean, tomato, sunflower, potato, watermelon, and muskmelon (Rahman et al., 2002;Agamy et al., 2013;Wang et al., 2014;Zheng et al., 2015;Zhao et al., 2016a,b). The pathogen affects the above-ground parts of the crops, and is the causal agent of early blight, stem canker, and some fruit rots (Abdelfattah et al., 2016;Bessadat et al., 2017). Foliar diseases outbreaks caused by A. tenuissima are primarily epidemic and are especially devastating on tomato leaves (Agamy et al., 2013;Bessadat et al., 2017). High humidity and fairly high temperatures can lead to severe epidemics in tomato-growing regions (Bessadat et al., 2017). Although environmental conditions vary significantly in different crop production regions, once established, the infection spreads rapidly. In fact, sporadic epidemic transmission is the main factor responsible for the high frequency of occurrence of this disease (Agamy et al., 2013;Meng et al., 2015). The increasing frequency of A. tenuissima outbreaks has affected the distribution of Alternaria species responsible for causing foliar diseases Zheng et al., 2015;Zhao et al., 2016a,b). The extent of genetic variation and spatial distribution in A. tenuissima associated with tomato foliar diseases in China, however, remains largely unknown.
Genetic variation in a species results from evolutionary events, including drift, migration, type of mating system, selection, and mutation, all of which are influenced by human activity and natural events (Wright et al., 2004;Zhan and McDonald, 2013). For example, indiscriminate use of fungicides in agro-ecosystems can increase the rate of mutation and impact the virulence and aggressiveness of a pathogen (Piotrowska et al., 2016). The artificial dissemination of a pathogen also affects migration or selection in pathogen populations and causes a change in natural ecosystems. Emergence of a sexual stage also plays an important role in dispersion (Artero et al., 2016). Species are expected to exhibit a greater level of genetic variation in response to environmental changes that increase their ability to adapt (McDonald and Linde, 2002;Meng et al., 2015). The genetic structure of a species determines the evolutionary potential of pathogen as it reflects the level of allelic diversity available to selection pressures (McDonald, 1997;Parker and Gilbert, 2004). A comprehensive understanding of the genetic structure of a species of pathogen is essential for managing disease occurrence and developing sustainable management practices (Miller et al., 2003).
Molecular markers are a reliable tool for assessing genetic variation and inferring mating systems in fungal populations (Santha Lakshmi Prasad et al., 2009;Stewart et al., 2011). To date, however, relatively few molecular markers, such as random amplified polymorphic DNA (RAPD) and amplified fragment length polymorphism (AFLP), have been reported for Alternaria spp. (Morris et al., 2000;Gannibal et al., 2007). Additionally, most molecular markers do not capture genetic structure with the degree of resolution and reliability that is provided by simple sequence repeats (SSRs) or microsatellites. SSRs are tandem repeat motifs of 1-6 bases that are abundantly spread throughout eukaryotic genomes and reflect genetic diversity (Andeden et al., 2015).
Genetic SSRs occur in the coding and regulatory regions of genes (Zheng et al., 2013), while genomic SSRs are in noncoding regions of the genome. Genetic SSRs have the advantage of being more highly conserved and thus more transferable across species (Chen et al., 2015). Although the identification of genetic SSRs is less expensive and time-consuming than identifying genomic SSRs, little information is available on genetic SSR markers in A. tenuissima. Due to their utility, it would be useful to identify SSR loci in A. tenuissima from transcriptomic data and design primer pairs that could be used to identify these SSR loci in genetic analyses of A. tenuissima.
A high level of genetic variation has been reported in A. alternata, A. brassicicola, and A. solani suggesting that a cryptic sexual stage dominates in these Alternaria species (Morris et al., 2000;Bock et al., 2005;Meng et al., 2015). High levels of genetic variation and evidence of sexual reproduction has also been reported in A. tenuissima isolated from wheat in Russia (Gannibal et al., 2007). Little information is available, however, on the population structure of A. tenuissima in tomato. Therefore, in the present study, a transcriptome of A. tenuissima was sequenced using next-generation sequencing technology (NGS) and used to identify large numbers of SSRs. This was done to determine the level of genetic diversity in A. tenuissima populations in China and infer the main evolutionary factors influencing the epidemic outbreaks of A. tenuissima. The distribution of SSR motifs in the transcriptome of A. tenuissima was characterized and the assembled unigenes were functionally annotated.

Sample Collection and Fungal Populations
A total of 191 A. tenuissima isolates were collected during 2015 and 2017 from 34 sampling locations in China (Supplementary Table S1). The isolates were obtained from tomato leaves exhibiting typical symptoms of foliar disease. The 34 sampling locations from twelve provinces, autonomous region, or municipality were organized into four tomato cropping regions based on geography, climate, and agricultural management (Bernardes-de-Assis et al., 2009). They were designated Northeastern China (Heilongjiang, Jilin, and Liaoning Provinces), Northern China (Hebei, Shanxi Provinces, and Beijing Municipality), Eastern China (Anhui, Fujian, Jiangxi, and Zhejiang Provinces), and Northwestern China (Ningxia Hui Autonomous Region and Gansu Province). These groupings represent the four major tomato-cropping regions in China (Figure 1). The four geographic regions are separated from each other by more than 500 km.
Isolates were identified using the standard procedures reported in our previous study (Zheng et al., 2015). The procedure includes both morphological characteristics and molecular analyses. The collected isolates were transferred to potato carrot agar (PCA) plates and grown for 7 days at 25 • C with 8 h light/16 h dark photoperiod to characterize their growth and conidia morphology. Genomic DNA was extracted from the A. tenuissima isolates using a cetyltrimethylammonium bromide (CTAB) procedure and used for molecular identification and additional SSR assays (Lee and Taylor, 1990). For the molecular analysis, partial coding sequences of the histone 3 gene and the internal transcribed spacer (ITS) region of ribosomal DNA (rDNA) were amplified from the extracted genomic DNA using the primer sets H3-1a/H3-1b and ITS1/ITS4, respectively (Glass and Donaldson, 1995). The PCR amplification products were shipped to Beijing TSINGKE Biotechnology Co. Ltd. (Beijing, China) for sequencing. The obtained sequence data were used to conduct BLAST searches using BLASTn on the NCBI website 1 to identify the Alternaria species.

cDNA Library Construction and Illumina Sequencing
One isolate "BJ319-1" was randomly selected from among the 191 collected A. tenuissima isolates. The mycelia from a culture of "BJ319-1" growing on potato dextrose agar (PDA) plates for 7 days, were harvested for the isolation of total RNA and subsequent transcriptome analysis. Total RNA was extracted using TRIzol reagent (Ambion, Thermo Fisher Scientific, United States). Any traces of DNA were then removed from the RNA extracts using DNase I (TaKaRa, Japan). The purity of the RNA extract was determined using a Nano-Drop 2000 (Thermo Fisher Scientific, United States). Qubit 2.0 (Life Technologies, United States) and an Agilent 2100 Bioanalyzer (Agilent Technologies, United States) were used to estimate the concentration and integrity of the total RNA. The cDNA library of pooled RNA was constructed with a method described by Li et al. (2014), with minor modifications. The resulting A. tenuissima cDNA library was sequenced using an Illumina HiSeq 4000 sequencing platform at Beijing Biomarker Technologies Co., Ltd. (Beijing, China). 1 https://blast.ncbi.nlm.nih.gov/Blast.cgi

De novo Assembly and Unigene Annotation
To obtain high-quality reads, raw sequences from the Illumina sequencing were filtered using Trimmomatic, a flexible readtrimming tool for Illumina NGS data (Bolger et al., 2014). After filtering out low-quality reads, the resulting clean reads were deposited into the Sequence Read Archive (SRA) database 2 , under the accession number SRP136412. Subsequently, clean reads were assembled using Trinity (Haas et al., 2013).
To annotate the A. tenuissima transcriptome, unigenes were searched against various databases, including NCBI non-redundant protein (Nr protein), Swiss-Prot, euKaryotic Orthologous Groups (KOG), and Kyoto Encyclopedia of Genes and Genomes (KEGG) (Altschul et al., 1997;Cameron et al., 2004). Blast2GO software (Conesa et al., 2005) was used to assign the Gene Ontology (GO) terms to the unigenes. All unigene annotations were performed using the method of Wu et al. (2014).

Development of SSRs and Primer Design
SSR loci were identified from the A. tenuissima transcriptome sequence data using MISA (MIcroSAtellite identification tool) and SAMtools (Li et al., 2009). The minimum number of repeats was defined as ten for mono-nucleotide repeats, six for dinucleotide repeats, five for tri-nucleotide repeats, and three for tetra-, penta-, and hexa-nucleotide repeats. Subsequently, SSR primers were designed using Primer Premier 5.0 software (PREMIER Biosoft International, Palo Alto, CA, United States). Based on the methodology described by Chen et al. (2015), the criteria used for designing the primers were: primer length of 16-22 bp, PCR product size of 100-300 bp, annealing temperature of 40-60 • C, and GC content of 40-60%.

SSR Assays of A. tenuissima Populations
To further characterize the population genetics of the A. tenuissima populations, primers pairs were synthesized for 16 SSRs. These SSRs were used as suitable markers for subsequent analyses based on preliminary tests ( Table 1). The forward primers were separately labeled with a fluorescent dye (Dye set: FAM, ROX, TRMA, HEX; Applied TSINGKE Biotechnology Co. Ltd.) at the 5 end. PCR amplifications were performed in a 25 µL PCR mixture that included 1 µl DNA template (100 µg mL −1 ), 9.5 µL ddH 2 O, 12.5 µL 2 × T5 Super PCR Mix (TSINGKE Biotechnology Co. Ltd.), and 1 µL each of the two primers (10 µM). The amplification was conducted in an Eppendorf Mastercycler R using the following protocol: an initial denaturation step at 95 • C for 5 min, followed by 35 cycles of denaturation at 94 • C for 30 s; annealing at 57 • C for 30 s, and extension at 72 • C for 30 s; with a final extension for 5 min at 72 • C. The obtained amplicons were then sequenced using an ABI 3730 DNA sequencer (Applied Biosystems).

Population Genetics
Alleles were aligned using GeneMarker v.2.2.0 software (SoftGenetics, State College, PA, United States). Sequenced fragments with an identical size originating from the same primer pair were considered as an allele. Multilocus genotypes, defined as having the same alleles at each of the single SSR loci, were detected using GenClone v.2.0 (Arnaud- Haond and Belkhir, 2007). Isolates with the same multilocus haplotype were considered as the asexual progeny of a genotype.
Genotypic diversity, gene diversity (Nei, 1973), allelic richness, and clonal fraction (CF) were used to evaluate genetic variation for each of the assigned geographical groups and the pooled geographic regions (Zhan et al., 2003). The genetic variation   data, except for CF, was calculated using POPGENE v. 1.32 according to the method of Meng et al. (2015). Shannon index was computed to estimate genotypic diversity (Grünwald et al., 2003). CF, defined as the percentage of isolates resulting from asexual reproduction (Zhan et al., 2003), was calculated as 1 − (number of genotypes/number of isolates assayed). The Ewens-Watterson test was used to evaluate the selective neutrality of SSR markers (Ewens, 1972;Watterson, 1978). The index population differentiation (F ST ), summary heterozygosity (H) from each locus (Nei, 1973;Nei and Chesser, 1983), and the test for selective neutrality were also computed by POPGENE v. 1.32 (Yeh et al., 1997). Finally, to infer the possibility of random mating in each of the geographic regions, MULTILOCUS v. 1.3 was used to test the null hypothesis of random mating using the index of association (I A ) and multilocus linkage disequilibrium values (r d ) by 1,000 randomizations according to Hemmati et al. (2009). If the value of I A and r d are not significantly different from the expected value of 0, random mating exists in the population (Brown et al., 1980). I A is usually dependent on the number of loci included. To supplement the use of I A , a modified statistic (r d ) was used in the analysis. The proportion of compatible pairs of SSR loci (PrCompat) was also performed using MULTILOCUS v. 1.3 software (Agapow and Burt, 2001). If all the observed genotypes are explained by mutation rather than recombination, two SSR loci are compatible (Estabrook and Landrum, 1975 (Pritchard et al., 2000) was used to analyze population structure and test for admixture. A Bayesian distinct Monte Carlo Markov Chain (MCMC) approach was implemented by STRUCTURE v. 2.3.4 using the protocol described by Tsui et al. (2012). A 100,000 burn-in period followed by 1,000,000 iterations was implemented using an admixture model, and the correlated allele frequencies for K-values were between 1 and 10. For each simulated cluster for K = 1−10, ten runs were repeated independently for consistency (Tsui et al., 2012). Structure Harvester 3 was used to compute K (Evanno et al., 2005) to estimate the optimal K-value. Replicate simulations of cluster membership (q-matrices) at K = 4 were used as input for CLUMPP_Windows v. 1.1.2 (Jakobsson and Rosenberg, 2007) using the Fullsearch algorithm, with weighted H and the G similarity statistic. Summarized cluster membership matrices (q-values) for both individuals and populations were then visualized using DISTRUCT v. 1.1 (Rosenberg, 2004).
Isolation by distance (IBD) was evaluated by assessing the correlation between pairwise geographical distance and Nei's unbiased genetic distance (Nei, 1978) for all population pairs with the package GENEPOP in R v.3.5.1 (using Isolde) (Raymond and Rousset, 1995) using 1000 random permutations.
The possibility and rate of migration among geographic regions were tested with MIGRATE v. 3.6.11 (Beerli and Felsenstein, 1999), which uses an expansion of the coalescent theory to estimate migration rates between populations (N e m) and (2N e µ), where N e is the effective population size, m is the constant migration rate between population pairs, and µ is the mutation rate per generation at the locus considered. Likelihood surfaces for each parameter were estimated by simulating genealogies using MCMC approach. The computations were carried out under a Brownian motion approximation of the stepwise mutation model (SMM). The runs consisted of two replicates of 10 short chains (with 10,000 genealogies sampled) and three long chains (with 500,000 genealogies sampled), with the first 10,000 genealogies discarded. A likelihood ratio test was used to compare the likelihoods of all models (Beerli and Felsenstein, 1999).

Illumina Sequencing Data
After stringent quality assessment, a total of 32.25 million clean reads with a GC content of 54.18 and a 95.27% Quality Score 30 (Q30) were obtained from the transcriptome sequence of A. tenuissima (Supplementary Table S2). Clean reads accounted for 99.72% of the total raw reads (Supplementary Figure S1). Sequencing was conducted on an Illumina HiSeq 4000 sequencing platform. Based on the clean reads, 50,992 individual transcripts were identified and 32,284 unigenes were assembled with an average length of 2,007.88 bp (N50 length of transcript = 4,172 bp, which is defined as the shortest sequence length of 50% of total contigs and is used to evaluate the quality of assembled sequences) and 1,088.76 bp (N50 length of unigene = 2,451 bp), respectively. Among the unigenes, 9,555 (29.60%) were 201 to 300 bp in length; 8,304 (25.72%) were 301 to 500 bp; 5,112 (15.83%) were 501 to 1,000 bp; 3,899 (12.08%) were 1,001 to 2,000 bp; and 5,413 (16.77%) were over 2,000 bp.

Sequence Annotation
Collectively, 24,670 unigenes were annotated utilizing five databases, NCBI non-redundant protein (Nr protein), Swiss-Prot, euKaryotic Orthologous Groups (KOG), Kyoto  Encyclopedia of Genes and Genomes (KEGG), and Gene Ontology (GO). Among the unigenes, 24,570 (99.6%) exhibited significant similarity to proteins in the Nr protein database, among which 10,562 (42.8%) were also found in the Swiss-Prot database.
A total of 15,503 (62.8%) of the unigenes were classified into 25 functional categories according to the KOG functional classification (Figure 2). The general function prediction was the most highly represented category (2,349 unigenes, 15.2%). Extracellular structures (31, 0.2%), followed by cell motility (3, <0.1%), were the least represented categories. KEGG was employed to identify the biological pathways present in the transcriptome sequences obtained from A. tenuissima. This analysis resulted in the clustering of 7,412 (30.0%) unigenes into 111 pathways. Highly represented pathways included: carbon metabolism (378 unigenes, 5.1%), biosynthesis of amino acids (352, 4.7%), and protein processing in the endoplasmic reticulum (321, 4.3%). The top 20 biological pathways of the enriched KEGG annotations are presented in Figure 3. Further analysis indicated that 15,589 (63.2%) of the unigenes could be assigned into three GO categories: cellular component, molecular function, and biological process (Figure 4). The highest represented subcategory in the cellular component category was cell part (6,475, 41.5%). Within the molecular function category, catalytic activity (8,860, 56.8%) and binding activity (7,810, 50.1%) were the most highly represented. A total of 1,096 unigenes (7.0%) were associated with transporter activity. Under the biological process category, metabolic process (11,068, 71.0%) was the most highly represented, followed by cellular process (8,984, 57.6%), and single-organism process (7,913, 50.8%).

Development of SSR Markers
A total of 1,140 SSRs were identified in the A. tenuissima transcriptome ( Table 2). The SSRs were identified in 1,072 unigenes, among a total of 9,312 unigenes that were more than 1,000 bp in length. The number of repeat nucleotides in the SSRs varied from 5 to 24, with more than 10 repeats being the most abundant SSR. The percentage of motifs with 9 repeats was low (3.1%) ( Table 2). A total of 111 of the unigenes contained more than one SSR. The distribution density of SSR loci in A. tenuissima unigenes is one per 22.9 kb and the frequency distribution of different SSR repeat numbers varies. The 487 mono-nucleotide repeat motifs were the most abundant with a frequency of 42.7%, followed by tri-(368 or 32.3%), di-(258 or 22.6%), tetra-(15 or 1.32%), penta-(7 or 0.6%), and hexa-nucleotide (5 or 0.4%) repeat motifs ( Table 2). Sixteen polymorphic SSR markers were selected for population genetic structure analysis based on the presence of different motifs. The unigenes with SSR markers were annotated with 51 functions, assigned to three categories in GO terms, and grouped into 13 different classifications in the KOG database. The unigenes with SSR markers were mapped onto the 15 pathways in the KEGG pathway database (Supplementary Table S3).

Genetic Variation and Linkage Disequilibrium
Sixteen SSR markers were used to analyze the genetic structure of A. tenuissima in four geographic regions in China ( Table 1). The flanking primers designed for each SSR provided distinct amplicons of the expected size. The observed fixation indexes had a 95% confidence interval for the analysis of selected neutrality of the SSR loci, suggesting that each SSR conformed to selective neutrality ( Table 3). The 191 isolates from the four populations of A. tenuissima were determined to represent 182 distinct genotypes. Genotypic diversity was 0.87 and the CF was 0.05 in the population pooled from the four geographic regions (Table 4). Among the geographic regions, 175 were detected only once, 6 genotypes detected twice, and one genotype detected four times. A total of 180 of the genotypes were detected in only 1 geographic region, while 2 genotypes were present in two geographic regions. No unique genotypes were found to be present in three or four geographic regions. The genotypic diversity of isolates collected from Eastern China was higher than in the other three regions, Northeastern China, Northern China, and Northwestern China ( Table 4).
The total number of alleles in the four geographic regions ranged from 2 to 13, and the number of private alleles ranged from 0.25 to 0.69 (  Table 4). The proportion of total genetic diversity attributed to population differentiation (F ST ) ranged from 0.309 to 0.582 for the sixteen SSR loci, with an overall average of 0.461. The gene diversity per locus ranged from 0.102 to 0.854 ( Table 5).
In an analysis of multilocus gametic disequilibrium, two measures of association, linkage disequilibrium (I A ) and proportion of compatible pairs of loci (r d ), were found to be significant in the four geographic regions for the total sample (all four geographic regions combined), indicating that the null hypothesis of complete panmixia was rejected ( Table 6).   (Agapow and Burt, 2001) after 1,000 randomizations.
FIGURE 5 | Population structure of A. tenuissima based on 16 microsatellites (different shadings represent different genetic groups; each column represents an individual isolate, and the height of the column segments shows the probability of assignment of this isolate to a particular genetic group. The height of each shaded region within an individual bar is the measure of proportional affiliation. When K = 4, q1 red, q2 green, q3 yellow, q4 blue, individuals with membership coefficients of q i ≥ 0.7 were assigned to a specific genetic cluster).

Population Structure and Differentiation
The Bayesian cluster analysis using STRUCTURE v. 2.3.4 indicated that the number of genetically distinct ancestral populations was best represented by K = 4 clusters, which was the highest value of K (Figure 5 and Supplementary Figure S2). The isolates from Northeastern China were assigned to cluster q4 (17 isolates, 38%) and q3 (15 isolates, 34%). The fifteen populations from Northern China exhibited a high level of admixture, and the isolates were assigned to cluster q4 (21 isolates, 29%) and q3 (19 isolates, 26%), followed by q2 (16 isolates, 22%). Most of the isolates from Eastern China were assigned to cluster q1 (18 isolates, 47%), and only one isolate was assigned to cluster q2. Most of the isolates from Northwestern China were assigned to q2 (16 isolates, 43%) and to a lesser extent q3 (8 isolates, 22%). Only two Northwestern China isolates were assigned to cluster q4. Based on the PCoA, the eight populations from Northeastern China clustered within the two right quadrants of the first PCoA axis (Figure 6). The fifteen Northern China populations were spread across the first component space (explaining 45% of the variation), partially overlapping with Eastern China and two Northwestern China regional populations. Northwestern China and Northeastern China regional populations were slightly differentiated in the second PCoA axis (explaining 12.86% of the variation). The populations in Eastern China and Northwestern China tended to fall into different clusters in the first PCoA axis (Figure 6). The PCoA results were similar to results obtained in the STRUCTURE analysis. The genetic clusters were not completely grouped according to geographic region in the PCoA analysis, which may be explained by the populations with small sample size.
The analysis of molecular variance (AMOVA) performed on the 34 populations indicated that 13.25 and 86.75% of the genetic variation was attributed to variations among and within populations, respectively (P < 0.001) ( Table 7). AMOVA was used to further analyze the level of differentiation among the four geographic regions established in the analysis utilizing STRUCTURE and geography. AMOVA attributed 5.43, 8.88, and 85.69% of the total variation to variations among geographic regions, among sampling locations within geographic region, and among individual isolates within populations, respectively, all of which were highly significant (P < 0.001) ( Table 7).
In general, pairwise genetic differentiations (F ST ) between populations were not significant within the geographic regions Northern China, Eastern China, and Northwestern China, except for Baoding city (Supplementary Table S4). These results indicate that the level of genetic differentiation within the Northern China, Eastern China, and Northwestern China are similar.  The strength of the correlation was weak and non-significant within the genetic clusters, Northeastern China (r 2 = 0.0151, P = 0.668), Northern China (r 2 = 0.0580, P = 0.999), Eastern China (r 2 = 0.0847, P = 0.907), and Northwestern China (r 2 = 0.3949, P = 0.007) (Supplementary Figure S3), however, a significant correlation was observed between genetic distances F ST /(1−F ST ) and geographical distances (km) for the entire data set (r 2 = 0.0362, P < 0.001). These results suggest that isolation by distance exists among the geographic regions.
Considerable levels of gene flow were observed among the geographic regions with an estimated number of migrants per generation M (2N e m) ranging from 0.54 (Eastern China from Northwestern China) to 3.70 (Northwestern China from Northern China) ( Table 8). The observed gene flow was asymmetric between Eastern China and Northwestern China (1.13 vs. 0.54), depending on the direction of the gene flow.

DISCUSSION
Whole genome sequences in the genus Alternaria have been obtained for A. consortialis (GenBank accession no. BCGG00000000), A. alternata (LMXP00000000), A. arborescens (AIIC00000000), and A. brassicicola (PHFN00000000) (Hu et al., 2012;Nguyen et al., 2016). A whole genome or transcriptome sequence of A. tenuissima, however, has not been reported or deposited in a public DNA database. In the current study, Illumina sequencing of a transcriptome of A. tenuissima generated 32.25 million reads with a 95.27% Q30 and 32,284 unigenes were predicted after assembly.
The N50 length of the unigenes was 2,451 bp, which was longer than the N50 obtained from the transcriptome sequencing of Alternaria sp. MG1 (N50 = 2,153 bp) using the Illumina HiSeq 2500 platform in a previous study (Che et al., 2016). Collectively, the results indicate that the quality and integrity of the obtained sequences are high.
Next-generation sequencing is a highly efficient and lowcost technology that can be used to develop large numbers of new SSR markers (Zhang et al., 2014). Detecting SSR markers in NGS data derived from a transcriptome is more efficient and rapid than previous, standard methodologies (Zheng et al., 2013). A total of 1,140 SSR loci were identified from 1,072 unigene sequences of A. tenuissima. Approximately 11.5% of the transcriptomic sequences contained SSR loci. The distribution density of SSRs in A. tenuissima is similar to many other higher plant species, such as rice, wheat, and soybean, which generally express a larger number of genes in a transcriptome than fungi due to the overall size of their genomes. Our results clearly identified a large number of SSR loci in the genes expressed in the A. tenuissima transcriptome.
Genetic markers should be selectively neutral, moderately diverse, and not linked if they are to be reliably used to study population genetics (Brown, 1996;Cooke and Lees, 2004). In the current study, sixteen SSR loci were selected from among different unigenes to analyze genetic diversity in A. tenuissima populations from four geographic regions. Most of the unigenes with SSR loci had different annotations in GO, KOG, and KEGG databases. The average size of a genome within different Alternaria species is more than 30 Mb (Hu et al., 2012;Woudenberg et al., 2015). Therefore, the probability that the sixteen selected genetic SSR markers are linked is extremely low based on the size of the genome.
Sexual recombination is expected to produce high levels of genetic diversity and random association among different loci (Milgroom, 1996;Kreis et al., 2016). In recent studies, some Alternaria species, such as A. solani (Meng et al., 2015), A. brassicicola (Morris et al., 2000), and A. helianthin (Santha Lakshmi Prasad et al., 2009) have been reported to have high levels of genetic diversity and recombination. Stewart et al. (2011), based on the results of mating system tests, suggested that Alternaria has a sexual cycle. Linkage equilibrium was found in A. brassicicola among the microsatellite loci (Linde et al., 2010). The complete sexual cycle of the above Alternaria species, however, has not been observed in any parts of the world. In the present study, the analysis of genotypic disequilibrium of populations from four geographic regions revealed a significant degree of non-random association, although high levels of diversity were observed. These results are consistent with Meng et al. (2015), who found that populations of A. solani from the Fujian Province (Eastern China) displayed high genetic variation and a lack of random mating. Bock et al. (2005) reported high levels of genetic diversity with a significant level of linkage disequilibrium in populations of A. brassicicola and suggested that recombination occurred only occasionally. Van Der Waals et al. (2004) indicated that the high genetic variation in A. solani could be accounted for by mutations rather than by sexual reproduction. These results suggest that random mating is not the main biotic factor that governs the high variation present in the four geographic regions.
High levels of genetic diversity were found to be present in the four geographic regions and within each sampled population, except for populations with a small sample size (e.g., Beijing Municipality, Songyuan, Ganzhou, Shaoxing, and Zhangye cities) ( Table 4 and Supplementary Table S1). There are reports of A. tenuissima causing foliar diseases in China on wheat (Bensassi et al., 2009), potato (Zheng et al., 2015), and watermelon (Zhao et al., 2016a). These crops are often grown in rotation with tomato in some regions of China. The high genetic variation within each geographic region and the low spatial differentiation among different geographic regions are similar to in the findings of a study of A. alternata in China (Meng et al., 2018). The genetic structure of the Northeastern China, Northern China, and Northwestern China geographic regions were highly admixed and could not be separated into three single major clusters by admixture and principal coordinate analyses. These results comply with gene flow driven by anthropogenic activities occurring in geographically closer populations which exchange genetic information over time, and have a tendency to exhibit a higher genetic similarity (Meng et al., 2018).
Samples in the current study were collected from four geographic regions, separated from each other by more than 500 km. It is difficult for pathogen spores to be disseminated such a distance via the air. The isolation by distance observed for the entire data set, but not within geographic regions, is indicative of a barrier to gene flow. Seed-borne dispersal or transport of other goods contaminated with A. tenuissima, however, may account for the observed gene flow between the geographic regions of Northeastern China, Northern China, and Northwestern China (Malik et al., 1991;Bock et al., 2005;Meng et al., 2015). Long distance dispersal via human-mediated gene flow was also reported in populations of A. alternata in potato growing areas of China (Meng et al., 2018) and Rhynchosporium secalis in agricultural systems . Our present results suggest that human-mediated dispersal also plays an important role in the dynamics of the population genetic structure of A. tenuissima.
In contrast to Northeastern China, Northern China, and Northwestern China, the Eastern China region exhibited a relatively simple genetic structure (Figure 1). The sampled locations in the Eastern China region are geographically far from the other three tomato-cropping regions and separated from them by the Yellow and Yangtze Rivers. We suggest that the populations located within Eastern China may be separated by weak natural barriers. In this scenario, the significant correlation between genetic differentiation and geographic distance would mainly be influenced by the population genetic structure in the Eastern China geographic region. This infers that genetic isolation exists between the Eastern China geographic region and the other three tomato-cropping regions.
In recent years, A. tenuissima has become an important pathogen, causing foliar disease in various crops throughout China Zheng et al., 2015;Zhao et al., 2016a,b). A comprehensive understanding of the population genetics of A. tenuissima has been lacking. In the present study, high levels of genetic diversity were determined to be present in A. tenuissima potentially brought about by gene flow among individuals within the populations. This may explain why A. tenuissima has developed the ability to infect different crops. The population genetics and biology of other tomato-growing regions in China (e.g., Central China and Southern China) have yet to be determined. Additional population genetic studies of Alternaria are needed for other geographic regions in China and further analyses are needed to determine the population genetic structure of Alternaria isolates over wider geographic regions of China.

DATA ARCHIVING STATEMENT
Data for this study will be available at the Dryad Digital Repository after manuscript is accepted for publication.

AUTHOR CONTRIBUTIONS
NY and XW conceived and designed the study. NY, GM, and KC performed the experiments. NY and XW wrote the paper. XW reviewed and edited the manuscript.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmicb. 2018.02904/full#supplementary-material FIGURE S1 | The percentage of clean reads, adapter related reads, and low quality reads.
FIGURE S2 | The estimated Delta K (K) for number of clusters ranging from 2 to 10 in STRUCTURE analysis.
FIGURE S3 | Plot of isolation by distance for the entire population, the geographic region Northern China, the geographic region Northeastern China, and the geographic region Eastern China.