Characterization and Application of EST-SSR Markers Developed From the Transcriptome of Amentotaxus argotaenia (Taxaceae), a Relict Vulnerable Conifer

Amentotaxus argotaenia (Taxaceae) is a vulnerable coniferous species with preference for shade and moist environment. Accurate estimation of genetic variation is crucial for its conservation, especially in the context of global warming. In this study, we acquired a transcriptome from A. argotaenia leaves using Illumina sequencing and de novo assembled 62,896 unigenes, of which 5510 EST-SSRs were detected. Twenty-two polymorphic EST-SSRs were successfully developed and further used to investigate genetic variation, linkage disequilibrium, and bottleneck signatures of A. argotaenia. The results showed that A. argotaenia had moderate genetic variation and high genetic differentiation, which may provide raw material to protect against climatic changes and accelerate local adaptation, respectively. No bottlenecks were found to occur in A. argotaenia. Our study not only showed that these EST markers are very effective in population genetic analysis but also lay a solid foundation for further investigating adaptive evolution and conservation strategies of A. argotaenia.


INTRODUCTION
The genus Amentotaxus Pilger (Taxaceae) has six species, and five of them have been listed as endangered, vulnerable, or near threatened (Fu et al., 1999a;Farjon and Filer, 2013;Hilton-Taylor et al., 2013). Among all Amentotaxus species, Amentotaxus argotaenia (Hance) Pilger has the widest distribution but with small isolated populations occurring in southern and central China, northern Vietnam, and Laos (Farjon and Filer, 2013). Its preferred environments are limestone mountains, forests, ravines, and shady and damp stream banks, at altitudes of 300-1100 m (Fu et al., 1999a;Lin et al., 2007). The natural regeneration of the plant is infrequent due to slow growth rate and poorly dispersed seeds. Moreover, forest clearing and habitat loss have also been severely reducing its population size. A. argotaenia is listed as vulnerable in China and as near threatened in the International Union for Conservation of Nature Red List of Threatened Species (Hilton-Taylor et al., 2013). Since knowledge of population genetics is essential for the conservation and sustainable use of wild resources (Wayne and Morin, 2004;Hoban et al., 2013), we aim to examine the population genetic variation of A. argotaenia using novel molecular markers.
In comparison to genomic SSRs, expressed sequence tag (EST)-SSRs represent functional markers that may linked with functional genes inducing phenotypic effects (Ranade et al., 2014;Zhou et al., 2018a). They hence provide opportunities to examine functional diversity in relation to adaptive variation. Moreover, EST-SSRs are reported to be more reliable because they present lower frequencies of null alleles than do genomic SSRs (Yu and Li, 2008). With the advances in next-generation sequencing technology, EST-SSRs are becoming amenable to be identified by sequencing the transcriptomes (Zhou et al., 2018b). Next-generation sequencing is faster and more cost-effective than traditional approach, e.g., cDNA library construction method (Huang et al., 2015). However, currently, there is a shortage of EST-SSRs developed from A. argotaenia, although EST-SSRs have been identified for its congeneric species, Amentotaxus formosana . Excavation and characterization of EST-SSRs for A. argotaenia may contribute to enhance our understanding of its population genetic diversity, structure, and the genetic basis of adaptive divergence. In addition, it will also provide resources to assess the association between transposable elements and SSR distribution as well as their roles in genome organization (La Rota et al., 2005;Wang et al., 2018a).
In this study, we constructed a leaf transcriptome of A. argotaenia using the Illumina sequencing platform. Based on the transcriptome sequencing data, we developed a set of EST-SSR markers and examined their polymorphisms. We then assessed the genetic variation of four populations of A. argotaenia using the novel EST-SSR markers. This work provides essential information for the conservation and management of A. argotaenia in the future.

Plant Materials and DNA Extraction
A total of 56 A. argotaenia individuals were sampled from four of its natural populations Jiuqushui (JQS; n = 15), Chuanping (CP; n = 13), Qiniangshan (QNS; n = 12), and Wugongshan (WGS; n = 16) located in China (Supplementary Table 1). Fresh leaves were collected and desiccated in sealed plastic bags with silica gel. Genomic DNA was isolated using the modified cetyltrimethylammonium bromide method (Su et al., 2005). DNA quality was evaluated using gel electrophoresis on 0.8% agarose gel.

RNA Extraction, cDNA library Construction, and Transcriptome Sequencing
Fresh young leaves of one A. argotaenia individual from population CP, which is planted in a greenhouse at Sun Yat-sen University, were used to extract total RNAs using the method described by Fu et al. (2004). RNA integrity was evaluated on agarose gels followed by quantification on an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, California, USA). The mRNAs were isolated from the total RNAs by using a Dynabeads mRNA DIRECT Kit (Invitrogen Life Technologies, Carlsbad, California, USA) and randomly fragmented. The fragmented mRNAs were converted into double-stranded cDNA by using random primers and reverse transcriptase. After endrepairing and tailing A, the cDNA fragments were ligated to Illumina paired-end adapters. The cDNA library was sequenced on an Illumina Hiseq2500 platform (Illumina, San Diego, California, USA) with insertion size of 400-500 bp.

Transcriptome Assembly, Functional Annotation, and Classification
We obtained a total of 25,257,542 paired-end reads from A. argotaenia. The reads were filtered by removing primer or adaptor sequences, and reads that contain unknown ("N") or poorquality bases (the mean quality per base < 15 with a 4-base wide sliding window) using the Trimmomatic software version 0.32 (Bolger et al., 2014). The resulting clean data were deposited in Sequence Read Archive of the National Center for Biotechnology Information (NCBI) (Bioproject no. PRJNA413732; Biosample no. SAMN07764634; https://www.ncbi.nlm.nih.gov/bioproject/ PRJNA413732). The clean reads longer than 90 nt were de novo assembled into contigs and transcripts using the TRINITY software (https://github.com/trinityrnaseq/trinityrnaseq/releases) with default settings. The transcripts that cannot be prolonged at either end were defined as unigenes.
All unigenes were searched against Nt (NCBI nucleotide sequences), Nr (NCBI non-redundant database), and Swiss-Prot (a manually annotated and reviewed protein sequence database) through blast 2.2.30+ (ftp://ftp.ncbi.nlm.nih.gov/blast/ executables/blast+/2.2.30/) with a cut-off E-value of 10 −5 . Protein domains of open reading frame within unigenes was identified by using HMMER hmmscan (hmmer-3.1b2-linux-intel-x86_64) and Pfam (the protein families database). Blast2GO version 3.0 was used to perform Gene Ontology (GO) annotations defined by molecular function, cellular component, and biological process ontologies (http://www.blast2go.com/b2ghome). We further used the Kyoto Encyclopedia of Genes and Genomes (KEGG) database to performed pathways annotation and euKaryotic Ortholog Groups (KOG) database to predict possible functions.

Evaluation of Polymorphic EST-SSRs and Population genetic Analysis
GenAlex software (Peakall and Smouse, 2006) was used to calculate the EST-SSR genetic parameters, including the number of observed alleles (Na), the effective number of alleles (Ne), observed heterozygosities (Ho), expected heterozygosities (He), and probability of the deviation from the Hardy-Weinberg equilibrium. The polymorphism information content (PIC) value and null alleles were evaluated using CERVUS 3.0 (Kalinowski et al., 2007) and Micro-Checker (Van Oosterhout et al., 2004), respectively. Linkage disequilibrium across loci was determined using TASSEL version 3.0 (Bradbury et al., 2007) with squared correlation coefficient (r 2 > 0.3) and the threshold of p values (< 0.001) based on Fisher's exact test.
Arlequin version 3.5 (Excoffier and Lischer, 2010) was used to perform a Mantel test with 10,000 permutations to examine the pattern of isolation by distance. Using the same software, analysis of molecular variance was conducted to determine the amount of genetic variation at different levels.
Using EST-SSR data, we applied a Bayesian model-based clustering algorithm implemented in STRUCTURE 2.2 to infer population structure. We used the admixture model, setting the parameters as follows: burn-in periods = 10,000, MCMC replicates = 10,000, K = 1 to 8, and iterations = 10. The optimum number of clusters (K) was determined by calculating ΔK (Evanno et al., 2005).
We conducted the Wilcoxon's sign-rank test and the modeshift test to detect signatures of genetic bottleneck by running BOTTLENECK version 1.2.02 (Piry et al., 1999). The two-phase mutation model was selected because it is more suitable for microsatellite data than the other two models (Piry et al., 1999;Zhang and Zhou, 2013). We performed 1000 simulations under the two-phase mutation model with 70% single-step mutations and 30% multi-step mutations.

Transcriptome Sequencing and De Novo Assembly
Approximately 23.5 million clean reads were obtained from the transcriptome of A. argotaenia with the length of 90-125 bp and GC content of 47%. The percentages of Q20 (base sequencing error probability < 1%) and Q30 (base sequencing error probability < 0.1%) bases were 100% and 97%, respectively. These clean reads were assembled into 80674 transcripts by using Trinity with an average length of 756 bp and an N50 of 1018 bp. The total length of transcripts reached 60,999,479 bp. After further assembly, a total of 62,896 unigenes were identified with an average length of 721 bp, a minimal length of 301 bp, and an N50 value of 947 bp. The sum of the length of the unigenes was 45,357,136 bp ( Table 1). The length of 37.473% (23569) of the unigenes ranged from 301 to 400 bp, 61.956% (38,968) varied from 401 to 3000 bp, while 0.571% (359) was longer than 3000 bp (Supplementary Figure 1).

Functional Annotation and Categorization
We conducted the annotation of 62,896 unigenes in the seven public databases (Nr, Nt, KOG, Swiss-Prot, Pfam, KEGG, and GO), of which 36,671 were successfully annotated ( Table 1) (Table 1).
First, 31,283 unigenes were classified into three main GO categories: biological process, cellular component, and molecular function, including 44 functional groups. In the biological process category, there were 7313 unigenes assigned to "cellular process, " 5973 to "single-organism process, " 5808 to "metabolic process, " and 1 to "biological regulation. " In the cellular component category, "cell part" and "organelle" component-related functions were predominant, with 4835 unigenes assigned to the former and 2060 to the latter. In the molecular function category, "binding" and "catalytic activity" were the most enriched, comprising 2520 and 2358 unigenes, respectively (Figure 1).

Polymorphic EST-SSRs Identification and Estimation of genetic Diversity
We randomly selected 60 EST-SSR primers to evaluate their application and the polymorphism across 12 A. argotaenia individuals from four populations. Twenty-two of the microsatellite loci exhibited allelic polymorphism, whereas 16 were identified as monomorphic ( Table 3). We then used the 22 polymorphic EST-SSR markers to perform population genetic analysis (Supplementary Figure 5).
The number of observed alleles and the effective number of alleles varied from 1 to 7 and from 1 to 4.694 per locus, respectively. The observed heterozygosity ranged from 0 to 1.000 (average = 0.250), while the expected heterozygosity ranged from 0.000 to 0.787 (average = 0.390). The mean value of PIC was 0.455, with the minimum of 0.084 and the maximum of 0.707. Fourteen loci were identified as null allele. Six, 9, 14, and 16 EST-SSRs showed significant deviations from the Hardy-Weinberg equilibrium in populations JQS, CP, QNS, and WGS, respectively ( Table 4).

Population genetic Structure and Differentiation
Genetic differentiation (Fst) based on EST-SSRs was 0.28198. Analysis of molecular variance revealed that 71.80% of the genetic variation occurred within populations, while 10.37% and 17.83% were attributed to among populations within groups and among groups, respectively (Supplementary Table 2). In addition, the result of Mantel test showed that there was no significant correlation between genetic and geographical distances (p = 0.3306).
ΔK demonstrated that the uppermost K equaled 3 (Supplementary Figure 4). Amentataxus argotaenia populations were assigned to three groups. Group I contained populations JQS and CP, group II contained population WGS, and group III contained population QNS (Figure 2).

Bottleneck Signature
At species level, both Wilcoxon's sign-rank test and mode-shift analysis indicated that A. argotaenia had not experienced a recent bottleneck (Supplementary Table 3).

Characterization of Transcriptome
In this study, we sequenced, assembled, and annotated the transcriptome of A. argotaenia using the next-generation sequencing approach. A total of 62,896 unigenes were de novo assembled with the unigene mean and N50 length of 721 and 947 bp, respectively (Table 1). More than half of the unigenes can be successfully annotated through seven databases (Nr, Nt, KOG, Swiss-Prot, Pfam, KO, and GO), of which 490 were simultaneously identified ( Table 1). Most of the annotated unigenes were unique in A. argotaenia compared to its closely related conifer Torreya grandis (Zeng et al., 2018). Moreover, the annotated results of NR database indicated that A. argotaenia exhibited only 27.5% unigene identity to another conifer Picea sitchensis. These unique unigenes may represent the species-specific genetic signature of A. argotaenia potentially underlying its speciation process or evolution (Frech and Chen, 2011). Similar results have been found in the case of Picea abies (Nystedt et al., 2013). In addition, 31,283, 32,953, and 15,072 unigenes of A. argotaenia were assigned into 44 functional groups in GO, 25 classifications in KOG, and 38 pathways in KEGG, respectively. These results indicate that the identified A. argotaenia unigenes have wide-ranging functions and will be valuable for analyzing the functional diversity of A. argotaenia.

Frequency and Distribution of EST-SSRs
The mean length of unigenes (721 bp) of A. argotaenia was considerably longer than that of other conifers, including Pinus pinaster (495 bp) (Canales et al., 2014), Platycladus orientalis (686 bp) (Hu et al., 2016), and P. abies (472 bp) (Chen et al., 2012). Zalapa et al. (2012) pointed out that longer sequences will increase the probability to properly design EST-SSR primers. In accordance with this, we developed 5510 EST-SSRs from the transcriptome of A. argotaenia, which was significantly greater than those of A. formosana (4955)  and Pinus densiflora (1953) . The most common motif for dinucleotides in A. argotaenia was found to be AT/TA. The same result was obtained in Picea spp. (Rungis et al., 2004), Pinus dabeshanensis (Xiang et al., 2015), and Pseudolarix amabilis (Geng et al., 2015). Ranade et al. (2014) emphasized that AT/TA was often ranked as the most abundant dimer motif in gymnosperms (especially in the 3′ untranslated regions). A factor related to this phenomenon may be increased A + T contents. In addition, the most common trinucleotide motif in A. argotaenia was AAG/CTT, which is similar to that in Cryptomeria japonica (Ueno et al., 2012), Pinus taeda (Wegrzyn et al., 2014), and Pinus halepensis (Pinosio et al., 2014), but unlike in P. dabeshanensis (AGC) (Xiang et al., 2015). It has been noted that AAG/CTT is the target for methylation in plants (Law and Jacobsen, 2010).

Validation and Polymorphism of EST-SSRs and Population genetic Variation
Thirty-eight of the 60 EST-SSR primers designed for A. argotaenia enabled to amplify expected products with a success rate of 63.33%, which is relatively high in comparison to previous studies (Dantas et al., 2015;Ueno et al., 2015). Then 22 polymorphic EST-SSRs were used to investigate standard genetic diversity and population genetic structure of four A. argotaenia populations. Only four pairs of them were tightly linked among each other. Moderate genetic variation was observed based on the EST-SSRs (PIC = 0.455) according to the evaluation criteria of polymorphism (moderate: 0.25 < PIC < 0.5) (Botstein et al., 1980). This judgment was also   lent support by such genetic parameters as the average of observed alleles, observed heterozygosity, expected heterozygosity, and the percentage of polymorphic band. In contrast, previous researches detected low levels of genetic diversity in A. argotaenia by using ISSRs and genomic SSRs (Ge et al., 2005;Ge et al., 2015). These inconsistencies highlighted the importance to investigate genetic variation of A. argotaenia using multiple markers. Accurate estimate of genetic diversity is very useful for conservation and management of genetic resources (Cardoso et al., 1998;Wang et al., 2018b). Compared to other conifers, we observed a moderate EST-SSR variation in A. argotaenia (Supplementary Table 4). Similar levels of the EST-SSR variation were found in its closely related species A. formosana as well . The moderate level of functional diversity, together with the finding that A. argotaenia did not experience a recent bottleneck, implies that the species still has essential evolutionary potential to adapt to the changing environment (Frankham, 2010).
We detected a marked genetic differentiation among A. argotaenia populations in comparison to other conifers (Supplementary Table 4). Similar findings were obtained by using chloroplast intergenic spacer, mitochondrial intron, and genomic microsatellite data (Ge et al., 2015). The dispersal distance of pollen and seed in conifers is generally less than 2 and 20 km, respectively (Fu et al., 1999b;Cain et al., 2000;Sima, 2004;Zhang et al., 2005;Lu, 2006). As for A. argotaenia, its pollen and seed exchanges may be further hindered because of preferring to grow under forest canopies (Ge et al., 2015). It is thus reasonable to speculate that the genetic differentiation pattern of A. argotaenia is highly linked to restricted between-population gene flow (genetic exchange via pollen and seed). Moreover, the establishment of climate-or habitat-linked genotypes should also be considered, since we used functional markers to perform studies (Jump and Peñuelas, 2005;Ortego et al., 2012).

CONClUSION
We generated the leaf transcriptome of A. argotaenia by using Illumina sequencing technology. A total of 62,896 unigenes were assembled, annotated, and classified. Based on the transcriptome data, 5510 EST-SSRs were identified from 4830 SSR-containing unigene sequences. Among them, 60 were randomly selected for the development of potential functional markers. Consequently, 22 polymorphic EST-SSR markers were developed and used to reveal a moderate level of functional diversity, along with marked genetic structure and the lack of genetic bottleneck, in A. argotaenia. This study has provided effective EST-SSR markers for measuring the evolutionary potential of A. argotaenia in response to environmental changes.

DATA AVAIlABIlITY STATEMENT
All Illumina clean data generated for this study was deposited at the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (https://www.ncbi.nlm.nih.gov/sra/ SRX3296043[accn]). The Bioproject number and Biosample number for clean data are PRJNA413732 and SAMN07764634, respectively. Thirty-eight EST-SSR sequences generated for this study were deposited at GenBank with accession numbers MG209531-MG209568.

AUThOR CONTRIBUTIONS
The author XR conducted the experiments and ZW completed the data analysis. The two authors contributed equally to this work. YS designed the experiments and wrote the manuscript, and TW corrected the manuscript. SUPPlEMENTARY FIgURE 5 | Amplification products generated using EST-SSR primer pair, SHS-26622, were separated by electrophoresis in 6% denaturing polyacrylamide gel. The expected allele size was 228 bp and the annealing temperature was 57°C. Lanes 1-15, 16-28, 29-40, and 41-56 were products of A. argotaenia individuals from populations JQS (Jiuqushui), CP (Chuanping), QNS (Qiniangshan), and WGS (Wugongshan), respectively; Lane M: 50 bp ladder.