Development and Characterization of EST-SSR Markers From RNA-Seq Data in Phyllostachys violascens

Bamboo are woody grass species containing important economic and ecological values. Lei bamboo (Phyllostachys violascens) is a kind of shoot-producing bamboo species with the highest economic yield per unit area. However, identifying different varieties of Lei bamboo based on morphological characteristics is difficult. Microsatellites play an important role in plant identification and genetic diversity analysis and are superior to other molecular markers. In this study, we identified 18,356 expressed sequence tag-simple sequence repeat (EST-SSR) loci in Lei bamboo transcriptome data. A total of 11,264 primer pairs were successfully designed from unigenes of all EST-SSR loci, and 96 primer pairs were randomly selected and synthesized. A total of 54 primer pairs were used for classifying 16 Lei bamboo varieties and 10 different Phyllostachys species. The number of polymorphism alleles among the 54 primer pairs ranged from 3 to 12 for P. violascens varieties and 3 to 20 for Phyllostachys. The phylogenetic tree based on polymorphism alleles successfully distinguished 16 P. violascens varieties and 10 Phyllostachys species. Our study provides abundant EST-SSR resources that are useful for genetic diversity analysis and molecular verification of bamboo and suggests that SSR markers developed from Lei bamboo are more efficient and reliable than ISSR, SRAP or AFLP markers.


INTRODUCTION
Bamboo is an economically important member of the woody grasses. It includes 88 genera and more than 1,400 species worldwide; 34 genera and 534 species are in China (Wu and Raven, 2006). Given the considerably fast growth, strong carbon fixation capability and edible shoots, bamboo has worldwide ecological and economic value.
The classification and nomenclature of plants are mainly based on morphological characteristics, such as roots, stems, leaves and flowers (Lichtenthaler, 1987). Flower morphology is the most important (PDCress and Staden, 1998). The identification and classification of bamboo is vital for germplasm collection and conservation (Soderstrom and Calderon, 1979). However, bamboo has a long juvenile phase, with the flowering interval of some species being up to 120 years (Janzen, 1976), which prevents the classification of bamboo when using only flower morphology. Moreover, morphological characteristics sometimes are not very reliable because they are affected by ecological factors, which leads to confused classification and reclassification of various accessions of bamboo (Das et al., 2007).
Lei bamboo (Phyllostachys violascens) has higher economic value per unit area than other bamboo species with edible shoots (Liu et al., 2001). Lei bamboo has many varieties that differ in shoot time, shoot yield per unit area and pathogen resistance. Lin et al. (2011) classified different P. violascens varieties by using inter-simple sequence repeat (ISSR), sequencerelated amplified polymorphism (SRAP), and amplified fragment length polymorphism (AFLP) markers. However, some varieties cannot be differentiated by just one molecular marker method (Lin et al., 2011). Therefore, the development of accurate and efficient molecular markers for classifying different varieties in Lei bamboo is important.
Previously, we sequenced the transcriptome of P. violascens (GenBank Accession No. SRX5137626) by using Illumina Truseq and assembled 132,678 unigenes. We identified EST-SSR loci as well as primer pairs based on these data. The major objective of this study was to develop efficient EST-SSR markers for classifying Lei bamboo varieties as well as Phyllostachys species. These markers may be applicable for bamboo taxonomic study, genetic diversity analysis and evolutionary research.

Marker Loci Detection and SSR Primer Pair Design
MicroSAtellite (MISA) 1 was used for SSR mining in the assembled contigs (Thiel et al., 2003). The minimum number 1 http://pgrc.ipk-gatersleben.de/misa/misa.html of repeats used to select the SSRs was 10 for mononucleotide repeats, 6 for dinucleotide repeats, and 5 for tri-, tetra-, penta-, and hexanucleotide repeats. The Perl program Primer3.0 2 (Rozen and Skaletsky, 2000) was used to design Primer pairs with the design principles of length 18 to 27 bp and about 20 bp, melting temperature (Tm) 57 • C to 63 • C, and PCR product size 100 to 280 bp.

PCR Product Sequencing
Silver nitrate stain gel was used for selecting suitable SSR primers (Panaud et al., 1996). The desired bands were accurately excised and subsequently purified by using the SanPrep Column DNA Gel Extraction Kit (Sangon, Shanghai) (Tang et al., 2010). The purified DNA bands were ligated with pMD-18-T vector (TaKaRa, Beijing) and sequenced by the Sanger method (Tang et al., 2010;Zhai et al., 2014). The PCR sequence similarity rate was more than 98% with transcriptome sequences applied.

Data Processing and Genetic Analysis
According to PAGE results, the absence or presence of bands was scored as zero or one in all SSR loci and two binary qualitative data matrices were generated (Rohlf, 2000;You et al., 2015). Dendrograms were constructed on the basis of these two binary qualitative matrices with the method unweight pair group method with arithmetic mean (UPGMA) selected for clustering and maximum number of tied trees set to 100. Before dendrogram construction, pairwise coefficients were calculated according to the Jaccard similarity coefficient method (Kosman and Leonard, 2005). The above-mentioned data processing, coupled with principal coordinates analysis (PCoA) involved using NTSYSpc v2.1 (Rohlf, 2000). The bootstrapping analysis repeat was 1,000 performed with the software FREETREE V.0.9.1.50 (Pavlícek et al., 1999). Bootstrap values more than 50 are listed on the dendrogram (Zhu et al., 2012;Zhai et al., 2013). In this study, we constructed three phylogenetic trees: 16 P. violascens varieties independently, 10 Phyllostachys species independently and combination with 16 P. violascens varieties and other 9 Phyllostachys species.
Within these SSRs, 95 motif types were detected (Appendix 1). The frequency distribution of the 20 most abundant SSR classical repeat motifs is in Table 1. Among those motif types, A/T was the most abundant (43.73%), followed by AG/CT (15.93%) and C/G (5.84%).
A total of 11,264 primer pairs were successfully designed by using Primer 3.0 and two Perl scripts of p3_in.pl and p3_out.pl 3 from 15,600 unigenes. However, the software primer modeling failed for 6,189 unigene sequences (Appendix 2). We randomly selected 96 SSR primer pairs from all SSR primer pairs. Finally, 54 optimal primer pairs (Appendix 3) were used to study a series of filters: (1) unsatisfactory amplification, (2) unspecific amplification, and (3) sequence less than 98% similarity in comparison with the transcriptome. The optimized SSRs were used to analyze the genetic diversity of 16 P. violascens varieties and 10 Phyllostachys species.

Genetic Diversity of P. violascens
We used 54 primer pairs for the 16 P. violascens varieties, generating 399 alleles, which were treated as informative loci for dendrogram construction (Appendix 4). The number of SSR alleles was 3 to 12 for one primer pair, and each primer pair had 6.88 alleles, on average.
To test the genetic similarity between varieties, we used PCoA based on SSR data and found three groups among the 16 P. violascens varieties ( Figure 1A). The phylogenetic tree was further constructed by using NTSYSpc (Figure 2A); the resulting Jaccard similarity coefficient ranged from 0.73 to 0.98. Four groups were uncovered in previous research according to genetic diversity of P. violascens varieties by combining ISSR, 3 http://pgrc.ipk-gatersleben.de/misa/primer3.html SRAP and AFLP marker analysis (Lin et al., 2011). For better comparison with previous study, the index 0.78 was selected and four main groups were detected in present study: groups I, II, III, and IV. Group I contained 10 varieties. Among the 10 varieties, P. violascens cv. Jianye and cv. Linanensis exhibited far genetic distances from other eight varieties; P. violascens cv. Hongke and cv. Xiye had very close genetic distance, with similarity index about 0.98. Among the four varieties in group II, the greater distant genetic distance was between P. violascens cv. Kuoyeqingtou and three other varieties. The groups III and IV contained only one variety each. These results were largely consistent with the PCoA results, with groups III and IV in the dendrogram clustered into one group on PCoA.

Genetic Diversity of Phyllostachys Bamboo
The above 54 primer pairs were used with the 10 Phyllostachys bamboo species to verify their cross-amplification potential in Phyllostachys. The SSR primer pairs of P. violascens were perfectly applied in other Phyllostachys species. The number of SSR alleles ranged from 3 to 20, with an average of 11.3 per primer pairs. A total of 656 loci were used to construct the dendrogram of Phyllostachys; no missing data were found in the study (Appendix 5). As shown in the phylogenetic tree (Figure 2B), the Jaccard similarity coefficient ranged from 0.67 to 0.84. In this study, the 10 Phyllostachys species were separated into three distinct groups: I, II, and III. Group I contained five species: P. violascens, P. glabrata, P. verrucosa, P. bambusoides, and P. aurea. The Jaccard similarity coefficient was up to 0.84 between P. violascens and P. glabrata. Group II included two bamboo species: P. edulis and P. virella. Group III contained three bamboo species: P. rivalis, P. parvifolia and P. nidularia. Similar grouping results were shown by PCoA ( Figure 1B).

Characterization of EST-SSRs in P. violascens
Simple sequence repeat markers have been widely used in fingerprint construction, genetic diversity analysis and molecular marker-assisted breeding because of their high polymorphism, repeatability and codominant inheritance (Ran et al., 2010;Feng et al., 2018;Shi et al., 2018;Wang et al., 2018). However, traditional methods for developing SSR markers are not very efficient (Tang et al., 2010). Next-generation sequencing technologies can generate massive data, which are good resources for developing SSR markers in many species including bamboo (Tang et al., 2010;Abhishek et al., 2016;Zhang et al., 2017). Nevertheless, SSRs in P. violascen have not been reported because of lack of transcriptome or genome sequences. In this study, we identified 18,356 EST-SSR loci from 15,600 unigenes, and a total of 11,264 primer pairs were designed based on the transcriptome of P. violascens. Overall, 54 primer pairs were successfully used for identifying 16 P. violascens varieties and 10 Phyllostachys  species, so the transcriptome sequences were good resources for developing SSR markers.
The frequency of SSRs in P. violascens was 1/4.55 kB when including the mononucleotide repeats, which was close to wheat (1/5.46 kB) (Peng and Lapitan, 2005) but significantly higher than in Arabidopsis (1/13.83 kB) (Cardle et al., 2000). Various factors may affect SSR frequency, including search criteria, software, as well as species properties (Sorrells, 2005). Tectonic activities and climate fluctuations during species evolutionary history contributed to the production and accumulation of genetic variations. Multi-locus plastid phylogenic analyses of bamboo revealed that the Bambusoideae began to diversify at about 43.26 million years ago (Mya), followed by the rapid radiation of the Arundinarieae at about 12-14 Mya, which was supposed to be induced by the important strengthening of the East Asian monsoon in the late Miocene . Therefore, the high frequency of SSRs detected in P. violascens may be due in part to the long and complex evolutionary process of bamboo.
The types of repeat motifs in this study were not uniformly distributed in the P. violascens transcriptome database ( Table 1). Di-and tri-repeats were the most predominant when excluding mononucleotide repeats ( Table 2). This result differed from Ma bamboo (Dendrocalamus latiflorus), in which tri-repeats are the most abundant (Abhishek et al., 2016). In addition, the proportion of tetra-, penta-and hexanucleotide repeats was significantly lower in P. violascens than D. latiflorus ( Table 2).  The differentiation between P. violascens and D. latiflorus was likely caused by distant genetic distance. Phyllostachys belong to the Arundinarieae, a kind of temperate woody bamboo, and Dendrocalamus is affiliated with tropical woody bamboo Bambuseae (Soreng et al., 2014). These two tribes diversified at the beginning of the bamboo evolutionary history and preserved distinct genetic elements (Zhang et al., 2012. As shown in Table 1, the AG/CT motif was the most predominant di-repeat (17.11%), which was similar to pigeonpea (16.7%) (Dutta et al., 2011) but higher than wheat (8.7%) (Peng and Lapitan, 2005) and lower than taro (52.86%) (You et al., 2015). Among tri-repeats, CCG/CGG was the most abundant (5.91%), which was consistent with previous findings in taro (You et al., 2015), rice and maize (Cardle et al., 2000). The AGC/GCT type (2.94%) was the second most abundant trirepeat, which was not common in taro (You et al., 2015), rice or maize (Cardle et al., 2000). Previous studies showed that the tri-repeat type of CCG/CGG was a rare motif in dicotyledonous plants but the most abundant repeat type among the tri-repeats in monocots (Varshney et al., 2002;You et al., 2015). Our results verified that many CCG/CGG repeats was a common feature of monocot plants and the high G/C content plays an important role in monocots (Morgante et al., 2002;Rota et al., 2005;Wang et al., 2010).

Genetic Diversity in 16 P. violascens Varieties
The genetic diversity of P. violascens germplasm has been unclear. The classification of 16 varieties of P. violascens previously involved using three distinct types of molecular markers -ISSR, SRAP and AFLP individually -but failed (Lin et al., 2011). When combining ISSR, SRAP and AFLP, these 16 varieties could be identified successfully (Lin et al., 2011). In the present study, the 16 varieties of P. violascens could also be distinguished by polymorphic EST-SSR markers, and significant genetic variations between different P. violascens varieties were uncovered. These results implied that SSR markers and the combination of ISSR, SRAP and AFLP markers were more sensitive than using the ISSR, SRAP and AFLP markers alone. Considering the convenience of experimental design, the SSR marker method is more applicable than ISSR, SRAP and AFLP combined.
In addition, both PCoA and phylogenetic analysis revealed similar grouping results for the 16 P. violascens varieties based on SSR markers (Figures 1A, 2A). The cluster results for SSR markers were similar to combining ISSR, SRAP and AFLP markers but differed little when using ISSR, SRAP, and AFLP markers alone (Lin et al., 2011). Hence, the SSR method and combining ISSR, SRAP and AFLP markers were more reliable than using ISSR, SRAP and AFLP markers alone. Meanwhile, the cluster result of SSR was more similar with AFLP than with ISSR and SRAP markers, which suggests that the AFLP method was more reliable than ISSR and SRAP methods in classifying P. violascens.

Transferability of EST-SSR Markers and Genetic Diversity Among 10 Phyllostachys Species
To test the versatility of EST-SSR markers derived from P. violascens, the 54 SSR primer pairs were further used to analyze the genetic diversity of 10 Phyllostachys species. Clear bands were generated and higher transfer rate (100%) was detected by using these SSR primer pairs as compared with Elymus sibiricus (22.40%) , Chrysanthemum nankingense (20%) (Wang et al., 2013), Juglans mandshurica (30.8%) (Hu et al., 2016), and Neolitsea sericea (16.3%) (Chen et al., 2015), which implies the availability of SSR markers in the genus of Phyllostachys. Furthermore, a phylogenetic tree encompassing all 16 P. violascens varieties and 9 other Phyllostachys species used in this study was also built based on the EST-SSRs markers (Appendix 6). The results showed that all varieties of P. violascens were clustered into one main group and kept separate with the other Phyllostachys species. The internal branch structure of the common dendrogram was consistent with the separate phylogenetic analysis based on the 16 P. violascens varieties (Figure 2A) and 10 Phyllostachys species (Figure 2B), confirming the versatility and robustness of the EST-SSR markers in distinguishing different varieties of P. violascens and different Phyllostachys species. Therefore, these EST-SSR markers might be used for classifying different species in Phyllostachys and even different genera of bamboo. Figure 2B shows that these 54 SSR primer pairs developed from P. violascens could successfully distinguish the 10 Phyllostachys species. Phyllostachys has been divided into Sect. Phyllostachys and Sect. Heterocladae according to the Flora of China (Wu and Raven, 2006). In this study, 10 species of Phyllostachys were divided into three groups by using SSR markers of P. violascens. Groups I and II belonged to Sect. Phyllostachys and group III belonged to Sect. Heterocladae. However, species from groups II and III were clustered together. One reasonable explanation was that Sect. Phyllostachys is not monophyletic. Alternatively, given the lack of informative characters, the existence of incomplete lineage sorting, and/or interspecies hybridization, it is not unexpected that bamboo species from different sections clustered together based on molecular markers and genome data . As well, the classification of Sect. Phyllostachys and Sect. Heterocladae in the Flora of China mainly depended on morphological characteristics, such as whether the blade was horizontal, reflexed or erect and whether culm sheaths were covered with spots. Further studies of the classification of Sect. Phyllostachys and Sect.
Heterocladae based on both morphological characteristics and molecular analysis are needed.

AUTHOR CONTRIBUTIONS
KC and XL organized the research and wrote the experimental subjects. LZ and KZ participated in the research. LL and ZZ collected the sample and material. XL and WZ provided technical guidance and paper modification.