Transcriptome Sequencing and Development of Novel Genic SSR Markers From Pistacia vera L.

In this study, we aimed to develop novel genic simple sequence repeat (eSSR) markers and to study phylogenetic relationship among Pistacia species. Transcriptome sequencing was performed in different tissues of Siirt and Atl cultivars of pistachio (Pistacia vera). A total of 37.5-Gb data were used in the assembly. The number of total contigs and unigenes was calculated as 98,831, and the length of N50 was 1,333 bp after assembly. A total of 14,308 dinucleotide, trinucleotide, tetranucleotide, pentanucleotide, and hexanucleotide SSR motifs (4–17) were detected, and the most abundant SSR repeat types were trinucleotide (29.54%), dinucleotide (24.06%), hexanucleotide (20.67%), pentanucleotide (18.88%), and tetranucleotide (6.85%), respectively. Overall 250 primer pairs were designed randomly and tested in eight Pistacia species for amplification. Of them, 233 were generated polymerase chain reaction products in at least one of the Pistacia species. A total of 55 primer pairs that had amplifications in all tested Pistacia species were used to characterize 11 P. vera cultivars and 78 wild Pistacia genotypes belonging to nine Pistacia species (P. khinjuk, P. eurycarpa, P. atlantica, P. mutica, P. integerrima, P. chinensis, P. terebinthus, P. palaestina, and P. lentiscus). A total of 434 alleles were generated from 55 polymorphic eSSR loci with an average of 7.89 alleles per locus. The mean number of effective allele was 3.40 per locus. Polymorphism information content was 0.61, whereas observed (Ho) and expected heterozygosity (He) values were 0.39 and 0.65, respectively. UPGMA (unweighted pair-group method with arithmetic averages) and STRUCTURE analysis divided 89 Pistacia genotypes into seven populations. The closest species to P. vera was P. khinjuk. P. eurycarpa was closer P. atlantica than P. khinjuk. P. atlantica–P. mutica and P. terebinthus–P. palaestina pairs of species were not clearly separated from each other, and they were suggested as the same species. The present study demonstrated that eSSR markers can be used in the characterization and phylogenetic analysis of Pistacia species and cultivars, as well as genetic linkage mapping and QTL (quantitative trait locus) analysis.


INTRODUCTION
Pistacia L. genus is a member of the Anacardiaceae family that also contains important species such as mango, pepper tree, and sumac (Kafkas, 2006a). The sex habit of Pistacia is dioecious with several exceptions (Kafkas et al., 2000). The genus of Pistacia consists of 13 or more species (Gundesli et al., 2019), and Pistacia vera is believed to be the most ancestral species, whereas the other species probably derived from Zohary (1952) and Kafkas (2019). Currently, the main pistachio producers in the world are Iran, the United States, Turkey, and Syria (FAO, 2019).
The first taxonomic study in the genus Pistacia was done morphologically by Zohary (1952). After discovery of different molecular markers, they have been used in Pistacia as well. The first detailed molecular study in Pistacia was performed based on chloroplast DNA profiles by Parfitt and Badenes (1997). Microsatellites or simple sequence repeats (SSRs) and repeats of 1-to 6-nucleotide-long DNA motifs have high reproducibility, multiallelic character, and extensive tandem repeats in the whole genome (Powell et al., 1996). SSRs have advantages over other marker systems because of their codominant inheritance, suitability for automation, and well-distribution throughout eukaryotic genomes. Recently, SSRs have been widely used in genetic map construction, DNA fingerprinting, genetic diversity, quantitative trait locus (QTL) mapping, and markerassisted selection (MAS) (Dong et al., 2018;Yang et al., 2018;Zhang et al., 2019).
Genic SSRs or eSSRs are obtained by expression sequence tags that are created by gene transcripts that have been converted into cDNA (Adams et al., 1991). Recently, eSSR markers have been used for identifying plant species because of its design from coding regions (Varshney et al., 2005;Ellis and Burke, 2007). The concern here is that because eSSRs are located within the genes, and more conserved, they may be used for identification of alleles related to some agronomically significant traits (Chen et al., 2017;Dong et al., 2018;Yang et al., 2018). Conventionally, SSR development needs to be labor-intensive, such as cloning DNA and constructing library, and generates particularly less SSRs compared with next-generation sequencing (NGS) technology (Zalapa et al., 2012;Zhang et al., 2012). The advantage of NGS technologies, especially next-generation transcriptome sequencing, provides a large amount of sequences with cost-effective and high-quality data in a short period (Wang et al., 2010;Taheri et al., 2019;Zhang et al., 2019).
The aims of the study were (i) to develop novel genic SSR markers from transcriptome sequences of pistachio (P. vera) and (ii) to determine phylogenetic relationship among Pistacia species using novel eSSRs.

Plants Materials
In this study, RNA isolation and transcriptome sequencing were performed in bud, flower, leaf, shoot, whole nut, pericarp, and kernel of Siirt (female) and Atlı (male) cultivars (Supplementary Table 1). The sampled tissues were immediately frozen in liquid nitrogen and stored at -80 • C until RNA isolation.

RNA Extraction, Sequencing, Transcriptome Assembly
Total RNA was extracted from different tissues of pistachio and was sequenced by BGI (Beijing Genomic Institute) using an Illumina (Hi-Seq 2500) NGS platform. The raw reads were first cleaned from adaptors and filtered for low-quality sequences with higher than 20% Q value < 20 bases. Those called as "clean reads" were assembled using Trinity software (v2.0.6) (Grabherr et al., 2011). After de novo assembly, Trinity sequences were named as transcripts obtained from contigs that were not extended on each ends The transcripts were clustered and obtained final unigenes with TGIGL (Pertea et al., 2003) software. Simple sequence repeats were searched using MIcroSAtelite (MISA) (Haas et al., 2013) search module in all unigenes. The search parameters were set for the detection of dinucleotide, trinucleotide, tetranucleotide, pentanucleotide, and hexanucleotide SSR motifs with a minimum of six, five, four, four, and four repeats, respectively. Primers pairs were designed using online software Batchprimer 3.0 with the standard parameters (You et al., 2008) Table 3). The sequence data have been deposited in the National Center for Biotechnology Information under BioProject accession number PRJNA648340.

DNA Extraction, Polymerase Chain Reaction Amplification, eSSR Validation
The genomic DNAs of 89 Pistacia samples were extracted from fresh leaves using the CTAB method of Doyle and Doyle (1987) with minor modifications (Kafkas et al., 2006). DNA concentrations were measured using a Qubit Fluorimeter (Invitrogen) or were estimated by comparing the band intensity with λ DNA of known concentrations following 0.8% agarose gel electrophoresis and ethidium bromide staining. DNA samples were subsequently diluted to a concentration of 10 ng/µL for eSSR-polymerase chain reaction (PCR). Initially, a total of 250 randomly selected primer pairs were screened in eight Pistacia individuals, and 55 primer pairs were selected for the characterization of 89 Pistacia genotypes belonging to 10 Pistacia species. eSSR-PCR was carried out using a three-primer strategy according to the method utilized by Schuelke (2000) with minor modifications (Topçu et al., 2016). PCR was performed in a total volume of 12.5 µL containing 20 ng DNA; 75 mM Tris-HCl (pH 8.8); 20 mM (NH 4 ) 2 SO 4 ; 2.0 mM MgCl 2 ; 0.01% Tween 20; 200 µM each dNTP; 10 nM M13 tailed forward primer at the 5 end; 200 nM reverse primer; 200 nM universal M13 tail primer (5 TGTAAAACGACGGCCAGT-3) labeled with one of FAM, VIC, NED, or PET dyes; and 0.6 U hot-start Taq DNA polymerase. Amplification was performed in two steps as follows: initial denaturation at 94 • C for 3 min, followed by 10 cycles at 94 • C for 30 s, 58 • C for 45 s, and 72 • C for 60 s. The second step included 30 cycles at 94 • C for 30 s, 58 • C for 45 s, and 72 • C for 60 s and a final extension at 72 • C for 10 min. After the PCR was completed, the reactions were subjected to denaturation for capillary electrophoresis in an ABI 3130xl genetic analyzer [Applied Biosystems Inc., Foster City, CA, United States (ABI)] using a 36-cm capillary array with POP7 as the matrix (ABI). Samples were fully denatured by mixing 0.5 µL of the amplified product with 0.2 µL of the size standard and 9.8 µL formamide. The fragments were resolved using ABI data collection software 3.0, and SSR fragment analysis was performed with GeneScan Analysis Software 4.0 (ABI).

Genetic Diversity
The 55 polymorphic eSSR loci were used for the genetic diversity of 11 P. vera and 78 wild Pistacia genotypes. The number of alleles (Na), the number of effective alleles (Ne), observed (Ho), and expected (He) heterozygosity were calculated using GenAlEx version 6.5 according to Peakall and Smouse (2012). The polymorphism information contents (PICs) of each marker were calculated using PowerMarker software version 3.25 (Liu and Muse, 2005). The SSR bands scored as present (1) or absent (0) consisted of a dendrogram using NTSYSpc v2.21c (Rohlf, 2009) software by unweighted pair-group method with arithmetic averages (UPGMA). A principal coordinate analysis (PCoA) was performed using NTSYSpc v2.21c (Rohlf, 2009). STRUCTURE 2.3.4 software (Pritchard et al., 2000) was also used to determine the possible number of populations and for the construction of the population structure. Structure analysis computes the proportion of the genome of an individual originating from each interfered population. Possible K's (where K is an assumed fixed number of subpopulations in the entire population) were from 1 to 10 with five replications to ensure consistency of results. Ln P(D)s mean possible estimated K's. There are Ln P(D) values for each K value. By using Ln P(D) values for every K calculate K that shows a possible number of populations. Each replication run was conducted with a burn-in period of 100,000 and 100,000 Markov chain Monte Carlo.

Identification and Distribution of SSR Motifs
The MISA search module was used to search for SSRs with the 98,831 unigenes. In total, 37,793 potential genic SSRs were identified from 98,831 unigene sequences, of which 23,485 were mononucleotide repeats (Supplementary Table 5).

Validation of SSRs
The 250 genic SSR primer pairs were designed from assembled short RNA sequences. The 250 eSSR primer pairs were screened in eight Pistacia species (P. vera, P. khinjuk, P. atlantica, P. mutica, P. integerrima, P. chinensis, P. terebinthus, and P. lentiscus) by agarose gel electrophoresis (1.5%). In total, 233 were amplified at least in one of the Pistacia species, and 82 were amplified in all tested eight Pistacia species (Supplementary Table 7). Of them, 55 eSSR primer pairs were used for characterization of 89 Pistacia accessions because of their polymorphism and having amplification in the tested Pistacia species.

Functional Annotation and Classification
A total of 98,831 assembled unigenes were aligned to different universal databases. Annotation demonstrated that 52,839 unigenes (53.46%) were found important in the Nr database, 49,727 (50.31%) in the Nt database, and 34,419 (34.82%) in the Swiss-Prot database. The annotation of 58,401 (59.09%) unigenes was successfully achieved in at least one of the six public databases (Supplementary Table 8).
KEGG Orthology (KO) database was used for metabolic pathway analysis. In the results constructed in a total 128 KEGG    The pathways involving the highest number of unique transcripts were "metabolic pathways" (6,857), followed by "biosynthesis of secondary metabolites" (3,498), "plant-pathogen interaction" (2,329), and "plant hormone signal transduction" (1,738) (Supplementary Table 11 and Figure 5).

Genetic Diversity in Pistacia
Diversity analyses were performed in 11 pistachio cultivars (P. vera) and in 78 wild Pistacia accessions. Allele ranges, number of alleles (Na), effective number of alleles (Ne), PICs, and expected and observed heterozygosities of 55 eSSR loci are given in Table 3.
A total of 434 alleles were amplified by 55 eSSR loci, ranging from 4 to 20 per locus. The highest number of allele (Na = 20) was amplified by the CUPVEST3927 locus. The effective number of allele ranged from 1.36 (CUPVEST9032) to 12.31 (CUPVEST3927). The highest observed heterozygosity (Ho = 0.92) was obtained from the CUPVEST1146 locus. The average expected heterozygosity was calculated as 0.65, ranging from 0.26 (CUPVEST9032) to 0.92 (CUPVEST3927). The PICs ranged from 0.25 (CUPVEST9032) to 0.92 (CUPVEST3927), with an average of 0.61 (Table 4).
In P. vera, a total of 150 alleles were amplified by 55 eSSR loci. The polymorphism rate was 92.7%. A total of 51 eSSR loci were polymorphic, whereas four loci were monomorphic. The average and the highest numbers of alleles were calculated as 2.73 and 6.00 (CUPVEST1313 and CUPVEST4033), respectively. The highest effective number of the allele (Ne = 4.94) was obtained from the CUPVEST4033 locus. The averages of expected heterozygosity, observed heterozygosity, and PIC values were calculated as 0.40, 0.38, and 0.34, respectively (Table 3 and Supplementary  Table 12).
In P. khinjuk, a total of 152 alleles were produced by 52 polymorphic and 3 monomorphic eSSR loci with 94.6% polymorphism rate. An average of 2.76 alleles were obtained from 52 polymorphic loci, and the highest number of alleles (Na = 5) was in the CUPVEST3826 and CUPVEST4033 loci. The effective number of alleles ranged from 1.00 to 3.85, with an average of 2.17. The highest He (0.74) value was calculated in the CUPVEST1313 and CUPVEST3826 loci. The averages of Ho, He, and PIC values were 0.61, 0.49, and 0.42, respectively (Table 3 and  Supplementary Table 13).
The lowest polymorphism (61.8%) was in P. eurycarpa with 34 polymorphic SSR loci. The number of alleles ranged from 1 to 4, with a total of 105 alleles. The highest values for Na, Ne, and He in P. eurycarpa were obtained from the CUPVEST1313 locus. The expected heterozygosity and PIC values were calculated as 0.33 and 0.27, respectively (Table 3 and Supplementary Table 14).  Pistacia atlantica had the highest number of alleles among Pistacia species in this study. A total of 191 alleles were obtained from 52 polymorphic and three monomorphic SSR loci. The CUPVEST3927 locus amplified the highest number of allele and the effective number of the allele. The averages of Na and Ne values were 3.47 and 2.12, respectively. The highest Ho value (Ho = 0.92) in P. atlantica was calculated in CUPVEST8600 locus, whereas the highest expected heterozygosity (He = 0.88) was calculated in the CUPVEST3927 locus. The expected heterozygosity value ranged from 0.00 to 0.88, with an average of 0.40. The average of PIC value was calculated as 0.36 (Table 3 and Supplementary Table 15).
In P. mutica, 117 alleles were obtained from 36 polymorphic SSR loci and 19 monomorphic SSR loci in five accessions. The mean Na and Ne values were determined as 2.13 and 1.71, respectively. The highest observed and expected heterozygosities  In P. integerrima, the number of alleles ranged from 1 to 4. A total of 104 alleles were produced by 55 eSSR loci with 69.1% polymorphism. The average number of alleles was calculated as 1.89. The CUPVEST6536 locus amplified the highest effective number of the allele with 3.20. The highest value for He (0.69) was obtained from the CUPVEST6536 locus. The average Ho and He values were 0.63 and 0.34, respectively. The average PIC value was calculated as 0.27 in P. integerrima (Table 3 and Supplementary  Table 18).
In P. chinensis, a total of 151 alleles were generated by 50 polymorphic and 5 monomorphic loci, ranging from 1 to 6 with an average 2.75 alleles per locus. The average Ne, Ho, He, and PIC were calculated as 1.82, 0.34, 0.35, and 0.31, respectively (Table 3  and Supplementary Table 19).
In P. terebinthus, 180 alleles were obtained from 46 polymorphic and 9 monomorphic loci. The average number of the allele was 3.27. The highest number of the alleles was obtained from the CUPVEST3927 and CUPVEST6009 loci. The average effective number of allele and the highest effective number of alleles were 2.02 and 6.54, respectively. The CUPVEST3927 SSR locus produced the highest expected heterozygosity value. The average Ho, He, and PIC values were calculated as 0.34, 0.38, and 0.33, respectively (Table 3 and Supplementary Table 20).
In P. palaestina, 197 alleles were generated from 34 polymorphic and 21 monomorphic loci. The average number of the allele was detected as 2.07. The average effective number of allele was 1.63. The average Ho, He, and PIC values were calculated as 0.28, 0.28, and 0.31, respectively (Table 3 and  Supplementary Table 21).
In P. lentiscus, 38 of 55 eSSR loci were polymorphic with 69.1%. A total of 131 alleles were amplified with an average of 2.38 alleles per locus. The average effective number of the allele was detected as 1.65. The highest number of allele (Na = 7), the effective number of the allele (Ne = 3.90), observed heterozygosity (Ho = 0.92), and expected heterozygosity (He = 0.74) values were obtained from the CUPVEST3826, CUPVEST5726, CUPVEST1146, and CUPVEST5726 loci, respectively. The average values for Ho and He were 0.24 and 0.29, respectively. The average PIC value was calculated as 0.27 (Table 3 and  Supplementary Table 22).
Genetic diversity values of unknown accessions were calculated as well. A total of 53 eSSR loci were polymorphic with 96.4%. The average numbers of Na, Ne, Ho, He, and PIC values were 2.96, 2.10, 0.45, 0.46, and 0.41, respectively (Table 3 and  Supplementary Table 23).

Clustering and Structure Analysis
The maximum Delta K value was at K = 7 (Figure 6). Pistacia accessions were grouped in seven main clusters: Cluster 1 included P. vera cultivars, whereas P. khinjuk was the closest species to P. vera. P mutica and P. atlantica were in Cluster 3 together with P. eurycarpa, which was clearly separated from P. mutica and P. atlantica. Cluster 4 included P. integerrima and UCB1 (P. atlantica × P. integerrima) accessions. Three UCB1 accessions were clearly separated from P. integerrima. P. chinensis accessions were in Cluster 5. P. terebinthus and P. palaestina accessions were grouped in the same cluster (Cluster 6) together with unknown Pistacia accessions. Cluster 7 consisted P. lentiscus accessions that were separated from rest of the accessions in the UPGMA analysis (Figure 7). A PCoA supported the structure and UPGMA analysis (Figure 8).

Identification of eSSRs
The density of eSSRs distribution was computed as one SSR per 17.75 kb in the present study and was higher than other species such as Argyranthemum brousonetii (2.3%, 27.00 kb) and Zingiber officinale (2.7%, 25.20 kb) (White et al., 2016;Awasthi et al., 2017), while it was lower than in some species such as Arachis hypogaea (17.7%, 3.30 kb), Curcuma longa (20.5%, 4.80 kb), and Curcuma alismatifolia (12.5%, 6.60 kb), respectively (Annadurai et al., 2013;Wang et al., 2018;Taheri et al., 2019). eSSR frequencies and their density in genomes may differ from species to species, because each species has different genetic construction. On the other hand, using different bioinformatics tools and criteria for detection of SSRs may also be a reason for the differences (Liu et al., 2018;Taheri et al., 2019). eSSR repeat motifs identified from dinucleotide to hexanucleotide and trinucleotide repeats (29.54%), dinucleotide repeats (24.06%), and hexanucleotide repeats (20.67%) were the most abundant repeat motifs, respectively. These results were similar to those found in previous studies in abundance of trinucleotide motifs within the transcriptome sequences (Awasthi et al., 2017;Park et al., 2019;Taheri et al., 2019). A previous study in P. vera by Jazi et al. (2017) also demonstrated that dinucleotides (44.7%) and trinucleotides (40.6%) were the most abundant types of repeats. The most frequent types in genic SSRs are trinucleotide repeat type, whereas the common types of SSRs in unigene sequences are dinucleotide and trinucleotide types (Varshney et al., 2002).

Transcriptome Assembly and Functional Annotation
De novo sequencing and assembly without aligning with the reference genome have been widely used to obtain first sequences for non-model organisms (Russell et al., 2014;Wei et al., 2014;Chen et al., 2017). The transcriptome sequences of P. vera were the first provided by Jazi et al. (2017) for discovery of markers about salinity-related genes. In the present study, different tissues of a female and a male P. vera cultivars were sequenced, and N50 of unigenes was computed as 1,333 bp, similar with the study performed by Jazi et al. (2017).
The GO database is one of the largest information sources about detection of the genes functions ranging from model organisms to minor organisms (Gene Ontology Consortium, 2004). For GO analysis, a total of 40,405 sequences were associated with 325,220 GO terms, which classified different 56 subcategories in three major categories in this study. Jazi et al. (2017) demonstrated that 68,539 sequences were annotated with 302,375 GO terms. The results indicated that assembled unigenes have different molecular functions involved in different metabolic pathways. The assembled sequences were aligned to COG database for prediction of the possible functions. KO provides recognition of the biological pathway of transcripts using KEGG database (Blanca et al., 2011;Torre et al., 2014). Therefore, KEGG pathway analysis explains the information about biological systems of organisms and the relationships between transcripts and their molecular, cellular, and organism levels (Kanehisa et al., 2008). A total of 30,746 unigenes were associated with 279 KEGG pathways. The results illustrated that the KEGG pathway classification of the P. vera will facilitate to understand related complex traits at transcriptome level in pistachio and in closely related species.
eSSR Polymorphism in Pistacia SSR markers have been widely preferred for genetic diversity studies, construction of consensus genetic linkage maps, QTL mapping, and MAS in breeding programs (Li et al., 2012;Dong et al., 2018). SSR markers from transcriptome sequences are especially valuable because of their preserved gene regions (Taheri et al., 2019). NGS provides more opportunities than detecting classic SSRs and generates enormous data for development of SSRs in many plant species including P. vera (Jazi et al., 2017). The detection of potential SSR markers in pistachio is very easy owing to its high heterozygosity rate (Motalebipour et al., 2016;Jazi et al., 2017). Although Jazi et al. (2017) detected 11,337 potential SSRs in P. vera, a total of 14,308 genic SSR loci were determined in this study.
In P. vera, several reports were published regarding the development of novel genomic SSR markers in pistachio. First, novel 14 SSR markers were developed by Ahmad et al. (2003) Zaloglu et al. (2015) constructed genomic libraries enriched with dinucleotides and trinucleotides repeats. They developed 47 polymorphic SSR loci from P. vera cv. Siirt. Topçu et al. (2016) sequenced 192 clones from enriched (GA) n and (AAG) n motifs. In total, 110 of 135 primers were produced in PCRs, and 64 of them were found polymorphic with 264 alleles in 12 pistachio cultivars. Currently, there are a few studies about the development of genomic SSR markers in Pistacia using NGS technology (Motalebipour et al., 2016;Khodaeiaminjan et al., 2018). Motalebipour et al. (2016) developed SSR markers from P. vera, and their transferability was tested in five wild Pistacia species. Khodaeiaminjan et al. (2018) developed 625 polymorphic in silico SSR markers. We developed here genic SSR markers using NGS and generated 33,341 SSR loci. Motalebipour et al. (2016) developed 204 SSR markers and tested them in five wild Pistacia species. The means of Ho, He, and PIC values in 24 P. vera accessions were calculated as 0.46, 0.55, and 0.50, respectively (Motalebipour et al., 2016). Khodaeiaminjan et al. (2018) used 613 in silico SSR loci with an average number of allele 4.2 in 18 P. vera genotypes. Average values for observed (Ho) and expected (He) heterozygosities and PIC were 0.53, 0.56, and 0.51, respectively, in 18 P. vera cultivars. In this study, we used 55 eSSR loci with an average number of 2.73 allele in only P. vera.
There are a limited number of studies about development of SSR markers in wild Pistacia species. Albaladejo et al. (2008) developed novel SSR markers from P. lentiscus using enriched library method and produced 59 alleles from eight SSR markers, ranging from 3 to 13 per locus, although a total of 131 alleles were generated by 55 eSSR loci in P. lentiscus genotypes in this study. Arabnezhad et al. (2011) designed 27 primer pairs enriched for dinucleotide (AG) n and trinucleotide (ATG) n motifs from P. khinjuk sequences. A total of 114 alleles were generated with an average number of allele Na = 2.8 per locus in all Pistacia accessions (Arabnezhad et al., 2011). In this study, 52 of 55 eSSRs were found polymorphic, and a total of 152 alleles were generated with an average number of allele Na = 2.8 in five P. khinjuk accessions.

Phylogeny of Pistacia Species
In this study, structure analysis separated Pistacia accessions in seven main clusters. P. khinjuk was the closest species to cultivated P. vera as in previous studies (Zohary, 1952;Perl-Treves, 2001, 2002;Golan-Goldhirsh et al., 2004;Al-Saghir and Porter, 2006;Kafkas, 2006a;Motalebipour et al., 2016). P. lentiscus was the most distant species to P. vera in accordance with previous studies (Zohary, 1952;Perl-Treves, 2001, 2002;Golan-Goldhirsh et al., 2004;Kafkas, 2006a,b;Motalebipour et al., 2016). Zohary (1952) defined that P. eurycarpa is a subspecies of P. atlantica (var. kurdica), whereas it was classified as a different species by Yaltırık (1967). In this study, we used two old P. eurycarpa accessions sampled from Göbek village of Gaziantep province to emerge relationship with other Pistacia species. The results clearly demonstrated that P. eurycarpa was closer to P. atlantica than P. khinjuk. Kafkas and Perl-Treves (2001) found that P. atlantica and P. eurycarpa were closely related species at molecular level. Therefore, the results in this study supported statements at morphological level by Yaltırık (1967) and at molecular level by Kafkas and Perl-Treves (2001).
Al Yafi (1978) detected different subspecies of P. atlantica using some leaf characters. He reported that P. mutica had been a subspecies in P. atlantica. Kafkas (2006a) described that P. atlantica and P. mutica were not clearly diverged. Therefore, P. mutica was grouped within P. atlantica. In this study, pairs of species were not prominently separated from each other. The previous findings supported that P. atlantica and P. mutica species were one of the closest pairs of species (Al-Saghir and Porter, 2006;Kafkas, 2006a).
P. integerrima and its hybrids have been used as rootstock for P. vera in California. Zohary (1952) defined P. integerrima as a subspecies of P. chinensis, whereas Parfitt and Badenes (1997) classified it as a distinct species. In this study, P. integerrima and P. chinensis were clearly separated from each other. Similar results were also obtained by Kafkas and Perl-Treves (2001), Kafkas (2006a), and Motalebipour et al. (2016).
There is still discussion about whether P. terebinthus and P. palaestina are same or different species. The first classification was performed by Engler (1883), who considered P. palaestina as a variety of P. terebinthus. However, Zohary (1952) described that P. palaestina was a different species from P. terebinthus. On the other hand, Yaltırık (1967) reported that P. palaestina was a subspecies of P. terebinthus. Kafkas and Perl-Treves (2001) and Kafkas (2006a) supported Yaltirik and Engler's classification studies. In this study, these species were not prominently diverged from each other. Therefore, our hypothesis is that they are same species, and P. palaestina can be a subspecies of P. terebinthus.
In this study, six unknown Pistacia genotypes were grouped with P. palaestina and P. terebinthus according to UPGMA analysis. Structure analysis demonstrated that they have close relationships with cultivated P. vera. This situation can be described that these genotypes may be hybrids between P. vera and P. palaestina or P. terebinthus.

CONCLUSION
Transcriptome sequencing provided opportunities for mining easy and cost-effective SSR markers using NGS platform. A total of 98,831 unigenes in this study can be useful for genome annotation in P. vera and in related species in the future. The SSR distribution frequency in pistachio transcriptome was one SSR per 17.75 kb. A total of 14,308 eSSRs were defined using transcriptome data of pistachio, and they can be used in studies such as germplasm characterization, population and evolutionary studies, marker-assisted breeding, and association and QTL mapping in Pistacia species. This was the first study characterizing 10 Pistacia species by genic SSRs and provided important findings on the taxonomy of Pistacia species.

DATA AVAILABILITY STATEMENT
The sequence data have been deposited in the National Center for Biotechnology Information (NCBI) under BioProject accession number PRJNA648340.

AUTHOR CONTRIBUTIONS
HK, EI, and SK prepared plant materials. HK, AP, HT, and SK performed the bioinformatic and SSR analysis. HK, AP, and SK wrote the manuscript. All the authors contributed to the article and approved the submitted version.