Validating DNA Polymorphisms Using KASP Assay in Prairie Cordgrass (Spartina pectinata Link) Populations in the U.S.

Single nucleotide polymorphisms (SNPs) are one of the most abundant DNA variants found in plant genomes and are highly efficient when comparing genome and transcriptome sequences. SNP marker analysis can be used to analyze genetic diversity, create genetic maps, and utilize marker-assisted selection breeding in many crop species. In order to utilize these technologies, one must first identify and validate putative SNPs. In this study, 121 putative SNPs, developed from a nuclear transcriptome of prairie cordgrass (Spartina pectinata Link), were analyzed using KASP technology in order to validate the SNPs. Fifty-nine SNPs were validated using a core collection of 38 natural populations and a phylogenetic tree was created with one main clade. Samples from the same population tended to cluster in the same location on the tree. Polymorphisms were identified within 52.6% of the populations, split evenly between the tetraploid and octoploid cytotypes. Twelve selected SNP markers were used to assess the fidelity of tetraploid crosses of prairie cordgrass and their resulting F2population. These markers were able to distinguish true crosses and selfs. This study provides insight into the genomic structure of prairie cordgrass, but further analysis must be done on other cytotypes to fully understand the structure of this species. This study validates putative SNPs and confirms the potential usefulness of SNP marker technology in future breeding programs of this species.


INTRODUCTION
Prairie cordgrass (Spartina pectinata Link) is a native grass species of the North American Prairie that has a geographic distribution, ranging from the southern U.S. (Texas, Arkansas, and New Mexico) to northern Canada, and from the east coast through the Midwest to the western coast of the U.S. (Hitchcock, 1950;Voight and Mohlenbrock, 1979;Barkworth et al., 2007;Gedye et al., 2010). This species is adapted to a wide range of environmental conditions and, in addition, responds well to abiotic stresses, such as moderate salinity, water logged soils, drought, and cold tolerance (Montemayor et al., 2008;Boe et al., 2009;Gonzalez-Hernandez et al., 2009;Kim et al., 2011;Zilverberg et al., 2014;Anderson et al., 2015). Because of its wide adaptability, this warm season, C4, perennial grass is highly valued for conservation practices, wetland revegetation, streambank stabilization, wildlife habitat, forage production, and recently bioenergy feedstock production (Hitchcock, 1950;Barkworth et al., 2007;Montemayor et al., 2008;Gonzalez-Hernandez et al., 2009;Kim et al., 2011;Boe et al., 2013;Zilverberg et al., 2014;Guo et al., 2015). This ability to adapt to such a wide diversity of conditions results in populations becoming adapted to specific environments, ultimately leading to genetically diverse populations. Adding to the potential genetic diversity of prairie cordgrass is polyploidy.
SNPs provide a highly efficient way to conveniently compare genomic and transcriptome sequences. Because they are one of the most abundant DNA variants found in plant genomes, SNPs are more likely to be related to specific biological functions and phenotypes (Rafalski, 2002;Bundock et al., 2006;Salem et al., 2012). This technology has been applied in genetic diversity analysis, genetic map construction, association map analysis, and marker-assisted selection breeding in many different types of crop species (Byers et al., 2012;Saxena et al., 2012;Semagn et al., 2014;Sindhu et al., 2014;Wei et al., 2014). SNP marker technology is also utilized in high-throughput genotyping, increasing the speed of the selection process by eliminating growing plants to maturity for phenotypic selection (Paux et al., 2012). In order to use SNP markers for genetic improvement, there is a three-step process one must follow: (1) SNP discovery after aligning sequence reads generated by next-generation sequencing technologies for different genotypes of a given species; (2) validate SNPs to distinguish DNA polymorphisms of actual allelic variants from those of other biological phenomena such as gene duplication events; (3) SNP genotyping of germplasm collection or genetic/breeding populations (Saxena et al., 2012).
Step one of the process was accomplished in prairie cordgrass by using a transcriptome assembly derived from multiple genotypes and tissues (Gonzalez et al., personal communication). The second and third steps are yet to be completed for polyploid prairie cordgrass. Several parameters, such as sample size, number of SNPs to be used for analysis, cost effectiveness, and the SNP genotyping platform, must be considered in these analyses (Semagn et al., 2014). Many technologies exist for use in SNP genotyping analysis, but one technology performs well when it comes to adaptability, efficiency, and cost-effectiveness. Kompetitive allele-specific PCR (KASP), developed by LGC Genomics (Teddington, UK; www.lgcgenomics.com), is a PCRbased homogeneous fluorescent SNP genotyping system, which determines the alleles at a specific locus within genomic DNA (Semagn et al., 2014). The KASP technology has been utilized on other polyploid plant species, including switchgrass (LGC Genomics, 2014), cotton (Byers et al., 2012), wheat (Paux et al., 2012), potato (Uitdewilligen et al., 2013), and various triploid citrus species (Cuenca et al., 2013).
In this study, SNPs, identified in the nuclear transcriptome, were converted to the KASP marker system in order to validate that these SNPs are true allelic variants. In addition, KASP markers were used in quality control analysis when making crosses, prairie cordgrass being a putative self-compatible species. The main objectives of this study were (1) to validate SNP polymorphisms identified in the nuclear transcriptome of natural populations of prairie cordgrass in the U.S. and (2) to assess the fidelity of specific tetraploid crosses and selfs, and to elucidate inheritance patterns of SNP markers.

Development and Validation of KASP Genotyping Assays
In a separate study by Gonzalez et al. (personal communication) at South Dakota State University, a transcriptome of prairie cordgrass was assembled using ∼1.2 billion Illumina pairedend reads from various vegetative tissues (roots, leaves, and rhizomes) under various conditions (salt stress, cold stress, and differing photoperiods) in order to obtain an abundance in diversity, with regards to the number and type of transcripts. The assembly was developed using CLC Genomics Workbench 7.0 (Arhaus, Denmark) and annotated against the sorghum genes models. About 146,549 contigs, or transcript assemblies, of 230 bp or more with an N50 of 973 bp were used to mine over 1 million SNPs, insertions, and deletions using the variant detection function in CLC Genomics Workbench. Putative SNPs were filtered based on coverage (minimum of 100 X), a window of 80-100 bp free from additional SNPs and an allele frequency of 20-80%. Initially, nine bi-allelic SNPs were selected for analysis, associated with enzymes within the lignin biosynthesis pathway. Additional SNPs were selected without regard to putative function of the transcript assembly. A total of 121 bi-allelic SNPs were identified for use in this study ( Table 1). SNPs were sent for primer development to be used in KASP genotyping assays. Genotyping with KASP was performed as follows.

Core Collection Analysis
In order to validate SNP polymorphisms of prairie cordgrass using KASP, seeds and rhizomes of natural populations were collected from across the continental U.S.A. (Kim et al., 2013) and grown at the Energy Biosciences Institute (EBI) Farm, Urbana, Illinois, USA. Individuals from 38 of these populations were selected as core collection based on geographic distribution; and two plants from each population were sampled, for a total of 76 plants ( Table 2). Leaf tissue samples were stored at −80 • C until DNA extraction was performed. Total genomic DNA was extracted from frozen leaf tissue using the CTAB method (Mikkilineni, 1997) with slight modifications as described by Kim et al. (2013). Fifty-nine KASP genotyping assays out of 121 were selected and used to analyze the collection and five additional Spartina species samples, namely; S. alterniflora, S. patens (Flageo vt.), S. patens (Sharp vt.), S. patens, and S. bakeri. All of the KASP genotyping assay results were recorded as a twoletter code, or SNP code, i.e., AA, AG, GG. A DNA fingerprint was made using all the SNP genotypes creating a concatenated DNA-like sequence, which was then imported into MEGA 6 (Tamura et al., 2013) to make a phylogenetic tree. The maximum parsimony (MP) tree, inferred from 1000 replicates, was obtained using the Subtree-Pruning-Regrafting algorithm with a search level one in which the initial trees were obtained by the random addition of sequences (Felsenstein, 1985;Nei and Kumar, 2000). All positions with <95% site coverage were eliminated.

F 1 Cross
In order to assess the utility of the KASP marker system in confirming specific tetraploid crosses of prairie cordgrass, a reciprocal cross involving two individuals (PC17-109 × PC20-102) of two populations differing in morphological characteristics of potential agronomic importance was developed. PC17-109 is a tetraploid population from Illinois with a phalanx rhizome type and low seed mass, whereas PC20-102 is a tetraploid population from Kansas with a guerilla rhizome type and high seed mass. In a greenhouse, the female inflorescence was covered ∼1 day prior to stigma emergence, while pollen was collected from the male parent. Pollen was directly applied to the stigmas with a brush, and rebagged until anthesis was completed. A total of 83 individuals, 70 F 1 individuals from PC17-109 (female) × PC20-102 (male) and 13 F 1 individuals from PC20-102 (female) × PC17-109 (male) were sampled. F 1 seeds were planted in greenhouse setting. Leaf tissue samples of each seedling were collected and stored at −80 • C until DNA extraction was performed. Total genomic DNA was extracted from frozen leaf tissue as described previously. For the F 1 individuals, 12 KASP genotyping assays were selected based on the parental SNP genotypes ( Table 3). All of the assay results were recorded as two-letter SNP codes. To determine if the F 1 progeny followed segregation of a typical monohybrid cross in relation to SNP genotype, a χ 2 analysis was performed using P = 0.05, df = 2, and χ 2 critical value = 5.991. The observed, along with the expected genotype, was recorded for each KASP genotyping assay.

F 2 Self
To assess the utility of the KASP marker system in identifying selfed individuals in the tetraploid background and gauge the segregation pattern, F 2 individuals were generated and genotyped. In a greenhouse, the prairie cordgrass inflorescence was covered ∼1 day prior to stigma emergence with bags constructed to view progression of inflorescence development of F 1 plants. When anthesis was reached, the bags were shaken to promote self-pollination. Bags remained until anthesis was complete. F 2 seeds were collected and planted in a greenhouse setting. A total of eight F 1 individuals were selfed (6 F 1 of PC17-109 × PC20-102 and 2 F 1 of PC20-102 × PC17-109) and 8-11 individuals were sampled from the planted seeds of each of the selfed plants (total of 76). Leaf tissue samples were stored at −80 • C until DNA extraction was performed. All 12 of the KASP genotyping assays selected to score the F 1 individuals were also tested on the F 2 individuals. All of the assay results were recorded as a SNP code as done in the F 1 analysis. All SNP codes that were not accurately identified were removed from analysis.

Development and Validation of KASP Assays
Twenty-six (21.5%) SNPs failed KASP marker development. From the remaining 95 (78.5%), 11 SNPs were found to be monomorphic when tested on the core collection DNA, resulting in 84 SNPs that were true allelic variants. Three of the eleven monomorphic markers were selected to discover if future plant samples would reveal the SNP polymorphisms previously identified in the transcriptome. From the 84 allelic variants, 56 of the most highly polymorphic SNPs were selected for further use in this study, resulting in 59 total KASP genotyping assays ( Table 4).

Core Collection
The resulting data set from the DNA fingerprint contained 118 characters. There was an average of 3.8 missing character data points (SNP codes) per population. The maximum parsimony tree identified one clade after correcting for the missing data (Figure 1). For 47.4% of the populations, plants sampled from the same populations were observed to form subclades; however, intrapopulational variation was observed.
Out of the 38 prairie cordgrass populations, 52.6% showed polymorphisms within populations. Of the 52.6% polymorphic populations, 50% were octoploid and 50% were tetraploid. Out of the 15 octoploid populations sampled, 66.7% of the populations showed polymorphisms between the two plants sampled and 43.5% of the 23 tetraploid populations showed polymorphisms. The average number of polymorphisms that occurred within each population was 16. In the octoploid populations, 16.4 was the average number of polymorphisms observed, and 15.5 polymorphisms were observed as the average for tetraploids.

F 1 Analysis
Only 6 out of 59 possible KASP genotyping assays showed both parents as homozygous SNPs but for opposite alleles. Three representative assays were selected which showed one SNP heterozygous for one parent and one SNP homozygous for the other parent, and three representative assays were selected which showed both parents as heterozygous SNPs ( Table 3). All SNP codes that could not be accurately identified or called, due to not appearing in one of the three genotypes, were removed from the χ 2 analysis. Four individuals did not consistently satisfy the expected heterozygous SNP genotype, with regards to KASP genotyping assays for which both parents were homozygous for opposite alleles (pcg_00050, pcg_00058, pcg_00059, pcg_000106, pcg_1186, and pcg_14142). These four individuals, after being analyzed across all 12 assays, were identified as being selfs, and were removed from the χ 2 analysis ( Table 3). Using the resulting trimmed data, the χ 2 analysis indicated normal monohybrid 1:2:1 and 1:1 Mendelian inheritance patterns and could not be rejected for any of the primers ( Table 5).

F 2 Analysis
The F 1 parent genotype was identified in order to find SNPs that indicated the parent was homozygous ( Table 6). For 3 F 1 parents that were selfed, there were F 2 progeny that did not fall into the expected homozygous parental genotype (example in Table 7). Two F 2 progeny were identified consistently as unexpected offspring genotype of 13-F1008, 1 progeny of 14-F1014, and 4 progeny of 14-F1071. Individuals that consistently fell into the heterozygous (unexpected) genotype category across multiple homozygous primers were considered outcrosses and not true selfs of the F 1 ( Table 7). Most of the F 2 progeny were identified as expected SNP genotypes when considering the parental genotype.

DISCUSSION
In order to validate SNP polymorphisms in prairie cordgrass, 121 SNPs identified from the nuclear transcriptome were sent for KASP assay development. Among 121 SNPs, the assay success rate was 78.5% with 26 assays failing development. This is comparable with findings in the literature of success rates of 83% (Cockram et al., 2012), 88.4% (Saxena et al., 2012), and 80.9% (Semagn et al., 2014). The assays failed mainly due to paralogs within the prairie cordgrass genome. Because not all of the populations used to develop the transcriptome were in the core collection of DNA used in this study, some assays appeared as monomorphic. These selected SNPs may have been derived from the octoploid populations not present in the core collection. Three monomorphic SNPs were selected for further analysis, to see if the SNPs would be polymorphic in future studies. With the failed and monomorphic assays removed, 84 putative SNPs were validated as true allelic variants and 59 SNPs were selected for this study. The 59 highly polymorphic assays were selected based on the criteria that there were at least two of the three genotypes present in a large portion of the samples analyzed. These assays were tested on the 38 natural populations, creating a phylogenetic tree that resulted in one clade containing all of the prairie cordgrass populations. If subclades were observed, the two plants of a single population were represented in the subclade. Just over half of the populations showed polymorphisms within, with an equal number of octoploid and tetraploid populations. The average number of polymorphisms that occurred within each population did not vary between octoploid and tetraploid populations. This is different from a chloroplast DNA study of prairie cordgrass, in which there was little, if any, polymorphisms observed in the tetraploid cytotype (Graves et al., 2015).
SNPs were successfully identified in nuclear transcriptomes of prairie cordgrass and validated as allelic variants that can be used in prairie cordgrass. SNP markers were used to detect significant polymorphisms in prairie cordgrass populations collected from distinct geographic regions in the U.S. These SNP polymorphisms appear to reflect genetic relationships in prairie cordgrass and, therefore, can be used to assess genetic diversity within and among populations in future studies.
The F 1 population, consisting of 83 plants, allows for the assessment of the fidelity of a specific tetraploid cross. Due to the lack of synchronization between the pollen and the ovaries, fewer seeds were obtained when PC20-12 was used as the female, compared with crosses involving PC17-109 as the female. Progeny that had SNP genotypes matching the female parent only were determined to be selfs. Of the F 1 progeny, 95.2% were identified to be hybrids. Prairie cordgrass is a protogynous outcrossing species (Gedye et al., 2012), leading to the possibility that later-maturing stigmas could have been exposed to pollen from the same female parent, resulting in 4.8% of the F 1 being selfs. The analysis of the 76 F 2 progeny obtained by selfing eight F 1 plants indicate that the SNPs, and the SNP markers chosen, could distinguish between a true selfed plant and an outcrossed plant. This is based on individuals   consistently being genotyped as heterozygous (outcrossed) rather than being homozygous (selfed) as expected. Ninety-one percent of the F 2 progeny were identified as successful selfs. Because of the protogynous nature of this species, there is already a natural element working against selfing. This could explain why outcrossed individuals were identified. There is also a possibility that some of the early-maturing stigmas were exposed to pollen in the greenhouse before bagging. This could explain why more F 2 progeny were identified as unexpected genotypes (outcrosses) than the expected genotype (selfs) of the F 1 progeny.
There is evidence that the tetraploid cytotype is an allotetraploid that may follow a disomic inheritance pattern. Two divergent copies in the Waxy lineages of Spartina genus support the allotetraploid origin of S. pectinata (Fortune et al., 2007). The bivalent pairing that occurs during meiosis (Church,    Analysis indicates that all primers produce expected results from a monohybrid Mendelian cross. df = 2, p = 0.05, critical χ 2 = 5.991. 1940; Marchant, 1968a,b;Bishop, 2015) and the observation of disomic inheritance using genotyping-by-sequencing (Crawford, 2015) both suggest a disomic inheritance pattern in S.
pectinata. This hypothesis was tested in a cross between two prairie cordgrass populations, exploiting the bi-allelic nature of the KASP technology to suggest Mendelian segregation ratios in a monohybrid type cross. The analysis of the F 1 hybrids and F 2 selfs conclude that disomic inheritance of SNPs in tetraploid prairie cordgrass is in agreement with the chromosomal and genomic evidence, and a possibility in this cross (Marchant, 1968a,b;Fortune et al., 2007;Bishop, 2015;Crawford, 2015). The primary requirement of any breeding program is to ensure that accurate crosses are made (Glaszmann et al., 2010). The small flower size of prairie cordgrass and the large number of flowers per head make it hard to perform physical emasculation. Possibilities of self-pollination always exist and, therefore, developing a molecular way to confirm true crosses from selfs is warranted (Fang et al., 2004;Gedye et al., 2012). In prairie cordgrass, SSR markers have been developed that identified successful crosses in this protogynous species without the need for emasculation. This study also confirms that hybrids of prairie cordgrass can be created and verified with molecular markers. However, utilizing SSRs can be time-consuming, limited in number, and more expensive than SNP markers, making a way for the introduction of these newly developed and validated KASP assays.

CONCLUSION
This study reports the first research of SNP marker development for use in prairie cordgrass. SNP markers developed from the nuclear transcriptome were tested on a core collection of DNA and found to be polymorphic among and within populations. The amount of variation differs from previous findings based on chloroplast DNA, which identified the octoploid cytotype as the