SNP Discovery and Genetic Variation of Candidate Genes Relevant to Heat Tolerance and Agronomic Traits in Natural Populations of Sand Rice (Agriophyllum squarrosum)

Zhao, Pengshan; Zhang, Jiwei; Qian, Chaoju; Zhou, Qin; Zhao, Xin; Chen, Guoxiong; Ma, Xiao-Fei

doi:10.3389/fpls.2017.00536

ORIGINAL RESEARCH article

Front. Plant Sci., 07 April 2017

Sec. Plant Genetics and Genomics

Volume 8 - 2017 | https://doi.org/10.3389/fpls.2017.00536

SNP Discovery and Genetic Variation of Candidate Genes Relevant to Heat Tolerance and Agronomic Traits in Natural Populations of Sand Rice (Agriophyllum squarrosum)

PZ
Pengshan Zhao ^1,2^*
JZ
Jiwei Zhang ¹
CQ
Chaoju Qian ¹
QZ
Qin Zhou ¹
XZ
Xin Zhao ^1,2
GC
Guoxiong Chen ^1,2
XM
Xiao-Fei Ma ¹

1. Key Laboratory of Stress Physiology and Ecology in Cold and Arid Regions, Northwest Institute of Eco-Environment and Resources, Chinese Academy of Sciences Lanzhou, China
2. Shapotou Desert Research and Experiment Station, Northwest Institute of Eco-Environment and Resources, Chinese Academy of Sciences Lanzhou, China

Abstract

The extreme stress tolerance and high nutritional value of sand rice (Agriophyllum squarrosum) make it attractive for use as an alternative crop in response to concerns about ongoing climate change and future food security. However, a lack of genetic information hinders understanding of the mechanisms underpinning the morphological and physiological adaptations of sand rice. In the present study, we sequenced and analyzed the transcriptomes of two individuals representing semi-arid [Naiman (NM)] and arid [Shapotou (SPT)] sand rice genotypes. A total of 105,868 pairwise single nucleotide polymorphisms (SNPs) distributed in 24,712 Unigenes were identified among SPT and NM samples; the average SNP frequency was 0.3% (one SNP per 333 base pair). Characterization of gene annotation demonstrated that variations in genes involved in DNA recombination were associated with the survival of the NM population in the semi-arid environment. A set of genes predicted to be relevant to heat stress response and agronomic traits was functionally annotated using the accumulated knowledge from Arabidopsis and several crop plants, including rice, barley, maize, and sorghum. Four candidate genes related to heat tolerance (heat-shock transcription factor, HsfA1d), seed size (DA1-Related, DAR1), and flowering (early flowering 3, ELF3 and late elongated hypocotyl, LHY) were subjected to analysis of the genetic diversity in 10 natural populations, representing the core germplasm resource across the area of sand rice distribution in China. Only one SNP was detected in each of HsfA1d and DAR1, among 60 genotypes, with two in ELF3 and four in LHY. Nucleotide diversity ranged from 0.00032 to 0.00118. Haplotype analysis indicated that the NM population carried a specific allele for all four genes, suggesting that divergence has occurred between NM and other populations. These four genes could be further analyzed to determine whether they are associated with phenotype variation and identify alleles favorable for sand rice breeding.

Introduction

Ongoing climate change and the increasing global population are continuous threats to global food security (Tester and Langridge, 2010; Lobell et al., 2011; Lobell and Gourdji, 2012; McCouch et al., 2013; Wheeler and von Braun, 2013). The negative influence of climate change on crop production has motivated scientists improve staple crops by exploitation of the genetic resource available in their wild relatives (Tester and Langridge, 2010; McCouch et al., 2013); however, the simultaneous development of new crops among the neglected and underutilized species will also crucial for sustainable and intensified food production (Mayes et al., 2012; Chen et al., 2014; Zhao et al., 2014). The Amaranthaceae species, sand rice (Agriophyllum squarrosum), has been among crops used for army provisions since the Tang Dynasty (AD 618–907) and is still an important component of local food for people inhabiting the Hexi Corridor along the ancient Silk Road in the northwest of China (Gao, 2002; Chen et al., 2014). Due to its high nutritional value and extreme stress tolerance, sand rice represents a suitable alternative food crop, resilient to climate change (Chen et al., 2014; Zhao et al., 2014).

Sand rice originates from the Gurbantunggut desert and is widely distributed on the mobile and semi-mobile sand dunes across the arid regions of northern China (Zhao et al., 2014; Qian et al., 2016). To thrive in a desert environment, sand rice has evolved many morphological traits and adaptation strategies to mitigate the risk of extinction due to climate variability, such as extreme temperatures, unpredictable precipitation, strong solar radiation, and other environmental stresses (Chen et al., 2014; Zhao et al., 2014). Comparative transcriptome analysis has been used to identify candidate genes related to abiotic stress tolerance and unique traits of this species (Zhao et al., 2014, 2016). Some core genetic elements involved in environmental responses are conserved among plant species; however, the genetic mechanisms underpinning the morphological and physiological adaptations of sand rice remain poorly understood.

Natural variation is the genetic basis for species adaptation to different environments and the identification of genomic polymorphisms, for example single nucleotide polymorphisms (SNPs), is essential for in-depth analysis of genes and alleles involved in plant evolution and environmental adaptation (Mitchell-Olds and Schmitt, 2006; Alonso-Blanco et al., 2009; Lasky et al., 2012). Extensive whole genome SNP analysis of several model and crop plants, such as Arabidopsis thaliana, rice, and maize, has been performed during past decades for a variety of purposes, including studies of genetic diversity, genome evolution, association mapping, and domestication (McNally et al., 2009; Huang et al., 2010; Lai et al., 2010; Hancock et al., 2011; Kump et al., 2011; van Heerwaarden et al., 2011; Horton et al., 2012).

The climate varies considerably across the geographic range of sand rice, and precipitation and mean temperature of the coldest quarter strongly influence its distribution (Qian et al., 2016). Furthermore, phenology variation has been observed among natural populations of sand rice (Zhao et al., 2014; Yin et al., 2016); for example, the seed size from semi-arid region (Naiman, NM) is larger than that from arid region [Shapotou (SPT); Figure 1]. Moreover, the flowering of NM sand rice occurs earlier than that of SPT at the common garden; SPT individuals are more tolerant to heat stress. Large scale analyses of genetic variation will be crucial for understanding the genetic mechanisms underlying adaptation of sand rice to climate features or local environmental conditions, and provide a foundation for the discovery and isolation of markers useful for its subsequent domestication.

FIGURE 1

Previous studies in Arabidopsis have elucidated complex pathways regulating heat stress response, seed size, and flowering time (Fornara et al., 2010; Bokszczanin et al., 2013; Verhage et al., 2014; Li and Li, 2016). For example, the heat-shock transcription factor HsfA1d is one of master regulators with a critical role in evoking the transcription cascade to confer heat stress response (Ohama et al., 2016). The ubiquitin receptors, DA1 and DA1-related protein (DAR1), are core genetic elements controlling cell proliferation in the integument via the ubiquitin-proteasome pathway (Li and Li, 2016). Moreover, late elongated hypocotyl (LHY) and early flowering 3 (ELF3) are morning- and evening-phased components involved in circadian rhythm regulation in Arabidopsis (Hicks et al., 1996; Nusinow et al., 2011; Hsu and Harmer, 2014).

With the advancement of sequencing technologies, genome scale analyses of sequence polymorphism has become feasible and cost effective for non-model plants (Silva-Junior et al., 2011; Zou et al., 2014; Pavy et al., 2016). Analysis of the genomes of populations from different environments could identify genes and alleles favored in specific local climates (Henry, 2014). In this study, two representative sand rice plants from arid and semi-arid regions (SPT and NM) were sequenced using the RNA-seq method to generate a genome wide dataset of SNP polymorphisms. Four candidate genes, HsfA1d, DAR1, LHY, and ELF3, were selected for further assessment of allelic diversity in 10 natural populations, which represent the core germplasm resources of sand rice, according to the results of phylogeographic analysis (Qian et al., 2016).

Materials and Methods

Ethics Statement

Shapotou and NM seeds were collected from SPT Desert Research and Experimental Station and NM Desertification Research Station, Northwest Institute of Eco-Environment and Resources, Chinese Academy of Sciences. Specific permits were not required to sample the seeds for this study.

Seed Size

Seeds were sampled from five independent plants at each of the NM and SPT stations. For each plant, 100 seeds were used to calculate the average seed weight and 10 seeds were used to determine the average diameter.

Plant Materials, RNA Extraction, Transcriptome Sequencing, and Unigene Annotation

Naiman seeds were germinated in a growth chamber and then transferred into pots filled with nutritional soil in the base with an upper layer of sand in a green house. The shoot and root of a 3-month-old seedling (Supplementary Figure 1) were collected separately and total RNA was extracted using a Plant total RNA Kit (TIANGEN, Beijing, China). Equal amounts of RNA from shoot and root were mixed together for cDNA library construction. Library was prepared as described by Zhao et al. (2014) and transcriptome sequencing was performed on the Illumina HiSeqTM 2500 platform using 125 bp paired-end reads at Biomarker Technologies (Beijing, China). A total of 16.40 million reads were obtained with 93.01% achieving quality scores above Q30. Raw data were deposited in the NCBI Short Read Archive with the accession number SRR5271162.

Reads from a previous transcriptome analysis of plant from the SPT region were downloaded from the NCBI Sequence Read Archive (SRR1559276, Zhao et al., 2014). After filtering, high quality reads from the two samples were assembled together by Trinity program with default settings (Grabherr et al., 2011). Contigs were clustered based on sequence similarity and paired-end information and then assembled into transcripts. Finally, singletons and the longest transcripts in each cluster were selected as a total reference Unigene set for sand rice. Histograms of Unigene length and GC content data were generated using an R script (R Development Core Team, 2008). Whole Unigenes were blasted against public protein databases, including the NCBI non-redundant protein (Nr) database, the Swiss-Prot protein database, Clusters of Orthologous Groups of proteins (COG), Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Eukaryotic Orthologous Groups (KOG), using a threshold less than 1E-5. The predicted ORFs for each Unigene were aligned against the Pfam database using the program HMMER (Eddy, 1998) to increase the number of annotated Unigenes.

SNP Calling and Statistical Analyses

Clean reads from each sample were re-mapped to reference Unigenes using the STAR program (Dobin et al., 2013) and SNP calling was conducted using GATK with the standard filter method (McKenna et al., 2010). Two main parameters were used to filter SNPs: (1) more than three mismatches in the adjacent 35 bp and (2) SNP quality score less than 2.0 after normalization for sequencing depth. This study mainly focused on pairwise SNPs, amongst which three types were classified based on their homozygous and heterozygous status in each sample. Confidently called SNPs were extracted from each Unigene and SNP frequencies were calculated by dividing the Unigene length by its number of SNPs. Histograms of SNP numbers and frequencies per Unigene, in addition to boxplots and ANOVA analysis of SNP frequency for each category, were performed using an R script (R Development Core Team, 2008).

For the GO enrichment analysis, a hypergeometric test with Bonferroni adjustment was used to identify enriched GO terms for Unigenes containing the pairwise SNPs, where the reference Unigene set served as the background. Terms were defined as enriched when adjusted p-values were less than 0.01.

SNP Validation

Primers were designed to amplify fragments spanning one or two SNPs in 50 randomly selected Unigenes identified as containing pairwise SNPs by RNA-seq. Two DNA samples from individual SPT and NM plants served as templates for validation the predicted SNPs. ExTaq (TaKaRa, Dalian, China) was used to amplify the fragments with 1.5 μl template DNA in 20 μl reaction volume. The PCR reaction was performed in a C1000 TOUCH thermal cycler and the amplified products were sequenced using an ABI Prism 3730xl sequencer at Majorbio (Shanghai, China).

Population Samples, Candidate Gene Sequencing, and Genetic Diversity Analyses

A total of 60 individuals from 10 natural populations (FK, DH, QHH, YJ, M4, MQ, SPT, JB, DLSH, NM) as described in Qian et al. (2016) were used to determine the genetic diversity of the three candidate genes (HsfA1d, DAR1, LHY, and ELF3). Samples from two individual plants from the SPT and NM populations were first amplified and sequenced in both directions to confirm the suitability of primers for population sequencing. PCR and sequencing were performed as described above. Population sequencing results for each candidate gene were aligned using BioEdit version 7.2.5 (Hall, 1999). After removing poor quality nucleotide sequence at the 5′ and 3′ termini, alignment files were subjected to further analysis of diversity indices using DnaSP version 5.10.01 (Librado and Rozas, 2009).

Results

Transcriptome Assembly and Annotation

An NM sand rice transcriptome library was constructed by mixing shoot and root RNA in a 1:1 ratio and pair-ended sequencing yielded 16.4 million clean reads. These reads were assembled, together with those from an individual sand rice plant from the SPT region (SRR1559276, Zhao et al., 2014) and a total of 91,884 Unigene were finally obtained with an average length of 725.98 bp and an N50 length of 1220 bp (Figure 2A). The average GC content was 39.14, and 69.36% of the total Unigenes had GC contents more than 35% (Figure 2B). All Unigenes were then blasted against six public protein databases, including the Nr, Swissprot, COG, GO, KOG, and KEGG, with a threshold of less than 1E-5. To increase the number of annotated Unigenes, predicted proteins were also aligned with the Pfam database. As shown in Table 1, 34,939 Unigenes (38.02%) could be matched to protein sequences or domains available in these databases.

FIGURE 2

Table 1

Annotation database	Annotated number	Length ≥ 300 bp	Length ≥ 1000 bp
COG	11,985	9,939	6,009
GO	18,973	15,037	8,244
KEGG	7,148	5,851	3,431
KOG	18,988	15,620	8,815
Pfam	21,641	18,910	12,138
SWISS	20,343	17,670	10,851
Nr	34,526	27,639	14,381
Total	34,939	27,841	14,403

Summary of Unigenes annotation.

Pairwise SNP Detection, Statistical Analyses, and GO Annotation

The clean reads from NM and SPT samples were re-mapped to the assembled reference Unigene set and SNPs were detected using GATK with default settings (McKenna et al., 2010). After quality filtering, a total of 111,971 and 119,702 SNPs were identified in NM and SPT individuals, respectively (Supplementary Table 1). The pairwised SNPs between NM and SPT were then isolated and classified into three types (Figure 3): 81,980 were inter-individual, which were homozygous in both samples; 3,499 were specific to NM, 20,389 to SPT, these were heterozygous in one individual but homozygous in the other. Fifty Unigenes were randomly selected for validation the predicted SNPs, of which 30 were successfully amplified and sequenced. The failed amplication might be from amplicons spanned a large intron or primers located at the exon/intron boundaries. Finally, 38 out of 50 SNPs were validated in these 30 Unigenes and another six were confirmed as homozygous SNPs. The predicting accuracy of our SNP dataset reached 76% (Supplementary Figure 2).

FIGURE 3

Single nucleotide polymorphism distribution and frequency are important indices for genome wide SNP development. In this study, all pairwise SNPs were distributed in 24,712 of 91,884 reference Unigenes, with approximately 57% (14,130) containing fewer than three SNPs (Figure 4A). A histogram of SNP frequency revealed a peak at 0.2 and 61.24% Unigenes were under the average SNP frequency of 0.3% (Figure 4B). Among the 24,712 Unigenes, 22,538 contained inter-individual SNPs, and 1,534 and 9,278 contained NM- and SPT-specific SNPs, respectively (Figure 4C). There were 16,756 Unigenes containing only one type of SNP, while 682 had all three types. Categories were named according to the gene number in each set in the Venn diagram (Figure 5). The mean SNP frequency values for all categories were lower than 0.45%; however, the four categories (SR_682, SR_6614, SR_465, and SR_195) including two or three types of SNP, exhibited higher frequencies than the other categories (p < 0.01; Figure 5). Detailed histograms illustrating SNP numbers and frequencies for each category are presented in Supplementary Figure 3.

FIGURE 4

FIGURE 5

Genes in each category (Figure 4C) were separately subjected to GO enrichment analysis (Table 2 and Supplementary Table 2). For the whole Unigene set (SR_24712), the most enriched GO terms were related to basic biology processes, including ‘DNA integration,’ ‘RNA-dependent DNA replication,’ and ‘translation.’ The GO enrichment results for inter-individual category (SR_22538) were similar to those for SR_24712, except they also included ‘translation elongation’ (GO: 0006414, p = 0.04). The GO term, ‘DNA recombination’ (GO: 0006310) was enriched for NM-specific category (SR_1534, p = 0.02) and not for any of the other categories. Interestingly, the GO term, ‘thylakoid membrane organization’ (GO: 0010027), was found in the enrichment results of SPT-specific category (SR_9278) and SR_24712, suggesting genes in SR_6614 were largely responsible for this annotation (p = 0.03). SR_14777 included only genes containing inter-individual SNPs (Figure 4C) and was enriched for seven GO terms, including ‘response to misfolded protein’ (GO: 0051788) and ‘response to cold’ (GO: 0009409). Of note, the GO term ‘embryo development ending in seed dormancy’ (GO: 0009793) was also enriched in categories SR_24712, SR_22538, and SR_14777.

Table 2

GO ID	GO term	p-values	adj-p-values
SR_24712
GO:0015074	DNA integration	8.00E-17	1.61E-13
GO:0006278	RNA-dependent DNA replication	1.27E-11	2.54E-08
GO:0006412	Translation	2.28E-09	4.57E-06
GO:0055085	Transmembrane transport	2.60E-08	5.20E-05
GO:0009793	Embryo development ending in seed dormancy	3.17E-07	0.000635
GO:0019288	Isopentenyl diphosphate biosynthetic process, methylerythritol 4-phosphate pathway	7.49E-07	0.001498
GO:0043581	Mycelium development	2.17E-06	0.004338
GO:0006357	Regulation of transcription from RNA polymerase II promoter	3.71E-06	0.007416
GO:0006259	DNA metabolic process	5.39E-06	0.010762
GO:0010027	Thylakoid membrane organization	2.09E-05	0.041638
SR_14777
GO:0015074	DNA integration	4.30E-15	6.53E-12
GO:0006278	RNA-dependent DNA replication	3.26E-09	4.94E-06
GO:0051788	Response to misfolded protein	8.55E-07	0.001297
GO:0055085	Transmembrane transport	2.96E-06	0.00449
GO:0009793	Embryo development ending in seed dormancy	6.57E-06	0.00994
GO:0009409	Response to cold	1.00E-05	0.015188
GO:0006412	Translation	1.29E-05	0.019432

Gene ontology enrichment results of SR_24712 and SR_14777 categories.

Candidate Genes Relevant to Heat Tolerance and Agronomic Traits

To isolate candidate genes for heat tolerance and agronomic traits, an all-against-all blast approach was conducted (described in Zhao et al., 2014) between the assembled sand rice Unigenes and the Arabidopsis protein database, resulting in the identification of 10,512 pairs of reciprocal best hits (RBHs) (Figure 6 and Supplementary Table 3). A Literature survey identified 64 heat shock proteins (HSPs), 21 HSFs, 366 flowering-, 59 seed size-, and 21 architecture-related genes in Arabidopsis (Finka et al., 2011; Scharf et al., 2012; Kesavan et al., 2013; Xiao et al., 2013; Dong and Wang, 2015; Li and Li, 2015, 2016; Teichmann and Muhr, 2015) and 31 HSPs, 11 HSFs, 100 flowering-, 28 seed size-, and 11 architecture-related genes had RBHs in the sand rice transcriptome, with mean sequence identities ranging from 50.88 to 64.09% (Figure 6 and Supplementary Table 3). Notably, only two Hsf1A sequences were identified in sand rice. Coincidently, the DA1 was missing from sand rice, while DAR1 was present. These lines of evidence imply that the gene families encoding these proteins may not have undergone expansion during sand rice speciation (Zhao et al., 2014). Furthermore, 41 genes with clear functions in the domestication of crop plants, including teosinte branched 1 in maize (Doebley et al., 1997), ring-type E3 ubiquitin ligase in rice (GW2; Song et al., 2007), non-brittle rachis 1 and 2 in barley (Pourkheirandish et al., 2015), tannin1 in sorghum (Wu et al., 2012), were blasted against the sand rice Unigene set, resulting in the identification of 39 RBHs with an average sequence identity of 53.78% (Figure 5 and Supplementary Table 3). By comparison with SR_24712 category, 30 HSPs, 11 HSFs, 96 flowering-, 26 seed size-, 11 architecture-, and 32 domestication-related genes containing at least one type of SNPs, were identified (Supplementary Table 3).

FIGURE 6

Genetic Diversity of HsfA1d, DAR1, LHY, and ELF3 in Natural Populations

Phylogeographic analysis has revealed that sand rice originated from the Gurbantunggut desert and then dispersed into the central desert region and eastern sandy lands in China (Qian et al., 2016). In this study, 10 natural populations (Figure 7) from the Gurbantunggut desert (FK), Kumutage desert (DH), Qinghai-Tibetan Plateau (QHH and YJ), central deserts (M4, MQ, SPT, JB, DLSH), and eastern sandy regions (NM), were used to investigate the genetic diversity of four candidate genes. Six genotypes in each population were sequenced and the successful genotypes for each gene ranged from 51 to 54 (Table 3). The HsfA1d gene was conserved in nine populations, with a single transversion SNP (T/A) leading to an amino acid change (Glutamate/Aspartate) detected in NM population (Figure 8). Alignment based on the RBH results demonstrated that this SNP was located between the regions encoding the transactivation domain (AHA) and the nuclear export signal (NES) at the C terminus of Arabidopsis HsfA1d. For the seed size gene DAR1, an intron (125 bp) was included in the PCR products and population sequencing identified a single transversion SNP (G/C), specific to the NM population. This synonymous SNP encoded an amino acid located between the LIM and the LIM-associated C-terminal domains (Supplementary Figure 4). Similarly, an 86 bp intron was included in the LHY amplicons and four SNPs were obtained from across the 10 sand rice populations, of which three were transitions (T/C, C/T, and T/C) and one was a transversion (A/T) leading to an amino acid change (Glutamine/Histidine; Supplementary Figure 5). Two transition SNPs were detected in the ELF3 gene. One SNP (G/A) was NM-specific and synonymous, while the other (C/T) was observed in five of six genotypes in the JB population and encode an Alanine to Valine amino acid substitution (Supplementary Figure 6). The aligned sequences for each gene were further subjected to analysis of nucleotide and haplotype diversity using DnaSP. Consistent with the numbers of SNP, nucleotide diversity values for all three genes were very low, ranging from 0.00032 (HsfA1d) to 0.00118 (LHY), while haplotype diversity ranged from 0.208 (HsfA1d) to 0.352 (ELF3).

FIGURE 7

Table 3

Candidate gene	HsfA1d	DAR1	ELF3	LHY
Expected length (bp)	810	539	699	866
PCR length (bp)	810	664	699	952
Sequence length (bp)	654	459	626	849
No. of genotypes	52	51	54	53
No. of indels	0	0	0	0
Indel frequency	0	0	0	0
No. of SNPs	1	1	2	4
Transition	0	0	2	3
Transversion	1	1	0	1
SNP frequency	1/654	1/459	1/313	1/212
Nucleotide diversity (Pi)	0.00032	0.00046	0.00059	0.00118
Watterson’s parameter (𝜃_w)	0.00034	0.00048	0.0007	0.0009
No. of haplotypes	2	2	3	3
Haplotype diversity	0.208	0.212	0.352	0.312

Nucleotide and haplotype diversity of four genes in 10 natural populations.

FIGURE 8

Discussion

Dissection of the genetic variation of sand rice is essential for understanding its extreme stress tolerance and phenotypic variability and will facilitate its future domestication. In this study, a total of 105,868 pairwise SNPs, distributed in 24,712 Unigenes, were identified between SPT and NM samples, with an average SNP frequency of 0.3% (Figures 2, 3). There were a larger number of SNPs specific to SPT (81,980 in 9,278 Unigenes) than to NM (3,499 in 1,534 Unigenes), implying that sand rice in the SPT region may exhibit a higher degree of outcrossing, and, consequently, increased genomic heterozygosity. GO enrichment analysis identified only one term (GO: 0006310, ‘DNA recombination’) significantly enriched for SR_1534 (Table 2), suggesting that variations in specific genes were required for the survival of the NM population in the semi-arid environment.

Environmental stresses, such as drought, heat, and salinity, adversely affect plant photosynthesis (Nouri et al., 2015). The thylakoid membrane is the primary site of photosynthesis inside chloroplasts and its molecular organization is vital for the coordination and regulation of photosynthetic processes (Nouri et al., 2015). Given that SPT is an arid, desert environment, it is logical that the GO term ‘thylakoid membrane organization’ is enriched in the SPT specific category (SR_9278). Two terms associated with the response to misfolded protein (GO: 0051788) and cold (GO: 0009409) were identified as enriched in the inter-individual category (SR_14777). These results may provide evidence that common genetic modules are shared in natural populations, whereas different alleles are favored by direction selection pressure resulting from the prevailing environmental conditions in the NM and SPT regions. Another abundant term in SR_14777 was ‘embryo development ending in seed dormancy’ (GO: 0009793), which is consistent with a survival strategy to cope with sand bury and unpredictable precipitation typical in the geographic range of sand rice (Tobe et al., 2005; Zheng et al., 2005; Liu et al., 2006; Gao et al., 2014).

In Arabidopsis, numerous genes have been identified with important roles in stress responses and controlling agronomic traits, such as large seed size, plant architecture, and earlier flowering. In this study, 10,512 pairs of orthologous genes were identified between Arabidopsis and sand rice (Supplementary Table 3). Among these, c48914.graph_c0 was included in the top 50 Unigenes with the highest numbers of SNPs; the ortholog in Arabidopsis was maintenance of methylation 1 (MOM1), which is involved in silencing the stress-induced expression of transposons and thereby in preventing the transgenerational transmission of epigenetic memory (Iwasaki and Paszkowski, 2014). It is possible that different alleles of MOM1 may have evolved in the NM and SPT sand rice populations to control epigenetic stress memory in the face of reoccurring heat- and other types of stress-induced damages. A number of orthologous genes in the sand rice transcriptome dataset that are candidates for involvement in the heat stress response and the control of agronomic traits are also listed in Supplementary Table 3, and the majority of them contained sequence variation between NM and SPT. Dissecting the genetic diversity of these candidate genes and analyzing their association with phenotypic variability in natural populations is likely to result in the identification of important molecular markers and/or favorable alleles, and to facilitate the domestication process of sand rice.

The sequence identity value (54%) between Arabidopsis HsfA1d and sand rice c46537.graph_c1 suggests that these genes may encode functionally equivalent proteins involved in heat stress tolerance. Quantitative RT-PCR revealed that the expression pattern of c46537.graph_c1 was similar to that of Arabidopsis HsfA1d after heat stress treatment (Supplementary Figure 7). A fragment of c46537.graph_c1 was analyzed in 10 natural populations, resulting in the identification of a single unique allele in the NM population. Similarly, only one synonymous SNP was detected in sand rice DAR1 by population sequencing. These results demonstrate that the same HsfA1d and DAR1 haplotypes were shared by plants in the central deserts and that genetic divergence has mainly occurred between NM and the other populations. Both of the SNPs in HsfA1d and DAR1 located in sequences encoding inter-domain protein regions, hence it is difficult to predict the effect of these SNPs on sand rice adaptation based on knowledge of protein domain functions determined in Arabidopsis. The non-synonymous SNP in HsfA1d may compromise the trimer formation (HsfA1a, b, and d), resulting in ineffective activation of the transcription network, which would be consistent with the reduced heat tolerance phenotype associated with NM plants (Zhao et al., 2014).

LHY and ELF3 are important regulators of the circadian rhythm (Hicks et al., 1996; Nusinow et al., 2011; Hsu and Harmer, 2014). The eastern sandy population (NM) harbored specific LHY and ELF3 alleles which were separated from other populations. These results suggest that variants in these two genes could confer some fitness advantage, such as early flowering, in the semi-arid environment. Furthermore, two transition SNPs of LHY, were shared between JB and NM populations. This is congruent with the speculation that the sand rice colonization pathway originated from the Gurbantunggut desert and subsequently dispersed into other desert regions (Qian et al., 2016).

Statements

Ethics statement

Author contributions

PZ conceived and designed the project. JZ prepared the RNA for sequencing. CQ and X-FM supplied the population DNA samples. QZ and XZ helped to perform experiments. PZ conducted the experiments, analyzed the results, and wrote the manuscript and GC revised the manuscript.

Funding

This work was granted by one of National Basic Research Program of China (973 Program, 2013CB429904), by National Natural Science Foundation of China (No. 41201048) and by Natural Science Foundation of Gansu Province, China (No. 1208RJZA244).

Acknowledgments

We thank Dr. Lirong Wang for her help on the seed sampling and sand rice growth.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary material

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpls.2017.00536/full#supplementary-material

References

1
Alonso-BlancoC.AartsM. G.BentsinkL.KeurentjesJ. J.ReymondM.VreugdenhilD.et al (2009). What has natural variation taught us about plant development, physiology, and adaptation?Plant Cell211877–1896.10.1105/tpc.109.068114
- CrossRef
- Google Scholar
2
BandeltH. J.ForsterP.RöhlA. (1999). Median-joining networks for inferring intraspecific phylogenies.Mol. Biol. Evol.1637–48.
- Google Scholar
3
BokszczaninK. L.Solanaceae Pollen Thermotolerance Initial Training Network Consortium and FragkostefanakisS. (2013). Perspectives on deciphering mechanisms underlying plant heat stress response and thermotolerance.Front. Plant Sci.4:315. 10.3389/fpls.2013.00315
- CrossRef
- Google Scholar
4
ChenG.ZhaoJ.ZhaoX.ZhaoP.DuanR.NevoE.et al (2014). A psammophyte Agriophyllum squarrosum (L.) Moq.: a potential food crop.Genet. Resour. Crop Evol.61669–676. 10.1007/s10722-014-0083-8
- CrossRef
- Google Scholar
5
DobinA.DavisC. A.SchlesingerF.DrenkowJ.ZaleskiC.JhaS.et al (2013). STAR: ultrafast universal RNA-seq aligner.Bioinformatics2915–21. 10.1093/bioinformatics/bts635
- CrossRef
- Google Scholar
6
DoebleyJ.StecA.HubbardL. (1997). The evolution of apical dominance in maize.Nature386485–488.
- Google Scholar
7
DongY.WangY. Z. (2015). Seed shattering: from models to crops.Front. Plant Sci.6:476. 10.3389/fpls.2015.00476
- CrossRef
- Google Scholar
8
EddyS. R. (1998). Profile hidden Markov models.Bioinformatics14755–763. 10.1093/bioinformatics/14.9.755
- CrossRef
- Google Scholar
9
FinkaA.MattooR. U.GoloubinoffP. (2011). Meta-analysis of heat- and chemically upregulated chaperone genes in plant and human cells.Cell Stress Chaperones1615–31. 10.1007/s12192-010-0216-8
- CrossRef
- Google Scholar
10
FornaraF.de MontaiguA.CouplandG. (2010). SnapShot: control of flowering in Arabidopsis.Cell141550550.e1–2. 10.1016/j.cell.2010.04.024
- CrossRef
- Google Scholar
11
GaoQ. (2002). The “grass seed” is Agriophyllum squarrosum in Dunhuang manuscripts.J. Dunhuang Stud.4243–44.
- Google Scholar
12
GaoR.YangX.YangF.WeiL.HuangZ.WalckJ. L. (2014). Aerial and soil seed banks enable populations of an annual species to cope with an unpredictable dune ecosystem.Ann. Bot.114279–287. 10.1093/aob/mcu104
- CrossRef
- Google Scholar
13
GrabherrM. G.HaasB. J.YassourM.LevinJ. Z.ThompsonD. A.AmitI.et al (2011). Full-length transcriptome assembly from RNA-Seq data without a reference genome.Nat. Biotechnol.29644–U130. 10.1038/nbt.1883
- CrossRef
- Google Scholar
14
HallT. A. (1999). BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT.Nucleic Acids Symp. Ser.4195–98.
- Google Scholar
15
HancockA. M.BrachiB.FaureN.HortonM. W.JarymowyczL. B.SperoneF. G.et al (2011). Adaptation to Climate Across the Arabidopsis thaliana Genome.Science33483–86. 10.1126/science.1209244
- CrossRef
- Google Scholar
16
HenryR. J. (2014). Genomics strategies for germplasm characterization and the development of climate resilient crops.Front. Plant Sci.5:68. 10.3389/fpls.2014.00068
- CrossRef
- Google Scholar
17
HicksK. A.MillarA. J.CarréI. A.SomersD. E.StraumeM.Meeks-WagnerD. R.et al (1996). Conditional Circadian Dysfunction of the Arabidopsis early-flowering 3 Mutant.Science274790–792. 10.1126/science.274.5288.790
- CrossRef
- Google Scholar
18
HortonM. W.HancockA. M.HuangY. S.ToomajianC.AtwellS.AutonA.et al (2012). Genome-wide patterns of genetic variation in worldwide Arabidopsis thaliana accessions from the RegMap panel.Nat. Genet.44212–216. 10.1038/ng.1042
- CrossRef
- Google Scholar
19
HsuP. Y.HarmerS. L. (2014). Wheels within wheels: the plant circadian system.Trends Plant Sci.19240–249. 10.1016/j.tplants.2013.11.007
- CrossRef
- Google Scholar
20
HuangX.WeiX.SangT.ZhaoQ.FengQ.ZhaoY.et al (2010). Genome-wide association studies of 14 agronomic traits in rice landraces.Nat. Genet.42961–967. 10.1038/ng.695
- CrossRef
- Google Scholar
21
IwasakiM.PaszkowskiJ. (2014). Identification of genes preventing transgenerational transmission of stress-induced epigenetic states.Proc. Natl. Acad. Sci. U.S.A.1118547–8552. 10.1073/pnas.1402275111
- CrossRef
- Google Scholar
22
KesavanM.SongJ. T.SeoH. S. (2013). Seed size: a priority trait in cereal crops.Physiol. Plant.147113–120. 10.1111/j.1399-3054.2012.01664.x
- CrossRef
- Google Scholar
23
KumpK. L.BradburyP. J.WisserR. J.BucklerE. S.BelcherA. R.Oropeza-RosasM. A.et al (2011). Genome-wide association study of quantitative resistance to southern leaf blight in the maize nested association mapping population.Nat. Genet.43163–168. 10.1038/ng.747
- CrossRef
- Google Scholar
24
LaiJ.LiR.XuX.JinW.XuM.ZhaoH.et al (2010). Genome-wide patterns of genetic variation among elite maize inbred lines.Nat. Genet.421027–1030. 10.1038/ng.684
- CrossRef
- Google Scholar
25
LaskyJ. R.Des MaraisD. L.McKayJ. K.RichardsJ. H.JuengerT. E.KeittT. H. (2012). Characterizing genomic variation of Arabidopsis thaliana: the roles of geography and climate.Mol. Ecol.215512–5529. 10.1111/j.1365-294X.2012.05709.x
- CrossRef
- Google Scholar
26
LiN.LiY. (2015). Maternal control of seed size in plants.J. Exp. Bot.661087–1097. 10.1093/jxb/eru549
- CrossRef
- Google Scholar
27
LiN.LiY. (2016). Signaling pathways of seed size control in plants.Curr. Opin. Plant Biol.3323–32. 10.1016/j.pbi.2016.05.008
- CrossRef
- Google Scholar
28
LibradoP.RozasJ. (2009). DnaSP v5: a software for comprehensive analysis of DNA polymorphism data.Bioinformatics251451–1452. 10.1093/bioinformatics/btp187
- CrossRef
- Google Scholar
29
LiuZ.YanQ.BaskinC. C.MaJ. (2006). Burial of canopy-stored seeds in the annual psammophyte Agriophyllum squarrosum Moq. (Chenopodiaceae) and its ecological significance.Plant Soil28871–80. 10.1007/s11104-006-9090-7
- CrossRef
- Google Scholar
30
LobellD. B.GourdjiS. M. (2012). The influence of climate change on global crop productivity.Plant Physiol.1601686–1697. 10.1104/pp.112.208298
- CrossRef
- Google Scholar
31
LobellD. B.SchlenkerW.Costa-RobertsJ. (2011). Climate trends and global crop production since 1980.Science333616–620. 10.1126/science.1204531
- CrossRef
- Google Scholar
32
MayesS.MassaweF. J.AldersonP. G.RobertsJ. A.Azam-AliS. N.HermannM. (2012). The potential for underutilized crops to improve security of food production.J. Exp. Bot.631075–1079. 10.1093/jxb/err396
- CrossRef
- Google Scholar
33
McCouchS.BauteG. J.BradeenJ.BramelP.BrettingP. K.BucklerE.et al (2013). Feeding the future.Nature49923–24.
- Google Scholar
34
McKennaA.HannaM.BanksE.SivachenkoA.CibulskisK.KernytskyA.et al (2010). The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.Genome Res.201297–1303. 10.1101/gr.107524.110
- CrossRef
- Google Scholar
35
McNallyK. L.ChildsK. L.BohnertR.DavidsonR. M.ZhaoK.UlatV. J.et al (2009). Genomewide SNP variation reveals relationships among landraces and modern varieties of rice.Proc. Natl. Acad. Sci. U.S.A.10612273–12278. 10.1073/pnas.0900992106
- CrossRef
- Google Scholar
36
Mitchell-OldsT.SchmittJ. (2006). Genetic mechanisms and evolutionary significance of natural variation in Arabidopsis.Nature441947–952.10.1038/nature04878
- CrossRef
- Google Scholar
37
NouriM. Z.MoumeniA.KomatsuS. (2015). Abiotic stresses: insight into gene regulation and protein expression in photosynthetic pathways of plants.Int. J. Mol. Sci.1620392–20416. 10.3390/ijms160920392
- CrossRef
- Google Scholar
38
NusinowD. A.HelferA.HamiltonE. E.KingJ. J.ImaizumiT.SchultzT. F.et al (2011). The ELF4-ELF3-LUX complex links the circadian clock to diurnal control of hypocotyl growth.Nature475398–402. 10.1038/nature10182
- CrossRef
- Google Scholar
39
OhamaN.KusakabeK.MizoiJ.ZhaoH.KidokoroS.KoizumiS.et al (2015). The transcriptional cascade in the heat stress response of Arabidopsis is strictly regulated at the expression levels of transcription factors.Plant Cell28181–201. 10.1105/tpc.15.00435
- CrossRef
- Google Scholar
40
OhamaN.SatoH.ShinozakiK.Yamaguchi-ShinozakiK. (2016). Transcriptional regulatory network of plant heat stress response.Trends Plant Sci.2253–65. 10.1016/j.tplants.2016.08.015
- CrossRef
- Google Scholar
41
PavyN.GagnonF.DeschenesA.BoyleB.BeaulieuJ.BousquetJ. (2016). Development of highly reliable in silico SNP resource and genotyping assay from exome capture and sequencing: an example from black spruce (Picea mariana).Mol. Ecol. Resour.16588–598. 10.1111/1755-0998.12468
- CrossRef
- Google Scholar
42
PourkheirandishM.HenselG.KilianB.SenthilN.ChenG.SameriM.et al (2015). Evolution of the grain dispersal system in barley.Cell162527–539. 10.1016/j.cell.2015.07.002
- CrossRef
- Google Scholar
43
QianC.YinH.ShiY.ZhaoJ.YinC.LuoW.et al (2016). Population dynamics of Agriophyllum squarrosum, a pioneer annual plant endemic to mobile sand dunes, in response to global climate change.Sci. Rep.6:26613. 10.1038/srep26613
- CrossRef
- Google Scholar
44
R Development Core Team (2008). R: A Language and Environment for Statistical Computing.Vienna: R Foundation for Statistical Computing.
- Google Scholar
45
ScharfK. D.BerberichT.EbersbergerI.NoverL. (2012). The plant heat stress transcription factor (Hsf) family: structure, function and evolution.Biochim. Biophys. Acta1819104–119. 10.1016/j.bbagrm.2011.10.002
- CrossRef
- Google Scholar
46
Silva-JuniorO. B.RosadoT. B.LaviolaB. G.PappasM. R.PappasG. J.GrattapagliaD. (2011). Genome-wide SNP discovery from a pooled sample of accessions of the biofuel plant Jatropha curcas based on whole-transcriptome Illumina resequencing.BMC Proc.5(Suppl. 7):57. 10.1186/1753-6561-5-s7-p57
- CrossRef
- Google Scholar
47
SongX.-J.HuangW.ShiM.ZhuM.-Z.LinH.-X. (2007). A QTL for rice grain width and weight encodes a previously unknown RING-type E3 ubiquitin ligase.Nat. Genet.39623–630.
- Google Scholar
48
TeichmannT.MuhrM. (2015). Shaping plant architecture.Front. Plant Sci.6:233. 10.3389/fpls.2015.00233
- CrossRef
- Google Scholar
49
TesterM.LangridgeP. (2010). Breeding technologies to increase crop production in a changing world.Science327818–822. 10.1126/science.1183700
- CrossRef
- Google Scholar
50
TobeK.ZhangL.OmasaK. (2005). Seed germination and seedling emergence of three annuals growing on desert sand dunes in China.Ann. Bot.95649–659. 10.1093/aob/mci060
- CrossRef
- Google Scholar
51
van HeerwaardenJ.DoebleyJ.BriggsW. H.GlaubitzJ. C.GoodmanM. M.de Jesus Sanchez GonzalezJ.et al (2011). Genetic signals of origin, spread, and introgression in a large sample of maize landraces.Proc. Natl. Acad. Sci. U.S.A.1081088–1092. 10.1073/pnas.1013011108
- CrossRef
- Google Scholar
52
VerhageL.AngenentG. C.ImminkR. G. H. (2014). Research on floral timing by ambient temperature comes into blossom.Trends Plant Sci.19583–591. 10.1016/j.tplants.2014.03.009
- CrossRef
- Google Scholar
53
WheelerT.von BraunJ. (2013). Climate change impacts on global food security.Science341508–513. 10.1126/science.1239402
- CrossRef
- Google Scholar
54
WuY.LiX.XiangW.ZhuC.LinZ.WuY.et al (2012). Presence of tannins in sorghum grains is conditioned by different natural alleles of Tannin1.Proc. Natl. Acad. Sci. U.S.A.10910281–10286. 10.1073/pnas.1201700109
- CrossRef
- Google Scholar
55
XiaoD.ZhaoJ. J.HouX. L.BasnetR. K.CarpioD. P. D.ZhangN. W.et al (2013). The Brassica rapa FLC homologue FLC2 is a key regulator of flowering time, identified through transcriptional co-expression networks.J. Exp. Bot.644503–4516. 10.1093/jxb/ert264
- CrossRef
- Google Scholar
56
YinC.QianC.ChenG.YanX.MaX. F. (2016). The influence of selection of ecological differentiation to the phenotype polymorphism of Agriophyllum squarrosum.J. Desert Res.36364–373.
- Google Scholar
57
ZhaoP.Capella-GutierrezS.ShiY.ZhaoX.ChenG.GabaldonT.et al (2014). Transcriptomic analysis of a psammophyte food crop, sand rice (Agriophyllum squarrosum) and identification of candidate genes essential for sand dune adaptation.BMC Genomics15:872. 10.1186/1471-2164-15-872
- CrossRef
- Google Scholar
58
ZhaoP.ZhangJ.ZhaoX.ChenG.MaX. F. (2016). Different sets of post-embryonic development genes are conserved or lost in two Caryophyllales species (Reaumuria soongorica and Agriophyllum squarrosum).PLoS ONE11:e0148034. 10.1371/journal.pone.0148034.g001
- CrossRef
- Google Scholar
59
ZhengY.XieZ.YuY.JiangL.ShimizuH.RimmingtonG. M. (2005). Effects of burial in sand and water supply regime on seedling emergence of six species.Ann. Bot.951237–1245. 10.1093/aob/mci138
- CrossRef
- Google Scholar
60
ZouX.ShiC.AustinR. S.MericoD.MunhollandS.MarsolaisF.et al (2014). Genome-wide single nucleotide polymorphism and Insertion-Deletion discovery through next-generation sequencing of reduced representation libraries in common bean.Mol. Breed.33769–778. 10.1007/s11032-013-9997-7
- CrossRef
- Google Scholar

Summary

Keywords

sand rice, physiological adaptation, climate change, single nucleotide polymorphism, allele diversity, natural variation, candidate genes

Citation

Zhao P, Zhang J, Qian C, Zhou Q, Zhao X, Chen G and Ma X-F (2017) SNP Discovery and Genetic Variation of Candidate Genes Relevant to Heat Tolerance and Agronomic Traits in Natural Populations of Sand Rice (Agriophyllum squarrosum). Front. Plant Sci. 8:536. doi: 10.3389/fpls.2017.00536

Received

08 December 2016

Accepted

27 March 2017

Published

07 April 2017

Volume

8 - 2017

Edited by

Michael Deyholos, University of British Columbia, Canada

Reviewed by

Dongying Gao, University of Georgia, USA; Zhixi Tian, Institute of Genetics and Developmental Biology – Chinese Academy of Sciences, China

Updates

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Pengshan Zhao, zhaopengshan@lzb.ac.cn

This article was submitted to Plant Genetics and Genomics, a section of the journal Frontiers in Plant Science

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Plant Genetics and Genomics

ORIGINAL RESEARCH article

SNP Discovery and Genetic Variation of Candidate Genes Relevant to Heat Tolerance and Agronomic Traits in Natural Populations of Sand Rice (Agriophyllum squarrosum)

Abstract

Introduction