Identification and Analysis of NaHCO3 Stress Responsive Genes in Wild Soybean (Glycine soja) Roots by RNA-seq

Soil alkalinity is a major abiotic constraint to crop productivity and quality. Wild soybean (Glycine soja) is considered to be more stress-tolerant than cultivated soybean (G. max), and has considerable genetic variation for increasing alkalinity tolerance of soybean. In this study, we analyzed the transcriptome profile in the roots of an alkalinity tolerant wild soybean variety N24852 at 12 and 24 h after 90 mM NaHCO3 stress by RNA-sequencing. Compared with the controls, a total of 449 differentially expressed genes (DEGs) were identified, including 95 and 140 up-regulated genes, and 108 and 135 down-regulated genes at 12 and 24 h after NaHCO3 treatment, respectively. Quantitative RT-PCR analysis of 14 DEGs showed a high consistency with their expression profiles by RNA-sequencing. Gene Ontology (GO) terms related to transcription factors and transporters were significantly enriched in the up-regulated genes at 12 and 24 h after NaHCO3 stress, respectively. Nuclear factor Y subunit A transcription factors were enriched at 12 h after NaHCO3 stress, and high percentages of basic helix-loop-helix, ethylene-responsive factor, Trihelix, and zinc finger (C2H2, C3H) transcription factors were found at both 12 and 24 h after NaHCO3 stress. Genes related to ion transporters such as ABC transporter, aluminum activated malate transporter, glutamate receptor, nitrate transporter/proton dependent oligopeptide family, and S-type anion channel were enriched in up-regulated DEGs at 24 h after NaHCO3 treatment, implying their roles in maintaining ion homeostasis in soybean roots under alkalinity. Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis showed “phenylpropanoid biosynthesis” and “phenylalanine metabolism” pathways might participate in soybean response to alkalinity. This study provides a foundation to further investigate the functions of NaHCO3 stress-responsive genes and the molecular basis of soybean tolerance to alkalinity.

Soil alkalinity is a major abiotic constraint to crop productivity and quality. Wild soybean (Glycine soja) is considered to be more stress-tolerant than cultivated soybean (G. max), and has considerable genetic variation for increasing alkalinity tolerance of soybean. In this study, we analyzed the transcriptome profile in the roots of an alkalinity tolerant wild soybean variety N24852 at 12 and 24 h after 90 mM NaHCO 3 stress by RNA-sequencing. Compared with the controls, a total of 449 differentially expressed genes (DEGs) were identified, including 95 and 140 up-regulated genes, and 108 and 135 down-regulated genes at 12 and 24 h after NaHCO 3 treatment, respectively. Quantitative RT-PCR analysis of 14 DEGs showed a high consistency with their expression profiles by RNA-sequencing. Gene Ontology (GO) terms related to transcription factors and transporters were significantly enriched in the up-regulated genes at 12 and 24 h after NaHCO 3 stress, respectively. Nuclear factor Y subunit A transcription factors were enriched at 12 h after NaHCO 3 stress, and high percentages of basic helix-loop-helix, ethylene-responsive factor, Trihelix, and zinc finger (C2H2, C3H) transcription factors were found at both 12 and 24 h after NaHCO 3 stress. Genes related to ion transporters such as ABC transporter, aluminum activated malate transporter, glutamate receptor, nitrate transporter/proton dependent oligopeptide family, and S-type anion channel were enriched in up-regulated DEGs at 24 h after NaHCO 3 treatment, implying their roles in maintaining ion homeostasis in soybean roots under alkalinity. Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis showed "phenylpropanoid biosynthesis" and "phenylalanine metabolism" pathways might participate in soybean response to alkalinity. This study provides a foundation to further investigate the functions of NaHCO 3 stress-responsive genes and the molecular basis of soybean tolerance to alkalinity.
Keywords: alkalinity, differentially expressed gene (DEG), Gene Ontology (GO) enrichment analysis, RNA-seq, ion transporter, wild soybean INTRODUCTION Salt-affected soils are major abiotic constraints to crop yield and agricultural sustainability, and can be classified into two main categories: saline and alkaline (Tuyen et al., 2010;Xu and Tuyen do, 2012). Different from salinity, alkalinity is mainly caused by sodium bicarbonate (NaHCO 3 ) or sodium carbonate (Na 2 CO 3 ), which affects seed germination, plant growth and productivity by the presence of excess Na + , combined with HCO − 3 , CO 2− 3 , high pH (>8.5), and poor soil structure. The Food and Agriculture Organization/United Nations Educational, Scientific and Cultural Organization (FAO/UNESCO) showed that alkaline soils cover an area of 434 million hectares (ha) worldwide (FAO, 2000;Munns, 2005). North America and Latin America has 14.5 and 50.9 million ha alkalinity soils, respectively (FAO, 2000;Tuyen et al., 2010). The Song-Nen Plain, a major soybean cultivation region in northeastern China, is one of the top three major contiguous saline-alkali-affected areas in the world, which has an estimated area of 3.73 million ha alkaline soils, and the size of which is increasing by 1.4% annually (Ge et al., 2010;Zhang L.M. et al., 2013). Therefore increasing crop tolerance to alkalinity is essential for food security.
Extensive investigations have been devoted to determine how plants respond to salinity/alkalinity. High concentrations of toxic ions in soils, such as Na + or sometimes Cl − , HCO − 3 , or CO 2− 3 , cause an increase in external osmotic pressure and ion imbalance in plants (Serrano and Rodriguez-Navarro, 2001;Zhu, 2001). There is evidence that high soil pH (>8.5) imposes adverse effect on roots, affecting nutrient uptake, disturbing organic acids balance, distribution and accumulation of inorganic ions, especially disrupting cellular pH stability (Chen et al., 2009;Yang et al., 2009;Zhang L.M. et al., 2013;. It has been proposed that there are three major mechanisms for plant salinity tolerance: tolerance to osmotic stress, sodium (Na + ) exclusion from leaf blades, and tissue tolerance (Munns and Tester, 2008). Osmotic stress inhibits plant growth and causes stomatal closure. An osmotic stress tolerant plant shows less reduction in shoot growth and greater stomatal conductance (Munns and Tester, 2008). Many compatible solutes (such as glycine betaine, proline, sugars, and polyols) are accumulated in plants under salinity, and are essential to balance the osmotic pressure of the cytoplasm (Munns, 2005). Ion exclusion means plants do not accumulate toxic concentration of Na + (Serrano and Rodriguez-Navarro, 2001;Zhu, 2003). Several gene families such as SOS and HKT are involved in ion exclusion mechanisms (Zhu, 2001;Munns, 2005). Tissue tolerance represents a compartmentalization of ions at the cellular level by sequestration of Na + in vacuoles to enhance tolerance to high concentration of ions. Two key genes, AtNHX1 and AtAVP1, are involved in the movement of Na + into vacuoles, affecting tissue tolerance of Arabidopsis (Gaxiola et al., 1999(Gaxiola et al., , 2001Pardo et al., 2006). Under alkaline stress, alkalinity-tolerant plants can uptake nutrients such as iron more efficiently than sensitive plants (Munns, 2005;Peiffer et al., 2012;Xu and Tuyen do, 2012). However, the details about genes and mechanisms of plant tolerance to alkalinity are largely unknown.
RNA-seq has several advantages such as no requirement of prior genome sequence information, higher throughput, wider range of expression levels, and less noise. Therefore it can be used for both model and non-model plant species. RNA-seq has been widely used for transcriptome analysis of crop response to salinity stress, such as wheat , rice (Zhou et al., 2016), maize (Zhang L.M. et al., 2013), cotton (Yao et al., 2011), and oilseed rape (Yong et al., 2014). However, fewer studies on RNAseq analysis of crops response to alkalinity have been reported so far (Fan et al., 2013;Zhang L.M. et al., 2013;DuanMu et al., 2015).
Soybean is one of the most important oilseed crops worldwide, which is rich in vegetable oil, protein and nutraceutical compounds such as isoflavones and saponins (Lam et al., 2010;Schmutz et al., 2010;Korir et al., 2013;Krishnamurthy et al., 2015). It is widely used for human food, animal feed, and industrial products. In ancient China ∼6,000-9,000 years ago, farmers used wild soybean (Glycine soja) to select domesticated soybean (Carter et al., 2004;Zhou et al., 2015). Cultivated soybean (G. max) has much lower genetic diversity than their wild progenitor (Hyten et al., 2006;Lam et al., 2010;Qi et al., 2014). Wild soybean is generally more tolerant and adapted to biotic and abiotic stress conditions than cultivated soybean. Both species have chromosomes 2n = 40 and can be crossed to generate viable, fertile offspring (Xu and Gai, 2003;Ge et al., 2010;Kim et al., 2010;Tuyen et al., 2010). The genomes of both species have been sequenced (Kim et al., 2010;Lam et al., 2010;Schmutz et al., 2010;Li et al., 2014;Zhou et al., 2015), providing useful information for functional genomics studies. However, transcriptome analysis of soybean in response to alkalinity by RNA-seq was limited.
Previous studies on expression profiles of soybean in response to alkaline stress were carried out using hydroponic solution with 50 mM NaHCO 3 (Ge et al., 2010(Ge et al., , 2011, which provides a general knowledge of soybean response to alkalinity. Root is the primary tissue in soil encountering alkaline stress, therefore in this research, we investigated the transcriptome profiles in the roots of an alkalinity-tolerant wild soybean variety, N24852, using quartz sand culture medium subjected to higher concentration of NaHCO 3 (90 mM), which is more similar to the alkalinity in the field. The aim of this study was to find differentially expressed genes (DEGs), metabolic pathways, and overall transcriptional regulation of soybean response to early stage of NaHCO 3 stress, which would broaden our understanding of the molecular and regulatory mechanisms of plant response to alkaline stress, and to identify candidate genes that could be utilized to improve soybean tolerance to alkalinity in future breeding programs.

Plant Growth and NaHCO 3 Stress Treatment
We identified an alkalinity-tolerant wild soybean variety N24852 from a preliminary screening of 129 soybean varieties (our unpublished data), which showed more alkalinity-tolerant than 95% of the 129 varieties. N24852 is tolerant to biotic and abiotic stresses  and its alkalinity tolerance was confirmed in this study (Supplementary Figure S1). A widely used salinity-tolerant (but moderately sensitive to alkalinity) cultivated soybean variety Lee 68 (Abel and Mackenzie, 1964), was used as a genotype control to evaluate the alkalinity tolerance of N24852. These two varieties were grown in a growth chamber (E-41HO, Percival, USA) with 60% relative humidity, 15 h light (50000 lx)/9 h darkness photoperiod and corresponding temperature regime of 28 • C/24 • C. These two varieties showed obvious phenotypic difference when using 90 mM NaHCO 3 treatment, but only showed subtle difference at 80 mM NaHCO 3 and no difference at 100 mM NaHCO 3 treatment in the preliminary experiment (Supplementary Figure S1). Therefore, 90 mM NaHCO 3 was used for RNA-seq study.
Soybean seeds were surface-sterilized with 1% sodium hypochlorite for 30 s, and rinsed three times with deionized water. Twenty seeds were sown in plastic pots ( 10 × 8 cm) which filled with clean quartz sand. Seven days after germination, the seedlings were thinned to four plants per pot. Four pots were placed in a plastic container (34 cm × 24 cm × 8 cm) containing 1.5 L fresh 1/2 strength Hoagland nutrient solution (pH ≈ 6.5), which could be absorbed through small holes at the bottom, and the solution was changed every 2 days. When the second trifoliolate leaves appeared (V3, approximately 14 days after planting), a treatment solution containing 90 mM NaHCO 3 (with a final pH of 8.5, adjusted by addition of KOH) was applied to induce alkaline stress. The control contained no NaHCO 3 . The experiment included three independent biological replications.

Tissue Harvest
Before harvest, roots were dipped into an iso-osmotic solution of 10 mM Ca(NO 3 ) 2 for 10-20 s to avoid turgor loss and rapid efflux of ions from the apoplast and epidermal cells due to osmotic shock (Munns et al., 2010), and rinsed three times with deionized water. Root tips (3 cm) of NaHCO 3 -treated and control plants were harvested after 12 and 24 h treatment, then immediately frozen in liquid nitrogen and stored at −80 • C for RNA isolation. The remaining root tissues and leaves from the stressed and control plants were harvested at the same time points for measurement of ion concentration. For determination of ion concentrations, samples were pretreated at 105 • C for 30 min to deactivate enzymes and dried at 80 • C for 5 days. In order to minimize variation between individual plants, bulked roots and leaves were harvested from four uniform plants in a single pot for each biological replicate.

Measurement of Ion Concentrations
Methods for ion extraction and measurement followed Munns et al. (2010). Dried samples were ground into powder. Then 0.1 g of the powder was digested in 2 mL nitric acid (HNO 3 ) in a microwave digestion system (ETHOS T, Milestone, Italy) following the manufacturer's recommendations. Na + , K + , and Ca 2+ concentrations were estimated using an Inductively Coupled Plasma Optical Emission Spectrometer (ICP-OES) Optima 8000 (PerkinElmer, USA) according to the manufacturer's instructions. The mg/L values were converted to mg/g dry weight (DW) using the formula of (concentration in mg/L as measured × dilution factor × volume of dilute nitrate acid)/(DW of tissue used in extraction × 1000). Differences between two soybean genotypes were compared by t-tests using the SAS proc ttest.

RNA Isolation, cDNA Library Construction and RNA-seq
Roots of wild soybean N24852 under 90 mM NaHCO 3 and control at 12 and 24 h were harvested for RNA extraction and RNA-seq. A total of 12 samples (2 time points × 2 treatments × 3 biological replicates) were prepared for RNAseq. Total RNA was extracted using Trizol R reagent (Invitrogen, USA). Total concentration and quality of the RNA samples were determined by agarose gel electrophoresis and a NanoDrop 2000 Spectrophotometer (Thermo Fisher Scientific, USA). An Agilent 2100 Bioanalyzer RNA Nano chip (Agilent Technologies, USA) was used for accurate quantification.
Messenger RNA (mRNA) was isolated from the total RNA by magnetic beads coated with oligo (dT) using a Dynabeads mRNA DIRECT Kit (Invitrogen, USA). Approximately 5 µg of mRNA from each sample was prepared to construct a library. The mRNA was fragmented into small pieces with fragmentation buffer at 90 • C. Purified mRNA fragments were reverse transcribed (Invitrogen, USA) with random hexamer adaptors to synthesize the first strand cDNA. Second strand cDNA was synthesized with RNaseH (Invitrogen, USA) and DNA polymerase I (Invitrogen, USA). The cDNA was cleaned using Agencourt Ampure XP SPRI beads (Beckman Coulter, USA). The synthesized cDNAs were appended with an ' A' base at the 3 -end and ligated with pairedend adaptors. Paired-end cDNA libraries were amplified using PCR, and fragments of 400-500 bp insertions were selected from 2% agarose gel electrophoresis separation and quantified using qPCR. The twelve cDNA libraries were sequenced on an Illumina HiSeq 2500 platform (Illumina Inc. USA) according to the manufacturer's recommendations at Berry Genomics Company, Beijing, China.

Sequencing Data Analysis
The sequence data (in FastQ format) have been submitted to the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) databases with the study accession SRP093892. The raw reads were cleaned by removing adaptor reads and low-quality reads (ambiguous sequences with 'N' percentage values ≤3 and the percentage of low-quality bases less than 3 is ≥50%) using an in-house script written in C. The clean data were used for subsequent data analysis. High-quality clean reads were mapped to the latest version (G. max Wm82.a2.v1) of soybean reference genome (Goodstein et al., 2012) downloaded from phytozome 1 , using Tophat2 (v2.0.13) software (Kim et al., 2013). Mismatches of less than or equal to 2 bp were allowed in mapping to reference sequences. Subsequently, the sequencing saturation of the library, coverage analysis of clean reads on reference genes, as well as the genomic distributions in CDS (exons), introns, and intergenic regions were analyzed.

Identification of Differentially Expressed Genes (DEGs)
Gene expression levels were normalized to Fragments Per Kilobase of transcript per Million fragments mapped (FPKM) by Cufflinks v2.2.2 software (Trapnell et al., 2010). DEGs were identified by comparing the NaHCO 3 -treated and control samples at the same time point using the R package DESeq (Anders and Huber, 2010). The significant DEGs were identified using the false discovery rate (FDR) ≤ 0.01 and | log 2 FoldChange| ≥ 1.

Functional Annotation and Pathway Analysis
All mapped unigenes and new genes found in this study were compared against seven databases including National Center for Biotechnology Information (NCBI) non-redundant protein database (NR), Gene Ontology (GO), Clusters of Orthologous Groups of proteins (COG) (Tatusov et al., 2000), euKaryotic Ortholog Groups (KOG) (Koonin et al., 2004), reviewed protein sequence database (Swiss-Prot) (Apweiler et al., 2004), and Kyoto Encyclopedia of Genes and Genomes (KEGG) Ontology (KO) (Kanehisa et al., 2004) using BLASTn (v 2.2.26) software (Altschul et al., 1999) with an E-value cutoff at 10 −5 , and searched against the Protein family (Pfam) database (Finn et al., 2014) by hmmscan (v 3.0) software (Johnson et al., 2010). All genes were also BLAST against the Plant Transcription Factors Database v3.0  with an E-value cutoff at 10 −5 . The WEGO software package (Ye et al., 2006) was used for describing GO functional classification of cellular component, molecular function and biological process. GO enrichment analyses of DEGs were performed using Singular Enrichment Analysis (SEA) method with P < 0.01 and FDR < 0.05 by agriGO (Du et al., 2010), and the newest soybean genome Wm82.a2.v1 was set as background. The hypergeometric Fisher exact test (P < 0.01) and Benjamini and Hochberg method (FDR < 0.05) was performed to detect statistically significant enrichment of KEGG pathway and transcription factors, in comparison with the whole soybean transcriptome as the background. DEGs were also blasted against the KOG databases 2 for functional classification.

Heat Map with Clustering Analysis
Heat map of the DEGs overlapped between two time points (12 and 24 h), were analyzed using heatmap.2 function of the R/Bioconductor package gplots with default options (Warnes, 2016). The gene expression levels were transformed by log 2 (FPKM+1) using three biological replications.

Quantitative Real-Time PCR (qRT-PCR)
To validate the RNA-seq gene expression patterns, 14 DEGs were selected and investigated by qRT-PCR using the same RNA samples for the RNA-seq library construction. We used the housekeeping gene GmUKN1 (Glyma.12G020500) as an internal control to normalize the level of gene expression (Hu et al., 2009;Guan et al., 2014). Primer pairs were designed for qRT-PCR using the online website NCBI primer-BLAST 3 . Primer sequences and gene annotations are listed in Supplementary Table S1. Firststrand cDNA was synthesized from 500 ng/10 µl total RNA by using a PrimeScript RT Reagent Kit (TaKaRa, Japan). qRT-PCRs were performed in a LightCycler R 480 II (Roche, Germany) in a final volume of 25 µl containing 12.5 µl SYBR Premix Ex Taq II (Tli RNaseH Plus) (TaKaRa, Japan), 2 µl (200 ng) of cDNA, 1 µl (10 mM) of the forward and reverse primers, and 8.5 µl of ddH 2 O. PCR conditions were set at 95 • C pre-denaturation for 3 min and followed by 40 cycles of 95 • C denaturation for 15 s, 60 • C annealing for 15 s, and 72 • C extension for 15 s. Relative gene expression level was calculated using the 2 − CT method (Livak and Schmittgen, 2001). All qRT-PCR were performed in three technical replicates. The correlation of RNA-seq data with qRT-PCR analysis was calculated using SAS (v9.3) proc corr based on log 2 fold changes.

Phenotypic and Physiological Responses of Wild Soybean N24852 to NaHCO 3 Stress
Maintenance of a low Na + concentration, and low ratios of Na + /K + and Na + /Ca 2+ are widely used as indices of plant salinity tolerance (Puranik et al., 2011;Yong et al., 2014;Tuyen et al., 2016). To evaluate the alkalinity tolerance of wild soybean N24852, the salt tolerant soybean variety Lee 68 was used for comparison. The wild soybean N24852 showed more tolerance to 90 mM NaHCO 3 stress than Lee 68, as indicated by the later appearance of chlorosis and wilting (Supplementary Figure S1). The concentration of Na + , ratios of Na + /K + and Na + /Ca 2+ in the roots of both soybean varieties increased after NaHCO 3 treatment (Figure 1). But the wild soybean N24852 showed significantly (P < 0.05, student's t-tests) lower concentration of Na + , ratios of Na + /K + and Na + /Ca 2+ in the roots than Lee68 under NaHCO 3 stress (Figure 1). And the Na + concentration, ratios of Na + /K + and Na + /Ca 2+ in the leaves of N24852 were also lower than those of Lee68 (Figure 1).

RNA-seq Data Output, Quality Assessment, Mapping and Annotation
Due to the lower Na + concentration in the roots of N24852 than that of Lee 68 (Figure 1) was observed as early as 12 h after alkalinity, RNA-seq analyses of N24852 roots after 12 and 24 h of NaHCO 3 stress in comparison with control were performed on three independent biological replicates. A total of 12 cDNA libraries were constructed and sequenced on Illumina HiSeq 2500, and 283.6 million raw reads were generated (Supplementary Table S2). After removing adaptors and low quality sequences, a total of 270.5 million (95.38% of the raw reads) clean reads (approximately 67.63 Gb clean data) were obtained. On average, 23.6 million clean reads (5.64 Gb clean FIGURE 1 | Ion concentrations and ratios in N24852 and Lee 68 under control (CK) and alkaline stress for 12 and 24 h. (A) Na + concentration in roots, (B) Na + /K + ratio in roots, (C) Na + /Ca 2+ ratio in roots. (D) Na + concentration in leaves, (E) Na + /K + ratio in leaves, (F) Na + /Ca 2+ ratio in leaves. Alkaline stress was 90 mM NaHCO 3 (pH = 8.5) treatment, and control was 0 mM NaHCO 3 (pH = 6.5). * and * * indicate significant difference at 0.05 and 0.01 level by student's t-tests between two soybean genotypes, respectively. DW, dry weight. data) were obtained from each sample (Supplementary Table S2). The percentages of Phred-like quality scores at the Q30 level (an error probability of 1 ) ranged from 87.24 to 94.45% and the average GC content was estimated as 46.43% (Supplementary  Table S2). Among the 12 samples, 88.16-91.69% of the clean reads were mapped to the reference genome, and 89.85-96.60% of clean reads were uniquely mapped (Supplementary Table  S2). The saturation curves of 12 RNA-seq samples (genes with FPKM ≥ 0.01) estimated that the gene coverage started to show saturation when approximately more than 5 million clean reads were aligned (Supplementary Figure S2A). The average clean reads of our 12 samples were 22.54 million, which is more than the saturation threshold. Gene coverage analysis indicates that the sequencing reads were uniformly distributed from the 5 to 3 of genes (Supplementary Figure S2B). On average, more than 80% of the mapped reads were located at the exon region (Supplementary Figure S3). Detailed information of the RNA-seq data was listed in Supplementary Table S2, suggesting that the sequencing quality was high and sequencing depth was sufficient for transcriptome coverage.
A total of 54,844 unigenes were detected in our transcriptome by cufflinks program, including 54331 genes that were aligned to the soybean reference genome (96.94% of the total 56044 genes in Williams 82) and 513 new genes.
Among these, 54342 genes (including 53888 mapped unigenes and 454 new genes) were annotated by at least one of the seven databases (Supplementary Table S3). Taken together, 96.08% (54342) of the expressed genes were successfully annotated in at least one of databases, with 10.51% (5946) of genes annotated in all databases (Supplementary Table S3). The 54342 annotated genes are listed in Supplementary Table S4.

Identification of Differentially Expressed Genes (DEGs) in Soybean Roots under NaHCO 3 Stress
By using the criteria of FDR ≤ 0.01 and | log 2 FoldChange| ≥ 1, a total of 449 genes were differentially expressed in wild soybean roots under NaHCO 3 stress compared with control (Supplementary Table S5). As shown in the Venn diagram (Figure 2), there were 95 and 140 up-regulated genes and 108 and 135 down-regulated genes after 12 and 24 h of alkaline stress, respectively, with more DEGs detected after 24 h alkaline stress than 12 h, implying more transcriptional changes at 24 h. The numbers of up-and down-regulated genes were similar. Nine genes were up-regulated and 20 genes were down-regulated at both time points (Figures 2 and 3), suggesting their importance FIGURE 2 | Venn diagram of the differentially expressed genes (DEGs) between alkaline stress (90 mM NaHCO 3 , pH = 8.5) and control. A12h, 12 h of alkaline stress; A24h, 24 h of alkaline stress.
in soybean response to NaHCO 3 Stress. The overlapped upregulated genes (Figure 3) included two aluminum-activated malate transporter (ALMT) genes (Glyma.11G179100 and Glyma.12G094400) and one gene (Glyma.13G363300) encoding a late embryogenesis abundant protein (LEA). ALMT transporters have been reported to play roles in adaptation of plants to abiotic stress (Henderson et al., 2014), and LEA proteins accumulate in response to water deficit caused by drought, heat and salinity (Battaglia and Covarrubias, 2013;Kosova et al., 2014). No DEG showed opposite regulation patterns at two time points (Figure 2; Supplementary Table S5).

Functional Classification and Gene Ontology (GO) Enrichment Analysis of DEGs
By WEGO database, the DEGs were classified into 12, 9, 12 categories for cellular components, molecular functions, and biological processes, respectively, after 12 h of NaHCO 3 stress, and were classified into 12, 13, and 16 categories, respectively, at 24 h of NaHCO 3 treatment (Supplementary Figure S4). By KOG database, there were 97 and 138 DEGs with KOG functional classification information at 12 and 24 h of NaHCO 3 stress, respectively, and they were classified into 17 functional categories (Supplementary Figure S5).
We also performed GO enrichment analyses of DEGs using agriGO (P < 0.01, FDR < 0.05). At 12 h of alkaline stress, genes with transcription factor activity (GO:0003700) were significantly enriched in up-regulated genes (Figure 4A), while genes with peptidase activity (GO:0008233) and related to biological process of proteolysis (GO:0006508) were enriched in down-regulated genes (Supplementary Figure S6A). At 24 h of alkaline stress, genes with the molecular function of transporter activity (GO:0005215) and carbohydrate binding (GO:0030246), and involved in biological process of transport (GO:0006810) were significantly enriched in up-regulated genes (Figure 4B), while GO terms in molecular functions of "hydrolase activity (GO:0016787), " "transferase activity (GO:0016757), " and "oxidoreductase activity (GO:0016705), " biological process of "carbohydrate metabolic process (GO:0005975)" especially "cellular glucan metabolic process (GO:0006073)" were enriched in down-regulated genes (Supplementary Figure S6B). Consistent with the number of DEGs, the numbers of enriched GO terms at 24 h were greater than 12 h after alkaline stress.

KEGG Pathway Classification and Enrichment Analysis of DEGs
All DEGs were BLAST against the KEGG Ontology (KO) database. At 12 h of NaHCO 3 stress, 30 DEGs were classified into 29 biological pathways belonging to 13 KEGG categories (Supplementary Table S6). At 24 h after NaHCO 3 stress, 48 DEGs were classified into 37 biological pathways belonging to 12 KEGG categories (Supplementary Table S7).
We also used the Fisher exact test to analyze KEGG enrichment for DEGs (P < 0.01, FDR < 0.05). No enriched biological pathways were found at 12 h after NaHCO 3 stress, but enrichment of "phenylpropanoid biosynthesis" and "phenylalanine metabolism" was found at 24 h after 90 mM NaHCO 3 (pH = 8.5) treatment (Supplementary Figures S7 and S8). The genes involved in both pathways were peroxidase-encoding genes, including three up-regulated genes (Glyma.07G209900, Glyma.17G053000, and Glyma.20G169200) and one down-regulated gene (Glyma.02G171600). As one of the important antioxidant enzymes, peroxidases (PODs) could protect plants from oxidative damage by scavenging of reactive oxygen species (ROS), which is induced by various abiotic and biotic stresses (Miller et al., 2008;Gill and Tuteja, 2010).

Validation of RNA-seq Data by qRT-PCR Analysis
To validate the expression data of RNA-seq, we selected 14 DEGs for qRT-PCR analysis, including four POD-encoding genes, three up-regulated genes and two down-regulated genes at both 12 and 24 h, and five genes that were up-or down-regulated at only one time point (Figure 5; Supplementary Table S1). To compare the expression data between RNA-seq and qRT-PCR, the relative expression level was transformed to log 2 fold change. The qRT-PCR results showed a high consistency (linear regression equation y = 0.9142x + 0.0851, R 2 = 0.9763) with RNA-seq data (Figure 5), indicating the reliability of RNA-seq expression profile in this study.

DEGs Related to Transcription Factors
Transcription factors are essential for regulation of gene expression, by binding to the specific cis-acting elements in the genes that they regulate. A total of 130 and 173 transcription factors representing 36 and 38 different families were differentially expressed at 12 and 24 h, respectively, under NaHCO 3 stress comparing with control in wild soybean roots FIGURE 3 | Heatmap of the overlap DEGs between two time points (12 and 24 h). Heatmap was plotted using heatmap.2 function of the R/Bioconductor package gplots. Hierarchical clustering of the DEGs was done by complete method with Euclidean distance. The gene expression levels were transformed by log 2 (FPKM+1) and the values were centered and scaled in row direction. X-axis, samples; Y -axis, differentially expressed gene names. (Figure 6). Among the differentially expressed transcription factors, high percentages of basic helix-loop-helix (bHLH), ethylene-responsive factor (ERF), Trihelix, and zinc finger (C2H2, C3H) families were found at both time points. Nuclear Factor Y subunit A (NF-YA) family was enriched at 12 h after NaHCO 3 stress. These transcription factors have already been reported to play roles in plant response to abiotic stress (Ge et al., 2010;Kielbowicz-Matuk, 2012;Li et al., 2013;Yong et al., 2014;Dey and Vlot, 2015). No transcription factor family was enriched at 24 h after NaHCO 3 stress.

DEGs Related to Ion Transport
ABC transporter (Lee et al., 2004;Henderson et al., 2014), ALMT family (Barbier-Brygoo et al., 2011;Henderson et al., 2014), glutamate receptor (GLR) (Demidchik et al., 2002), nitrate transporter (NRT)/proton dependent oligopeptide (POT) family Barbier-Brygoo et al., 2011;Chen et al., 2012), and S-type anion channel (SLAH) (Negi et al., 2008;Barbier-Brygoo et al., 2011) have been reported to play roles in ion transport, which indicates their importance in maintaining ion homeostasis in plant response to NaHCO 3 stress. Nine transporter genes were differentially expressed at 12 h after NaHCO 3 stress ( Table 1). At 24 h after NaHCO 3 treatment, the up-regulated genes were enriched in transport related genes as shown by GO enrichment analysis ( Figure 4B). Twenty-four transporter genes were up-regulated and three transporter genes were down-regulated at 24 h after NaHCO 3 stress (Table 1), including the genes encoding ABC transporter, ALMT, ammonium transporter, aquaporin transporter, cation efflux family, GLR, NRT/POT family, SLAH, sulfate transporter, and zinc/iron transporter. Our results indicate the importance of these transporter genes in maintaining ion homeostasis in soybean response to NaHCO 3 stress.

DISCUSSION
The alkalinity-tolerant wild soybean variety N24852 showed lower Na + concentration, and lower ratios of Na + /K + and Na + /Ca 2+ in leaves and roots under NaHCO 3 stress in this study. This was similar to the previously reported salt tolerance phenotype of G. max (Tuyen et al., 2016), Brassica napus line N119 (Yong et al., 2014), Oryza sativa line FL478 (Walia et al., 2005), and Foxtail millet cv. Prasad (Puranik et al., 2011), all of which maintained lower Na + concentrations, and ratios of Na + /K + and Na + /Ca 2+ compared with sensitive lines. For RNA sequencing, we used clean quartz as the cultivation medium to mimic the alkalinity in fields, and set up controls at each time point to identify DEGs between NaHCO 3 stress and control, avoiding interference from the development-related DEGs.
Gene Ontology enrichment analyses of the DEGs showed that different GO terms were enriched for different time points. For example, genes with transcription factor activity (GO: 0003700) were significantly enriched in up-regulated genes at 12 h of alkaline stress (Figure 4A), while genes with transporter activity (GO: 0005215) were significantly enriched in up-regulated genes at 24 h of alkaline stress ( Figure 4B). Transcription factors play important roles in plant response to abiotic stresses. NF-YA transcription factors were enriched at 12 h after NaHCO 3 stress in this study (Figure 6). AtNF-YA1 plays a role in regulating post-germination growth arrest under salt stress in Arabidopsis (Li et al., 2013). In addition, high percentages of bHLH, ERF, Trihelix, and zinc finger (C2H2, C3H) transcription factors were found at both 12 and 24 h after alkaline stress in our result (Figure 6). A great percentage of bHLH transcription factor family has been found in the transcriptome profiles of wild soybean roots (Ge et al., 2010) and leaves (Ge et al., 2011) under NaHCO 3 treatment, and was enriched in the transcriptome of oilseed rape roots and leaves under NaCl treatment (Yong et al., 2014). AP2/ERF transcription factors have been shown to play roles in plant response to abiotic stress (Mizoi et al., 2012;Dey and Vlot, 2015) including salt stress (Wang L.Q. et al., 2015), and overexpression of ERF1 in Arabidopsis thaliana enhanced plant tolerance to salt stress (Cheng et al., 2013). C2H2 zinc finger proteins are important components in regulation of plant tolerance to biotic and abiotic stresses (Kielbowicz-Matuk, 2012). Over-expression of a zinc finger protein gene IbZFP1 from sweet potato increased salt and drought tolerance in transgenic Arabidopsis (Wang F. et al., 2015).
ABC transporters might play roles in Na + and K + homeostasis in Arabidopsis and Cl − transport in grapevine (Lee et al., 2004;Henderson et al., 2014). Three ABC transporter genes were up-regulated at 24 h after NaHCO 3 stress in this study ( Table 1), suggesting their possible roles in mediating ion homeostasis of wild soybean under alkaline stress conditions. Among the nine up-regulated DEGs overlapped between the two time points, there are two ALMT genes. The ALMT gene family not only chelates aluminum in the plant rhizosphere, but also has multiple roles in plant adaptation to abiotic stress such as transport of anions involved in mineral nutrition and ion homeostasis processes (Henderson et al., 2014). Three ALMT1 homolog genes changed expression levels in grapevine roots under 50 mM Cl − conditions (Henderson et al., 2014). A gene (Glyma.13G093300) encoding GLR was up-regulated at 24 h after NaHCO 3 stress (Table 1). GLR was predicted to have permeability to Na + and K + , and regulated by Ca 2+ (Demidchik et al., 2002). Four and five NRT/POT genes were differentially expressed at 12 and 24 h of NaHCO 3 stress, respectively ( Table 1). The NRT/POT family is involved in uptake of nitrate and various peptides (Tsay et al., 2007). AtNRT1.8 was up-regulated while AtNRT1.5 was down-regulated by salt and cadmium stress in Arabidopsis roots Chen et al., 2012). Two SLAH genes were up-regulated at 24 h of NaHCO 3 stress, which encode the SLAHs and might be involved in the transmembrane transfer of anions (Barbier-Brygoo et al., 2011). The guard-cell-specific expression of SLAC1/SLAH3 resulted in the dissipation of the overaccumulated osmoregulatory anions in the Arabidopsis slac1 mutant (Negi et al., 2008).
KEGG enrichment analysis of DEGs showed "phenylpropanoid biosynthesis" and "phenylalanine metabolism" pathways were enriched at 24 h after NaHCO 3 treatment (Supplementary Figures S7 and S8). A previous study using the Affymetrix R Soybean Genome Array (Ge et al., 2010) indicated that genes participating in the "phenylpropanoid biosynthesis" pathway were significantly up-regulated in wild soybean roots under NaHCO 3 stress. Transcriptomes of maize roots under Na 2 CO 3 treatment (Zhang L.M. et al., 2013), oilseed rape roots under NaCl treatment (Yong et al., 2014), and bermudagrass roots under NaCl treatment (Hu et al., 2015) also showed an enrichment in the "phenylpropanoid biosynthesis" pathway. These results suggest "phenylpropanoid biosynthesis" pathway might play an important role in plant response to alkalinity and salinity. The DEGs involved in both "phenylpropanoid biosynthesis" and "phenylalanine metabolism" pathways in this study were POD genes, including three up-regulated (Glyma.07G209900, Glyma.17G053000, Glyma.20G169200) and one down-regulated (Glyma.02G171600). DuanMu et al. (2015) showed that two POD genes were differentially expressed in wild soybean roots treated with NaHCO 3 . As one of the important antioxidant enzymes, PODs could protect plants from oxidative damage by scavenging of ROS. Activities of POD in salt-tolerant species Beta maritima were higher than salt-sensitive relative Beta vulgaris at 150 and 500 mM NaCl stress (Chen et al., 2009). "Phenylpropanoid biosynthesis" pathway showed that POD also have an important role in lignin synthesis. Lignin is a phenylpropanoid compound derived from phenylalanine. Lignin and suberin could deposit in the primary cell wall and form into the Casparian strip in vascular plant roots, which appears as a tight barrier that reduces non-selective apoplastic transport of toxic solutes into the stelar tissues under alkaline-salt conditions (Chen et al., 2011).
Recently, great progress has been made on soybean tolerance to salinity. Several independent studies consistently verified a major quantitative trait locus (QTL) on Chromosome 3 for soybean tolerance to NaCl stress (Chen et al., 2008;Hamwieh and Xu, 2008;Hamwieh et al., 2011;Ha et al., 2013). A cation/H + exchanger family gene, GmCHX1/GmSALT3/Ncl/ FIGURE 6 | Transcription factor families of the DEGs at 12 h (A) and 24 h (B) of NaHCO 3 treatment. A total of 130 and 173 transcription factors (TFs) were differentially expressed at 12 and 24 h after 90 mM NaHCO 3 (pH 8.5) stress, respectively, and the percentages of TF families in DEGs and all genes in soybean genome were shown as blue bars and red bars, respectively. * Significantly enriched TF family identified by hypergeometric Fisher exact test (P < 0.01) and Benjamini and Hochberg method (FDR < 0.05).
(Glyma03g32900), was identified in this genomic region, which regulated Na + , K + , and Cl − homeostasis in the shoot of soybean under NaCl stress, and the functional allele of this gene could improve soybean salt tolerance and yield under salinity stress (Guan et al., 2014;Qi et al., 2014;Tuyen et al., 2016). Guan et al. (2014) identified two salt-tolerant haplotypes and seven saltsensitive haplotypes in GmCHX1 by analyzing the SNP variation among 172 soybean accessions. Patil et al. (2016) identified three major structural variants and several SNPs in GmCHX1 based on the re-sequencing (15X) of 129 soybean accessions.
The transcript abundance of GmCHX1/GmSALT3/Ncl/ Glyma03g32900 increased after NaCl treatment in salt-tolerant soybean lines (Guan et al., 2014;Qi et al., 2014). In the new version of soybean genome reference sequence (Williams 82. a2.v1), Glyma03g32900 was assembled into two genes/transcript models, including Glyma.03G171600 and Glyma.03g171700 . In this study, these two genes/transcripts were not detected as a DEG at 12 or 24 h after 90 mM NaHCO 3 treatment. And the published alkalinity-tolerance QTL were mapped on Chromosome 17 (Tuyen et al., 2010(Tuyen et al., , 2013, which did not overlap with the major salt tolerance QTL on Chromosome 3 (Guan et al., 2014;Qi et al., 2014;Tuyen et al., 2016). These comparisons suggest that there might be different genes controlling soybean tolerance to alkalinity or salinity. The 68 candidate genes within the alkalinity-tolerance QTL on Chromosome 17 (Tuyen et al., 2013) were not detected as DEGs in this study, which might be due to different genes controlling alkalinity-tolerance in different soybean varieties (JWS156-1 for published QTL mapping study and N24852 for RNA-seq in this study). We did observe lower Na + concentration, lower ratios of Na + /K + and Na + /Ca 2+ in both roots and leaves of the wild soybean variety N24852 under NaHCO 3 stress (Figure 1), indicating that the alkalinity tolerance is primarily due to Na + exclusion from the roots, which leads to less Na + transport into the aerial part. In our analysis, genes related to ion transporters were enriched in NaHCO 3 stress-responsive genes, which might play a role to maintain ion homeostasis of wild soybean (N24852) under alkalinity. Further studies, such as co-localization of alkalinity tolerance QTL with alkalinityresponsive genes using the same soybean tolerant variety, and functional study of candidate alkalinity-tolerance genes, would help to elucidate the genes and mechanisms underlying alkalinity tolerance.
On the other hand, under alkaline stress or salt stresses, plants encounter both ionic and osmotic stresses. Therefore, there might be some overlapped responsive genes between alkaline and salt stress, as well as other stresses. We compared the expression patterns of the alkalinity-responsive genes encoding NF-YA transcription factors and transporters in this study (Supplementary Table S8) with the published RNA-seq data in soybean under salt (Belamkar et al., 2014), drought (Belamkar et al., 2014;Chen et al., 2016), flooding , potassium deficiency (Wang et al., 2012), and shade stresses (Gong et al., 2014). Among these 39 alkalinity-responsive genes, there are 20, 16, 8, 9, and 8 genes were also detected as DEGs in soybean response to salt, drought, flooding, potassium deficiency, and shade, respectively. Therefore, some NF-YA transcription factors and transporters are involved in soybean response to general abiotic stresses mentioned above. More comprehensive analyses such as genome-wide comparisons of DEGs and GO and KEGG pathway enrichment could provide the overall picture of common and specific genes/pathways/mechanisms in soybean responses to different stresses, which would be a great study to follow in the future.

CONCLUSION
In this study, an alkalinity-tolerant wild soybean variety N24852 was identified, which showed low Na + concentration in both leaves and roots under NaHCO 3 (pH = 8.5) stress. Nine genes (including two ALMT genes and one LEA gene) were up-regulated at both 12 and 24 h of NaHCO 3 (pH = 8.5) treatment. NF-YA transcription factors were enriched in the up-regulated genes at 12 h after NaHCO 3 stress, while genes related to ion transporters such as ABC transporter, ALMT, GLR, NRT/POT family, and SLAH were enriched in up-regulated genes at 24 h after NaHCO 3 stress. "phenylpropanoid biosynthesis" and "phenylalanine metabolism" pathways were enriched at 24 h after NaHCO 3 treatment. This study provides a list of NaHCO 3 stressresponsive genes to help us further understanding plant response to alkaline stress.

AUTHOR CONTRIBUTIONS
JZ and YL conceived and designed the experiments. JZ, JW, and WJ performed the experiments. JZ, JW, WJ, JL, SY, and YL analyzed the data. JG and YL contributed reagents/materials and interpretation of the results. JZ and YL wrote and revised the manuscript. All authors read, revised and approved the final manuscript.