Original Research ARTICLE
Large-Scale Public Transcriptomic Data Mining Reveals a Tight Connection between the Transport of Nitrogen and Other Transport Processes in Arabidopsis
- 1Biological, Environmental and Climate Sciences Department, Brookhaven National Laboratory, Upton, NY, USA
- 2Purdue Research Foundation, West Lafayette, IN, USA
- 3Department of Bioengineering, Carl R. Woese Institute for Genomic Biology, National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- 4Arkansas Forest Resources Center, The University of Arkansas at Monticello, Monticello, AR, USA
Movement of nitrogen to the plant tissues where it is needed for growth is an important contribution to nitrogen use efficiency. However, we have very limited knowledge about the mechanisms of nitrogen transport. Loading of nitrogen into the xylem and/or phloem by transporter proteins is likely important, but there are several families of genes that encode transporters of nitrogenous molecules (collectively referred to as N transporters here), each comprised of many gene members. In this study, we leveraged publicly available microarray data of Arabidopsis to investigate the gene networks of N transporters to elucidate their possible biological roles. First, we showed that tissue-specificity of nitrogen (N) transporters was well reflected among the public microarray data. Then, we built coexpression networks of N transporters, which showed relationships between N transporters and particular aspects of plant metabolism, such as phenylpropanoid biosynthesis and carbohydrate metabolism. Furthermore, genes associated with several biological pathways were found to be tightly coexpressed with N transporters in different tissues. Our coexpression networks provide information at the systems-level that will serve as a resource for future investigation of nitrogen transport systems in plants, including candidate gene clusters that may work together in related biological roles.
Nitrogen (N) is often the most limiting nutrient for plant growth. The US produces more than 10 million tons of nitrogen fertilizer annually in order to increase the output of agriculture (Russell et al., 2009). The process of making those fertilizers is energy intensive, and excessive fertilization leads to environmental pollution due to leaching and run-off of N into rivers and oceans. Understanding the mechanisms of nitrogen utilization in plants will provide guidance to improve nitrogen use efficiency of crop plants, which will reduce fertilizer and energy costs of agriculture, and help protect our environment (Canfield et al., 2010).
N is usually taken up by roots from the soil as nitrate or ammonium, or sometimes organic forms, such as amino acids (Masclaux-Daubresse et al., 2010). Nitrate and ammonium may be assimilated into organic forms in the roots or leaves through the glutamine synthetase-GOGAT (GS-GOGAT) cycle, and may be utilized or stored where synthesized, or translocated to other tissues (Masclaux-Daubresse et al., 2010). For example, in many crop plants, when there is limited N availability, N is translocated from older leaves to younger leaves higher on the stem that typically receive more direct sunlight and are less likely to be shaded than older leaves (Diaz et al., 2008). Transport between different plant tissues may occur by loading N into the xylem or phloem, where it moves with the bulk flow of the xylem or phloem sap, respectively. Some of the genes that control N transport have been identified, but other components remain to be identified and our understanding of the system is incomplete.
Nitrogen in different chemical forms can be transported by different gene families, such as amino acid transporters (AAT), nitrate transporters/peptide transporters (NPF, formerly called NRT1 and PTR), and NRT2 (Tsay et al., 2007; Léran et al., 2014), ammonium transporters (AMT), amino acid-polyamine-choline transporters (APC), and amino acid/auxin permeases (AAAP) (Williams and Miller, 2001), which we will collectively call “N transporters” here for brevity. Uptake of nitrate from soil is perhaps the best understood aspect of N transport in plants. Plants have evolved two types of transporters, high and low affinity, for the uptake of nitrate from soil at low and high concentrations, respectively, and those transporters are induced or repressed accordingly (Orsel et al., 2002). After uptake, nitrate may be assimilated to organic forms in the roots, or loaded into the xylem by NPF7.3 (formerly NRT1.5) for transport from roots to leaves, where it may be assimilated or stored in the vacuole (Wang Y. -Y. et al., 2012). Although plants often recycle this valuable nutrient from the old leaves to new organs (Wang Y. -Y. et al., 2012), the genes involved in recycling have not yet been fully determined (Tegeder, 2012). Also, the system responsible for translocation of N from leaves to reproductive organs has not been fully elucidated, although some components have been identified. For example, the amino acid permeases, such as AAP2 and AAP6, mediate transfer of amino acids from the xylem to the phloem, impacting N and protein content of seeds, and other silique and seed-localized transporters, such as AAP1 and NPF2.12 (formerly NRT1.6) mediate seed development and filling (Almagro et al., 2008; Tegeder, 2012). Environmental conditions in the soil may vary drastically, including the level of nitrate availability, and factors such as nitrogenous metabolite concentrations in specific tissues, circadian rhythm, sucrose, and pH may play a role in regulating N utilization (Gojon et al., 2009; Krouk et al., 2010). Thus, we expect coordinated coregulation of genes that act together as a system in response to these varying conditions.
Microarray technology has provided the power to measure mRNA abundance efficiently and affordably, and has been used to study N utilization in plants (Wang et al., 2003; Bi et al., 2007; Krouk et al., 2009). Generally, the mRNA samples from plants with no or limited N and sufficient N supply are compared in order to find the differentially expressed genes (DEG), which are considered to be the candidates involved in N utilization. A series of computational studies have been performed based on the microarray measurements in order to investigate the gene networks underlying these processes (Gutiérrez et al., 2007a,b; Stokes et al., 2008; Nero et al., 2009). For instance, Nero et al. integrated 76 microarray samples from five labs and identified a gene network module which may be responsive to nitrate. Unlike traditional research focusing on one or a few genes, these studies provided a genome-wide view of nitrogen utilization, which may help us better understand the mechanisms at a higher level (Ruffel et al., 2010). The idea behind those studies generally is that genes with a similar expression pattern across many samples may be functionally related (Rhee and Mutwil, 2014). Plant transcriptomic data have accumulated in the past decade and more than 30,000 expression profiling samples for Arabidopsis are stored in NCBI GEO (Barrett et al., 2013). Despite the abundance of the data, making sense of those public data remains challenging (Rung and Brazma, 2013). In order to detect the stable coexpression relationships, microarray datasets from different labs have been combined to calculate the correlation coefficient between two expression profiles (Kim et al., 2001; Stuart et al., 2003; Atias et al., 2009; Mao et al., 2009; Wang S. et al., 2012). Often correlations between different genes may depend on the specific cellular context (De la Fuente, 2010), for example cancer vs. non-cancer cells (Anglani et al., 2014). The problem with combining microarray data from many different experiments is that context-specific relationships may be missed.
We applied context-specific coexpression analysis, first for a subset of genes involved in nitrogen transport, 17 genes (15 NRTs and of 2 other families that encode channels that transport nitrate), and then on a larger scale for 170 genes potentially involved in the nitrogen transport system in Arabidopsis from multiple gene families (Table S1). Unlike previous computational works, we processed each GEO dataset independently in order to capture context-specific regulation relating to nitrogen transport. We analyzed microarray datasets from 320 studies done by different labs, including not only microarray data generated for the study of nitrogen but also microarray data from studies unrelated to nitrogen. Candidate genes and pathways that might be involved or associated with nitrogen transport were discovered, which will guide further experimental studies.
Differential Expression Across Experiments Indicates Context-Specific Gene Functionality within a Tissue
Although both ammonium and nitrate can be used by plants, nitrate is the major form of nitrogen in many soils (Chrispeels et al., 1999). We focused initially on 15 NRTs and two other genes that encode channels that transport nitrate (Figure 1), each of which has some experimental evidence of its function (Wang Y. -Y. et al., 2012). We first explored differential expression, since those genes are believed to be regulated to respond to certain signals and hence they may play a role during the studied biological process (Tarca et al., 2006). Among the 371 published Arabidopsis expression series datasets we collected from GEO, 50 datasets are root-specific and 49 datasets are leaf-specific (see Table S2). For each leaf- and root-specific dataset, we identified the DEG and tallied the number of experiments in which each gene was differentially expressed (see Section Materials and Methods). Comparing the differential expression events between roots and leaves was suggestive of the function of some genes. For example, NPF2.7 (formerly NAXT1) was differentially expressed in more than 30% of root-specific datasets but in only about 12% of leaf-specific datasets (Figure 1), suggesting a context-specific function of NPF2.7 in roots. Previous studies have demonstrated that NPF2.7 is involved in the excretion of nitrate from roots of Arabidopsis (Segonzac et al., 2007). Also, our results show that NPF2.13 (formerly NRT1.7) was differentially expressed in only about 20% of roots but in about 40% of leaves (Figure 1). It has been reported that NPF2.13 is involved in translocation of nitrate from old leaves to young leaves (Fan et al., 2009). Furthermore, NRT2.1 and NRT2.2 are differentially expressed in roots twice as much as in leaves (Figure 1), which corresponds to their roles in the uptake of nitrate from soil (Wang Y. -Y. et al., 2012). We must caution that there are caveats to using this approach as an indicator of functionality. For example, NRT2.4 has much higher differential expression in roots than in leaves, which is consistent with its function in root nitrate uptake (Kiba et al., 2012). However, NRT2.4 has a second role, relating to the loading of nitrate into the phloem in shoots, which would have been missed by the differential expression approach alone. These examples suggest that comparing the relative level of responsiveness of genes in particular tissues is one approach that could be combined with other approaches to focus functional genomics studies of gene networks, especially for large gene families like NRT (e.g., cytochrome P450s, glycosyltransferases, glycoside hydrolases, etc.). Additionally, the measure of plasticity in expression provided by this analysis is suggestive of the degree to which a gene's function within a tissue is dependent on context and conditions.
Figure 1. Comparison of differential expression between roots and leaves for N transporters. ANOVA followed by FDR was utilized to detect differential expression between replicated groups in each GEO datasets (p < 0.01). There are 50 datasets which contain root samples only and 49 datasets which contain leaf samples only among 371 datasets collected for this study (see Table S2 for detail).
Coexpression Analysis Across 320 Datasets Identified Related Metabolic Processes and Possible Pathway Members Relating to Transport of Nitrogen
When building a coexpression network, Pearson Correlation Coefficient (PCC) is often used to measure the weight of correlation between two expression profiles. The challenge in selecting a cutoff to define what elements to include is that the minimum value of PCC that is significantly different from zero (i.e., no correlation) heavily depends on the sample size. For example, at the significance level 0.05, the minimum PCC is 0.6 when the sample size is 10, and 0.2 when the sample size is 100. Generally speaking, there is no standardization of the cutoff amongst studies, and it may vary dramatically (Jordan et al., 2004; Van Noort et al., 2004; Wang S. et al., 2012). Although the p-value of correlation can be used as cutoff between datasets of different sample size (Ponomarenko et al., 2013), it is tricky to calculate an average value using p-value. Since coexpression networks are intended to make complex systems understandable to the human intellect, others have included only a small number of most highly correlated genes (e.g., the top 20 genes or top 0.1%) (Kim et al., 2001; Bergmann et al., 2004). We utilized a similar strategy to focus attention on the most highly correlated network members (See Section Materials and Methods). A network was constructed using the top 20 coexpressed partners for our 17 focal genes (i.e., 15 NRT and 2 channels) (Figure 2). The weight (i.e., PCC) for each GEO dataset was calculated independently and the average value of all datasets was used to measure the strength of coexpression between a pair of genes. Unlike using a combined meta-dataset where the transient/context-specific signals may be swamped, those relationships are more likely to be captured by our method (Usadel et al., 2009).
Figure 2. The coexpression network of 17 N transporters. Only the top 20 coexpressed genes for each N transporter were included. The width of the edge is corresponding to the weight of average coexpression among 320 GEO datasets. A high resolution version of this figure and the coexpression weight can be found in Supplemental Materials (Figure S3 and Table S3).
Many of the 17 N transporters were coexpressed with several other N transporters, showing the potential functional association among those genes. In order to test whether this coexpression network makes biological sense, GO enrichment analysis was performed to detect over-represented functional categories after removing all of the 17 genes from the network (Figure 2). Interestingly, “water channel activity” was over-represented (p = 6.4 × 10−8), which indicates that transport of water and transport of nitrate may be under coordinated transcriptional control. Other than those 17 genes, there are various transporters, metabolic enzymes, and transcriptional regulators in this network (Figure 2), some of which appear likely to have a relationship with the nitrogen transport system, based on their known functions. For example, NPF6.3 (formerly NRT1.1) is highly coexpressed with H+-ATPase 2, AT4G30190 (Table 1 and Table S3). NPF6.3 is a nitrate/proton symporter, requiring a proton gradient for the uptake of nitrate from the soil (Parker and Newstead, 2014). The H+-ATPases that maintain the proton gradient comprise a large superfamily (Axelsen and Palmgren, 2001; Palmgren, 2001), but our coexpression analysis suggests that H+-ATPase 2, specifically, may contribute to the proton gradient needed for the transport of nitrate from the soil into the root by NPF6.3. Furthermore, H+-ATPase 2 is strongly expressed in the root pericycle, cortex, epidermis, and root cap according to the Arabidopsis eFP browser, particularly after nitrate addition (Figure S1A; Winter et al., 2007).
We further included 171 genes potentially involved in nitrogen transport in a similar analysis as above. A network of top 20 coexpressed partners for each nitrogen transporter can be visualized in Figure S2. In total, 2047 other genes are in this network, many of which are connected with more than one nitrogen transporter (Table S4). Interestingly, other transporter genes are enriched among those 2047 genes, such as genes from the GO categories “ion transport” and “carbohydrate transport” (Table S5), indicating those biological processes might be regulated similarly in Arabidopsis (Koprivova et al., 2000; Scheible et al., 2004). These other transporters could possibly be involved in transport of counter-ions to help maintain charge balance across membranes during sustained NO transport, or might be involved in the uptake or homeostasis of other essential nutrients that would be needed for growth and development at the same time as N uptake. Multiple other GO biological process categories were also significantly over-represented in the network (Table S5), such as those relating to phenolics. It is well documented that phenolic compound biosynthesis is upregulated when N is limited relative to C (Scheible et al., 2004; Cross et al., 2006). Our analysis suggests that there may be coordinated regulation of N transporter genes and phenylpropanoid biosynthetic genes. Additionally, there were various carbohydrate (C) metabolism and transport categories that were over-represented in the N transporter network (Table S5). This is consistent with previous evidence for extensive coordination to balance C and N metabolism (Palenchar et al., 2004) but may also indicate the need for increased carbohydrates in tissues where N uptake is strong to provide energy to maintain the proton gradient needed for N uptake, and to provide energy and organic building blocks for the lateral root proliferation that is common in high N regions of soil (Hodge, 2004). Finally, the network also included responses to numerous stimuli, such as water deficit, abscisic acid, and wounding. These “response” categories represent a rich resource for hypothesis generation, as they may reflect the importance of coordinating N utilization with other aspects of plant physiology in response to different environmental conditions. For example, N uptake may need to be altered if water uptake declines during drought, since N delivery to the shoot requires transport with water through the xylem. In addition to these over-arching insights, similar networks that focus on particular aspects of N transport (e.g., N export during leaf senescence) may be useful to identify a more focused set of processes that are associated with particular aspects of N utilization.
Coexpression Network Indicates Tissue Specificity and Potential Pathways Associated with N-Transport
Increasing the coexpressed partners in a network beyond the top 20, as above, may be meaningful but there is a risk of increasing the false-positive rate. One strategy to detect those broader relationships when individual gene relationships are relatively weak is to compute the correlation between a gene and meaningful pathways, such as GO Biological Processes (Huang et al., 2006; Tegge et al., 2012; Bateman et al., 2014). Since a pathway is a pre-defined group of genes, taking all those genes into account may boost the power to detect the real signal (Lee et al., 2011). Furthermore, the presence or absence of specific genes or networks may be tissue- or cell-type dependent (Anglani et al., 2014). We calculated the correlation between expression of the 17 N transporter genes and GO Biological Process pathways within each GEO dataset for samples from the same tissue type, and used the top 10 correlations to construct a network of N transporter-pathways that displays the tissue-specificity of each edge (See Section Materials and Methods; Figure 3). Each connection between a N transporter and a GO pathway represents a statistically significant correlation in a tissue type, which is represented by the color of the edge. Some N transporters are connected by network edges of a single tissue type, such as NPF2.12 (NRT1.6) and NRT2.2, while others are connected by network edges of multiple tissue types, such as NPF6.2 (NRT1.4) and NPF4.6 (NRT1.2) (Figure 3).
Figure 3. A tissue-specific coexpression network between 17 N transporters and GeneOntology biological processes. Only the top 10 statistically significant coexpressed pathways coexpressed with each N transporter were included. The numbers following the name of GO biological process represent the number of genes within the process/the number of genes within the process and are on the microarray. The width of the edge is corresponding to the average weight of coexpression in GEO datasets of a specific tissue between the N transporter and genes from the GO category. Only the edges supported by at least 5 datasets of a specific tissue are shown here. A high resolution version of this figure and all the weights between N transporter and GO biological processes in all available tissues can be found in Supplemental Materials (Figure S4 and Table S6).
These correlations may provide hints as to the function of uncharacterized N transporter genes or additional functions of previously characterized genes. NPF2.12 (formerly NRT1.6) is connected with several pathways in our N transporter-pathway network and all those relationships are based on seed-specific datasets (Figure 3). This is consistent with previous evidence, which suggests that NPF2.12 is involved in the delivery of nitrate from the maternal plant to the developing embryo, particularly the transfer of nitrate from the vascular tissue into the seed (Almagro et al., 2008). Knockout of NPF2.12 has profound impacts such as reduced nitrate content of seeds, and substantially increased incidence of seed abortion. Our network suggests that, in addition to carpel development, NPF2.12 may also be strongly linked with anther and pollen development, vacuolar protein localization, and phenolic metabolism. In recent years, intact phenolic metabolism has been linked with proper pollen development and pollen fertilization of embryos (Matsuno et al., 2009; Fellenberg et al., 2012; Fellenberg and Vogt, 2015).
All of the edges connected to NRT2.2 are based on root-specific datasets, which is consistent with its role in the uptake of nitrate from soil (Li et al., 2007). One of the pathways connected to NRT2.2 is “specification of organ identity,” which might reflect the tight relationship between nitrate uptake and cellular differentiation or between nitrate consumption and root growth (Walch-Liu et al., 2006). NRT2.2 and NRT2.1 shared a strong link with “imidazole-containing compound metabolic process,” which includes multiple genes associated with histidine biosynthesis. Although the relevance of this co-linkage to histidine biosynthesis is not immediately clear, it is interesting that NRT2.2 and NRT2.1 are linked in our network since the two genes reportedly have some overlap of function in inducible high affinity nitrate uptake by roots (Li et al., 2007).
Table 2 shows the top correlated pathway for each of the 17 N transporters. Detailed information about all those tissue-specific correlations can be found in Table S6. We believe that data such as these will provide potential candidate genes and interesting hypotheses for further studies. For example, leaf-specific “sulfate (S) assimilation” is the best correlated pathway with NPF6.3 (formerly NRT1.1) and NPF7.3 (formerly NRT1.5). It is not surprising that N uptake and S assimilation genes correlate, since plant processes that require a lot of N also tend to need S, for example for the biosynthesis of cysteine, methionine, and several important cofactors (Koprivova et al., 2000). Similarly, multiple genes were best associated with, “photosynthesis, light harvesting,” including NPF4.6, NPF6.2 (formerly NRT1.2 and NRT1.4, respectively), NRT2.7, and CLCA, which appear as a cluster in the network (Figure 3). This may reflect the importance of tight co-regulation of N and C metabolism and also the fact that the biosynthesis of light harvesting proteins and pigments is highly dependent on the availability of N (Scheible et al., 2004).
Another example is NPF2.13 (formerly NRT1.7) which is coexpressed with “negative regulation of cell death” pathway in leaf. As reported previously, NPF2.13 plays a role in recycling of nitrogen within the plant (Fan et al., 2009) and cell death is a prominent event during senescence (Lim et al., 2007). It is probably crucial that cell death is slowed or delayed until most of the N is exported from the leaves through the phloem. Several other N transporter genes are also strongly correlated with the “negative regulation of cell death” pathway (NPF4.6/NRT1.2, NPF7.2/NRT1.8, NRT2.6, and NRT2.5), which might suggest that multiple N transporter genes are involved in N remobilization during senescence. NRT2.5 has been linked previously with N remobilization (Lezhneva et al., 2014). Alternatively, the “negative regulation of cell death” hub might indicate a general role of abundant N in delaying senescence. Indeed nitrogen status and senescence are known to be closely linked (Cooke et al., 2005; Diaz et al., 2008).
Some of the other GO Biological Process pathways appear to represent informative hubs. For example, the “nitrate transport,” and “response to nitrate,” groups are coexpressed with multiple genes in root tissues, including NPF6.3, NPF4.6, NPF2.9, NPF2.7, and SLAH3 (Figure 3). Several of these genes are known to be involved in nitrate uptake and redistribution in roots (Segonzac et al., 2007; Wang and Tsay, 2011; Glass and Kotur, 2013) and this association suggests that the others might play other roles in these processes. For example, SLAH3 functions in nitrate release from guard cells (Geiger et al., 2011), but based on the fact that SLAH3 is strongly expressed in the pericycle of roots (Figure S1B) combined with this coexpression relationship, one could hypothesize that SLAH3 might facilitate nitrate loading into the xylem or phloem in the roots via an apoplastic route. The role of NPF2.9 (formerly NRT1.9) was described as mediating nitrate distribution between shoot and root (Wang and Tsay, 2011). Based on expression of NPF2.9 in companion cells in roots and the relationship of NPF2.9 with NPF6.3 and NPF4.6 (formerly NRT1.1 and 1.2, respectively) in our coexpression network, perhaps NPF2.9 might have a more direct link with nitrate uptake, such as delivery of nitrate from the maturation zone of the root, where much of the water and nitrate uptake occurs, toward the developing root tip via the phloem. By identifying clusters or hubs such as this, our analysis provides guidance for further experimentation. For example, in order to understand nitrate uptake, we need to understand the functions of these genes and how these functions are integrated as a system.
One of the popular approaches to leverage expression data is coexpression network analysis. Transcriptome data probably is the most abundant biological data for plants, with more than 30,000 microarray samples deposited in NCBI GEO for the model plant Arabidopsis alone (He et al., 2016). This massive dataset is a valuable resource to functional genomics of plants. For example, genes involved in flavonoid biosynthetic process (Katsumoto et al., 2007), starch metabolism (Mentzen et al., 2008), aliphatic glucosinolate biosynthesis (Gigolashvili et al., 2009), lignin biosynthesis (Vanholme et al., 2013), and photorespiration (Pick et al., 2013) have been identified with the assistance of coexpression networks. Compared with animal data, functional gene annotation is limited in plants. It is critical to utilize the large amount of transcriptomics data to guide studies of gene function in plant science (Hwang et al., 2011). In fact, the standard gene annotations for Arabidopsis include data predicted based on coexpression networks (Heyndrickx and Vandepoele, 2012).
Generally speaking, large sample size helps to infer a more robust correlation relationship. If two genes show a high coexpression in only one dataset but very low coexpressions in other datasets, it may be a false positive due to the noise of microarray or stochasticity (Lee et al., 2004). Using integrated datasets helps to avoid those false positives, but may lead us to ignore biologically meaningful but transient patterns. Recently, the experimental evidence supporting the existence of transient relationships has been revealed (Ideker and Krogan, 2012). For example, a method called AP-SRM (Affinity Purification-Selected Reaction Monitoring) has been established to measure the physical interactions that only exist in certain conditions (Bisson et al., 2011). More than 70% of yeast genetic interactions under chemical treatment cannot be detected in a normal cellular environment (Bandyopadhyay et al., 2010). Plant scientists are also aware that coexpression networks are context dependent (Usadel et al., 2009). For instance, coexpressed partners of an Arabidopsis gene (i.e., RGL2) are highly dependent on the microarray samples used (Usadel et al., 2009). Instead of combining expression profiling samples from different labs, we calculated the strength of coexpression for each GEO dataset independently in order to capture those context-specific signals. As far as we know, our work is the first to perform context-specific coexpression analysis for genes involved in N transport. As the cost of RNAseq decreases, gene expression data will increase exponentially, and it will only become more crucial to have computational methods, such as those described here, to transform those vast amounts of data into refined hypotheses, and ultimately to expand our knowledge of plants as complex integrated systems.
Here in our study, publicly available microarray data was utilized to explore the coexpression network of nitrogen transporters in Arabidopsis. A tight association between transport of nitrogen and other transport and metabolic processes was revealed. The co-regulated partners of N transporter genes was provided, serving as a resource for further studies. It is well known that carbon and nitrogen metabolism are tightly coordinated in plant tissues (Palenchar et al., 2004; Scheible et al., 2004; Cross et al., 2006). Our coexpression network supports the notion that there is coordination at the organismal level, and suggests that N transporters mediate at least some aspects of the coordination of C and N metabolism.
Materials and Methods
Data Collection and Normalization
Three hundred and seventy one expression series datasets based on platform GPL198 were collected from GEO (Table S7). Each dataset contains at least 12 samples. Three hundred and twenty datasets which contain CEL files were used in our analysis. Robust Multiarray Average (RMA) was used to normalize the microarray data for each dataset (Irizarry et al., 2003). The IDs of probesets were converted into gene locus ID based on the annotation file for GPL198. The replicate group and tissue types were manually curated.
Identification of Differentially Expressed Genes
For each expression dataset, we applied ANOVA to identify gene expression which has larger variation between two replicate groups than within a replicate group. Only those with false discovery rate (FDR) < 0.001 were considered as DEG.
Construction of Coexpression Network
We used the following equation to measure the strength of coexpression between a nitrogen transporter and another gene on the array:
where rk is the coexpression weight (i.e., PCC) between gene i and nitrogen transporter x in the GEO dataset n. Rx, i is the average value of 320 weights between gene i and nitrogen transporter x. We used the following equation to measure the strength of tissue-specific coexpression between a nitrogen transporter and another gene on the array:
where T is the subset of GEO dataset of a specific tissue type. is the strength of tissue-specific coexpression between a nitrogen transporter x and gene i. rk in Equation (2) represents the weights from a specific tissue type, T. We used the “biological process” in the GeneOntology system to define a pathway and the following equation was used to measure the tissue-specific coexpression between a nitrogen transporter x and a pathway:
where m is the number of genes in a pathway p. is the tissue-specific coexpression between nitrogen transporter x and another gene k. And k is a gene in the pathway p. When we constructed the coexpression network between nitrogen transporters and pathways, only the datasets where a nitrogen transporter is differentially expressed were used. In order to determine the statistical significance of , the genes of a pathway were replaced by randomly selected genes in the genome and the was calculated. We repeated this process 100 times for each pathway. A empirical p < 0.01 was assigned if none of the resulted is higher than the real . For more detail, see Supplemental Note.
FH, AK, SM, and BB conceived the study. FH performed analysis. FH, BB, and AK wrote the paper.
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
This work was supported by a Laboratory Directed Research and Development grant from Brookhaven National Laboratory (to BB and SM) and the US Department of Energy, and was supported in part by grants PM-031 (SM) from the Office of Biological Research of the U.S. Department of Energy, and by the USDA National Institute of Food and Agriculture, McEntire-Stennis project number 1009319 (BB). This article has been authored by Brookhaven Science Associates, LLC under contract number DE-AC02-98CH10886 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/article/10.3389/fpls.2016.01207
Supplemental Note. Detailed information for the method of calculating p-value between a gene and a pathway.
Figure S1. Additional evidence of expression pattern for H+-ATPase 2 (A) and SLAH3 (B).
Figure S2. The coexpression network of 171 N transporters.
Figure S3. The high-resolution version of Figure 2.
Figure S4. The high-resolution version of Figure 3.
Table S1. A list of nN transporters used in our coexpression analysis.
Table S2. Differential expression of 17 N transporters in roots and leaves.
Table S3. The coexpressed partners for each N transporters.
Table S4. Degree information for nodes in the coexpression network formed between 171 N transporters and their top20 coexpressed partners.
Table S5. Functional enrichment of coexpressed genes with 171 N transporters.
Table S6. Tissue-specific coexpressed biological processes with 17 N transporters.
Table S7. All GEO datasets used in this study.
Almagro, A., Lin, S. H., and Tsay, Y. F. (2008). Characterization of the Arabidopsis nitrate transporter NRT1.6 reveals a role of nitrate in early embryo development. Plant Cell 20, 3289–3299. doi: 10.1105/tpc.107.056788
Anglani, R., Creanza, T. M., Liuzzi, V. C., Piepoli, A., Panza, A., Andriulli, A., et al. (2014). Loss of connectivity in cancer co-expression networks. PLoS ONE 9:e87075. doi: 10.1371/journal.pone.0087075
Bandyopadhyay, S., Mehta, M., Kuo, D., Sung, M.-K., Chuang, R., Jaehnig, E. J., et al. (2010). Rewiring of genetic networks in response to DNA damage. Science 330, 1385–1389. doi: 10.1126/science.1195618
Barrett, T., Wilhite, S. E., Ledoux, P., Evangelista, C., Kim, I. F., Tomashevsky, M., et al. (2013). NCBI GEO: archive for functional genomics data sets - update. Nucleic Acids Res. 41, D991–D995. doi: 10.1093/nar/gks1193
Bateman, A. R., El-Hachem, N., Beck, A. H., Aerts, H. J. W. L., and Haibe-Kains, B. (2014). Importance of collection in gene set enrichment analysis of drug response in cancer cell lines. Sci. Rep. 4:4092. doi: 10.1038/srep04092
Bi, Y.-M., Wang, R.-L., Zhu, T., and Rothstein, S. J. (2007). Global transcription profiling reveals differential responses to chronic nitrogen stress and putative nitrogen regulatory components in Arabidopsis. BMC Genomics 8:281. doi: 10.1186/1471-2164-8-281
Bisson, N., James, D. A., Ivosev, G., Tate, S. A., Bonner, R., Taylor, L., et al. (2011). Selected reaction monitoring mass spectrometry reveals the dynamics of signaling through the GRB2 adaptor. Nat. Biotechnol. 29, 653–658. doi: 10.1038/nbt.1905
Chrispeels, M. J., Crawford, N. M., and Schroeder, J. I. (1999). Proteins for transport of water and mineral nutrients across the membranes of plant cells. Plant Cell 11, 661–676. doi: 10.1105/tpc.11.4.661
Cooke, J. E. K., Martin, T. A., and Davis, J. M. (2005). Short-term physiological and developmental responses to nitrogen availability in hybrid poplar. New Phytol. 167, 41–52. doi: 10.1111/j.1469-8137.2005.01435.x
Cross, J. M., von Korff, M., Altmann, T., Bartzetko, L., Sulpice, R., Gibon, Y., et al. (2006). Variation of enzyme activities and metabolite levels in 24 Arabidopsis accessions growing in carbon-limited conditions. Plant Physiol. 142, 1574–1588. doi: 10.1104/pp.106.086629
De la Fuente, A. (2010). From “differential expression” to “differential networking” - identification of dysfunctional regulatory networks in diseases. Trends Genet. 26, 326–333. doi: 10.1016/j.tig.2010.05.001
Diaz, C., Lemaître, T., Christ, A., Azzopardi, M., Kato, Y., Sato, F., et al. (2008). Nitrogen recycling and remobilization are differentially controlled by leaf senescence and development stage in Arabidopsis under low nitrogen nutrition. Plant Physiol. 147, 1437–1449. doi: 10.1104/pp.108.119040
Fan, S.-C., Lin, C.-S., Hsu, P.-K., Lin, S.-H., and Tsay, Y.-F. (2009). The Arabidopsis nitrate transporter NRT1.7, expressed in phloem, is responsible for source-to-sink remobilization of nitrate. Plant Cell 21, 2750–2761. doi: 10.1105/tpc.109.067603
Geiger, D., Maierhofer, T., Al-Rasheid, K. A., Scherzer, S., Mumm, P., Liese, A., et al. (2011). Stomatal closure by fast abscisic acid signaling is mediated by the guard cell anion channel SLAH3 and the receptor RCAR1. Sci. Signal. 4, ra32. doi: 10.1126/scisignal.2001346
Gigolashvili, T., Yatusevich, R., Rollwitz, I., Humphry, M., Gershenzon, J., and Flügge, U.-I. (2009). The plastidic bile acid transporter 5 is required for the biosynthesis of methionine-derived glucosinolates in Arabidopsis thaliana. Plant Cell 21, 1813–1829. doi: 10.1105/tpc.109.066399
Gutiérrez, R. A., Gifford, M. L., Poultney, C., Wang, R., Shasha, D. E., Coruzzi, G. M., et al. (2007a). Insights into the genomic nitrate response using genetics and the Sungear Software System. J. Exp. Bot. 58, 2359–2367. doi: 10.1093/jxb/erm079
Gutiérrez, R. A., Lejay, L. V., Dean, A., Chiaromonte, F., Shasha, D. E., and Coruzzi, G. M. (2007b). Qualitative network models and genome-wide expression data define carbon/nitrogen-responsive molecular machines in Arabidopsis. Genome Biol. 8:R7. doi: 10.1186/gb-2007-8-1-r7
He, F., Yoo, S., Wang, D., Kumari, S., Gerstein, M., Ware, D., et al. (2016). Large-scale atlas of microarray data reveals the distinct expression landscape of different tissues in Arabidopsis. Plant J. 86, 472–480. doi: 10.1111/tpj.13175
Heyndrickx, K. S., and Vandepoele, K. (2012). Systematic identification of functional plant modules through the integration of complementary data sources. Plant Physiol. 159, 884–901. doi: 10.1104/pp.112.196725
Huang, R., Wallqvist, A., and Covell, D. G. (2006). Comprehensive analysis of pathway or functionally related gene expression in the National Cancer Institute's anticancer screen. Genomics 87, 315–328. doi: 10.1016/j.ygeno.2005.11.011
Hwang, S., Rhee, S. Y., Marcotte, E. M., and Lee, I. (2011). Systematic prediction of gene function in Arabidopsis thaliana using a probabilistic functional gene network. Nat. Protoc. 6, 1429–1442. doi: 10.1038/nprot.2011.372
Irizarry, R. A., Hobbs, B., Beazer-barclay, Y. D., Antonellis, K. J., Scherf, U. W. E., and Speed, T. P. (2003). Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4, 249–264. doi: 10.1093/biostatistics/4.2.249
Jordan, I. K., Mariño-Ramírez, L., Wolf, Y. I., and Koonin, E. V. (2004). Conservation and coevolution in the scale-free human gene coexpression network. Mol. Biol. Evol. 21, 2058–2070. doi: 10.1093/molbev/msh222
Katsumoto, Y., Fukuchi-Mizutani, M., Fukui, Y., Brugliera, F., Holton, T. A., Karan, M., et al. (2007). Engineering of the rose flavonoid biosynthetic pathway successfully generated blue-hued flowers accumulating delphinidin. Plant Cell Physiol. 48, 1589–1600. doi: 10.1093/pcp/pcm131
Kiba, T., Feria-Bourrellier, A.-B., Lafouge, F., Lezhneva, L., Boutet-Mercey, S., Orsel, M., et al. (2012). The Arabidopsis nitrate transporter NRT2.4 plays a double role in roots and shoots of nitrogen-starved plants. Plant Cell 24, 245–258. doi: 10.1105/tpc.111.092221
Krouk, G., Tranchina, D., Lejay, L., Cruikshank, A. A., Shasha, D., Coruzzi, G. M., et al. (2009). A systems approach uncovers restrictions for signal interactions regulating genome-wide responses to nutritional cues in Arabidopsis. PLoS Comput. Biol. 5:e1000326. doi: 10.1371/journal.pcbi.1000326
Lee, I., Blom, U. M., Wang, P. I., Shim, J. E., and Marcotte, E. M. (2011). Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res. 21, 1109–1121. doi: 10.1101/gr.118992.110
Léran, S., Varala, K., Boyer, J. C., Chiurazzi, M., Crawford, N., Daniel-Vedele, F., et al. (2014). A unified nomenclature of nitrate transporter 1/peptide transporter family members in plants. Trends Plant Sci. 19, 5–9. doi: 10.1016/j.tplants.2013.08.008
Lezhneva, L., Kiba, T., Feria-Bourrellier, A. B., Lafouge, F., Boutet-Mercey, S., Zoufan, P., et al. (2014). The Arabidopsis nitrate transporter NRT2.5 plays a role in nitrate acquisition and remobilization in nitrogen-starved plants. Plant J. 80, 230–241. doi: 10.1111/tpj.12626
Li, W., Wang, Y., Okamoto, M., Crawford, N. M., Siddiqi, M. Y., and Glass, A. D. M. (2007). Dissection of the AtNRT2.1:AtNRT2.2 inducible high-affinity nitrate transporter gene cluster. Plant Physiol. 143, 425–433. doi: 10.1104/pp.106.091223
Masclaux-Daubresse, C., Daniel-Vedele, F., Dechorgnat, J., Chardon, F., Gaufichon, L., and Suzuki, A. (2010). Nitrogen uptake, assimilation and remobilization in plants: challenges for sustainable and productive agriculture. Ann. Bot. 105, 1141–1157. doi: 10.1093/aob/mcq028
Matsuno, M., Compagnon, V., Schoch, G. A., Schmitt, M., Debayle, D., Bassard, J.-E., et al. (2009). Evolution of a novel phenolic pathway for pollen development. Science 325, 1688–1692. doi: 10.1126/science.1174095
Mentzen, W. I., Peng, J., Ransom, N., Nikolau, B. J., and Wurtele, E. S. (2008). Articulation of three core metabolic processes in Arabidopsis: fatty acid biosynthesis, leucine catabolism and starch metabolism. BMC Plant Biol. 8:76. doi: 10.1186/1471-2229-8-76
Nero, D., Krouk, G., Tranchina, D., and Coruzzi, G. M. (2009). A system biology approach highlights a hormonal enhancer effect on regulation of genes in a nitrate responsive “biomodule”. BMC Syst. Biol. 3:59. doi: 10.1186/1752-0509-3-59
Palenchar, P. M., Kouranov, A., Lejay, L. V., and Coruzzi, G. M. (2004). Genome-wide patterns of carbon and nitrogen regulation of gene expression validate the combined carbon and nitrogen (CN)-signaling hypothesis in plants. Genome Biol. 5:R91. doi: 10.1186/gb-2004-5-11-r91
Pick, T. R., Bräutigam, A., Schulz, M. A., Obata, T., Fernie, A. R., and Weber, A. P. M. (2013). PLGG1, a plastidic glycolate glycerate transporter, is required for photorespiration and defines a unique class of metabolite transporters. Proc. Natl. Acad. Sci. U.S.A. 110, 3185–3190. doi: 10.1073/pnas.1215142110
Ponomarenko, M. P., Suslov, V. V., Ponomarenko, P. M., Gunbin, K. V., Stepanenko, I. L., Vishnevsky, O. V., et al. (2013). Abundances of microRNAs in human cells can be estimated as a function of the abundances of YRHB and RHHK tetranucleotides in these microRNAs as an ill-posed inverse problem solution. Front. Genet. 4:122. doi: 10.3389/fgene.2013.00122
Ruffel, S., Krouk, G., and Coruzzi, G. M. (2010). A systems view of responses to nutritional cues in Arabidopsis: toward a paradigm shift for predictive network modeling. Plant Physiol. 152, 445–452. doi: 10.1104/pp.109.148502
Russell, A. E., Cambardella, C. A., Laird, D. A., Jaynes, D. B., and Meek, D. W. (2009). Nitrogen fertilizer effects on soil carbon balances in Midwestern U.S. agricultural systems. Ecol. Appl. 19, 1102–1113. doi: 10.1890/07-1919.1
Scheible, W.-R., Morcuende, R., Czechowski, T., Fritz, C., Osuna, D., Palacios-Rojas, N., et al. (2004). Genome-wide reprogramming of primary and secondary metabolism, protein synthesis, cellular growth processes, and the regulatory infrastructure of arabidopsis in response to nitrogen. Plant Physiol. 136, 2483–2499. doi: 10.1104/pp.104.047019
Segonzac, C., Boyer, J.-C., Ipotesi, E., Szponarski, W., Tillard, P., Touraine, B., et al. (2007). Nitrate efflux at the root plasma membrane: identification of an Arabidopsis excretion transporter. Plant Cell 19, 3760–3777. doi: 10.1105/tpc.106.048173
Stokes, T. L., Thum, K., Xu, X., Obertello, M., Katari, M. S., Gutie, R. A., et al. (2008). Systems approach identifies an organic nitrogen-responsive gene network that is regulated by the master clock control gene CCA1. Proc. Natl. Acad. Sci. 105, 4939–4944. doi: 10.1073/pnas.0800211105
Usadel, B., Obayashi, T., Mutwil, M., Giorgi, F. M., Bassel, G. W., Tanimoto, M., et al. (2009). Co-expression tools for plant biology: opportunities for hypothesis generation and caveats. Plant Cell Environ. 32, 1633–1651. doi: 10.1111/j.1365-3040.2009.02040.x
Vanholme, R., Cesarino, I., Rataj, K., Xiao, Y., Sundin, L., Goeminne, G., et al. (2013). Caffeoyl shikimate esterase (CSE) is an enzyme in the lignin biosynthetic pathway in Arabidopsis. Science 341, 1103–1106. doi: 10.1126/science.1241602
Van Noort, V., Snel, B., and Huynen, M. A. (2004). The yeast coexpression network has a small-world, scale-free architecture and can be explained by a simple model. EMBO Rep. 5, 280–284. doi: 10.1038/sj.embor.7400090
Wang, R., Okamoto, M., Xing, X., and Crawford, N. M. (2003). Microarray analysis of the nitrate response in Arabidopsis roots and shoots reveals over 1,000 rapidly responding genes and new linkages to glucose, trehalose-6-phosphate, iron, and sulfate metabolism. Plant Physiol. 132, 556–567. doi: 10.1104/pp.103.021253
Wang, S., Yin, Y., Ma, Q., Tang, X., Hao, D., and Xu, Y. (2012). Genome-scale identification of cell-wall related genes in Arabidopsis based on co-expression network analysis. BMC Plant Biol. 12:138. doi: 10.1186/1471-2229-12-138
Williams, L., and Miller, A. (2001). Transporters responsible for the uptake and partitioning of nitrogenous solutes. Annu. Rev. Plant Physiol. Plant Mol. Biol. 52, 659–688. doi: 10.1146/annurev.arplant.52.1.659
Winter, D., Vinegar, B., Nahal, H., Ammar, R., Wilson, G. V., and Provart, N. J. (2007). An “electronic fluorescent pictograph” Browser for exploring and analyzing large-scale biological data sets. PLoS ONE 2:e718. doi: 10.1371/journal.pone.0000718
Keywords: coexpression network, NRT, nitrate transporter, big data, Arabidopsis, public expression data
Citation: He F, Karve AA, Maslov S and Babst BA (2016) Large-Scale Public Transcriptomic Data Mining Reveals a Tight Connection between the Transport of Nitrogen and Other Transport Processes in Arabidopsis. Front. Plant Sci. 7:1207. doi: 10.3389/fpls.2016.01207
Received: 20 June 2016; Accepted: 29 July 2016;
Published: 11 August 2016.
Edited by:Alessandro Laganà, Icahn School of Medicine at Mount Sinai, USA
Reviewed by:Mikhail P. Ponomarenko, Institute of Cytology and Genetics of Siberian Branch of Russian Academy of Sciences, Russia
Xiaoxiao Sun, University of Georgia, USA
Copyright © 2016 He, Karve, Maslov and Babst. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Fei He, firstname.lastname@example.org