Re-analysis of RNA-seq transcriptome data reveals new aspects of gene activity in Arabidopsis root hairs

Root hairs, tubular-shaped outgrowths from root epidermal cells, play important roles in the acquisition of nutrients and water, interaction with microbe, and in plant anchorage. As a specialized cell type, root hairs, especially in Arabidopsis, provide a pragmatic research system for various aspects of studies. Here, we re-analyzed the RNA-seq transcriptome profile of Arabidopsis root hair cells by Tophat software and used Cufflinks program to mine the differentially expressed genes. Results showed that ERD14, RIN4, AT5G64401 were among the most abundant genes in the root hair cells; while ATGSTU2, AT5G54940, AT4G30530 were highly expressed in non-root hair tissues. In total, 5409 genes, with a fold change greater than two-fold (FDR adjusted P < 0.05), showed differential expression between root hair cells and non-root hair tissues. Of which, 61 were expressed only in root hair cells. One hundred and thirty-six out of 5409 genes have been reported to be “core” root epidermal genes, which could be grouped into nine clusters according to expression patterns. Gene ontology (GO) analysis of the 5409 genes showed that processes of “response to salt stress,” “ribosome biogenesis,” “protein phosphorylation,” and “response to water deprivation” were enriched. Whereas only process of “intracellular signal transduction” was enriched in the subset of 61 genes expressed only in the root hair cells. One hundred and twenty-one unannotated transcripts were identified and 14 of which were shown to be differentially expressed between root hair cells and non-root hair tissues, with transcripts XLOC_000763, XLOC_031361, and XLOC_005665 being highly expressed in the root hair cells. The comprehensive transcriptomic analysis provides new information on root hair gene activity and sets the stage for follow-up experiments to certify the biological functions of the newly identified genes and novel transcripts in root hair cell morphogenesis.

During the past 10 years, with the emergence of microarray technology coupled with advanced computational methods, vast transcriptome analyses at genome-wide level have been performed in Arabidopsis either by comparing transcriptional profiles of root hair-defective mutants compared to those of wild type plants, or by direct exploration of root hairspecific transcriptional profiles from root hair protoplasts based on fluorescence-activated cell sorting (FACS) platform, with hundreds to thousands of genes being identified either involved in root hair morphogenesis or in root hair response to abiotic stresses (Jones et al., 2006;Brady et al., 2007;Dinneny et al., 2008;Gifford et al., 2008;Bruex et al., 2012;Hill et al., 2013;Kwasniewski et al., 2013;Simon et al., 2013;Becker et al., 2014;Niu et al., 2014;Tanaka et al., 2014;Wilson et al., 2015). From these omics datasets, supported by previous molecular genetics studies (Ishida et al., 2008;Grebe, 2012;Ryu et al., 2013;Grierson et al., 2014), a subset of 208 "core" epidermal genes has been identified and a gene regulatory network involved in root epidermis cell differentiation in Arabidopsis has been established (Bruex et al., 2012), which provides an advantage model to study the roles of both single and duplicate genes in a specific gene network (Simon et al., 2013). However, several technical limitations of microarrays, such as limited gene probes present in the chip, narrow dynamic range of gene expression changes, as well as incapability to distinguish homologous genes with high similarity, have failed to show the dynamicity and genomewide range of transcriptional profiling of root hairs. Fortunately, next-generation sequencing technology has overcome such weaknesses and enabled us to explore whole transcriptomes at single-base resolution in a cost-effective manner. It has also enabled us to accurately quantify gene expression and identify unannotated transcripts and splicing isoforms via advanced computational methods (Trapnell et al., 2012).
In our previous study, the paired-end reads were separately matched to Arabidopsis genome in each biological repeat using BLAT program (Kent, 2002) and the differentially expressed genes were then identified in each replication using custommade software RACKJ (Lan et al., 2013), which results in 1617 differentially expressed genes between root hairs (herein referred as RH) and non-root hair tissues (all root tissues except root hairs; herein referred as NRH). However, it must be noticed that although BLAT is a very effective tool for doing nucleotide alignments between mRNA and genomic DNA, it was slow and not very accurate for mapping RNAseq reads to the genome. In addition, BLAT is not designed for the alignment of paired-end reads. RACKJ was initiated for identification of splicing isoforms. It was employed to identify differentially genes in each biological repeat via Z-sore analysis. Therefore, additional information could be revealed by re-analysis of the RNA-seq data using advanced pipeline. Tophat-Cufflinks pipeline are free, open-source software tools for gene discovery and comprehensive expression analysis of RNA-seq data (Trapnell et al., 2012). Tophat was initiated specially for RNA-seq data analysis (Trapnell et al., 2009), which enables both single-end and paired-end reads to align to huge genomes using the ultra-high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons (Trapnell et al., 2009(Trapnell et al., , 2012. The Cufflinks pipeline contains four programs which enables us to perform not only accurate quantification of the known gene expression but also identification and quantification of any previously unannotated transcripts with or without biological repeats (Trapnell et al., 2012). In this study, we extended previous study by re-analyzing RNA-seq data using Tophat-Cufflinks pipeline aimed to provide additional information on root hair gene activity. We revealed more than five thousands of genes that were differentially expressed between RH and NRH, with more than 4000 genes only being reported in the present study. Moreover, a subset of 14 previously unannotated transcripts was identified as to be differentially expressed between RH and NRH. The comprehensive transcriptomic analysis expands our knowledge in root hair gene activity and sets the stage for followup experiments on the biological functions of the newly identified genes and novel transcripts in root hair morphogenesis.

Digital Information on Gene Expression in Root Hairs and Non-root Hair Tissues at Genome-wide Level
In previous study, using transgenic plants carrying Expansin7 (EXP7) promoter fused to GFP as materials (Cho and Cosgrove, 2002), coupled with FACS technique, Arabidopsis root hair protoplasts were harvested and the transcriptome profiling has been explored by RNA-seq from two biological repeats. The RNA-seq data was subsequently analyzed using BLAT program (Kent, 2002) and the differentially expressed genes were further mined by custom-made software RACKJ (Lan et al., 2013). In the present study, the RNA-seq data were re-analyzed by aligning the paired-end reads to Arabidopsis Genome released in TAIR10 via Tophat program (Trapnell et al., 2009). The differentially expressed genes between RH and NRH were subsequently identified using Cufflinks pipeline (Trapnell et al., 2010(Trapnell et al., , 2012. Results showed that a total of 19,743 and 19,660 genes were confidently identified (with status "OK") in RH and NRH, respectively (Table S1 in the Supplementary Material). Of which, an overlap of 19,600 genes were expressed both in RH and NRH. In the RH, ERD14, RIN4, and AT5G64401 were among the most abundant genes, with RPKM (Reads Per Kb per Million reads) value more than 1500 (Table 1). Among the 30 highest abundant genes, those encoding arabinogalactan proteins were the most enriched, and four of which were highly expressed in RH. Genes encoding glutathione S-transferases, dehydrins, thioredoxins, and proline-rich extension-like proteins were among the second enriched group in RH, with at least two members detected from each gene family (Table 1). Among the 30 highest abundant genes in NRH were arabinogalactan protein-encoding genes and genes encoding glutathione S-transferases and dehydrins ( Gene Ontology (GO) analysis of the top 30 most abundant genes in RH and NRH revealed that stress related processes involved in "cold acclimation, " "response to water deprivation, " "toxin catabolic process, " and "aluminum ion transport" were enriched both in RH and NRH (Table S4 in the Supplementary Material). Other processes of "responsive to oxidative stress, " "response to microbial phytotoxin, " "defense response to fungus, " and "aluminum ion transport" were more enriched in RH than in NRH. In contrast, genes involved in "response to cold" and "serine-isocitratelyase pathway" were more pronounced in NRH than in RH (

Differentially Expressed Genes Identified Between Root Hairs and Non-root Hair Tissues Using Cufflinks Pipeline
The differentially expressed genes between RH and NRH were identified using Cuffdiff algorithm in Cufflinks pipeline with following parameters: for a given gene, (1) the FDR (false discovery rate) adjusted p-value (that is q-value) must be less than 0.05; (2) a fold change between RH and NRH is greater than two-fold; (3) the RPKM value of each gene must be more than one in either of the samples. Subsequently, a total of 5409 genes were identified as differentially expressed genes between RH and NRH ( , the most up-regulated genes were those encoding proline-rich (extensin-like) proteins, extensins, expansins such as EXP7 and EXP18, arabinogalactan proteins, xyloglucan endotransglucosylase/hydrolase, peroxidase superfamily protein, and others. Some other genes including COBL9, COW1, LRX1, IRE, IRC1, and others, which were reported to be required for or associated with root hair development and growth, were also found highly induced in RH.
Comparison of the 1617 differentially expressed genes identified in previous study (Lan et al., 2013) to the 5409 genes identified in the present study led to an overlap of 1259 genes ( Table S8 in the Supplementary Material). Seventy-seven percent (1259 out of 1617) of the differentially expressed genes identified in previous study have been determined in the present analysis. By contrast, only 23% (1259 out of 5409) of the differentially expressed genes identified in the present study have been found in previous analysis, and 77% (4150 out of 5409) of the additional genes were only discovered in the present study using Tophat-Cufflinks pipeline. Several of these additional genes, such as COBL9, RHS15, and RHS have been reported to be associated with root hair formation; some of them were among the most up-regulated genes in RH. List of the top 100 most up-regulated genes in the present study can be found in Table 2 for detailed information.
In addition, among the 1617 differentially expressed genes identified in previous study, a subset of 635 genes was shown up-regulated in RH. Among them, 580 genes were found upregulated in RH in present study, i.e., 91% (580 out of 635) genes up-regulated in previous study were also identified by Tophat-Cufflinks pipeline.

Differential Go Analysis of Differentially Expressed Genes
Differential GO analysis of the 5409, 4150, and 1259 genes, which were differentially expressed genes identified in this study, newly identified in this study and identified by both present and previous study, respectively, were performed.
Results showed that processes of "response to salt stress, " "ribosome biogenesis, " "protein phosphorylation, " "embryo development ending in seed dormancy, " and "response to water deprivation" were most enriched (P ≦ 3.01E-10) in the total 5409 differentially expressed genes (Table S9 in the Supplementary Material). In the 4150 subset, processes of "protein phosphorylation, " "embryo development ending in seed dormancy, " "response to water deprivation, " "response to chitin, " "cytokinesis, " "intracellular signal transduction, " "DNA replication, " "transport, " and "microtubule-based movement" were enriched (P ≦ 5.21E-7). Whereas protein synthesis related processes of "ribosome biogenesis" and "translation, " and root hair related processes of root hair cell differentiation and development were underrepresented (P ≧ 0.55). By contrast, protein synthesis and root hair related processes as well as processes of "response to salt stress, " "response to cold, " and "response to cadmium ion" were dramatically overrepresented (P ≦ 2.97E-11) in the 1259 overlapping genes ( Table S9 in the Supplementary Material).
GO analysis of the top 100 most induced genes in RH revealed that processes of "plant-type cell wall organization, " "root hair cell differentiation, " "trichoblast differentiation, " "response to oxidative stress, " "unidimensional cell growth, " "protein phosphorylation, " and "root hair cell tip growth" were enriched; while only the process of "intracellular signal transduction" was shown significantly (P < 0.01) in the subset of 61 genes expressed only in RH (Table S10 in the Supplementary Material).

Co-Expression Network Construction and Module Identification in RH
MACCU program (Lin et al., 2011) was used to calculate the Pearson correlation coefficients of any two genes based on the 300 root-related arrays which were manually identified as previously described, and gene pairs with a threshold value of ≧0.83 were selected to build co-expression networks (Lin et al., 2011). The threshold value was selected for individual coexpression network mainly based on the GO enrichment analysis of genes involved in the network (Lin et al., 2011). Briefly, first a series of threshold values from 0.7 to 0.9 were employed to select gene pairs for co-expression networks. Then, we applied a series of GO enrichment analysis of the genes corresponding to individual co-expression network and looked for the threshold with the best enrichments of GO categories (P < 1E-03) among the input genes. Cytoscape (http://www.cytoscape.org) program was applied to visualize the co-expression relationships among genes and the tool of NetworkAnalyzer was employed to extract connected components (sub-network). In the present study, we mainly focused on the up-regulated genes in RH and on finding novel modules, when compared to the previous study. To this end, the co-expression analysis of the 635 upregulated genes previously identified was performed. Results showed that a network comprising of 122 nodes from 124 genes and 367 edges (correlations between genes) was constructed. This network can be divided into one large and 12 small components (sub-networks), with the large one consisting of 93 nodes from 94 genes and 349 edges ( Figure S1). Using MCODE program, two modules containing 20 and nine genes were extracted from  the large component, respectively (Figures 1A,B). GO analysis showed that processes of "plant-type cell wall organization, " "response to oxidative stress, " "oxidation-reduction process, " and "trichoblast differentiation" were enriched (P < 0.001) in module1; while only process of "trichoblast differentiation" was enriched in module 2 ( Figure 1C). To know whether this network and modules were presented in the present study, analysis of the 580 up-regulated overlapping genes showed that nearly the same co-expression network was found except that four genes were not included (nodes labeled in blue stars in Figure S1). Co-expression analysis was then performed on the 2172 up-regulated genes identified only in the present study. A subset of 264 out of 2172 genes (12%) was co-expressed at the cutoff of 0.83. This network contained 260 nodes from 264 genes and 589 edges, which can be divided into five large (>10 nodes), eight middle (3-10 nodes), and 20 small (2 nodes) sub-networks ( Figure S2). The largest sub-network contained 70 nodes from 74 genes, and the second largest sub-network contained 44 genes and the third one contained 29 genes, respectively ( Figure S2). The other two large sub-networks contained 20 and 16 genes, respectively ( Figure S2). GO analysis showed that processes of energy-related metabolism, such as "ATP synthesis coupled proton transport, " "photorespiration, " "response to salt stress, " "mitochondrial electron transport, ubiquinol to cytochrome c, " "response to cadmium ion, " and "proton transport" were enriched in the largest sub-network (Table S11 in the Supplementary Material). Other enriched processes were mainly related to stress responses such as cellular response to cold, cold acclimation, response to wounding, response to chitin, hyperosmotic salinity response, response to karrikin, and response to UV-B. These enriched processes were mainly distributed in the sub-networks of 3, 4, and 7 (Table S11 in the Supplementary Material). No (P = 1) and low (0.01 > P > 0.001) enriched processes were found in the sub-networks of 9, and 2, 5, 6, and 8, respectively (Table S11 in the Supplementary Material). Four functional modules were extracted from the network, which contains 12, 13, 7, and 6 nodes and various edges, respectively (Figure 2). GO analysis showed that processes of "glycogen biosynthetic process, " "photosynthetic electron transport in photosystem II, " "histone deacetylation, " "red, far-red light phototransduction, " "defense response signaling pathway, resistance gene-dependent, " and "ethylene biosynthetic process" were enriched in the module 1 (Figure 3), and processes of "ATP synthesis coupled proton transport, " "mitochondrial electron transport, ubiquinol to cytochrome c, " "purine nucleotide transport, " "oxidation-reduction process, " and "actin polymerization or depolymerization" were enriched in the module 2, respectively. In the module 3, besides the process of "ATP synthesis coupled proton transport" which was overrepresented, other processes of "glucose mediated signaling pathway, " "Golgi organization, " and "proton transport" were enriched. While signaling related processes of "small GTPase mediated signal transduction, " "photosynthesis, light reaction, " and "intracellular signal transduction" were enriched in the module 4 (Figure 3).

Analysis of Root Hair Regulatory Element in the Differentially Expressed Genes
Existence of Root Hair Regulatory Element (RHE) cis-element sequence "WHHDTGNNN(N)KCACGWH" (where W = A/T, H = A/T/C, D = G/T/A, K = G/T, and N = A/T/C/G) in the 5409 differentially expressed genes was investigated as previously described (Won et al., 2009). Screening within 3000 bp upstream of the start codon (Hereafter named as −3000 bp) resulted in 201 RHE hits from 194 genes, with few genes carrying two or more RHEs (Table S12 in the Supplementary Material). Among the 201 RHE hits, RHE patterns of "AAAGTGTAGAGCACGAT, " "ATCTTGGCTT TCACGTT, " and "TTCGTGAGTTTCAAATA" were relatively enriched. Subsequently, screening within introns identified 43 genes with one RHE in different intron positions (Table S13 in the Supplementary Material). Eighty nine genes were found to contain one RHE in the CDS (Encoding DNA Sequence) regions, with the sequences of "TCCATGGAAGTCACGAT, " and "TTTATGGCTGGCACGTA" being pronounced among the hits (Table S14 in the Supplementary Material). AT2G31350, encoding glyoxalase 2-5, was shown to contain two RHEs in −3000 bp region and the first intron, respectively ( Table 3). Four genes AT1G18460, AT1G18470, AT2G33320, and AT3G45530 were found to harbor one RHE in −3000 bp regions and another one in the CDS regions (Table 3). Another three genes AT3G19050, AT4G03500, and AT5G27680 were shown to carry RHEs in both introns and CDS regions but not in the -3000 bp region ( Table 3).

Identification of Conserved Root Epidermal Genes and Associated Co-expression Network
With the attempt to identify conserved root epidermal genes, the set of 208 "core" root epidermal genes was derived from previous report (Bruex et al., 2012), and was compared with the 5409 differentially expressed genes in this study. Comparison resulted in an overlap of 136 genes (Table S15 in the Supplementary Material), which could be grouped into nine clusters according to expression patterns ( Figure S3). One hundred and twenty three out of 136 genes were annotated as hair genes, but only 27 of the 123 genes carry RHEs in their promoters (Table S15 in the Supplementary Material). To obtain the conserved root epidermal gene-specific co-expression network, the 136 genes were loaded as baits with the rest of 5409 differentially expressed genes (preys) for subsequent co-expression analysis at correlation coefficient cutoff of 0.83. The final network composing of 122 nodes (genes) and 306 edges was generated after discarding edges only linked to two preys. Of the 122 genes, 50 and 72 were from baits and preys, respectively. This network can be further divided into one large and three small clusters ( Figure S4). GO analysis of the bait genes involved in the network showed that processes of "trichoblast differentiation, " "plant-type cell wall organization, " and "root hair elongation" were most enriched (Figure S4), while processes of "plant-type cell wall organization" and "oxidation-reduction process" were overrepresented in the prey genes ( Figure S5). One module was extracted from the network, which contains 15 nodes and 80 edges ( Figure 4A). GO analysis showed that processes of "plant-type cell wall  Frontiers in Plant Science | www.frontiersin.org FIGURE 4 | The module extracted from the "core" root epidermal gene associated co-expression newwork of the differentially expressed genes between root hairs (RH) and non-root hair tissues (NRH),with pearson correlation coefficient cutoff at 0.83 (A) and Gene Ontology (GO) enrichment analysis of the genes involved in modules (B). Genes in green color indicate bait genes from "core" root epidermal gene and genes in red color indicate prey genes identified in the present study. Red stars indicate root hair-related genes.
modification involved in multidimensional cell growth, " "planttype cell wall loosening, " and "trichoblast differentiation" were most enriched in this module ( Figure 4B).

Identification of Unannotated Transcripts
To identify previously unannotated transcripts which are differentially expressed between RH and NRH, we first assembled a new transcript on the basis of annotated transcript reference (TAIR10_GFF3_genes_gff) using Cuffmerge algorithm in the Cufflinks pipeline (Trapnell et al., 2012). Subsequently, the differentially expressed previously unannotated transcripts were analyzed using Cuffdiff program (Trapnell et al., 2012). Results showed that a total of 121 novel transcripts were identified, and 14 out of 121 unannotated transcripts were differentially expressed between RH and NRH, with transcripts XLOC_000763, XLOC_031361, and XLOC_005665 being the most expressed genes in RH (Table 4). XLOC_005665 is of particular interest, which was highly expressed in RH (Figure 5) and deduced a small peptide with 59 amino acids.

Discussion
Root hairs in Arabidopsis have been intensively studied in various respects and close to 100 genes involved in the cell fate determination and root hair formation have been identified, which provides numerous advantages for basic studies of development, cell biology, and physiology (Grierson et al., 2014). In the last decade, high-throughput transcriptome analysis, used as alternate approaches differing from traditional molecular genetic analysis, have been adopted extensively to explore genes potentially involved in root hair morphogenesis at genomewide in Arabidopsis (Birnbaum et al., 2003;Jones et al., 2006;Brady et al., 2007;Dinneny et al., 2008;Gifford et al., 2008;Bruex et al., 2012;Lan et al., 2013). In the current study, RNAseq data sets were re-analyzed by Tophat-Cufflinks pipeline, and several new aspects of root hair gene expression were presented. First, RNA-seq technique facilitated obtaining the global "digital" transcriptional information on root hair genes (Table S1 in the Supplementary Material). Of the 19,743 genes detected in RH, ERD14, RIN4, AT5G64401, and others were among the most abundant transcripts ( Table 1). ERD14 and its homologous ERD10 were previously isolated from a cDNA library of Arabidopsis plants hydrated for 1 h and induced by ABA treatment and dehydration (Kiyosue et al., 1994). In this study, both ERD14 and ERD10 were shown highly expressed in RH, and were up-regulated in RH compared to NRH ( Table 1 and Table S2 in the Supplementary Material). This suggests that ERD14 and ERD10 might be important in root hair morphogenesis or in response to abiotic stresses. RIN4 (RPM1-interacting protein 4) is first reported to interact with Pseudomonas syringae type III effector or molecules, and is required for RPM1-mediated resistance in Arabidopsis (Mackey et al., 2002). Further study showed that RIN4 can interact with AHA1 and AHA2 both in vitro and in vivo, thus regulating plasma membrane (PM) H(+)-ATPases activity. PM H(+)-ATPase activation/ inactivation can regulate the opening or closure of stomata, thereby controls bacterial entry into the leaf (Liu et al., 2009). AHA2 has been reported to be a major regulator controlling the rhizosphere acidification in response to Fe deficiency (Santi and Schmidt, 2009). Taken together, it is possible that RIN4 also plays important roles in root hair morphogenesis and response to Fe deficiency by regulating (PM) H(+)-ATPases activity mediated by AHA2. The third highest expressed gene in root hairs was AT5G64401 which encodes a small peptide with unknown function ( Table 1).
In previous study, a subset of 1617 genes showed differential expression between RH and NRH (Lan et al., 2013). In this study, the abundance of the 5409 genes was revealed to be changed significantly (Table S5) by Tophat-Cufflinks pipeline. Comparison of these two sets (5409 vs. 1617) resulted in an overlapped 1259 genes. We showed that additional 4150 genes were differentially expressed between RH and NRH. Genes like COBL9 (Jones et al., 2006) and RHS15 (Won et al., 2009;Bruex et al., 2012), which were reported to be required for or associated with root hair development and growth, were only determined in this study (Table 2). Moreover, 1/3 of cell-type patterning genes, such as ECTOPIC ROOT HAIR2 (ERH2/POM1), ECTOPIC ROOT HAIR3 (ERH3), GLABRA3 (GL3), ROOTHAIRLESS2 (RHL2), SCRAMBLED (SCM/SUB), TRANSPARENT TESTA GLABRA2 (TTG2), and WEREWOLF (WER), 63% (31 out of 49) of root hair morphogenesis-related genes (Grierson et al.,  2014), and 45% (five out of 11) of genes related to hormone action affecting root hair development (Grierson et al., 2014) have been identified as differentially expressed genes between RH and NRH (Table S5). This study well-complements and extends the previous study by adding new information on root hair genes' numbers and activity. Several highly up-regulated genes in RH, which were not reported previously, deserve further investigation.
Co-expression analysis, which is based on the concept that genes with coordinated expression pattern under diverse conditions are often functionally related (Eisen et al., 1998). This concept allows us to filter and select genes of unknown functions for experimental validation and functional predictions as their co-expression is related to genes of known functions (Aoki et al., 2007;Usadel et al., 2009). Not only did we identified modules from previous study (Figure 1 and Figure S1), but also revealed some new modules by the co-expression analysis of the subset of 2172 up-regulated genes in RH (Figure 2 and Figure S2). Results showed that only 12% (264 out of 2172) of the differentially expressed genes are involved in the network, and 589 relationships between genes were formed, suggesting that most of these genes are involved in diverse processes. GO analysis showed that genes associated with energy and stress related processes are enriched in the network (Table S11). This further indicates that root hair development and growth are sensitive to environmental stimuli and are energy-dependent. The conserved root epidermal genes, associated the co-expression analysis of 5409 genes, led to a network composed of 122 nodes (genes) and 306 edges ( Figure S3). Unexpectedly, in the module, only one gene was from preys (Figure 4A in red color) and another 14 genes, including EXP7, EXP18, RHS12, RHS13, and RHS19, were from core root epidermal genes. (Figure 4A). Since these core genes were verified to be required for root hair development and growth, therefore it can be suggested that this prey gene plays important roles in root morphogenesis ( Figure 4B). These results strongly encourage worth further investigation for those genes with unknown functions associated with the above mentioned networks.
The analysis of RHEs in the differentially expressed genes (5409) resulted in only 194 genes which carry one or two RHEs within the 3000 bp upstream of the start codon (Table S12). In an attempt to find whether such RHE localizes in other positions, we screened RHE in both introns and CDS regions. Subsets of 43 and 89 genes harboring one RHE have been hit, respectively (Tables S13, S14). Further analysis showed that only few genes carry RHE in introns and CDS, but none of them carry RHE within the three different types of positions (Table 3). Similarly, the previous study identified 154 out of 208 "core" epidermal genes in "H" position, namely root hair genes, but only 33 of them carry RHE (Bruex et al., 2012). These results suggest that regulatory elements, other than RHE, are probably involved in the transcriptional regulation of root hair gene expression.

Conclusions
In summary, using the currently popular RNA-seq analysis programs, we here provided genome-wide "digital" information on transcriptional expression of root hair genes. We detected additional 4150 genes that are differentially expressed between RH and NRH. We also identified 14 previously unannotated transcripts, which are also differentially expressed between RH and NRH. The findings in this study well-complement and extend the previous one. Some of the highly up-regulated genes in root hairs, which were not reported in the previous study, such as RIN4 (of known function) or AT5G64401 (of unknown function) are worth further study. Gene clustering and the root epidermal-specific co-expression analysis revealed some potentially important genes, such as AT5G04960, AT4G26010, and AT5G05500 probably function as putative novel players in root hair morphogenesis.

Data Collection and Processing
Transcriptomic data sets were downloaded from a public database (NCBI: SRA045009.1) and analyzed as previously described (Trapnell et al., 2009(Trapnell et al., , 2010. Microarray data of 2671 ATH1 arrays from the NASCarray database (http://affymetrix. arabidopsis.info/) were downloaded and normalized using the RMA function of the Affy package of the Bioconductor software. Three hundred root-related arrays were manually identified as previously described (Lin et al., 2011), and were used as a database for co-expression analysis.
To align the RNA-seq reads to the genome, we first generated a Bowtie2 index using TAIR10 genome and then run Tophat with the following options: -N 2 -read-gap-length 3 -read-editdist 3 -read-realign-edit-dist 0 -report-secondary-alignmentscoverage-search -microexon-search -library-type fr-unstranded -b2-sensitive. The resulting aligned reads were then used to create a RABT (Reference Annotation Based Transcript) assembly using Cufflinks. First, Cufflinks was run in the discovery mode aimed to identify previously unannotated transcripts. Assemblies both from RH and NRH were then merged into one file using Cuffmerge, using TAIR10_GFF3_genes_gff file as the reference annotation, resulting in a RABT assembly, used to quantify transcript abundance. Finally, transcript abundance (RPKM) and identification of differentially expressed genes was performed using Cuffdiff with default parameters (P < 0.05 and FDR cutoff of 0.05%) with the options: -N -u, corresponding to upper quartile normalization and multi-read-correct. Differential transcript abundance at all genes was calculated as the logarithm base-2 of the expression ratio (RPKM NRH /RPKM RH ).

Gene Ontology Analysis
GO enrichment analysis using the TopGo "elim" method (Alexa et al., 2006) was based on The Gene Ontology Browsing Utility (GOBU) as previously described (Lin et al., 2006). The elim algorithm iteratively removes the genes mapped to significant terms from higher level GO terms, and thus avoids the increase of unimportant functional categories.

Generation of Co-expression Networks Using the MACCU Toolbox
Gene co-expression networks were constructed on the basis of 300 publicly available root-related microarrays using the MACCU toolbox as previous report (Lin et al., 2011), with a Pearson correlation threshold of equal to or greater than 0.83 based on the GO enrichment analysis. The generated coexpression networks were visualized by Cytoscape (http://www. cytoscape.org), and the Cytoscape tool of NetworkAnalyzer was employed to extract connected components (sub-network).

Module Identification of Co-expression Networks
MCODE plugin in Cytoscape software was employed to extract functional modules as previous report (Rivera et al., 2010).
First, a vertex-weighting value was calculated based on the clustering coefficient, Ci [Ci = 2 * n/Ki * (Ki-1)], where Ki represents the node count of the neighborhood of node i; and n indicates the number of edges among the Ki nodes in the neighborhood. Next, the highest weighted vertex is set as a center point, seed of the region and search node j whose weight ratio (Wj/Wseed) was >0.1. Then, it filters the predicted complexes if the minimum degree of the graph is less than the given threshold and then constructs a module by deleting the searched node from the network. The top modules with a node count >5 were selected in the co-expression networks for GO enrichment analysis.
Figure S1 | Co-expression relationships of the 635 up-regulated genes in root hairs (RH) when compared to non-root hair tissues (NRH),with pearson correlation coefficient cutoff at 0.83. Bule stars indicate the genes requried for or associated with root hair development and growth. Figure S2 | Co-expression relationships of the 2172 up-regulated genes in root hairs (RH) when compared to non-root hair tissues (NRH),with pearson correlation coefficient cutoff at 0.83. Figure S3 | Hierarchical clustering analysis of changes in transcript abundance of 136 overlapping genes (Table S13 in the Supplementary Material) between 208 "core" root epidermal genes (Bruex et al., 2012) and 5409 differentially expressed genes in this study. Transcript abundance was defined as RPKM (Reads Per Kilobase per Millionmapped reads) in the root hairs (RH) and non-root hair tissues (NRH) with two biological repeats. Color key indicates the log2 transformed intensity, gray color which not in the color key indicates that the number is missing. Figure S4 | The "core" root epidermal gene associated co-expression newwork of the differentially expressed genes between root hairs (RH) and non-root hair tissues (NRH),with pearson correlation coefficient cutoff at 0.83. Genes in green color indicate bait genes from "core" root epidermal gene and genes in red color indicate prey genes identified in the present study. Figure S5 | Gene Ontology (GO) enrichment analysis of the bait and prey genes involved in the the "core" root epidermal gene associated co-expression newwork.