Co-expression Networks From Gene Expression Variability Between Genetically Identical Seedlings Can Reveal Novel Regulatory Relationships

Co-expression networks are a powerful tool to understand gene regulation. They have been used to identify new regulation and function of genes involved in plant development and their response to the environment. Up to now, co-expression networks have been inferred using transcriptomes generated on plants experiencing genetic or environmental perturbation, or from expression time series. We propose a new approach by showing that co-expression networks can be constructed in the absence of genetic and environmental perturbation, for plants at the same developmental stage. For this, we used transcriptomes that were generated from genetically identical individual plants that were grown under the same conditions and for the same amount of time. Twelve time points were used to cover the 24-h light/dark cycle. We used variability in gene expression between individual plants of the same time point to infer a co-expression network. We show that this network is biologically relevant and use it to suggest new gene functions and to identify new targets for the transcriptional regulators GI, PIF4, and PRR5. Moreover, we find different co-regulation in this network based on changes in expression between individual plants, compared to the usual approach requiring environmental perturbation. Our work shows that gene co-expression networks can be identified using variability in gene expression between individual plants, without the need for genetic or environmental perturbations. It will allow further exploration of gene regulation in contexts with subtle differences between plants, which could be closer to what individual plants in a population might face in the wild.


INTRODUCTION
Understanding how transcriptomes are regulated is key to shed light on how plants develop and also respond to environmental fluctuations. A powerful tool often used to reveal transcriptional regulation at a genome-wide level is gene co-expression networks (Stuart et al., 2003;Serin et al., 2016). In gene co-expression networks, genes that co-vary in expression in different conditions are detected and paired together (Usadel et al., 2009;Ruan et al., 2010;Contreras-López et al., 2018). By doing this for the entire transcriptome, a multitude of genes can be linked, indicating a similar gene regulation. Communities of genes, called modules, that are more closely linked can then be identified (Mao et al., 2009). The presence of genes in a given module indicates a close co-regulation and is usually the starting point to look for their implication in the same pathway, or their regulation by the same transcription factor(s) [TF(s)] (Aoki et al., 2007;Zheng et al., 2011;de Luis Balaguer et al., 2017;Liu et al., 2019). Most studies using co-expression networks can be separated into two categories: targeted analyses that use only a subset of genes (selected based on their function or transcriptional regulation) or specific genetic/environmental perturbations and global analyses that make use of hundreds or thousands of transcriptomes performed in various conditions, often publicly available ones, and do not select genes based on their function prior to the coexpression analysis. Co-expression networks are now commonly used in a variety of work in plant research and have allowed the identification or prediction of new genes and TFs involved in development (Xie et al., 2015;Silva et al., 2016;de Luis Balaguer et al., 2017), in metabolic pathways (Wisecaver et al., 2017), and in response to biotic and abiotic stresses (Prasch and Sonnewald, 2013;Shaik and Ramakrishna, 2013;Amrine et al., 2015;Sharma et al., 2018;Liu et al., 2019). It has also been proposed that the topology of the co-expression network and position of genes in the network can be of interest in itself to identify genes involved in natural diversity in development and in the response to environment (Ichihashi et al., 2014;Des Marais et al., 2017).
One limit of gene co-expression networks is that they only provide information about correlation in expression but do not indicate the direction and type of relationship between genes that are co-expressed. In order to define which genes are TFs that regulate the expression of other genes in the network, additional types of data should be used or integrated (Rao and Dixon, 2019). These additional data can be, for example, ChIP-seq (Chen et al., 2018;Kulkarni and Vandepoele, 2019) that provides the list of targets of a given TF, protein-protein interaction (He et al., 2010;Kulkarni and Vandepoele, 2019), as well as the presence of TF binding motifs in the promoter of genes (Vandepoele et al., 2009;Ma et al., 2013). Another limit is that genes should exhibit changes in expression between the different samples used for the analysis in order to detect co-expressed pairs of genes. This is usually achieved by using genetic and/or environmental perturbations in order to cause changes in the transcriptome regulation. However, these perturbations often have large effects, and it can be time-consuming and challenging to produce the large number of samples required. In order to analyze gene regulation in a biologically context that is more relevant, more subtle changes in expression might be preferred. This could be achieved by using milder genetic or environmental perturbations. Another option would be to analyze changes in expression that occur in the absence of any genetic or environmental perturbation (Bhosale et al., 2013). This can be possible in theory as widespread differences in gene expression levels have been observed between genetically identical plants, in the absence of any environmental perturbation (Hall et al., 2007;Jimenez-Gomez et al., 2011;Shen et al., 2012;Mönchgesang et al., 2016;Cortijo et al., 2019). The idea is to use this variability in gene expression to find potential co-regulation. In mammals, variability in gene expression between single cells of the same cell type has been used to identify co-expression patterns for genes that show a high level of gene expression variability between cells (Mantsoki et al., 2016). Moreover, gene co-expression networks have been inferred using transcriptomes of individual plant leaves, after removing in silico the genotype, environment, and genotype × environment effects on gene expression (Bhosale et al., 2013). The modules identified in this network were functionally relevant, and this study allowed the identification of a new regulator of the jasmonate pathway (Bhosale et al., 2013). It thus shows that the analysis of gene expression regulation can be as powerful in the absence of genetic and environmental fluctuation. However, the first step of the study of Bhosale et al. was to remove in silico the genotype, environment, and genotype × environment effects on gene expression, as the transcriptomes were performed on plants from different genotypes, as well as plants that were grown in different research laboratories. It is thus not clear if co-expression networks can be identified in plants from transcriptomes performed in the absence of genetic and environmental perturbation.
We thus decided to test if it is possible to infer gene co-expression networks using transcriptomes generated on single plants in the absence of any genetic and environmental perturbation. In particular, we wanted to define if such a network would provide different information compared to a network using environmental perturbation. Finally, we wished to determine if modules that would be detected in such a network would have functional relevance. In order to answer these questions, we took advantage of the existence of a set of published transcriptomes carried out on single seedlings of the same genotype that were grown in the same environmental conditions (Cortijo et al., 2019). In this dataset, multiple genetically identical seedlings had been harvested at several time points during a day/night cycle. Differences in expression between seedlings in each time point were previously observed for many genes in this dataset. In particular, 8.7% of the genes in this dataset have been identified as highly variable genes (HVGs), as their expression was statistically more variable between seedlings than the rest of the transcriptome. Using this dataset, we were able to infer co-expression networks in the absence of genetic and environmental perturbations. Based on enrichment in a module for genes involved in flavonoid metabolism, we speculated that AT4G22870, a 2-oxoglutarate (2OG) and Fe(II)-dependent oxygenase, could also have a role in flavonoid metabolism. Finally, we identified new targets for the transcriptional regulators PHYTOCHROME INTERACTING FACTOR 4 (PIF4), GIGANTEA (GI), and PSEUDO-RESPONSE REGULATOR 5 (PRR5).

Co-expression Networks Can Be Inferred Using Expression Variation Between Individual Seedlings
Co-expression networks in plants are normally inferred using transcriptomes obtained from pools of plants, using genetic or environmental perturbations in order to identify genes that co-vary in expression between these conditions. In order to define if co-expression networks can be inferred from expression measurements obtained from single seedlings in the absence of genetic and environmental perturbations, we used the previously published dataset of transcriptomes generated on single seedlings grown in the same environment. This dataset contained a total of 14 seedlings per time point, for 12 time points spanning a 24h day/night cycle (Cortijo et al., 2019). Widespread differences in expression levels have been observed between seedlings in this dataset, which is a prerequisite to be able to infer a co-expression network (Usadel et al., 2009;Ruan et al., 2010;Contreras-López et al., 2018) (Figure 1A and Supplementary Figure S1). We first detected co-expressed genes in each time point, by measuring Spearman correlation for each pair of genes in profiles of expression in the 14 seedlings of this time point. In order to keep robust correlations in the final network, we used a Benjamini-Hochberg correction with a false-discovery rate of 10% to keep FIGURE 1 | Inference of gene co-expression networks in absence of genetic and environmental perturbations. (A) Description of co-expression network inference using transcriptomes performed on single seedlings. Transcriptomes were generated for a total of 14 seedlings per time point, with 12 time points spanning a day/night cycle over 24 h. In each time point, genes with correlated expression profiles in the 14 seedlings were identified. The co-expression network was inferred based on pairs of genes significantly correlated in at least four consecutive time points. Finally, modules in the network, which consist of groups of genes that are densely connected, were detected. the most significant correlations and then selected edges of the network that are detected in at least four consecutive time points, with one gap allowed (section "Materials and Methods"). Using this approach, we find a total of 4715 edges, connecting 1729 genes in this network, from now on referred to as the variability network. The number of edges detected for each time point varies from 787 to 3221, with a higher number of edges being detected at the end of the day and the beginning of the night ( Figure 1B). We then used the Louvain community detection algorithm in order to identify modules of genes that are densely connected in the network (Blondel et al., 2008). In total, we identified 153 modules (Supplementary Table S1), containing between two and 334 genes, with most of the modules only composed of two genes ( Figure 1C). To test the robustness of the variability network, we also selected significantly correlated edges that are in at least three consecutive time points and compared the detected modules in both networks (Supplementary Figure S2). Similar modules with a similar overall connectivity between them are found in the two networks, which confirms the robustness of the modules in our original network. Modules in the network based on four consecutive time points are smaller. In some cases, several modules of the network based on four consecutive time points correspond to a single module in the network based on three consecutive time points and these smaller modules have differences in several features (Supplementary Figure S2). That is why we decided to focus our analysis on the network obtained when selecting edges present in four consecutive time points, and in particular for the 28 modules of this network containing five genes or more.
First, we analyzed the number of edges at each time point throughout the time course for each module ( Figure 1D and Supplementary Figure S3). In most modules, the edges are distributed non-uniformly across the 12 time points. Some exhibit a larger number of edges during the day or the night, while in other modules, a larger number of edges are observed at the transitions from night to day, or from day to night. It indicates that genes in these modules are co-regulated at some moments of the day/night cycle but not at others. While for some modules, this is linked to the genes being more expressed at these same times of the day (module 1 for example), this is not the case for other modules in which genes are expressed throughout the time course (module 12 for example; Figure 2). Most of the edges in module 1 are observed during the day (Figure 1D), and we were able to confirm co-expression of genes in this module by doing an RT-qPCR in a replicate experiment for a few genes in this module. In this replicate experiment, we find a very high correlation during the day (ZT6) and a lower correlation during the night (ZT14) (Supplementary Figure S4). On the other hand, most edges in module 21 are observed during the night ( Figure 1D). We also find in a replicate RT-qPCR experiment that genes of module 21 were more correlated during the night than during the day (Supplementary Figure S4). These results confirm that the co-expression of genes in modules of the variability network, and also the differences in co-expression between the day and night, can be reproduced in a replicate experiment. Moreover, we find that modules with a high percentage of edges during the night are more connected to one another than with modules for which most edges are observed during the day, and vice versa ( Figure 3A). We can measure this assortativity of the network (i.e., the tendency of similar nodes to be connected to each other) through the Pearson correlation of the daytime-specific edge percentages of connected modules (Pearson correlation = 0.4573, p-value = 0.043). This result shows that modules that are more connected to one another are more similar, at least for this feature, indicating that the community detection in the network worked well and provides modules that are relevant.
Since high gene expression variability between genetically identical plants was previously observed in the transcriptome dataset we used to infer the variability network (Cortijo et al., 2019), we tested if the network is enriched in HVGs. We find a total of 477 HVGs in the network, that is, 27.6% of all genes in the variability network. This is higher than the 8.7% of HVGs that were detected in the full transcriptome dataset (Cortijo et al., 2019). This result suggests that most of the genes in the variability network do not have to display a high level of gene expression variability to be able to detect co-expression between individual seedlings. We find that most modules are either strongly enriched in HVGs, or strongly depleted in HVGs, with only a few modules containing around 27% of HVGs (Table 1 and Supplementary Figure S5a). Modules 37, 43, and 66 for example are only composed of HVGs, while a total of eight modules do not have a single HVG. This result suggests that HVGs can co-vary in expression and are potentially coregulated. It also suggests that HVGs are not likely to co-vary in expression with non-variable genes. To test if this result could indicate a bias in the method used to construct the variability network and detect modules, we analyzed expression levels in single seedlings for genes in modules with high or low percentage of HVGs (Supplementary Figure S5b). We find that modules with high or low percentage of HVGs have different expression profiles in the seedlings, indicating an absence of bias. Moreover, modules with a high percentage of HVGs tend to be more connected to one another than with modules containing a low percentage of HVGs, and vice versa (Pearson correlation = 0.6896, p-value = 0.0007683, Figure 3B).
Our results show that gene co-expression networks can be inferred in the absence of genetic or environmental perturbation. Moreover, genes do not need to show a high level of gene expression variability between seedlings to be integrated in the network.

Additional Gene Co-expression Is Identified in the Variability Network Compared to a Network Inferred From Pools of Plants
Next, we decided to test whether the co-expression network based on the variability of expression between genetically identical plants grown in the same environment is different from what is the standard practice in the field. Usually, coexpression networks are inferred by comparing transcriptomes obtained from pools of plants experiencing an environmental or genetic perturbation. Given that our dataset contains data for several time points throughout a day/night cycle, this FIGURE 2 | Expression profiles throughout the time course for genes in each module with 5 genes or more, using the average expression of the fourteen seedlings for each time point. Each line represents the normalized expression (z-score) for one gene. Modules are ordered by the percentage of genes in the averaged time course network (high to low). Modules highlighted in blue contain 50% or more of genes that are also in the averaged time course network. Modules highlighted in red contain 15% or less of genes that are also in the averaged time course network. would correspond to comparing the average expression of the 14 seedlings for each time point and exploiting changes in expression happening during the time course. We used this strategy to infer a co-expression network, referred to as the averaged time course network, that allows the identification of co-expression throughout the time course and is the closest to standard practices using our dataset. Using this approach, we find a total of 9332 edges, connecting 3861 genes in the averaged time course network. A total of 524 genes of this averaged time course network are also present in the variability network, that is, 30% of the genes in the variability network ( Table 2). Only 35 edges are shared between the two networks. This result shows that the majority of the genes and edges present in the variability network are not detected in this dataset using a classical approach with pools of plants.
We find that between 0 and 87.5% of genes in modules of the variability network are also in the averaged time course network, with most of the modules having between 20 and 50% of genes also present in the averaged time course network ( Table 2). The modules with the highest percentage of genes also in the averaged time course network are modules 71 (87%: seven out of eight genes) and 13 (76%: 59 out of 77 genes). We find that genes in these modules have very similar and clear expression profiles throughout the time course (Figure 2).   This is also the case for all modules with at least 50% of genes in the averaged time course network (modules highlighted in blue in Figure 2). This result could suggest that the reason why these modules contain many genes also present in the averaged time course network is because their genes have very similar expression patterns throughout the day/night cycle. On the other hand, several modules that only have 15% or less of genes present in the averaged time course network are composed of genes without clear expression patterns during the time course. These results show that additional gene coexpression is identified in the variability network compared to the averaged time course network. Most importantly, using gene expression in single seedlings, co-expression between genes can be detected even in absence of expression patterns throughout the day/night cycle.

Modules Identified in the Variability Network Are Functionally Relevant
In order to define if the modules identified in the variability network are functionally relevant, we performed a Gene Ontology (GO) enrichment analysis. We find that some of the modules have strongly enriched GO (Supplementary Table S2). For example, module 8 is enriched in multiple GO related to photosynthesis. In particular, 33 genes out of the 41 in this module are members of photosystem I or II, or of the light harvesting complex (Figure 4A and Supplementary Table S3). Other genes in this module also have functions related to photosynthesis: CURT1A is required for a proper thylakoid morphology (Pribil et al., 2018), while RBCS1A, RBCS3B, and RCA are members of the Rubisco or necessary for Rubisco light activation (Izumi et al., 2012;Carmo-Silva and Salvucci, 2013). Most edges of module 8 are observed at the transition between night and day. This module contains 51% of genes that are also in the averaged time course network, which could be expected as most of the genes have very similar expression patterns throughout the day/night cycle. In particular, all genes of this module present in photosystem I, II, or the light harvesting system have the same expression profile with a peak of expression at dawn and the beginning of the day, while other genes have different expression profiles with a peak of expression at the beginning of the day and another one during the night Genes of the module are color coded depending on their role in the glucosinolate pathway: biosynthesis (turquoise), transport (green), or regulation (orange). Genes previously identified as co-expressed with glucosinolate biosynthesis genes are also indicated (gray). On the right side, the glucosinolate biosynthesis pathway is shown with an indication of the number of genes present in module 1 at each step of the pathway. Figure S6a). Also, we find that these other genes are at the periphery of module 8 (Supplementary Figure S6b), which highlights that these genes are less well correlated with the dense core of highly correlated photosystem genes in the center of the network. Another module enriched in GO related with photosynthesis is module 37 (Supplementary Table S2), in which nine out of the 12 genes are chloroplast genes, some being present in photosystem I or II, in the Cytochrome b6/f complex, or in the ATP synthase ( Figure 4A and Supplementary Table S3). Genes in module 37 are mainly expressed at the beginning of the day. These results suggest that the expression of genes involved in photosynthesis is co-regulated, not only over time but also between plants at a given time.

(Supplementary
Module 71 is enriched in GO related to DNA packaging (Supplementary Table S2) and is in fact only composed of histones, including two variants of H2A, two variants of H2B, and H3.1 (Supplementary Table S3). None of the genes in this module are HVGs, and seven out of eight genes are also present in the averaged time course network.
Module 1 is enriched in GO related to glucosinolate (Supplementary Table S2). We find that 16 out of 19 genes of the module are in the glucosinolate biosynthesis pathway, transporters of glucosinolate, or TFs regulating the pathway (Figure 4B and Supplementary Table S3). All genes in module 1, except one, were previously identified as co-expressed, in a previous study of the glucosinolate pathway (Wisecaver et al., 2017). Among the genes that are not known to be involved in glucosinolate biosynthesis, but are co-expressed with it, AKN2 is regulated by the MYB TF also regulating the glucosinolate pathway (Yatusevich et al., 2010). AKN2 is involved in sulfate assimilation, which is linked to glucosinolate metabolism (Yatusevich et al., 2010). It thus makes sense that AKN2 is coexpressed with genes of the glucosinolate biosynthesis pathway. Most edges of module 1 are observed during the day, which is when the genes in the module are more expressed. Also, 15 out of the 19 genes of the modules are HVGs.
Module 43 is enriched in GO related to flavonoid metabolism (Supplementary Table S2), with six out of eight genes shown to be involved in flavonoid biosynthesis (Saito et al., 2013). Among the other genes, AT4G22870 has not been shown to be involved in the flavonoid pathway, and our result suggests that it might have a role in this pathway. It is a protein of the 2OG and Fe(II)dependent oxygenase superfamily. We also find that all the genes in module 43 are HVGs. Most edges of the module are observed during the day, and the genes in the module have very similar expression patterns throughout the time course with a peak of expression at the beginning of the day and another one at the end of the day. However, none of the genes in module 43 are also present in the averaged time course network.
Overall, we find that several modules in the variability network are functionally relevant, with modules showing enrichment for functions such as photosynthesis, DNA packaging, and glucosinolate or flavonoid metabolism, even in the absence of genetic and environmental perturbations. Moreover, we could identify a potential role in the enriched pathways for some genes, based on their co-expression with other genes in the same module.

Identification of New Targets for GI, PIF4, and PRR5
To go further in the functional analysis of the modules, we looked for enrichment of targets of transcriptional regulators in the modules. We focused on transcriptional regulators for which ChIP-seq was performed under similar conditions (seedlings grown in day/night cycles) and for which a list of target genes have been previously published (Pfeiffer et al., 2014;Zhang et al., 2014;Liu et al., 2016;Nohales et al., 2019). We define, as targets, genes that are in proximity to regions where transcriptional regulators are binding to the DNA (ChIP-seq peaks), without considering if they are misregulated in the mutant for the transcriptional regulator. This way, we identified an enrichment in modules for targets of the TFs SPL7, PIF4, and PRR5, and of the transcriptional regulator GI (Supplementary Table S4). For example, all 41 genes in module 8 are SPL7 targets (Supplementary Table S4) . This is significantly more compared to the entire network in which 244 genes are SPL7 targets (14%). SPL7 targets have been previously shown to be enriched in multiple GO terms, including photosynthesis , in agreement with the predominant role in photosynthesis of genes in this module 8.
Targets for GI have been identified genome-wide, even if it does not bind directly to the DNA. We find that six out of seven genes in module 64 are targets of GI (Supplementary Table S4 and Figure 5A). This is more compared to the entire network in which 394 genes are GI targets (22%). To explore more in detail GI binding at the genes in module 64, we downloaded the ChIP-seq data, mapped it on the Arabidopsis thaliana genome, and looked at the ChIP-seq signal for GI at all the seven genes of module 64 (Nohales et al., 2019). We find a strong signal for the GI ChIP-seq at the promoter of the six genes that were already identified as GI targets ( Figure 5A). Interestingly, the signal for GI at the seventh gene, AT1G03630, not previously described as a GI target, is equally strong at the promoter of the gene (Figure 5A). This result indicates that AT1G03630 is also a target of GI, even if it has not been previously identified as such. AT1G03630, or PORC, encodes for a protein with protochlorophyllide oxidoreductase activity that is NADPH-and light-dependent (Gabruk et al., 2015).
Another TF with enriched targets in some modules is PIF4. We find that three out of the five genes (60%) in module 86 are PIF4 targets (Supplementary Table S3 and Figure 5B), while only 305 genes in the full network are PIF4 targets (17.6%). To explore more in detail PIF4 binding at the genes in module 86, we downloaded the ChIP-seq data, mapped it on the A. thaliana genome, and looked at the ChIP-seq signal for PIF4 at all the five genes of module 86 (Pfeiffer et al., 2014). We observe a strong signal for the PIF4 ChIP at the promoters of the three known targets in module 86 ( Figure 5B). We also see a clear signal for the PIF4 ChIP for the two other genes in module 86, AT4G26542 and AT5G55730, suggesting that they are also targets of PIF4 ( Figure 5B). AT4G26542 is an anti-sense transcript for AT4G26540. AT5G55730 (FLA1) encodes a fasciclin-like arabinogalactan-protein 1 (Johnson et al., 2011).
Finally, we find an enrichment for PRR5 targets in modules 8 and 21 with, respectively, 78 and 70% of genes in the module that are PRR5 targets (Liu et al., 2016). For comparison, 27% of all genes in the network are PRR5 targets. To explore more in detail PRR5 binding at the genes in module 8, we downloaded the ChIP-seq data, mapped it on the A. thaliana genome, and looked at the ChIP-seq signal for PRR5 at all the 41 genes of module 8 (Nakamichi et al., 2012). We find a strong ChIP-seq PRR5 signal at the 32 target genes in module 8, and a similarly strong signal for most of the other nine genes in the module that were not listed as a PRR5 target (Supplementary Figure S7a). In order to look for PRR5 targets, and to expand the analysis to other modules, we re-identified peaks for the PRR5 ChIPseq and looked for PRR5 targets in the modules with a high proportion of already described PRR5 targets (Supplementary Table S4). This way, we identified five additional PRR5 targets in module 8, and nine additional PRR5 targets in module 21. When combining the PRR5 targets from both analyses, the total percentage of PRR5 targets is 90% in module 8, and 83% in module 21 (Supplementary Figure S7b). These results suggest that most, if not all, genes in modules 8 and 21 are in fact PRR5 targets. PRR5 is a core component of the circadian clock. To test if genes in modules 8 and 21 might be regulated by the circadian clock, we analyzed other TFs involved in the circadian clock: PRR7 (Liu et al., 2016), PRR9 (Liu et al., 2016), TOC1 (Liu et al., 2016), LUX (Ezer et al., 2017), and CCA1 (Kamioka et al., 2016). We found the strongest enrichment in modules 8 and 21 for PRR7 targets (53 and 32%, when 3.9% of all genes are PRR7 targets). On the other hand, none of the modules were strongly enriched in targets for CCA1 or TOC1 (Supplementary Table S4). We did not analyze enrichments for targets of PPR9 and LUX as only nine targets of PRR9 and two targets of LUX are present in the network. This result suggests that modules 8 and 21 could be regulated by several PRR TFs, but probably not by other circadian clock TFs.
Overall, we find that some modules are enriched for targets of several transcriptional regulators and that this enrichment can be used to identify additional targets for the transcriptional regulator in the modules showing enrichment for its targets.

DISCUSSION
In this work, we have analyzed gene co-expression networks inferred using expression data generated in the absence of genetic and environmental perturbations. To do this, we made use of an already published dataset of transcriptomes performed on single seedlings that were grown in the same environment. We showed that genes do not need a high level of gene expression variability between seedlings to be able to integrate them in the network (Table 1). Moreover, we find that modules identified in this network are biologically relevant, as several are strongly enriched in GOs (Figure 4) and in targets of transcriptional regulators (Figure 5). Based on these enrichments, we speculated that AT4G22870 could also have a role in flavonoid metabolism and identified new targets for the transcriptional regulators GI, PIF4, and PRR5. We find that it is possible to infer gene co-expression networks using transcriptomes of genetically identical plants of the same age grown in the exact same environment. This is in agreement with previous work, where co-expression networks have been inferred on transcriptomes generated on individual plants and for which genetic and environmental effects have been removed in silico (Bhosale et al., 2013). We also find an interesting topology of the network with some modules more connected with one another and that connected modules share similar characteristics in terms of percentage of edges detected during the day or night, and percentage of HVGs (Figure 3 and Supplementary Figure S8). We observe that modules have either a high or low percentage of HVGs, but rarely a mix of HVG with non-HVGs. This suggests that some pathways are more variable than others. We find that, in general, modules with genes involved in the response to the environment are also composed of a high percentage of HVGs. This is the case, for example, for module 37, enriched in photosynthesis (100% of HVGs); module 43, enriched in flavonoid metabolism (100% of HVGs); or module 1, enriched in glucosinolate metabolism (78% of HVGs). Flavonoids are secondary metabolites and have been shown to be involved in many biotic and abiotic responses in plants (Tohge et al., 2017). Glucosinolates are involved in the response to pathogens (Burow et al., 2010). In agreement with our observation, previous work showed that HVGs are usually involved in the response to the environment (Newman et al., 2006;Yin et al., 2009;Hirao et al., 2015;Gasch et al., 2017;Cortijo et al., 2019). In particular, plant-to-plant variability has already been observed for glucosinolates (Mönchgesang et al., 2016), showing that the variability in expression we observe for genes involved in this pathway can lead to differences in glucosinolate content between plants.
Like for Bhosale et al. (2013), we find that the modules of the network identified in the absence of genetic and environmental perturbation are biologically relevant and can be used to speculate new gene function or regulation. We only explored the function for the most obvious GO enrichment in modules as GO can be sparse for some functions and many genes do not have a GO. For example, module 43 is enriched in genes involved in the flavonoid pathway. We speculate that AT4G22870, a member of this module, is also involved in the flavonoid pathway. To support our suggestion, AT4G22870 codes for a protein of the 2OG and Fe(II)-dependent oxygenase superfamily, and three 2OG-and ferrous iron-dependent oxygenases have been previously shown to be involved in flavonoid biosynthesis (Saito et al., 2013). Most importantly, this new potential candidate gene could not have been detected by analyzing the network inferred using day/night environmental fluctuations as none of the genes in module 43 are also present in the averaged time course network.
We find several modules with enrichment for genes involved in photosynthesis, particularly modules 8 and 37. The main distinction between these two modules is that module 8 is composed of genes from the nuclear genome, while module 37 is mainly composed of genes from the chloroplast genome. Our approach was not designed to specifically identify and separate genes from different organelles, suggesting that genes from the nuclear and chloroplast genomes involved in photosynthesis vary differently in expression between seedlings. Our result is in agreement with the fact that organelle functional modules can be detected in A. thaliana (Penga et al., 2016). However, genes that are not from the nuclear genome are usually ignored in network analysis, and it would be of interest to integrate them in the future.
Finally, we identified enrichment for targets of the transcriptional regulators GI, PIF4, and PRR5 in different modules and used this enrichment to highlight new targets. We find that in most cases, when a module is enriched in targets for a transcriptional regulator, the remaining genes of that module are also targets of this regulator. By reanalyzing the ChIP-seq data for PRR5, we could increase the percentage of targets in modules already showing a strong enrichment. This result shows the double interest of combining co-expression networks with ChIP-seq data (Chen et al., 2018;Kulkarni and Vandepoele, 2019). On the one hand, ChIP-seq data add information about the regulation of genes in the co-expression network. On the other hand, the co-expression network is a good way to focus on some of the targets of the TF to better understand their regulation and also to detect extra targets. In the case of PRR5, we find that 90% of the genes in module 8 are targets of this TF. Genes in module 8 are involved in photosynthesis. This is in agreement with the fact that the circadian clock, of which PRR5 is a core member, has been shown to regulate the photosynthesis (Harmer et al., 2000;Schaffer et al., 2001;Dodd et al., 2014).
The functional characterization of the network has been restricted to some modules with obvious GO enrichments and to transcriptional regulators for which ChIP-seq data and lists of targets were available and performed in similar conditions. However, this network, being the first to be performed in the absence of genetic and environmental fluctuation, could bring further information on other pathways we have not explored in this paper. Moreover, our approach could reveal co-regulations that might not be detected using environmental perturbations, as shown by the fact that the variability network provided additional co-expression relationships that were not detected in a network inferred on the same dataset using expression fluctuations caused by the day/night cycle. That is why we encourage readers to look at the modules for their genes or pathway of interest and have developed an interactive website where readers can do so 1 .
We show that most genes in the network are not HVGs ( Table 1), showing that high gene expression variability between seedlings is not needed to be able to detect co-expression. These results indicate that we are not in the presence of random fluctuation in expression, or noise, but that pathways are slightly differently regulated in individual seedlings even if the plants are in the same environment. Our approach uses these small differences between seedlings that might be caused by microenvironmental fluctuation or a different state of the plant caused by internal factors. It indicates that plants are very sensitive to minor changes in their environment and that we could harness this sensitivity to better understand gene expression regulation. Phenotypic differences have been observed between genetically identical plants grown in the same environment (Paxman, 1956;Sakai and Shimamoto, 1965;Hall et al., 2007;Forde, 2009;Jimenez-Gomez et al., 2011;Mönchgesang et al., 2016), indicating that the changes in expression of pathways we highlight here might be physiologically relevant . It shows that it is not necessary to perform experiments in very different environmental conditions to identify co-expression networks that could be relevant to the studied pathway. This is particularly true for crops growing in outdoor fields, like very recently shown in maize (Felipe Cruz et al., 2020). Strong fluctuations (mutants, over-expressors, and environmental fluctuations) could potentially affect a big part of the transcriptome that could mask some co-expressions of interest showing the usefulness of our approach in some contexts. Our work shows the interest in harnessing gene expression variability between genetically identical individual plants in order to better understand gene regulation in a context where differences between plants are not known and probably very subtle.

Transcriptome Data
The transcriptomes we used were already published [GSE115583 (Cortijo et al., 2019)] and performed on single seedlings, for a total of 14 seedlings per time point every 2 h over a 24-h cycle. Expression levels and corrected variability levels for all genes were downloaded from https://jlgroup.shinyapps.io/AraNoisy/, as these data had already been corrected as previously described (Cortijo et al., 2019).

Variability Network
For each of the 12 time points (0, 2, 4, . . . 22 h), we calculated the Spearman correlation between every pair of genes, using their expression profiles across the 14 seedlings ( Figure 1A). Using a Benjamini-Hochberg correction with a false discovery rate of 10%, the most significant correlations were selected and further filtered by only considering those for which a significant correlation appeared in four consecutive time points (with one gap allowed, e.g., 8, 10, 14, and 16 h). These correlations form the edges of the variability network. We also calculated a version of the network using a filter that only required three consecutive time points and calculated network modules using the same community detection algorithm. As can be seen in Supplementary Figure S1, similar modules with a similar overall connectivity between them are found, which confirms the robustness of the modules in our original network. All network analysis was carried out using the Python NetworkX and pythonlouvain libraries.

Averaged Time Course Network
For the averaged time course network, we calculated the mean expression across all seedlings for every time point, generating a time series of average expression for every gene. We again calculated the Spearman correlations for every pair of genes and generated a network by applying the Bonferroni correction as a (highly conservative) significance cutoff. This yielded a network that was similar in size to the variability network. All network analysis was carried out using the Python NetworkX and pythonlouvain libraries.

Community Detection
The Louvain community detection algorithm (Blondel et al., 2008) was used to identify modules in the networks. This algorithm attempts to maximize the modularity of the network by searching the space of network partitions. Due to the size of the search space, it is unable to find the global maximum. The composition of modules may therefore (as with most community detection algorithms) vary somewhat between runs of the algorithm.

RT-qPCR
Col-0 WT A. thaliana seeds were sterilized, stratified for 3 days at 4 • C in the dark, and transferred for germination on solid 1/2X Murashige and Skoog (MS) media at 22 • C in long days for 24 h. To reduce the level of phenotypic variation between plants, we selected the seeds that were at the same stage of germination with a binocular microscope and transferred them into a new plate containing solid 1/2X MS media. Seedlings were grown in a conviron reach-in cabinet at 22 • C and 65% humidity, with 12 h of light (170 µmoles) and 12 h of dark. After 7 days of growth, 16 individual seedlings were harvested at ZT6 and at ZT14 into a 96-well plate and flash-frozen in dry ice. All seedlings harvested in a given time point were grown in the same plate. Total RNA was isolated from one seedling. We assessed RNA concentration using Qubit RNA HS assay kit. cDNA synthesis was performed on 700 ng of DNase-treated RNA using the Transcriptor First-Strand cDNA Synthesis Kit. RT-qPCR analysis was performed in the LightCycler 480 instrument using LC480 SYBR green I master, on 0.4 µl of cDNA in a 10-µl reaction. Gene expression relative to two control genes (SandF and PP2A) was measured (see Supplementary Table S5 for the list of primers used for RT-qPCR).

GO Term Enrichment
We used the Ontologizer (Bauer et al., 2008) command line tool with Bonferroni multiple-hypothesis correction to perform GO term enrichment analysis of the network modules. Only the significantly enriched non-redundant GO are shown.
Reads were aligned to the TAIR10 genome using Bowtie2 (Langmead et al., 2009) and Picard tools were used to remove potential optical duplicates 2 . Peak calling was performed using MASC2 (Zhang et al., 2008), with the corresponding INPUT used as a reference. The Integrative Genomics Viewer (IGV; Robinson et al., 2011) was used to show snapshots of ChIP-seq signal around targets.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found on Gene Expression Omnibus GSE115583: https: //www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE115583.

AUTHOR CONTRIBUTIONS
SC conceived the project and interpreted the data. SC, JL, and SA designed the project and wrote the article, with SC writing the first draft. SA inferred the networks. SC and SA performed downstream analyses of the network. MB performed the RT-qPCR validation. All authors contributed to the article and approved the submitted version.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2020. 599464/full#supplementary-material Supplementary Figure 1 | Expression in seedlings of genes in module 1, from the RNA-seq data, with one line per gene. Expression is mean normalized for each gene. Supplementary Figure 4 | (A) Correlation in expression between seedlings for genes of the module 1 and module 21, for the RNA-seq experiment. AT5G07690 at ZT14 was removed as it is not expressed. (B) Correlation in expression between seedlings for genes of the module 1 and module 21, based on an RT-qPCR replicate of the RNA-seq experiment. Sixteen seedlings where harvested at ZT6 and at ZT14. (C) Normalized expression level in the fourteen seedlings for the genes of the module 21, from the RNA-seq data. Expression level for AT4G13250 is shown as the x axis while expression for the other genes of the module is shown on the y axis. Expression is mean normalized for each gene.

Supplementary Figure 5 | (A)
Inter-individual gene expression variability profiles throughout the time course for genes in each module with five genes or more. Each line represents the corrected variability level for one gene:: corrected CV 2 = [log 2 (CV 2 /trend)], with CV 2 = variance/(average 2 )] (see Cortijo et al., 2019). Modules are ordered by the percentage of HVG (high to low). Modules highlighted in blue contain 75% or more of HVGs. Modules highlighted in red contain 10% or less of HVGs. (B) Heatmap of normalized gene expression for genes in modules 2, 6 (less than 15% HVG), module 4 (55% of HVG), and modules 15 and 24 (more than 85% of HVG). Expression is shown in single seedlings from the time point ZT20. Expression is mean normalized: expression in a seedling/averaged expression in all seedlings. The left color coded bar indicates the module of each gene.