Using Gene Expression to Study Specialized Metabolism—A Practical Guide

Delli-Ponti, Riccardo; Shivhare, Devendra; Mutwil, Marek

doi:10.3389/fpls.2020.625035

REVIEW article

Front. Plant Sci., 12 January 2021

Sec. Plant Systems and Synthetic Biology

Volume 11 - 2020 | https://doi.org/10.3389/fpls.2020.625035

Using Gene Expression to Study Specialized Metabolism—A Practical Guide

School of Biological Sciences, Nanyang Technological University, Singapore, Singapore

Article metrics

View details

Citations

8,1k

Views

2,2k

Downloads

Abstract

Plants produce a vast array of chemical compounds that we use as medicines and flavors, but these compounds’ biosynthetic pathways are still poorly understood. This paucity precludes us from modifying, improving, and mass-producing these specialized metabolites in suitable bioreactors. Many of the specialized metabolites are expressed in a narrow range of organs, tissues, and cell types, suggesting a tight regulation of the responsible biosynthetic pathways. Fortunately, with unprecedented ease of generating gene expression data and with >200,000 publicly available RNA sequencing samples, we are now able to study the expression of genes from hundreds of plant species. This review demonstrates how gene expression can elucidate the biosynthetic pathways by mining organ-specific genes, gene expression clusters, and applying various types of co-expression analyses. To empower biologists to perform these analyses, we showcase these analyses using recently published, user-friendly tools. Finally, we analyze the performance of co-expression networks and show that they are a valuable addition to elucidating multiple the biosynthetic pathways of specialized metabolism.

Introduction

Despite the therapeutic and industrial potential of specialized plant metabolites (SM, also called secondary metabolites), their total chemical synthesis is often prohibitively expensive or even impossible due to their structural complexity (Chemler and Koffas, 2008). As a consequence, most of the SM are still extracted from their plant sources. The plant sources are often difficult to cultivate, resulting in the overharvesting of these species from the wild, as exemplified by firmoss (Huperzia serra), the pacific yew (Taxus brevifolia), and golden root (Rhodiola rosea; Busing et al., 1995; Lan et al., 2013). Furthermore, many valuable SM can be present at low concentrations in plants, precluding the production of these beneficial molecules in a cost-efficient manner. Consequently, large efforts are underway to understand the SM biosynthetic pathways, as these pathways can be engineered into more suitable microbial or plant hosts and further modified to produce novel, more potent compounds.

Despite the efforts to elucidate the plant SM biosynthetic pathways, very few pathways have been studied to completion, and even fewer have been transferred to heterologous hosts. A few examples include artemisinic acid (Paddon et al., 2013), the monoterpenoid indole alkaloids (Brown et al., 2015), and the benzylisoquinoline alkaloids (Thodey et al., 2014). This is a stark contrast to the >700 bacterial and fungal SM biosynthetic pathways that have been characterized and engineered (Cimermancic et al., 2014). There are two main reasons for this discrepancy between plants and microbes. Firstly, the enzymes biosynthesizing a SM in microbes are typically organized as biosynthetic gene clusters (BGCs), i.e., in a contiguous manner on chromosomes (Keller, 2019), which greatly simplifies the identification of the biosynthetic pathways. Conversely, in plants, the majority of SM pathways are not found in BGCs (Kliebenstein et al., 2012; Shi and Xie, 2014). However, nearly two dozen BGCs making defensive compounds have been functionally characterized, indicating that BGCs can be used to predict plant SM pathways in some cases (Nützmann et al., 2016; Kautsar et al., 2017; Tohge and Fernie, 2020). Secondly, in contrast to microbes, biosynthetic enzymes in plants comprise multiple, large gene families (e.g., cytochrome p450 family can comprise up to 1% of all plant genes; Mizutani and Ohta, 2010), complicating the assignment of an enzyme to a correct pathway based on genomic approaches alone. Consequently, many plant SM pathways, such as artemisinin, salicin, and taxol, have been elucidated by time-consuming and complex experimental approaches such as activity-guided fractionation, where the relevant enzyme is purified by multiple rounds of activity-guided fractionation, and identified by a proteomic approach, such as mass spectrometry.

Fortunately, the last decade has seen the emergence of novel methods in the area of genomics, transcriptomics, proteomics, metabolomics, synthetic biology, and gene function prediction, which has fueled the identification of SM biosynthetic pathways (Jacobowitz and Weng, 2020; Mutwil, 2020). These additional approaches provide multipronged sources of information to predict the identity of the enzymes making a given SM, allowing rapid de novo biosynthetic pathway prediction in nonmodel plants (Torrens-Spence et al., 2016). These predictions can then be rapidly tested by synthesizing codon-optimized cDNA of the putative enzyme and expressed in a laboratory microbe or a more suitable plant, such as Nicotiana benthamiana [please see the excellent review on these approaches in Jacobowitz and Weng, 2020)]. The various computational approaches comprising sequence similarity, Quantitative Trait Loci/Genome-Wide Association Studies (QTL, GWAS), phylogenetic profiling, and machine learning have been extensively reviewed elsewhere (Jacobowitz and Weng, 2020; Mutwil, 2020).

This review focuses on gene expression and co-expression networks as tools to uncover SM biosynthetic pathways. To showcase some of the analyses, we dissect biosynthetic pathways of sporopollenin, lignin, cutin, and suberin. We also discuss another important but overlooked property of gene expression and co-expression analyses: the ability to identify transcription factors and transporters as additional genes involved in the metabolites’ regulation and biosynthesis. Finally, we discuss some of the caveats typical for these analyses.

Correlating Metabolite Presence and Gene Expression

Specialized metabolites often show a restricted presence in only a few organs, tissues, and cell types (Li et al., 2016), and can be extensively regulated by environmental factors (e.g., pathogen attack, UV-B light; Li et al., 2015; Tohge et al., 2016). For example, plant defense metabolites are frequently present in specialized tissues/cell types to minimize autotoxicity in the surrounding tissues and/or to maximize the effectiveness of these metabolites toward the spatially specific attacks of the aggressors (Schilmiller et al., 2010; Tissier, 2012). Of the 895 non-redundant metabolite spectra from different tissues of Nicotiana attenuata, 595 (63%) displayed tissue-specific expression, showing that SM often have organ- and tissue-specific gene expression (Li et al., 2016). Intuitively, the biosynthetic enzymes and their mRNAs should only be present in the cells where the metabolite is made. This assumption can be exploited to identify the biosynthetic genes by correlating gene expression and metabolite levels. This assumption fails for cases where the site of metabolite biosynthesis and accumulation differs, as exemplified by nicotine, which is biosynthesized in roots by root-specific enzymes and exported to leaves (Katoh et al., 2005; Tan et al., 2020). However, this simple yet powerful analysis has been successfully applied to unravel biosynthetic pathways of modified fatty acids in tomato (Jeon et al., 2020) and colchicine in Gloriosa superba (Nett et al., 2020).

To exemplify how gene expression specificity can uncover a biosynthetic pathway, we use the CoNekT online tool¹ (Proost and Mutwil, 2018) to analyze pollen exine biosynthesis. Pollen exine is an outermost protective layer of pollen grains, and consists of the insoluble sporopollenin biosynthesized in anthers (Hsieh and Huang, 2007). Thus, by identifying other genes with anther-specific gene expression, we should find the exine biosynthetic genes. To perform this analysis, we navigated to the “Tools/Find Specific Profiles,” selected Arabidopsis and “Flowers (anthers)” as the target species and tissue, which revealed 162 genes with another-specific expression (Figure 1A and Supplementary Table 1). As expected, these genes show exclusively anther-specific expression profiles (Figure 1B). Among these genes, we found numerous genes with unknown function, transcription factors, lipid transfer proteins, and several genes implicated in sporopollenin biosynthesis (Table 1). Notably, the analysis can reveal non-enzymatic genes essential for the functioning of the pathways, such as transporters needed for shuttling of the metabolite precursors (ABCG26) and transcription factors controlling the expression of the pathway (MYB103).

FIGURE 1

TABLE 1

Gene ID	Symbol	Annotation	Function
AT3G13220	ABCG26, WBC27	ABC-2 type transporter family protein	Polyketide export Quilichini et al., 2014
AT1G62940	ACOS5	Acyl-CoA synthetase 5	Sporopollenin monomer biosynthesis de Azevedo Souza et al., 2009
AT4G34850	LAP5	Chalcone and stilbene synthase family protein	Biosynthesis of pollen fatty acids and phenolics found in exine Dobritsa et al., 2010
AT1G02050	LAP6	Chalcone and stilbene synthase family protein	Biosynthesis of pollen fatty acids and phenolics found in exine Dobritsa et al., 2010
AT1G01280	CYP703A2, CYP703	Cytochrome P450, family 703, subfamily A, polypeptide 2	Biosynthesis of medium-chain hydroxy fatty acids Morant et al., 2007
AT1G69500	CYP704B1	Cytochrome P450, family 704, subfamily B, polypeptide 1	Biosynthesis of long-chain fatty acids Dobritsa et al., 2009
AT5G56110	MYB103, AtMYB103, ATMYB80, MS188	Myb domain protein 103	Tapetum and exine development Zhang et al., 2007

Annotation of anther-specific genes involved in sporopollenin biosynthesis.

Expression profiles can also identify functionally equivalent genes across species. For example, gene AT1G69500 (CYP704B1) is a cytochrome P450 long-chain fatty acid {omega}-hydroxylase essential for pollen exine formation (Dobritsa et al., 2009). Cytochrome P450 genes comprise one of the largest gene families that catalyze various metabolic reactions (Xu et al., 2015). Due to numerous duplications, it can be challenging to identify P450 genes involved in sporopollenin biosynthesis in other plants. However, since all sporopollenin-specific P450s are also likely expressed in anthers in other species, we can use gene expression to identify the relevant genes. We used CoNekT to compare expression profiles of the orthogroup containing AT1G69500 and 78 other land plant-specific genes (https://evorepro.sbs.ntu.edu.sg/family/view/131885, click on “row-normalized” to view expression). As expected, AT1G69500 is expressed specifically in flowers (CoNekT groups components of an organ into one category), while for Amborella trichopoda, only AMTR_s00010p00266280 is showing a similar expression pattern, suggesting that AT1G69500 and AMTR_s00010p00266280 are functionally equivalent (Figure 1C).

Using Guide Genes to Identify Biosynthetic Pathways

To uncover the other biosynthetic pathway components, it is possible to identify other genes with a similar expression profile if at least one of the biosynthetic enzymes is known (Usadel et al., 2009; Serin et al., 2016). This assumption is based on the observation that genes with similar expression patterns across organs, developmental stages, and biotic and abiotic perturbations tend to be involved in related biological processes. Identification of genes with similar profiles can be made by calculating all possible pairwise comparisons of gene expression profiles using different similarity metrics (e.g., Pearson Correlation Coefficient, Mutual Rank, and Highest Reciprocal Rank), across tens to thousands of gene expression measurements captured by microarrays or RNA sequencing (RNA-seq; Usadel et al., 2009; Mutwil et al., 2010; Aoki et al., 2016).

The identification of these transcriptionally co-regulated (co-expressed) genes has been successfully used to further complete various metabolic pathways, such as protolimonoids from Azadirachta indica (Hodgson et al., 2019), vinblastine from Madagascar periwinkle (Caputi et al., 2018), etoposide glycone from Podophyllum hexandrum (Lau and Sattely, 2015), and the seco-iridoid pathway from Catharanthus roseus (Miettinen et al., 2014), to name a few recent examples. The identification of the co-expressed genes can be performed in three ways, by a: (i) co-expression list analysis, (ii) hierarchical clustering of expression profiles, or (iii) co-expression networks. To exemplify how these analyses can be performed and interpreted, we use the classical example of lignin biosynthesis, which requires multiple steps to convert phenylalanine to various lignin precursors (Figure 2A; Sibout et al., 2017).

FIGURE 2

Uncovering Functionally Related Genes by the Co-Expression List Analysis

The co-expression list analysis is typically a “one versus all” analysis, where the expression profile similarity of one gene is compared to expression profiles of all genes, and the resulting list is sorted according to a similarity metric, such as the Pearson Correlation Coefficient (PCC; Usadel et al., 2009). Typically, this analysis is used to uncover unknown components of a biological process (Brown et al., 2005; Persson et al., 2005). Since the list is sorted according to expression profile similarity, the most relevant genes are found on top of the list, and typically top 50 genes are investigated (Aoki et al., 2016; Proost and Mutwil, 2018). The analysis of phenylalanine ammonia-lyase 1 (PAL1), which is the first enzyme in the phenylpropanoid pathway needed for lignin biosynthesis (Figure 2A), revealed several known players, such as C4H, PAL2, CYP98A3, CCR1, CCR2, 4CL, and HCT (Table 2 and Supplementary Table 2). It is important to note that the list does not contain all of the lignin biosynthetic enzymes, showing that co-expression is not always guaranteed to retrieve all relevant genes. To uncover the pathway’s missing members, we recommend using other known members of the pathway as a query and collate the results.

TABLE 2

Sequence	Annotation	PCC	Function
AT2G37040	Phenylalanine ammonia-lyase 1 ATPAL1, PAL1	1.0	Phenylpropanoid pathway entry Cochrane et al., 2004
AT2G30490	Cinnamate-4-hydroxylase ATC4H, CYP73A5, REF3, C4H	0.836494	Trans-4-coumarate biosynthesis Schilmiller et al., 2009
AT3G53260	Phenylalanine ammonia-lyase 2 ATPAL2, PAL2	0.806119	Phenylpropanoid pathway entry Cochrane et al., 2004
AT2G40890	Cytochrome P450, family 98, subfamily A, polypeptide 3 CYP98A3	0.647512	3’-hydroxylation of p-coumaric esters Schoch et al., 2001
AT1G80820	Cinnamoyl-Coa reductase CCR2, ATCCR2	0.624933	Cinnamaldehyde biosynthesis Lacombe et al., 1997
AT1G51680	4-coumarate:CoA ligase 1 4CL1, AT4CL1, 4CL.1	0.609514	CoA thiol ester biosynthesis Ehlting et al., 1999
AT5G48930	Hydroxycinnamoyl-CoA shikimate/quinate hydroxycinnamoyl transferase HCT	0.589046	Hoffmann et al., 2004
AT1G15950	Cinnamoyl-Coa reductase 1 CCR1, IRX4, ATCCR1	0.525528	Cinnamaldehyde biosynthesis Lacombe et al., 1997

Co-expression list of PAL1.

For brevity, only the known participants of the lignin biosynthesis pathway are shown.

Hierarchical Clustering Analysis

Hierarchical clustering of expression profiles is a “many versus many” analysis, where the selected genes are grouped into clusters defined by expression profile similarity. These clusters are then visually analyzed to identify genes containing the known components of a pathway and exclude genes that are not part of these clusters. Typically, this analysis is used when the list of candidate genes is extensive and needs to be reduced. This approach has been used in identifying P450 enzymes important for protolimonoid synthesis (Hodgson et al., 2019) and components of etoposide aglycone biosynthesis (Lau and Sattely, 2015). To exemplify a clustering analysis, we selected four PAL gene family members, ATC4H, and nine members of the CAD family. We entered the 14 (AT2G37040, AT3G10340, AT3G53260, AT5G04230, AT2G30490, AT1G72680, AT2G21730, AT2G21890, AT3G19450, AT4G34230, AT4G37970, AT4G37980, AT4G37990, and AT4G39330) genes into the ‘‘Tools/Heatmap/Comparative’’², which revealed the expression profiles of these genes in organs of Arabidopsis. The resulting heatmap was pasted into the ClustVis web-tool³ (Metsalu and Vilo, 2015) and used to perform hierarchical clustering. The heatmap revealed that PAL1,2 and 4 are clustering with C4H and CAD, but, e.g., not with PAL3, which has not been implicated in lignin biosynthesis (Figure 2B). The heatmap can also indicate where a given cluster is expressed, showing that the lignin cluster has the highest expression in roots. In contrast, the other major cluster containing CAD2, 3, 6, and ELI3 are expressed in male organs (comprising pollen and sperm, Figure 2B). Thus, the clustering analysis can reveal functionally related genes and indicate the organs and tissues where these genes are likely active.

Co-Expression Network Analysis—Searching With a Query Gene

Co-expression networks can be used in “many versus many” (when used with one query gene) or “all versus all” (when used with co-expression clusters) type of analyses. In co-expression networks, nodes (or vertices) represent genes, and edges (or links) connect genes that display similar expression profiles (Lee et al., 2004; Usadel et al., 2009; Serin et al., 2016). While the networks are different from co-expression lists (lists are ordered while networks are not) and hierarchical clustering (networks are unordered and typically do not indicate the expression patterns of genes), when used with one query gene, the networks provide the same information: the identity of functionally-related genes. To exemplify a typical network analysis, we used PAL1⁴, which similarly to the co-expression list (Table 2), retrieved several, but not all, known participants of lignin biosynthesis (Figure 2C).

In contrast to lists and hierarchical clustering approaches, networks can convey additional information with node and edge colors. For example, CoNekT uses different node colors and shapes to indicate gene families (see text footnote 4; Proost and Mutwil, 2018), while ATTED-II⁵ (Aoki et al., 2016), and GeneMANIA⁶ (Warde-Farley et al., 2010) use edge styles to indicate different types of functional relationships between genes (e.g., co-expression, protein-protein interactions). Modern tools provide interactive networks, where the nodes can be moved, colored by different criteria (e.g., by organ-specific expression or gene family membership), allowing adjusting the networks to convey the desired information better.

Identifying Functionally Related Genes by Custom Network Analysis

While a typical genome-wide co-expression network typically contains tens of thousands of nodes (genes) and millions of edges (connections), a typical user is only interested in a particular part of the network representing a biological process of interest. Since functionally related genes tend to be connected, the network can be used to uncover functional clusters of genes. Conceptually, the analysis is similar to hierarchical clustering (Figure 2B), but instead of clades, the functionally related genes are connected by edges.

While most current studies focus on uncovering the enzymes constituting a biosynthetic pathway, non-enzymatic genes are also crucial for SM’s efficient biosynthesis. For example, gliotoxin biosynthesis in fungi Aspergillus requires a gliotoxin efflux pump that removes the harmful metabolite from the cellular environment. At the same time, another enzyme modifies it to a less toxic form (Dolan et al., 2015). Furthermore, up to 50% of BGCs in fungi also contain transcription factors that positively regulate the corresponding pathway (Brown et al., 2015). In plants, we observed that relevant transcription factors and transporters can be co-expressed with the pathways they regulate and participate in, respectively. For example, we observed ABCG26, a polyketide transporter needed for exine biosynthesis in Arabidopsis (Table 1), and in Brachypodium distachyon various other transporters and transcription factors important for cellulose biosynthesis (Sibout et al., 2017), artemisinin biosynthesis in Artemisia annua (Tan and Mutwil, 2019) and nicotine biosynthesis in Nicotiana tabacum (Tan et al., 2020). Thus, co-expression analysis is uniquely positioned to reveal non-enzymatic components essential for the efficient functioning of metabolic pathways.

To demonstrate how this analysis can be performed, we tested which MYB transcription factors are co-expressed with lignin biosynthesis-related laccases (LAC) in Arabidopsis (Figure 3). To this end, we used as input the 11 LAC genes⁷, together with 122 MYB transcription factors⁸ into the ‘‘Tools\Create custom network’’ tool⁹. We observed the association of laccases necessary for lignin biosynthesis in the secondary cell wall (LAC2, LAC4, and LAC17; Berthet et al., 2011; Khandal et al., 2020) with MYBs controlling lignin biosynthesis (MYB103, MYB85, MYB63, and MYB52; Zhou et al., 2009; Cassan-Wang et al., 2013; Öhman et al., 2013; Geng et al., 2020). Interestingly, we also observed the association of MYB5, which controls seed coat development (Li et al., 2009) to TT10, which is essential for flavonoid biosynthesis in the seed coat (Pourcel et al., 2005). Since CoNekT allows quick retrieval of gene families representing different gene functions, we envision that this functionality can be used to rapidly highlight transcription factors, transporters, and other genes necessary for the biosynthetic pathways.

FIGURE 3

Searching Co-Expression Clusters for Enriched Biosynthetic Pathways

One of the significant advantages of co-expression networks is the availability of graph-theoretical methods to define co-expression clusters, i.e., groups of genes with similar expression profiles (Ronan et al., 2016). This simplifies gene expression data analysis, as clustering typically assigns tens of thousands of genes into hundreds of co-expression clusters. The clusters can then be compared to identify groups with similar functions across species (Heyndrickx and Vandepoele, 2012) or duplicated modules within species (Ruprecht et al., 2016). Furthermore, the clusters’ biological function can be elucidated by identifying enriched Gene Ontology or MapMan terms (Sibout et al., 2017; Ferrari et al., 2020).

To demonstrate how searching for functionally enriched clusters can be used to generate novel insights, we selected cutin and suberin as an example. Cutin and suberin are lipid biopolyester components of the cell walls important for desiccation tolerance (Philippe et al., 2020). To identify a module biosynthesizing cutin in Arabidopsis, we navigated to the “Tools/Find enriched clusters,” entered “cutin biosynthesis” under GO search box, and clicked “Show clusters.” This revealed three clusters significantly (p-value < 0.05) enriched for genes known to be involved in cutin biosynthesis in Arabidopsis, and we clicked on cluster 26. The page dedicated to the cluster provides information about the average expression profile of the genes in the cluster, the identity of the genes, and functional enrichment analysis¹⁰. The “Similar clusters” table found on the cluster page also contains the identity of similar clusters across and within species (similarity is defined by Jaccard index between cluster gene families; Proost and Mutwil, 2018), allowing an easy way to identify conservation and duplication of biosynthetic pathways (Ruprecht et al., 2016). Interestingly, we observed that cluster 206 from Arabidopsis is most similar to cutin cluster 26, indicating that the cutin cluster has been duplicated to biosynthesize a cutin-like polymer in another organ or tissue.

By clicking on the “Compare” button next to the duplicated cluster 206, the two clusters are visualized (Figure 4A). The two clusters contain numerous gene families that have been implicated in the biosynthesis of cutin and suberin, comprising CYP450s, lipid transfer proteins, acyl-transferases, and glycerol-3-phosphate acyltransferase (GPAT; Philippe et al., 2020). Cutin is predominantly present in aerial organs, while suberin is mostly present in roots and seed coats (Philippe et al., 2020). In line with this, comparative expression profile analysis of two representative CYP450s revealed the expected expression of cluster 26 in flowers and cluster 206 in roots (Figure 4B). Interestingly, MYB107 has been shown to regulate suberin biosynthesis (Gou et al., 2017), but is also found in the cutin cluster, suggesting that it might also have a role in cutin biosynthesis. We also observed numerous other gene families (e.g., cupredoxin, cysteine/histidine-rich, carboxypeptidases, and RING/U-box), which are not implicated in the biosynthesis of the polymers. However, since these gene families are present in both clusters, they are likely involved in some aspect of their biosynthesis.

FIGURE 4

To conclude, enriched cluster analysis can reveal the clusters comprising various biosynthetic pathways. The conserved or duplicated modules can identify the conserved (i.e., likely relevant) genes found in the pathways.

Performing Your Own Analysis With Existing Tools or Your Own Data

While the above analyses exemplified how CoNekT can be used to study SM, multiple online tools are available, such as ATTED-II (Aoki et al., 2016), CoNekT (Proost and Mutwil, 2018), PlaNet (Mutwil et al., 2011), ePlant (Waese et al., 2017), and PlantGenIE (Sundell et al., 2015) reviewed in Rao and Dixon (2019). These tools are preloaded with expression data from tens of plants of agricultural and evolutionary interest (Table 3). Still, there are >200,000 RNA-seq experiments publicly available for >100 species from the plant kingdom¹¹, providing an excellent opportunity to study the biosynthetic pathways of SM. Furthermore, as RNA-seq analysis is becoming more affordable and accessible, numerous studies nowadays generate and analyze their own RNA-seq data to prioritize genes for functional analysis. To perform such an analysis, we need (i) coding sequence (CDS) file, (ii) gene expression data, and (iii) gene expression similarity analysis.

TABLE 3

	ATTED (https://atted.jp/)	ePlant (http://bar.utoronto.ca/)	PlantGenIE (https://plantgenie.org/)	PlaNet (www.gene2function.de)	CoNekT (www.evorepro.plant.tools)
Amborella trichopoda	N	N	N	N	Y
Arabidopsis thaliana	Y	Y	Y	Y	Y
Brassica rapa	Y	N	N	N	N
Chlamydomonas reinhardtii	N	N	N	Y	Y
Cyanophora paradoxa	N	N	N	N	Y
Eucalyptus grandis	N	N	Y	N	N
Ginkgo biloba	N	N	N	N	Y
Glycine max	Y	N	N	Y	N
Marchantia polymorpha	N	N	N	N	Y
Medicago truncatula	Y	N	N	Y	N
Oryza sativa	Y	Y	N	Y	Y
Physcomitrella patens	N	N	N	Y	Y
Picea abies	N	N	Y	N	Y
Populus trichocarpa	Y	N	Y	Y	N
Selaginella moellendorffii	N	N	N	N	Y
Solanum lycopersicum	Y	N	N	N	Y
Vitis vinifera	Y	N	N	N	Y
Zea mays	Y	Y	N	N	Y

Online tools allowing expression profiles and co-expression network analysis.

Only tools that are preloaded with co-expression networks for more than two plants are shown.

The CDS file contains the transcript sequences the RNA-seq data should be mapped too. A CDS file can be typically retrieved from a public database, such as the EnsemblGenone¹² or Phytozome¹³, or the genome release paper, if available. If no genome is available, RNA sequencing data can be used for de novo assembly. Best-performing transcriptome assemblers are typically able to retrieve >70% of the expected gene space (Hölzer and Marz, 2019). Indeed, elucidation of biosynthetic pathways without a reference genome successfully revealed steps in colchicine alkaloid (Nett et al., 2020) and protolimonoid biosynthesis (Hodgson et al., 2019), showing that the RNA-seq data can be used as an acceptable source for CDS. Comparison of 10 transcriptome de novo assembly tools across nine RNA-seq datasets spanning different kingdoms of life showed that Trinity, SPAdes, and Trans-ABySS consistently show the highest performance in reconstructing the coding sequences (Hölzer and Marz, 2019), where Spades has the easiest setup, user-friendliness, and lowest memory usage and runtime.

The gene expression data is used to reveal the functional associations between the genes. While as few as eight samples can be sufficient to identify relevant members of a metabolic pathway (Nett et al., 2020), the expression data should ideally capture organs/tissues which show contrasting levels of the metabolite of interest. For example, among the four organs of G. superba (leaf, stem, rhizome, and root), colchicine alkaloids showed the highest accumulation in the rhizome, which allowed the authors to elucidate most of the pathway by identifying rhizome-specific genes by clustering analysis. In another study, the authors took advantage of highly specific induction of falcarindiol biosynthesis by pathogen elicitors and identified six acetyltransferases that were upregulated upon treatment (Jeon et al., 2020). Conversely, the lignin (Figures 2, 3), suberin, and cutin (Figure 4) examples from Arabidopsis use one dataset containing hundreds of publicly available RNA-seq experiments that captures different organs, developmental stages, and growth conditions. This comprehensive dataset can thus be potentially used to identify all Arabidopsis biosynthetic pathways, as long as the dataset captures the organs where a given pathway is expressed. We have developed a user-friendly, cloud computing pipeline, LSTRaP-Cloud¹⁴, that provides tools to download and quality-control publicly available gene expression data and to perform co-expression list and co-expression network guide gene analyses (Tan et al., 2020). Alternatively, Curse can perform these analyses on the user’s computer and allow the semi-automated annotation of the RNA-seq experiments¹⁵ (Vaneechoutte and Vandepoele, 2019).

The gene expression similarity analysis is used to identify genes with similar expression patterns, which is the basis for identifying functionally-related genes. If one or multiple guide genes are known, we recommend the co-expression list approach (Table 2), which can be performed by the LSTRaP-Cloud or Curse. To identify gene clusters containing known participants of the pathway of interest, clustering-based analyses of the expression matrix (Table 1 and Figure 2B) can be done with the ClustVis web-tool¹⁶ (Metsalu and Vilo, 2015). Alternatively, CoExpNetViz allows the upload and co-expression analysis of the user’s gene expression data¹⁷ (Tzfadia et al., 2016), and CoNekT provides source code and instructions to set up a stand-alone database¹⁸ (Proost and Mutwil, 2018).

Is Co-Expression a Silver Bullet in Biosynthetic Pathway Discovery? Not Quite

The above examples demonstrate that gene expression and co-expression analyses are valuable additions to the SM pathway discovery toolbox. However, as with many guilt-by-association methods, we often observe many missing enzymes (false negatives) and irrelevant genes (false positives). This is exemplified by Figure 2C, where, e.g., COMT enzyme is not detected (false negative) and where a large number of seemingly irrelevant genes are found in the lignin biosynthesis network (false positive).

To gage the co-expression networks’ performance in identifying SM genes, we tested three network construction methods (PCC, HRR, and MR) from four different species (Zea mays, Solanum lycopersicum, Oryza sativa, and Arabidopsis thaliana). The used networks are based on gene expression data representing all major plant organs at different developmental stages (Julca et al., 2020). We analyzed 15 different secondary metabolic pathways associated with alkaloids, betaines, glucosinolates, phenolics, and terpenoids (Figure 5). We then predicted genes that are involved in each of the 15 pathways, by using a network neighborhood approach (Hew et al., 2020), and the F1 score to see how known members of each pathway could be correctly classified by each of the networks. We observed a complex interplay between the different metabolic pathways and species. For example, the performance of the networks was higher in Arabidopsis than tomato for nearly all pathways, while, e.g., terpene pathway could be more readily predicted in maize than Arabidopsis (higher scores in the latter plant), for all three types of networks (HRR, MR, and PCC). Conversely, methylerythritol 4-phosphate (MEP) pathway could not be predicted at all in Arabidopsis (F1 score 0 for all networks). These results indicate that co-expression networks can show unpredictable performance when predicting SM pathways, and more research is needed to understand which conditions would result in best performance (quantity and quality of the expression data, the network construction methods).

FIGURE 5

Conclusion and Future Perspectives

Gene expression and co-expression network analyses are valuable, unique tools to unravel the biosynthetic pathways of specialized metabolism. The expression-based analyses’ versatility allows shortlisting of gene candidates with even a few RNA sequencing samples (Nett et al., 2020) or elucidation of multiple pathways with one large expression dataset (Figures 1–4). We find ourselves in the log phase of metabolic pathway discovery as open-source online tools are publicly available (e.g., https://github.com/tqiaowen/LSTrAP-Cloud) and repositories are brimming with gene expression data for hundreds of plant species.

In addition to uncovering the enzymes underpinning the various metabolic pathways, the co-expression networks present two exciting, novel opportunities. Firstly, these analyses can reveal non-enzymatic components of the pathways, such as transporters and transcription factors (Table 1 and Figure 3). The transcription factors are especially exciting, as changing their expression can alter the whole pathway’s activity and cause dramatic changes in metabolite levels (Zhao and Dixon, 2011). Secondly, the networks can serve as top-down tools to uncover new pathways by identifying novel clusters of connected genes. For example, the analysis investigating the functional association between MYB transcription factors and laccases (Figure 3) can be repurposed to study associations between all enzymes in an organism. The analyses discussed in this review can and should be supplemented with other omics-based inference methods to pave the way for more nutritious, resilient crops, and the development of novel medicines.

Statements

Author contributions

RD-P contributed to the co-expression performance analysis. DS helped with the literature summary. MM designed the review. All authors helped with the manuscript.

Funding

DS was supported by Singaporean Ministry of Education grant MOE2018-T2-2-053.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2020.625035/full#supplementary-material

Supplementary Table 1

Anther-specific genes retrieved from CoNekT. The genes shown in Table 1 are indicated by bold letters.

Supplementary Table 2

PAL1-coexpressed genes retrieved from CoNekT. The genes shown in Table 2 are indicated by bold letters.

Footnotes

1.^https://evorepro.sbs.ntu.edu.sg/search/specific/profiles

2.^https://evorepro.sbs.ntu.edu.sg/heatmap/

3.^https://biit.cs.ut.ee/clustvis/

4.^https://evorepro.sbs.ntu.edu.sg/network/graph/3767

5.^https://atted.jp/locus/?gene_id=818280

6.^https://genemania.org/search/arabidopsis-thaliana/pal1

7.^https://evorepro.sbs.ntu.edu.sg/family/view/115

8.^https://evorepro.sbs.ntu.edu.sg/family/view/3

9.^https://evorepro.sbs.ntu.edu.sg/custom_network/

10.^https://evorepro.sbs.ntu.edu.sg/cluster/view/212

11.^https://www.ncbi.nlm.nih.gov/sra/

12.^https://plants.ensembl.org/index.html

13.^https://phytozome.jgi.doe.gov/pz/portal.html

14.^https://github.com/tqiaowen/LSTrAP-Cloud

15.^http://bioinformatics.psb.ugent.be/webtools/Curse/

16.^https://biit.cs.ut.ee/clustvis/

17.^http://bioinformatics.psb.ugent.be/webtools/coexpr/index.php

18.^https://github.com/sepro/CoNekT

References

1
AokiY.OkamuraY.TadakaS.KinoshitaK.ObayashiT. (2016). ATTED-II in 2016: A plant coexpression database towards lineage-specific coexpression.Plant Cell Physiol.57e5. 10.1093/pcp/pcv165
2
BerthetS.Demont-CauletN.PolletB.BidzinskiP.CézardL.le BrisP.et al (2011). Disruption of LACCASE4 and 17 results in tissue-specific alterations to lignification of Arabidopsis thaliana stems.Plant Cell231124–1137. 10.1105/tpc.110.082792
3
BrownD. M.ZeefL. A. H.EllisJ.GoodacreR.TurnerS. R. (2005). Identification of novel genes in Arabidopsis involved in secondary cell wall formation using expression profiling and reverse genetics.Plant Cell172281–2295. 10.1105/tpc.105.031542
4
BrownS.ClastreM.CourdavaultV.O’ConnorS. E. (2015). De novo production of the plant-derived alkaloid strictosidine in yeast.Proc. Natl. Acad. Sci. U.S.A.1123205–3210. 10.1073/pnas.1423555112
5
BusingR. T.HalpernC. B.SpiesT. A. (1995). Ecology of Pacific Yew (Taxus brevifolia) in Western Oregon and Washington.Conserv. Biol.91199–1207. 10.1046/j.1523-1739.1995.9051189.x-i1
- CrossRef
- Google Scholar
6
CaputiL.FrankeJ.FarrowS. C.ChungK.PayneR. M. E.NguyenT.-D.et al (2018). Missing enzymes in the biosynthesis of the anticancer drug vinblastine in Madagascar periwinkle.Science3601235–1239. 10.1126/science.aat4100
7
Cassan-WangH.GouéN.SaidiM. N.LegayS.SivadonP.GoffnerD.et al (2013). Identification of novel transcription factors regulating secondary cell wall formation in Arabidopsis.Front. Plant Sci.4:189. 10.3389/fpls.2013.00189
8
ChemlerJ. A.KoffasM. A. (2008). Metabolic engineering for plant natural product biosynthesis in microbes.Curr. Opin. Biotechnol.19597–605. 10.1016/j.copbio.2008.10.011
9
CimermancicP.MedemaM. H.ClaesenJ.KuritaK.Wieland BrownL. C.MavrommatisK.et al (2014). Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters.Cell158412–421. 10.1016/j.cell.2014.06.034
10
CochraneF. C.DavinL. B.LewisN. G. (2004). The Arabidopsis phenylalanine ammonia lyase gene family: kinetic characterization of the four PAL isoforms.Phytochemistry651557–1564. 10.1016/j.phytochem.2004.05.006
11
de Azevedo SouzaC.KimS. S.KochS.KienowL.SchneiderK.McKimS. M.et al (2009). A novel fatty Acyl-CoA Synthetase is required for pollen development and sporopollenin biosynthesis in Arabidopsis.Plant Cell21507–525. 10.1105/tpc.108.062513
12
DobritsaA. A.LeiZ.NishikawaS.-I.Urbanczyk-WochniakE.HuhmanD. V.PreussD.et al (2010). LAP5 and LAP6 encode anther-specific proteins with similarity to chalcone synthase essential for pollen exine development in Arabidopsis.Plant Physiol153937–955. 10.1104/pp.110.157446
13
DobritsaA. A.ShresthaJ.MorantM.PinotF.MatsunoM.SwansonR.et al (2009). CYP704B1 is a long-chain fatty acid ω-hydroxylase essential for Sporopollenin synthesis in pollen of Arabidopsis.Plant Physiol.151574–589. 10.1104/pp.109.144469
14
DolanS. K.O’KeeffeG.JonesG. W.DoyleS. (2015). Resistance is not futile: gliotoxin biosynthesis, functionality and utility.Trends Microbiol.23419–428. 10.1016/j.tim.2015.02.005
15
EhltingJ.BüttnerD.WangQ.DouglasC. J.SomssichI. E.KombrinkE. (1999). Three 4-coumarate:coenzyme A ligases in Arabidopsis thaliana represent two evolutionarily divergent classes in angiosperms.Plant J.199–20. 10.1046/j.1365-313X.1999.00491.x
16
FerrariC.ShivhareD.HansenB. O.PashaA.EstebanE.ProvartN. J.et al (2020). Expression atlas of Selaginella moellendorffii provides insights into the evolution of vasculature, secondary metabolism, and roots.Plant Cell32853–870. 10.1105/tpc.19.00780
17
GengP.ZhangS.LiuJ.ZhaoC.WuJ.CaoY.et al (2020). MYB20, MYB42, MYB43, and MYB85 regulate phenylalanine and lignin biosynthesis during secondary cell wall formation.Plant Physiol.1821272–1283. 10.1104/pp.19.01070
18
GouM.HouG.YangH.ZhangX.CaiY.KaiG.et al (2017). The MYB107 transcription factor positively regulates suberin biosynthesis1[OPEN].Plant Physiol.1731045–1058. 10.1104/pp.16.01614
19
HewB.TanQ. W.GohW.NgJ. W. X.MutwilM. (2020). LSTrAP-Crowd: prediction of novel components of bacterial ribosomes with crowd-sourced analysis of RNA sequencing data.BMC Biol.18:114. 10.1186/s12915-020-00846-9
20
HeyndrickxK. S.VandepoeleK. (2012). Systematic identification of functional plant modules through the integration of complementary data sources.Plant Physiol.159884–901. 10.1104/pp.112.196725
21
HodgsonH.PeñaR. D. L.StephensonM. J.ThimmappaR.VincentJ. L.SattelyE. S.et al (2019). Identification of key enzymes responsible for protolimonoid biosynthesis in plants: opening the door to azadirachtin production.Proc. Natl. Acad. Sci. U.S.A.11617096–17104. 10.1073/pnas.1906083116
22
HoffmannL.BesseauS.GeoffroyP.RitzenthalerC.MeyerD.LapierreC.et al (2004). Silencing of hydroxycinnamoyl-coenzyme A shikimate/quinate hydroxycinnamoyltransferase affects phenylpropanoid biosynthesis.Plant Cell161446–1465. 10.1105/tpc.020297
23
HölzerM.MarzM. (2019). De novo transcriptome assembly: a comprehensive cross-species comparison of short-read RNA-Seq assemblers.GigaScience8giz039. 10.1093/gigascience/giz039
24
HsiehK.HuangA. H. C. (2007). Tapetosomes in Brassica tapetum accumulate endoplasmic reticulum–derived flavonoids and alkanes for delivery to the pollen surface.Plant Cell19582–596. 10.1105/tpc.106.049049
25
JacobowitzJ. R.WengJ.-K. (2020). Exploring uncharted territories of plant specialized metabolism in the postgenomic Era.Annu. Rev. Plant Biol.71631–658. 10.1146/annurev-arplant-081519-035634
26
JeonJ. E.KimJ.-G.FischerC. R.MehtaN.Dufour-SchroifC.WemmerK.et al (2020). A pathogen-responsive gene cluster for highly modified fatty acids in tomato.Cell180176–187e19. 10.1016/j.cell.2019.11.037
27
JulcaI.FloresM.ProostS.LindnerA.-C.HackenbergD.SteinbachovaL.et al (2020). Comparative transcriptomic analysis reveals conserved transcriptional programs underpinning organogenesis and reproduction in land plants.bioRxiv[Preprint]10.1101/2020.10.29.361501
- CrossRef
- Google Scholar
28
KatohA.OhkiH.InaiK.HashimotoT. (2005). Molecular regulation of nicotine biosynthesis.Plant Biotechnol.22389–392. 10.5511/plantbiotechnology.22.389
- CrossRef
- Google Scholar
29
KautsarS. A.Suarez DuranH. G.BlinK.OsbournA.MedemaM. H. (2017). PlantiSMASH: automated identification, annotation and expression analysis of plant biosynthetic gene clusters.Nucleic Acids Res.45W55–W63. 10.1093/nar/gkx305
30
KellerN. P. (2019). Fungal secondary metabolism: regulation, function and drug discovery.Nat. Rev. Microbiol.17167–180. 10.1038/s41579-018-0121-1
31
KhandalH.SinghA. P.ChattopadhyayD. (2020). The microRNA397b-LACCASE2 module regulates root lignification under water and phosphate deficiency.Plant Physiol.1821387–1403. 10.1104/pp.19.00921
32
KliebensteinD. J.OsbournA.Pamela RonaldE. C.ShirasuK. (2012). Making new molecules – evolution of pathways for novel metabolites in plants This review comes from a themed issue on Biotic interactions.Curr. Opin. Plant Biol.15415–423. 10.1016/j.pbi.2012.05.005
33
LacombeE.HawkinsS.Van DoorsselaereJ.PiquemalJ.GoffnerD.PoeydomengeO.et al (1997). Cinnamoyl CoA reductase, the first committed enzyme of the lignin branch biosynthetic pathway: cloning, expression and phylogenetic relationships.Plant J.11429–441. 10.1046/j.1365-313x.1997.11030429.x
34
LanX.ChangK.ZengL.LiuX.QiuF.ZhengW.et al (2013). Engineering salidroside biosynthetic pathway in hairy root cultures of Rhodiola crenulata based on metabolic characterization of tyrosine decarboxylase.PLoS One8:e75459. 10.1371/journal.pone.0075459
35
LauW.SattelyE. S. (2015). Six enzymes from mayapple that complete the biosynthetic pathway to the etoposide aglycone.Science3491224–1228. 10.1126/science.aac7202
36
LeeH. K.HsuA. K.SajdakJ.QinJ.PavlidisP. (2004). Coexpresion analysis of human genes across many microarray data sets.Genome Res.141085–1094. 10.1101/gr.1910904
37
LiD.BaldwinI. T.GaquerelE. (2015). Navigating natural variation in herbivory-induced secondary metabolism in coyote tobacco populations using MS/MS structural analysis.Proc. Natl. Acad. Sci. U.S.A.112E4147–E4155. 10.1073/pnas.1503106112
38
LiD.HeilingS.BaldwinI. T.GaquerelE. (2016). Illuminating a plant’s tissue-specific metabolic diversity using computational metabolomics and information theory.Proc. Natl. Acad. Sci. U.S.A.113E7610–E7618. 10.1073/pnas.1610218113
39
LiS. F.MillikenO. N.PhamH.SeyitR.NapoliR.PrestonJ.et al (2009). The Arabidopsis MYB5 transcription factor regulates mucilage synthesis, seed coat development, and trichome morphogenesis.Plant Cell2172–89. 10.1105/tpc.108.063503
40
MetsaluT.ViloJ. (2015). ClustVis: a web tool for visualizing clustering of multivariate data using Principal component analysis and heatmap.Nucleic Acids Res.43W566–W570. 10.1093/nar/gkv468
41
MiettinenK.DongL.NavrotN.SchneiderT.BurlatV.PollierJ.et al (2014). The seco-iridoid pathway from Catharanthus roseus.Nat. Commun.53606. 10.1038/ncomms4606
42
MizutaniM.OhtaD. (2010). Diversification of P450 genes during land plant evolution.Annu. Rev. Plant Biol.61291–315. 10.1146/annurev-arplant-042809-112305
43
MorantM.JørgensenK.SchallerH.PinotF.MøllerB. L.Werck-ReichhartD.et al (2007). CYP703 is an ancient cytochrome P450 in land plants catalyzing in-chain hydroxylation of lauric acid to provide building blocks for sporopollenin synthesis in pollen.Plant Cell191473–1487. 10.1105/tpc.106.045948
44
MutwilM. (2020). Computational approaches to unravel the pathways and evolution of specialized metabolism.Curr. Opin. Plant Biol.5538–46. 10.1016/j.pbi.2020.01.007
45
MutwilM.KlieS.TohgeT.GiorgiF. M.WilkinsO.CampbellM. M.et al (2011). PlaNet: combined sequence and expression comparisons across plant networks derived from seven species.Plant Cell23895–910. 10.1105/tpc.111.083667
46
MutwilM.UsadelB.SchütteM.LoraineA.EbenhöhO.PerssonS. (2010). Assembly of an interactive correlation network for the Arabidopsis genome using a novel Heuristic Clustering Algorithm.Plant Physiol.15229–43. 10.1104/pp.109.145318
47
NettR. S.LauW.SattelyE. S. (2020). Discovery and engineering of colchicine alkaloid biosynthesis.Nature584148–153. 10.1038/s41586-020-2546-8
48
NützmannH. W.HuangA.OsbournA. (2016). Plant metabolic clusters – from genetics to genomics.New Phytol.211771–789. 10.1111/nph.13981
49
ÖhmanD.DemedtsB.KumarM.GerberL.GorzsásA.GoeminneG.et al (2013). MYB103 is required for FERULATE-5-HYDROXYLASE expression and syringyl lignin biosynthesis in Arabidopsis stems.Plant J.7363–76. 10.1111/tpj.12018
50
PaddonC. J.WestfallP. J.PiteraD. J.BenjaminK.FisherK.McPheeD.et al (2013). High-level semi-synthetic production of the potent antimalarial artemisinin.Nature496528–532. 10.1038/nature12051
51
PerssonS.WeiH.MilneJ.PageG. P.SomervilleC. R. (2005). Identification of genes required for cellulose synthesis by regression analysis of public microarray data sets.Proc. Natl. Acad. Sci. U.S.A.1028633–8638. 10.1073/pnas.0503392102
52
PhilippeG.SørensenI.JiaoC.SunX.FeiZ.DomozychD. S.et al (2020). Cutin and suberin: assembly and origins of specialized lipidic cell wall scaffolds.Curr. Opin. Plant Biol.5511–20. 10.1016/j.pbi.2020.01.008
53
PourcelL.RoutaboulJ.-M.KerhoasL.CabocheM.LepiniecL.DebeaujonI. (2005). TRANSPARENT TESTA10 encodes a laccase-like enzyme involved in oxidative polymerization of flavonoids in Arabidopsis seed coat.Plant Cell172966–2980. 10.1105/tpc.105.035154
54
ProostS.MutwilM. (2018). CoNekT: an open-source framework for comparative genomic and transcriptomic network analyses.Nucleic Acids Res.46W133–W140. 10.1093/nar/gky336
55
QuilichiniT. D.SamuelsA. L.DouglasC. J. (2014). ABCG26-mediated polyketide trafficking and hydroxycinnamoyl spermidines contribute to pollen wall exine formation in Arabidopsis.Plant Cell264483–4498. 10.1105/tpc.114.130484
56
RaoX.DixonR. A. (2019). Co-expression networks for plant biology: why and how.Acta Biochim. Biophys. Sin. (Shanghai)51981–988. 10.1093/abbs/gmz080
57
RonanT.QiZ.NaegleK. M. (2016). Avoiding common pitfalls when clustering biological data.Sci. Signal.9re6–re6. 10.1126/scisignal.aad1932
58
RuprechtC.MendrinnaA.TohgeT.SampathkumarA.KlieS.FernieA. R.et al (2016). Famnet: a framework to identify multiplied modules driving pathway expansion in plants.Plant Physiol.1701878–1894. 10.1104/pp.15.01281
59
SchilmillerA. L.MinerD. P.LarsonM.McDowellE.GangD. R.WilkersonC.et al (2010). Studies of a biochemical factory: tomato trichome deep expressed sequence tag sequencing and proteomics.Plant Physiol.1531212–1223. 10.1104/pp.110.157214
60
SchilmillerA. L.StoutJ.WengJ.-K.HumphreysJ.RueggerM. O.ChappleC. (2009). Mutations in the cinnamate 4-hydroxylase gene impact metabolism, growth and development in Arabidopsis.Plant J.60771–782. 10.1111/j.1365-313X.2009.03996.x
61
SchochG.GoepfertS.MorantM.HehnA.MeyerD.UllmannP.et al (2001). CYP98A3 from Arabidopsis thaliana is a 3’-hydroxylase of phenolic esters, a missing link in the phenylpropanoid pathway.J. Biol. Chem.27636566–36574. 10.1074/jbc.M104047200
62
SerinE. A. R.NijveenH.HilhorstH. W. M.LigterinkW. (2016). Learning from co-expression networks: possibilities and challenges.Front. Plant Sci.7:444. 10.3389/fpls.2016.00444
63
ShiM.-Z.XieD.-Y. (2014). Biosynthesis and metabolic engineering of anthocyanins in Arabidopsis thaliana.Recent Patents Biotechnol.847–60. 10.2174/1872208307666131218123538
64
SiboutR.ProostS.HansenB. O.VaidN.GiorgiF. M.Ho-Yue-KuangS.et al (2017). Expression atlas and comparative coexpression network analyses reveal important genes involved in the formation of lignified cell wall in Brachypodium distachyon.New Phytol.2151009–1025. 10.1111/nph.14635
65
SundellD.MannapperumaC.NetoteaS.DelhommeN.LinY. C.SjödinA.et al (2015). The plant genome integrative explorer resource: PlantGenIE.org.New Phytol.2081149–1156. 10.1111/nph.13557
66
TanQ. W.GohW.MutwilM. (2020). LSTrAP-cloud: a user-friendly cloud computing pipeline to infer coexpression networks.Genes11428. 10.3390/genes11040428
- CrossRef
- Google Scholar
67
TanQ. W.MutwilM. (2019). Inferring biosynthetic and gene regulatory networks from Artemisia annua RNA sequencing data on a credit card-sized ARM computer.Biochim. Biophys. Acta Gene Regul. Mech.1863194429. 10.1016/j.bbagrm.2019.194429
68
ThodeyK.GalanieS.SmolkeC. D. (2014). A microbial biomanufacturing platform for natural and semi-synthetic opiates.Nat. Chem. Biol.10837. 10.1038/nchembio.1613
69
TissierA. (2012). Glandular trichomes: what comes after expressed sequence tags?Plant J.7051–68. 10.1111/j.1365-313X.2012.04913.x
70
TohgeT.FernieA. R. (2020). Co-regulation of clustered and neo-functionalized genes in plant-specialized metabolism.Plants (Basel)9622. 10.3390/plants9050622
71
TohgeT.WendenburgR.IshiharaH.NakabayashiR.WatanabeM.SulpiceR.et al (2016). Characterization of a recently evolved flavonol-phenylacyltransferase gene provides signatures of natural light selection in Brassicaceae.Nat. Commun.712399. 10.1038/ncomms12399
72
Torrens-SpenceM. P.FallonT. R.WengJ. K. (2016). “Chapter four – a workflow for studying specialized metabolism in nonmodel eukaryotic organisms,” in Methods in Enzymology Synthetic Biology and Metabolic Engineering in Plants and Microbes Part B: Metabolism in Plants, ed.O’ConnorS. E. (Cambridge, MA: Academic Press), 69–97. 10.1016/bs.mie.2016.03.015
73
TzfadiaO.DielsT.De MeyerS.VandepoeleK.AharoniA.Van De PeerY. (2016). CoExpNetViz: comparative co-expression networks construction and visualization tool.Front. Plant Sci.6:1194. 10.3389/fpls.2015.01194
74
UsadelB.ObayashiT.MutwilM.GiorgiF. M.BasselG. W.TanimotoM.et al (2009). Co-expression tools for plant biology: opportunities for hypothesis generation and caveats.Plant Cell Environ.321633–1651. 10.1111/j.1365-3040.2009.02040.x
75
VaneechoutteD.VandepoeleK. (2019). Curse: building expression atlases and co-expression networks from public RNA-Seq data.Bioinformatics352880–2881. 10.1093/bioinformatics/bty1052
76
WaeseJ.FanJ.PashaA.YuH.FucileG.ShiR.et al (2017). ePlant: visualizing and exploring multiple levels of data for hypothesis generation in plant biology[OPEN].Plant Cell291806–1821. 10.1105/tpc.17.00073
77
Warde-FarleyD.DonaldsonS. L.ComesO.ZuberiK.BadrawiR.ChaoP.et al (2010). The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function.Nucleic Acids Res.38W214–W220. 10.1093/nar/gkq537
78
XuJ.WangX.GuoW. (2015). The cytochrome P450 superfamily: key players in plant development and defense.J. Integr. Agric.141673–1686. 10.1016/S2095-3119(14)60980-1
- CrossRef
- Google Scholar
79
ZhangZ.-B.ZhuJ.GaoJ.-F.WangC.LiH.LiH.et al (2007). Transcription factor AtMYB103 is required for anther development by regulating tapetum development, callose dissolution and exine formation in Arabidopsis.Plant J.52528–538. 10.1111/j.1365-313X.2007.03254.x
80
ZhaoQ.DixonR. A. (2011). Transcriptional networks for lignin biosynthesis: more complex than we thought?Trends Plant Sci.16227–233. 10.1016/j.tplants.2010.12.005
81
ZhouJ.LeeC.ZhongR.YeZ.-H. (2009). MYB58 and MYB63 are transcriptional activators of the lignin biosynthetic pathway during secondary cell wall formation in Arabidopsis.Plant Cell21248–266. 10.1105/tpc.108.063321

Summary

Keywords

transcriptomics, co-expression, clustering, enrichment, online, metabolism

Citation

Delli-Ponti R, Shivhare D and Mutwil M (2021) Using Gene Expression to Study Specialized Metabolism—A Practical Guide. Front. Plant Sci. 11:625035. doi: 10.3389/fpls.2020.625035

Received

02 November 2020

Accepted

30 November 2020

Published

12 January 2021

Volume

11 - 2020

Edited by

Meng Xie, Brookhaven National Laboratory, United States

Reviewed by

Xu Lu, China Pharmaceutical University, China; Tao Yao, Oak Ridge National Laboratory (DOE), United States

Updates

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Marek Mutwil, mutwil@ntu.edu.sg; mr_mutwil@hotmail.com

This article was submitted to Plant Systems and Synthetic Biology, a section of the journal Frontiers in Plant Science

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Plant Systems and Synthetic Biology

REVIEW article

Using Gene Expression to Study Specialized Metabolism—A Practical Guide

Abstract

Introduction

Correlating Metabolite Presence and Gene Expression

Using Guide Genes to Identify Biosynthetic Pathways

Uncovering Functionally Related Genes by the Co-Expression List Analysis

Hierarchical Clustering Analysis

Co-Expression Network Analysis—Searching With a Query Gene

Identifying Functionally Related Genes by Custom Network Analysis

Searching Co-Expression Clusters for Enriched Biosynthetic Pathways

Performing Your Own Analysis With Existing Tools or Your Own Data

Is Co-Expression a Silver Bullet in Biosynthetic Pathway Discovery? Not Quite

Conclusion and Future Perspectives

Statements

Author contributions

Funding

Conflict of interest

Supplementary material

Footnotes

References

Summary

Outline

Figures

Cite article

Article metrics

REVIEW article

Using Gene Expression to Study Specialized Metabolism—A Practical Guide

Abstract

Introduction

Correlating Metabolite Presence and Gene Expression

Using Guide Genes to Identify Biosynthetic Pathways

Uncovering Functionally Related Genes by the Co-Expression List Analysis

Hierarchical Clustering Analysis

Co-Expression Network Analysis—Searching With a Query Gene

Identifying Functionally Related Genes by Custom Network Analysis

Searching Co-Expression Clusters for Enriched Biosynthetic Pathways

Performing Your Own Analysis With Existing Tools or Your Own Data

Is Co-Expression a Silver Bullet in Biosynthetic Pathway Discovery? Not Quite

Conclusion and Future Perspectives

Statements

Author contributions

Funding

Conflict of interest

Supplementary material

Footnotes

References

Summary

Outline

Figures

Cite article

Share article

Article metrics