Shikimate and Phenylalanine Biosynthesis in the Green Lineage

The shikimate pathway provides carbon skeletons for the aromatic amino acids l-tryptophan, l-phenylalanine, and l-tyrosine. It is a high flux bearing pathway and it has been estimated that greater than 30% of all fixed carbon is directed through this pathway. These combined pathways have been subjected to considerable research attention due to the fact that mammals are unable to synthesize these amino acids and the fact that one of the enzymes of the shikimate pathway is a very effective herbicide target. However, in addition to these characteristics these pathways additionally provide important precursors for a wide range of important secondary metabolites including chlorogenic acid, alkaloids, glucosinolates, auxin, tannins, suberin, lignin and lignan, tocopherols, and betalains. Here we review the shikimate pathway of the green lineage and compare and contrast its evolution and ubiquity with that of the more specialized phenylpropanoid metabolism which this essential pathway fuels.


INTRODUCTION
The shikimate pathway is closely interlinked with those of the aromatic amino acids (L-tryptophan, l-phenylalanine, and Ltyrosine) and in land plants bears very high fluxes with estimates of the amount of fixed carbon passing through the pathway varying between 20 and 50% (Weiss, 1986;Corea et al., 2012;Maeda and Dudareva, 2012). Considerable research focus has been placed on this pathway since the aromatic amino acids are not produced by humans and monogastric livestock and are therefore an important dietary component (Tzin and Galili, 2010). Furthermore, one of the enzymes of the pathway -5-enolpyruvalshikimate-3phosphate synthase (EPSP) -is one of the most widely employed herbicide target sites (see, Duke and Powles, 2008). Moreover, as we have recently described, plant phenolic secondary metabolites and their precursors are synthesized via the pathway of shikimate biosynthesis and its numerous branchpoints (Tohge et al., 2013). The shikimate pathway is highly conserved being found in fungi, bacteria, and plant species wherein it operates in the biosynthesis of not just the three aromatic amino acids described above but also of innumerable aromatic secondary metabolites such as alkaloids, flavonoids, lignins, and aromatic antibiotics. Many of these compounds are bioactive as well as playing important roles in plant defense against biotic and abiotic stresses and environmental interactions (Hamberger et al., 2006;Maeda and Dudareva, 2012), and as such are highly physiologically important. It is estimated that under normal conditions as much as 20% of the total fixed carbon flows through to shikimate pathway (Ni et al., 1996), with greater carbon flow through the pathway under times of plant stress or rapid growth (Corea et al., 2012). Given its importance it is perhaps not surprising that all members of biosynthetic genes and corresponding enzymes involved in shikimate pathway have been characterized in model plants such as Arabidopsis. Cross-species comparison of the shikimate biosynthetic enzymes has revealed that they share sequence similarity, divergent evolution, and commonality in reaction mechanisms (Dosselaere and Vanderleyden, 2001). However, all other species vary considerably from fungi which has evolved a complex system with a single pentafunctional polypeptide known as the AroM complex which performs five consecutive reactions (Lumsden and Coggins, 1977;Duncan et al., 1987). In this review we will summarize current knowledge concerning the genetic nature of this pathway focusing on cross-species comparisons bridging a wide range of species including algae (Chlamydomonas reinhardtii, Volvox carteri, Micromonas sp., Ostreococcus tauri, Ostreococcus lucimarinus), moss (Selaginella moellendorffii, Physcomitrella patens), monocots (Sorghum bicolor, Zea mays, Brachypodium distachyon, Oryza sativa ssp. japonica and Oryza sativa ssp. indica), and dicots (Vitis vinifera, Theobroma cacao, Carica papaya, Arabidopsis thaliana, Arabidopsis lyrata, Populus trichocarpa, Ricinus communis, Manihot esculenta, Malus domestica, Fragaria vesca, Glycine max, Lotus japonicus, Medicago truncatula) species (Table 1). Finally, we compare and contrast the evolution of this pathway with that of the more specialized pathways of phenylpropanoid biosynthesis.

SHIKIMATE BIOSYNTHESIS AND PHENYLALANINE DERIVED SECONDARY METABOLISM IN PLANTS
Given that phenolic secondary metabolites which are derived from phenylalanine via shikimate biosynthesis are widely distributed in plants and other eukaryotes, genes encoding shikimate biosynthetic enzymes are generally highly conserved in nature. Eight and two reactions are involved in shikimate and phenylalanine biosynthesis, respectively. Both members of all gene families and the corresponding biosynthetic enzymes involved in these www.frontiersin.org pathways have been characterized in model plants such as Arabidopsis ( Figure 1A). In contrast, phenolic secondary metabolites derived from phenylalanine display considerable species-specific distribution with the phenolic secondary metabolites have been found in plant kingdom such as coumarin derivatives, monolignal, lignin, spermidin derivatives, flavonoid, tannin being present in specific families within the green lineage ( Figure 1B). This diversity has arisen by the action of diverse evolutionary strategies for example gene duplication and cis-regulatory evolution in order to adapt to prevailing environmental conditions. Given their species-specific distribution, the genes involved in plant phenolic secondary metabolism such as phenylammonia-lyase (PAL), polyketide synthase (PKS), 2-oxoglutarate-dependent deoxygenases (2ODDs), and UDP-glycosyltransferases (UGTs) are frequently used as case studies of plant evolution (Tohge et al., 2013). Despite the fact that shikimate-phenylalanine biosynthetic genes are well conserved in all species including algae species, phenolic secondary metabolism related orthologous genes were not detected in all algae species ( Table 2, Tohge et al., 2013). This result suggests a considerably more ancient origin of the shikimatephenylalanine pathways. In the next sections, we will discuss the evolution of shikimate-phenylalanine pathways focusing on crossspecies comparisons for each gene encoding on of the constituent enzymes of either pathway.

3-DEOXY-D-ARABINO-HEPTULOSONATE 7-PHOSPHATE SYNTHASE
The first enzymatic step of the shikimate pathway, 3-deoxy-D-arabino-heptulosonate 7-phosphate synthase (DAHPS), catalyzes an aldol condensation of phosphoenolpyruvate (PEP), and D-erythrose 4-phosphate (E4P) to produce 3-deoxy-D-arabinoheptulosonate 7-phosphate (DAHP) (Figure 1). According to their protein structure, DAHPSs can be clustered into two distinct homology classes. The microbe derived class I DAHPS contain a bifunctional chorismate mutase (CM)-DAHPS domains, for that reason microbial DAHPSs, for example, E. coli (AroF, G, and H) and S. cerevisiae (Aro3 and 4), are classified as class I DAHPSs. By contrast, class II DAHPS were previously thought to be present only in plant species, but have subsequently been reported in certain microbes such as Streptomyces coelicolor, Streptomyces rimosus, and Neurospora crassa (Bentley, 1990;Maeda and Dudareva, 2012  the evolutionary link between the two classes which are evolved from primitive unregulated member of class II DAHPS (Wu and Woodard, 2006). Class II plant DAHPSs have been reported from carrot roots (Suzich et al., 1985) and potato cell culture (Pinto et al., 1986;Herrmann and Weaver, 1999). DAHPS is encoded by three genes in the Arabidopsis genome (AtDAHPS1, AT4G39980; AtDAHPS2, At4g33510; AtDAHPS3, At1g22410). Orthologous gene search queries using the Arabidopsis DAHPSs, revealed a single gene in algae species (Chlamydomonas reinhardtii, Volvox carteri, Micromonas sp., and Ostreococcus tauri) and Lotus japonica but two to eight isoforms in other higher plant species ( Table 2). AtDAHPS1-type and AtDAHPS2 type genes display differential expression in Arabidopsis thaliana, Solanum lycopersicum, and Solanum tuberosum (Maeda and Dudareva, 2012). AtDAHPS1-type genes, which are additionally subject to redox regulation by the ferredoxin-thioredoxin system, exhibit significant induction by wounding and pathogen infection (Keith et al., 1991;Gorlach et al., 1995;Maeda and Dudareva, 2012), whereas AtDAHPS2 type genes display constitutive expression (Gorlach et al., 1995). A phylogenetic analysis of DAHPS genes reveals four major clades, (i) a microphyte clade, (ii) a bryophyte duplication clade, (iii) monocot and dicot woody species clade, (iv) a AtDAHPSs clade (Figure 2Aa). Furthermore, major clade iv has four sub-groups, (iv-a) AtDAHPS2 group, (iv-b) monocot, (iv-c) AtDAHPS1 group and (iv-d) AtDAHP3 group. This result indicates that the constitutively expressed AtDAHPS1 and the stress responsive AtDAHPS 3 type genes display well conserved sequence between species (clade iv-c and iv-d), whereas the second constitutively expressed AtDAHPS2 type genes are clearly separated between monocot and dicot species (clade iv-a).

3-DEHYDROQUINATE SYNTHASE
The second step of the shikimate pathway is catalyzed by 3dehydroquinate synthase (DHQS), an enzyme which promotes www.frontiersin.org    the intramolecular exchange of the DAHP ring oxygen with carbon 7 to convert DAHP into 3-dehydroquinate. Unlike the fungal situation detailed above, the plant DHQS gene is monofunctional and only found as a single copy in all species with the exception Glycine max which harbors two genes in its genome (Figure 2Ab). Phylogenetic analysis of DHQS genes reveals three major clades consisting of (i) microphyte (ii) bryophyte, (iii) monocot, (iv) Brassicaceae, and (v) dicot species. Intriguingly, by contrast to other shikimate biosynthetic genes, gene expression of DHQS gene is not well correlated to phenylpropanoid production in Arabidopsis (Hamberger et al., 2006).

3-DEHYDROQUINATE DEHYDRATASE/SHIKIMATE DEHYDROGENASE
3-Deoxy-d-arabino-heptulosonate 7-phosphate is converted to 3-dehydroquinate by the bifunctional enzyme 3-dehydroquinate dehydratase/shikimate dehydrogenase (DHQD/SD), which catalyzes firstly the dehydration of DAHP to 3-dehydroshikimate and consequently the reversible reduction of this intermediate to shikimate using NADPH as co-factor. DHQD/SD exists in three forms; bacterial specific class I shikimate dehydrogenases (AroE type), class II shikimate/quinate dehydrogenases (YdiB type), and class III of shikimate dehydrogenase-like (SHD-l type) (Michel et al., 2003;Singh et al., 2005). In plants class IV, enzymatic activity of DHQD is 10 times higher than SD activity indicating that the amount of 3-dehydroshikimate will be more than sufficient to support flux through the shikimate pathway (Fiedler and Schultz, 1985). This bifunctional enzyme plays an important role in regulating metabolism of several phenolic secondary metabolic pathways (Bentley, 1990;Ding et al., 2007). In general, seed plants contain a single DHQD/SD gene which contains a sequence encoding a plastic transit peptide in their genome (Maeda et al., 2011, Table 2). However, an exception to this statement is Nicotiana tabacum which contains two genes in its genome. Intriguingly, silencing of NtDHD/SHD-1 results strong growth inhibition and reduction of the level of aromatic amino acids, chlorogenic acid, and lignin contents (Ding et al., 2007), however, a second cytosolic isoform can compensate for the production of shikimate but not at the phenotypic level. On a more general basis phylogenetic analysis reveals that microphytes also contain a low number of DHQD/SD genes (between one and two), whilst clear separation between (i) the microphyte clade, (ii) bryophyte clade, (iii) monocot clade, (iv) woody species-specific tandem gene duplication clade, and (v) dicot clades could be observed (Figure 2Ac; Table 2). Interestingly, the observation of the woody species-specific tandem gene duplication clade suggests that these species evolved after DHQD/SD gene duplication. The cytosolic localization of NtDHD/SHD-2 is intriguing since the presence of DAHP synthase, ESPS synthase and CM isoforms lacking N-terminal plastid targeting sequences has been reported (d 'Amato, 1984;Mousdale and Coggins, 1985;Ganson et al., 1986). Furthermore, the findings that both ESPS synthase and shikimate kinase (SK) are active even when they retain their target sequences (Dellacioppa et al., 1986;Schmid et al., 1992) suggests that they could also potentially be constituents of a cytosolic pathway. Finally, experiments in which isolated and highly pure mitochondria were supplied with 13 C labeled glucose to investigate the binding of the cytosolic isoforms of glycolysis (Giege et al., 2003) also revealed 13 C enrichment in shikimate (Sweetlove and Fernie, 2013), indicating that a full cytosolic pathway is likely also in this species.

SHIKIMATE KINASE
The fifth reaction of the shikimate pathway is catalyzed by SK which catalyzes the ATP-dependent phosphorylation of shikimate to shikimate 3-phospate (S3P). E. coli has two SKs, one of class I (AroL type) and one of II (AroK type) which share only 30% sequence identity (Griffin and Gasson, 1995;Whipp and Pittard, 1995;Herrmann and Weaver, 1999). In plants, different numbers of SK isoforms are found in several species; only one in green algae, lycophytes, and bryophytes but between one and three in monocot and dicot plants ( Table 2). A phylogenetic analysis of SK genes presents five major clades consisting of (i) microphyte, (ii) bryophyte, (iii) dicot woody species-specific clade, (iv) monocot clade, and (v) dicot species clade (Figure 2Ad). Anaylsis of the SK protein of Spinacia olerancea revealed that it was modulated by energy status and is therefore similar to bacterial SK protein and other ATP-utilizing enzymes (Pacold and Anderson, 1973;Huang et al., 1975;Schmidt et al., 1990). For this reason it has recently been postulated that SK may link to energy requiring shikimate pathway to the cellular energy balance (Maeda and Dudareva, 2012), however, direct experimental support for this hypothesis is currently lacking. In Arabidopsis, homologous genes named SKL1 and SKL2, which are functionally required for chloroplast biogenesis have been demonstrated to have arisen from SK gene duplication (Fucile et al., 2008). SKL1 and SKL2 orthologs have been found in several seed plant species, but not in green algae ( Table 2).

5-ENOLY PYRUVYLSHIKIMATE 3-PHOSPHATE SYNTHASE
The 5-enolypyruvylshikimate 3-phosphate synthase (EPSPS, 3phosphoshikimate 1-carboxyvintltransferase) is the sixth step and here a second PEP is condensed with S3P to form 5enolpyruvylshiukimate 3-phosphate (EPSP). Since EPSPS is the only known target for the herbicide glyphosate (Steinrucken and Amrhein, 1980), isoforms of this enzyme are often classified according to their sensitivity of glyphosate, glyphosate sensitive EPSPS class I is present in bacteria and plant species, whilst glyphosate insensitive EPSPS class II which has been reported in certain bacteria such as Agrobacterium (Fucile et al., 2011). In plants, different number of EPSPS isoforms is found in several species; only a single isoform in green algae, lycophytes, and bryophytes, but either one or two are found in monocot and dicot species ( Table 2). Phylogenetic analysis of EPSPS genes revealed, atypically for genes associated with shikimate metabolism, that five major groups could be observed; (i) microphyte, (ii) bryophyte, (iii) Brassicaceae specific clade, (iv) monocot species, and (v) dicot species clade (Figure 2Ae). There are clear indications that duplicated EPSPS genes in Arabidopsis, apple, grapevine, soybean, and poplar are the result of independent duplication events within their lineages with both copies being maintained in Arabidopsis (Hamberger et al., 2006), however, the reason for the unique divergence in this gene of the pathway is currently unclear.

CHORISMATE SYNTHASE
Chorismate, the final product of the shikimate pathway, is subsequently formed by chorismate synthase (CS) which catalyzes the trans-1,4 elimination of phosphate from EPSP. CSs are categorized within one of two functional groups (i) fungal type bifunctional CS which are associated with NADPH-dependent flavin reductase or (ii) bacterial and plant type monofunctional CSs (Schaller et al., 1991;Maeda and Dudareva, 2012). The reaction catalyzed by CS requires flavin mononucleotide (FMN) and its overall reaction is redox neutral (Ramjee et al., 1991;Macheroux et al., 1999;Maclean and Ali, 2003). The FMN represents supplies an electron donor for EPSP which facilitates the cleavage of phosphate. The first cloned plant CS gene was that from C. sempervirens (Schaller et al., 1991) which contains a sole CS in its genome. Given that this gene has a 5 plastid import signal sequence, these results indicate that there may be no CS outside of the plastid this species. Surveying other species revealed that one to two CS genes were present in green algae, lycophytes, and bryophytes as well as dicot specie but that one to three are present in the genomes of apple and leguminous species ( Table 2). A phylogenetic analysis of CS genes reveals three major clades constituted by (i) microphyte, (ii) monocot, (iii) dicot species (Figure 2Af).

CHORISMATE MUTASE
Chorismate mutase catalyzes the first step of phenylalanine and tyrosine biosynthesis and additionally represents a key step of toward the branch split of tryptophan biosynthesis. CM catalyzes the transformation of chorismate to prephenate via a Claisen rearrangement. The bacterial minor CM proteins (AroQ type, class I CM) display monofunctional enzymatic activity whilst several bifunctional CMs such as CM-PDT, CM-PDH, and CM-DAHP have been additionally been found in fungi and bacteria (class II CM, Euverink et al., 1995;Romero et al., 1995;Chen et al., 2003;Baez-Viveros et al., 2004). In spite of the fact of only one CM gene is present in algae and lycophyte genomes, more a single gene copy (two to five) are found in bryophytes as well as monocot and dicot species ( Table 2). In seed plants, the CM1 bears a putative plastid transit peptide, but CM2 does not and is additionally usually insensitive to allosteric regulation by aromatic amino acids (Benesova and Bode, 1992;Eberhard et al., 1996;Maeda and Dudareva, 2012). Several plant species, especially dicot plants, have an additional CM3 family gene which displays high sequence similarity to CM2 yet bears a putative plastid transit peptide. For example, Arabidopsis has three isozymes named AtCM1 (At3g29200), AtCM2 (At5g10870), and AtCM3 (At1g69370) (Mobley et al., 1999;Tzin and Galili, 2010). Phylogenetic analysis of the CS genes reveals three major clades constituting of (i) AtCM2 clade, (ii) microphyte and bryophyte clade, and (iii) AtCM2 clade (Figure 2Ba). Additionally, clade iii shows two sub-groups, (iii-a) AtCM3 sub-groups and (iii-b) AtCM1 sub-group (Figure 2Ba) (Eberhard et al., 1996). In spite of that the CM2 sub-group contains all species of seed plants, monocot species are not contained into AtCM3 sub-group.
Recently the importance of CM has been extended beyond intracellular metabolism, In Zea mays, the chorismate mutase Cmu1 secreted by Ustilago maydis, a widespread pathogen characterized by the development of large plant tumors and commonly known as smut, is a virulence factor. The uptake of the Ustilago CMu1 protein by plant cells allows rerouting of plant metabolism and changes the metabolic status of these cells via metabolic priming (Djamei et al., 2011). It now appears that secreted CMs are found in many plant-related microbes and this form of host www.frontiersin.org manipulation would appear to be a general weapon in the arsenal of plant pathogens.

PREPHENATE AMINOTRANSFERASE AND AROGENATE DEHYDRATASE
Prephenate aminotransferase (PAT) and arogenate dehydratase (ADT) catalyze the final steps for production of phenylalanine. Whilst ADT was first cloned in 2007 (Cho et al., 2007;Huang et al., 2010), it is only more recently that PAT was cloned. Papers published in 2011 identified PAT in Petunia hybrid, Arabidopsis thaliana, and Solanum lycopersicum (Dal Cin et al., 2011;Maeda et al., 2011) and established that it directs carbon flux from prephenate to arogenate but also that it is strongly and coordinately upregulated with genes of primary metabolism and phenylalanine derived flavor volatiles. In plant species, a different number of PAT isoforms have been found. Although green algae only contain single PAT and ADT genes, monocot species have between one and two PATs and between two and four ADTs whilst dicot plants genomes contain the same number of PATs but two to eight ADTs ( Table 2). Phylogenetic analysis of PAT genes shows three major clades of (i) microphyte, (ii) monocot, and (iii) dicot species (Figure 2Bb).

GENES INVOLVED IN PLANT PHENOLIC SECONDARY METABOLISMS
Phenolic secondary metabolism displays an immense chemical diversity due to the evolution of enzymatic genes which are involved in the various biosynthetic and decorative pathways. Such variation is caused by diversity and redundancy of several key genes of phenolic secondary metabolism such as PKSs, cytochrome P450s (CYPs), Fe 2+ /2-oxoglutarate-dependent dioxygenases (2ODDs), and UDP-glycosyltransferases (UGTs). On the other hand, there are other general phenylpropanoid related biosynthetic genes, phenylalanine ammonia-lyase (PAL), cinnamate 4-hydroxylase (C4H), and 4-coumarate:coenzyme A ligase (4CL), which are required in order to differentiate various classes of phenolic secondary metabolism. All of these core genes encode important enzymes which activate a number of hydroxycinnamic acids to provide precursors for the biosynthesis of lignins, monolignals, and indeed all other major phenolic secondary metabolites in higher plants (Lozoya et al., 1988;Allina et al., 1998;Hu et al., 1998;Ehlting et al., 1999;Lindermayr et al., 2002;Hamberger and Hahlbrock, 2004). Since phenolic secondary metabolism display considerable species-specificity, investigation of the genes encoding the responsible biosynthetic enzymes are frequently used as an example of chemotaxonomy for understanding plant evolution. However, considering the evolution of these genes in isolation is rather restrictive a deeper understanding is provided by combining this with investigation of the evolution of the shikimate-phenylalanine biosynthetic genes in the green lineage.

CONCLUSION
During the long evolutionary period covered from aquatic algae to land plants, plants have adapted to the environmental niches with the evolutionary strategies such as gene duplication and convergent evolution by the filtration of natural selection. Genes of plant shikimate biosynthesis have evolved accordingly (Figure 3).
In this review, we demonstrated that biosynthetic genes of aromatic amino acid primary metabolism are well conserved between algae and all land plants. However, in contrast to algae species which have neither isoforms nor duplicated genes in their genomes, all land plants harbor gene duplications including tandem gene duplications which are particularly prominent in the cases of DAHPS, DHQD/SD, CS, CM, and ADT ( Figure 3A; Table 2). Our phylogenetic analysis revealed clear separation between algae, monocots, dicots, woody species, and leguminous plants. Analysis of the presence and copy number of key genes across these species gives several hints as to how to improve our understanding of the scaffold from which these genes have evolved. However, the exact evolutionary pressures on genes of shikimate biosynthesis including the unique occurrence of the Arom complex will require considerable further studies. That said it is intriguing to compare and contrast biosynthetic genes of those downstream of them in the production of plant phenolics ( Figure 3B). Interestingly, shikimate pathway genes are ubiquitous across the green lineage whilst this cannot be said for all downstream genes of phenylpropanoid biosynthesis. Furthermore, there is a much greater gene duplication within phenylpropanoid than shikimate biosynthesis ( Figure 3A; Table 2). This fact also reflected in the level of chemical diversity of the respective pathways with the essentiality of the shikimate pathway preventing much diversity, but phenylpropanoid species often being redundant in function to one another. It would seem likely that the phenylpropanoid pathway initially arose via mutations accumulating in the shikimate pathway genes. However, whilst these were potentially beneficial in land plants for reasons we discuss in our recent review of these compounds (Tohge et al., 2013) they do not appear to share the essentiality of shikimate across the entire green lineage.