Vascular gene expression: a hypothesis

The phloem is the conduit through which photoassimilates are distributed from autotrophic to heterotrophic tissues and is involved in the distribution of signaling molecules that coordinate plant growth and responses to the environment. Phloem function depends on the coordinate expression of a large array of genes. We have previously identified conserved motifs in upstream regions of the Arabidopsis genes, encoding the homologs of pumpkin phloem sap mRNAs, displaying expression in vascular tissues. This tissue-specific expression in Arabidopsis is predicted by the overrepresentation of GA/CT-rich motifs in gene promoters. In this work we have searched for common motifs in upstream regions of the homologous genes from plants considered to possess a “primitive” vascular tissue (a lycophyte), as well as from others that lack a true vascular tissue (a bryophyte), and finally from chlorophytes. Both lycophyte and bryophyte display motifs similar to those found in Arabidopsis with a significantly low E-value, while the chlorophytes showed either a different conserved motif or no conserved motif at all. These results suggest that these same genes are expressed coordinately in non-vascular plants; this coordinate expression may have been one of the prerequisites for the development of conducting tissues in plants. We have also analyzed the phylogeny of conserved proteins that may be involved in phloem function and development. The presence of CmPP16, APL, FT, and YDA in chlorophytes suggests the recruitment of ancient regulatory networks for the development of the vascular tissue during evolution while OPS is a novel protein specific to vascular plants.


INTRODUCTION
The vascular tissue played an essential role in the adaptation of plants to land, allowing the great diversification of tracheophytes, plants harboring conducting tissues. While non-vascular land plants are also quite diverse and include species that have successfully colonized a variety of niches, the sheer number of tracheophyte species as well as its diversity in geographical distribution, shape and size is overwhelming. This in no small part has resulted from this evolutionary innovation.
The evolution of specialized cell types that gave rise to conducting tissues allowed the distribution of nutrients to all plant organs, which evidently also led to cell and tissue specialization, or, as is also termed, a division of labor. The vascular tissue gave rise more recently to two well-defined transport systems within plants, the xylem and the phloem. While the first one has a direct role in the delivery of water and mineral nutrients from roots to aerial tissues, the phloem is involved in the transport of fixed carbon as well as other nutrients from photosynthetic to heterotrophic tissues. In addition, a wealth of evidence indicates that the vasculature functions in signaling between distant tissues (Lucas et al., 2013). Indeed, on one hand, several responses to environmental cues, as well as to a genetic program, imply such long-distance signaling. For example, response to drought stress is regulated by abscisic acid (ABA) transport from roots to shoots via the xylem (Sauter et al., 2001); the control of nodule number in legumes results from the transport of a signaling molecule from leaves to roots, presumably through the phloem (Krusell et al., 2002). Flower induction, the paradigm of phloem interorgan signaling, which has been extensively reviewed (Corbesier et al., 2007;Lin et al., 2007;Tamaki et al., 2007), involves the transport from photosynthetic leaves to the shoot apex of a protein, FLOWERING LOCUS T (FT; and possibly also RNA). Thus, it is clear that the vascular tissue, and in particular the phloem, the focus of this work, plays an essential role in plant adaptation.
Plants include the tracheophytes, chlorophytes, (single-celled and colonial taxa) and bryophytes. Although obviously the extant representatives of chlorophytes and non-vascular bryophytes cannot by any means be considered primitive or less evolved than vascular plants, it can be argued that they share some primitive features common to all plants ancestors, such as unicellularity or less modified cell types. These can also illustrate the likely steps (not necessarily in chronological order, and most probably occurring simultaneously and evolving independently in different plant lineages) that gave rise to modern vascular plants, evidently starting with the evolution of multicellularity. Next, specialization occurred in such manner that originated heterotrophic cells, including those that were able to absorb mineral nutrients and water from soil as well as those that gave rise to reproductive tissue (Lucas et al., 2013). It can be envisaged that other events in the early evolution of land plants involved the establishment of novel developmental programs that resulted in sieve cells (as in gymnosperms) and eventually in sieve elements on one hand, and in vessel elements on the other. The genetic networks underlying such processes have been intensely studied, more so in the case of xylem differentiation, in which case a study using comparative genomics revealed that xylem transcriptomes were more conserved during evolution than other tissues transcriptomes in vascular plants, being the functional domains of genes specially conserved pointing to the presence of an ancestral xylem transcriptome; also, a phylogenetic analysis showed that evolution of xylem transcriptome follows the branch divergence patterning than plant species (Li et al., 2010). Less is known on the pathways leading to Companion Cell-Sieve Element (CC-SE) differentiation, although recent work has helped identify key proteins involved in such process.
Efforts have been made to identify genes involved in the transition from unicellular algae to multicellular land plants as well as the transition from non-vascular to vascular plants, uncovering that most of the genes involved in vasculature formation and differentiation were already present in non-vascular plants; so the evolution of vasculature involved, additionally to some innovations, the co-option and integration of ancestral developmental pathways (Banks et al., 2011).
It is likely that reprogramming the expression of certain genes must have been important during the evolution of the vasculature. In particular, it is conceivable that the evolution and development of new structures or tissues in any organism necessarily involves the coordinated expression of a set of genes, both temporally and spatially.
In the present work, we addressed the evolution of vascular gene expression by examining regulatory upstream regions of genes from various species, the homologs of which in pumpkin encode transcripts found in pumpkin phloem sap exudates, ranging from chlorophytes to monocots and dicots, but also including a plant lacking true vascular tissue, and another one that displays "primitive" conducting tissues. We selected these genes that have the highest number of ESTs found in phloem sap exudates, although it is clear that these may be expressed in other tissues in addition to phloem. Furthermore, the upstream regions of these genes from Arabidopsis (termed SETP, for Sieve Element Transcript Promoters set) harbor an overrepresented common motif, directing gene expression in the vasculature in mature leaves (Ruiz-Medrano et al., 2011). It must be mentioned that several of the analyzed genes from Arabidopsis have not been characterized in detail and thus the transcriptional start site is not known. The present work is based on some assumptions: (1) Not all phloem sap transcripts are necessarily transported through the phloem, since some may originate from neighboring damaged cells; however, it is likely that most are synthesized in phloem cambium, CC or phloem parenchyma and thus reflect gene activity in the phloem, or at least in vascular tissues, as evidenced by in situ hybridization of pumpkin phloem sap transcripts and histochemical analysis of plants expressing GUS fusions to upstream regions of the Arabidopsis gene homologs for these transcripts (Ruiz-Medrano et al., 1999a,b;Xoconostle-Cázares et al., 1999;Ruiz-Medrano et al., 2007. (2) Gene promoters harboring common motifs are expressed in a coordinated fashion; the more common motifs they share the coordination is tighter (spatially and/or temporally; see for example Harmer et al., 2000), although this should not be always necessarily the case. (3) The closest homolog to the corresponding pumpkin gene should be expressed in a similar fashion, i.e., will correspond to a gene also expressed in vascular tissue. This assumption is especially crucial when dealing with members of large gene families, not necessarily sharing a common expression pattern. (4) While the upstream regions of the analyzed genes are termed promoters, several have been poorly characterized and therefore may include 5 untranslated regions (UTR). Therefore, it is possible that several motifs found in Arabidopsis lie within the 5 UTR of these genes (Ruiz-Medrano et al., 2011).
There is an implicit limitation in this approach, and it is that the analyzed sequences belong to taxa whose genomes have been deciphered. These taxa correspond to plants with basic and/or economic importance; thus, some of them may not be representative from an evolutionary viewpoint.
The predicted phylogeny of some key proteins involved in phloem function and differentiation was also analyzed. The results suggest that most dicot and some monocot genes expressed in vascular tissue share GA/CT motifs, as found in Arabidopsis (Ruiz-Medrano et al., 2011). Interestingly, no such motifs were found in chlorophytes, and at least in one of those analyzed, no common motif could be discerned in this gene promoter set. 51 SETPHs were found in chlorophytes but not in all analyzed algal species, suggesting an ancient function of these genes not related with vascular expression (Figure 5, Table 2) Finally, the dendrograms suggest that phloem evolution occurred by the recruitment of preexisting genes involved in certain regulatory networks, as well as the appearance of novel genes altogether.

MATERIALS AND METHODS
The sequences analyzed are described in a previous work (Ruiz-Medrano et al., 2011). These correspond to genes for transcripts that have been found in pumpkin phloem sap exudates. It is reasonable to assume that the homologs from extant taxa are expressed in a similar manner, and that phloem sap transcripts result from the activity of gene promoters in the CC or neighboring cell types.

SEQUENCES
Upstream sequences from extant plant taxa were retrieved from phytozome (http://phytozome.net/). First, the protein sequences from Arabidopsis were obtained through the Arabidopsis Information Resource (TAIR; http://www.arabidopsis.org/). Afterwards, a BLAST search was carried out using the corresponding Arabidopsis protein sequence as query into the proteome database of interest; afterwards the upstream 1 kb sequences were obtained for each protein homolog from the data set (see Tables 1, 2, S1). 5 UTRs were included in case the transcriptional start site was not known for the gene in question.

GENE LIST
The Functional categorization was obtained from TAIR. The tissue specific gene expression was obtained form GENEVESTIGATOR (www.genevestigator.com/).

PROMOTER ANALYSIS
The upstream sequences were then compared with a probabilistic approach, using the Multiple EM for Motif Elicitation method (MEME; http://meme.nbcr.net/meme/; Bailey and Elkan, 1994) with the following settings: a maximum of 3 motifs per promoter, and 6-10 bases motif length. Also, the analyzed sequences were shuffled to provide for control sequences. The common motifs found for each promoter set were graphically represented using the Weblogo program (http://weblogo.threeplusone.com/create.cgi; Crooks et al., 2004).

PHYLOGENETIC ANALYSIS
Protein sequences for homologs of pumpkin CmPP16, and Arabidopsis FT, APL, Octopus and YDA were obtained from Phytozome. Protein sequences were retrieved from The Arabidopsis Information Resource database (www.arabidopsis.com). BLAST analysis was done in the Phytozome database (www.phytozome.net) and the most representative proteins in each taxa were selected. Aminoacid sequences were aligned using CLUSTAL X2 (Larkin et al., 2007) and edited manually with SEAVIEW4 program (Gouy et al., 2010). Maximum-likelihood and Neighbor-Joining phylogenies were generated form the alignments in MEGA5 (Tamura et al., 2011). The substitution matrix P-distance (Nei and Kumar, 2000) was used for Neighbor-joining and Dayhoff model (Schwarz and Dayhoff, 1979) for Maximum-likelihood reconstruction.
Pairwise deletion Gaps/missing data treatment was chosen for Neighbor-Joining method and Partial deletion with a site coverage cutoff of 95% for Maximum-likelihood method. An heuristic search was used for Maximum-likelihood initial tree as follows: maximum parsimony method was used when the number of common sites was 100 or less than one fourth of the total number of sites; otherwise, the BIONJ method with MCL distance matrix was used. Replicates consisting in 1000 bootstraps were used for the statistical support of all phylogenetic trees.

MOST EMBRYOPHYTES SHARE AN OVERREPRESENTED GA/CT MOTIF IN UPSTREAM REGIONS OF SETP HOMOLOGS (SETPHs)
It was assumed that the closest homologs to genes expressed in vascular tissue in Arabidopsis would have a similar expression pattern in other species. Therefore, a BLASTP search was carried out for each of the plant species for which its genome has been sequenced. Then, the corresponding 1 kb upstream region was retrieved. The list of 57 Arabidopsis genes and their promoters, used to search for their homologs in other plant species is shownt These genes showed overlapping functions, according to their GO, of which 26 (45%) encoded kinases, 20 (35%) for nucleotide binding proteins and 14 (25%) for transcription factors; also, in most of them phloem from different organs is the tissue in which tgese genes are generally expressed at higher levels (data not shown), according to the Genevestigator database (Table 1). These were termed SETPH, based on the nomenclature described before, for Sieve Element Transcript gene Promoter Homolog (Ruiz-Medrano et al., 2011). There is information on the vascular expression of some of these genes, such as At1g34260, At1g63700, At1g14205, At5g65210, At1g19220, At3g03770, At3g55470, and At5g66080 (Ruiz-Medrano et al., 2011). Interestingly, in chlorophytes several genes were not found, or the similarity was too weak to consider these as orthologs of the Arabidopsis genes ( Table 2). A probabilistic method, Multiple EM for Motif Elicitation (Bailey and Elkan, 1994) has been used previously in our group in order to detect common motifs in upstream regions of Arabidopsis homologs of pumpkin genes for transcripts isolated from phloem sap exudates (Ruiz-Medrano et al., 2011). Indeed, this method predicted the most abundant motif in this promoter set, GA/CT repeats, which coincided with those found using enumerative methods. More importantly, promoters harboring such repeats were found to be active in vascular tissues, as well as in other tissues. Also recently, the CT rich motifs were reported in CRF (cytokinin response factors) family genes which have strongest expression in phloem tissue, supporting the notion that these motifs are conserved and overrepresented in promoters of genes expressed in this tissue (Zwack et al., 2012). Thus, this method was used for detection of common motifs, if any, in the same gene promoter set in other species. These are shown in Table S1, which includes most members of Viridiplantae, the genomes of which have been deciphered. These are mostly embryophytes, including the moss Physcomitrella patens as the sole representative of bryophytes; a tracheophyte with a less developed conducting tissue, Selaginella moellendorffii; and several dicot and monocot species. Also included are five species of chlorophytes showing colonial organization (Volvox carteri) as well as unicellular algae. The results of this analysis suggest that most dicots harbor a GA/CT rich motif similar to the Arabidopsis SETPs (Figures 1A,B). The three motifs with the lowest E-value are shown for each SETPH. In addition, common motifs were searched for in the same, shuffled sequences as a control. A cutoff value of E = 10 −6 for Motif 1 was set; no common motifs were found in shuffled sequences whatsoever below this value. In the "native" sequences (i.e., not subjected to shuffling) those that did show such high values were present in few promoters and were quite degenerate. For example, in Coccomyxa subellipsoidea no motif with an E-value smaller than 1 was found, indeed, it was the only SETPH set in which no common motifs were found (Table S1).
In monocots, the grasses Panicum virgatum and Brachypodium distachyon SETPHs also harbored GA/CT rich motifs with a low E-value; a similar situation was found with Setaria italica (Table S1), suggesting that indeed the presence of the aforementioned motifs may correlate to vascular-specific expression. However, no such motif was found overrepresented in neither Sorghum bicolor nor Zea mays SETPHs; in this last instance relatively high E-values were found for the most overrepresented motifs, A/T-rich in both cases (Figure 1; Table S1).
Interestingly, a similar overrepresented motif was found in S. moellendorffi (Figure 2). This species is of particular interest because it has specialized, water-conducting tissues, albeit less complex than those observed in angiosperms and gymnosperms; in fact, the vessels of this plant have been considered as a "primitive" feature but involving the same character transformation as angiosperms and gymnosperms. Unlike angiosperms and gymnosperms, Selaginella phloem cells also present "primitive" features such as clustered pores as sieve areas and the presence of degenerated nuclei (Burr and Evert, 1973). Regardless, the presence of an overrepresented GA/CT rich motif in this SETPH with an extremely low E-value (10 −32 ) supports the notion that these genes are expressed in a coordinated manner, and may be expressed also in its conducting tissues. On the other hand, C/G rich motifs with a low A/T content were detected in the P. patens SETPHs, which may be related to the GA/CT rich motif observed in tracheophyte SETPHs (Figure 2). It is worth mentioning that another overrepresented signature, a GT/AC-rich motif, was also present in some dicots; and, interestingly, in at least two chlorophyte SETPHs.

CHLOROPHYTES SETPHs SHARE DIFFERENT COMMON MOTIFS
The same analysis was applied to the chlorophytes SETPHs in order to search for common motifs and determine whether these SETPHs may be expressed coordinately. Quite different motifs were found in this case; and, as mentioned before, one SETPH did not harbor common motifs, except for three shared by few promoters and of a degenerate nature (Figure 2; Table S1). It must be considered that no counterparts of 10-13 genes were found in chlorophytes ( Table 2). While extremely low E-values were observed for some of these motifs (in particular a CGT repeat), these were present in a smaller subset of SETPHs, but at a high frequency in such promoters. Thus a more precise measurement of specific motif abundance was required in order to compare the frequency of such motifs in chlorophyte SETPHs as well as the GA/CT motif in embryophyte SETPHs.

CHLOROPHYTES AND EMBRYOPHYTES SETPHs HARBOR DISTINCT TYPES OF OVERREPRESENTED MOTIFS AT DIFFERENT FREQUENCIES
A more quantitative analysis was carried out with the aforementioned SETPHs. First, the average number of each of the three motifs with lowest E-values per promoter was determined. Of mention, aside from the GA/CT and A/T rich motifs found in embryophytes, as well as the CGT and A/T rich motifs in chlorophytes, additional ones were found in other SETPHs. For example, a G-Box was found overrepresented in the Citrus clementina SETPH (Figure 1B). This cis-element is a transcription factor binding site involved in different types of responses, such as in light and drought (Liu et al., 2008;Lata and Prasad, 2011). On the other hand, a Conserved Late Element-like motif (CLE) was present in several Medicago truncatula SETPHs. This motif was first found in the Begomovirus common region, and is involved in late gene activation; interestingly, a small region of the Coat Protein gene promoter harboring these motifs is capable of directing vascular-specific expression (Sunter and Bisaro, 1997;Ruiz-Medrano et al., 1999b). Thus, vascular expression may require different motifs specific for each taxon. Additionally, the GA/CT rich motifs are also present in promoters active in tissues other than phloem, such as pollen, or in genes expressed in diverse tissues, such as ribosomal proteins and cyclins (Ruiz-Medrano et al., 2011).
In the embryophytes SETPH, the motif with the highest average number per promoter was the A/T sequence, while in chlorophytes it was the CGT-repeat motif (Figure 3). However, this can be misleading, given that few promoters could harbor several of such motifs, thus biasing the abundance of such motifs in those particular SETPHs. Therefore, the number of each motif present per number of taxa was determined in order to estimate their relative abundance (Figure 4). Evidently, the number of GA/CT motifs was higher than all the other ones in embryophytes, while the CGT motif was more abundant in chlorophytes. Thus, the SETPHs of these two lineages could be expressed coordinately, except for C. subellipsoidea. However, the number of common motifs in chlorophytes is smaller than in embryophytes (Table S1), so a smaller subset of genes is expressed in a similar way. Nevertheless, Ostreococcus SETPHs showed the highest number of different common motifs in chlorophytes, suggesting that their expression is tightly coordinated ( Table S1).
The results of this analysis suggest that the activity of the same set of SETPHs in different taxa from embryophytes is coordinated in a similar way, given the frequency of the shared GA/CT rich motifs. Interestingly, the same motif is overrepresented in S. moellendorffii and in P. patens, this last species lacking a true vascular tissue. Furthermore, a similar set of SETPHs in chlorophytes may also show some degree of coordination in its expression, suggesting that such potential coordination of gene activity was already present in ancient lineages leading to both chlorophytes and embryophytes. Furthermore, the absence of shared motifs in the www.frontiersin.org July 2013 | Volume 4 | Article 261 | 9

FIGURE 2 | MEME analysis of SETPHs from S. moellendorffii and P. patens (upper panel) and from chlorophytes (lower panel). Shown are the three overrepresented motifs in each promoter set identified by MEME analysis with the lowest Expectation (E) values below each motif.
SETPHs of C. subellipsoidea may suggest that this species is more distantly related from vascular plants than other chlorophytes.

PHYLOGENETIC ANALYSIS OF PROTEINS INVOLVED IN PHLOEM DIFFERENTIATION AND/OR FUNCTION
From the previous section it can be inferred that coordinated expression of genes, aside from obvious functions in response to environmental stimuli or to a developmental program, may lead to novel structures, particularly in colonial or multicellular organisms. Evidently, proteins or other regulatory elements (such as non-coding RNAs) with novel features that may arose via gene or genome duplication, may had an important role in the evolution of multicellularity and in the development of new adaptation mechanisms (i.e., new developmental or signal transduction pathways, structures, etc.). The vascular tissue of plants is such an adaptation, while not necessarily a prerequisite for colonization of terrestrial ecosystems. Considering the diversity of non-vascular plants, it is evident that the vascular system represented an advantage that allowed the diversification and occupation of a large array of niches by tracheophytes, as well as a large increase in size. In particular, although the information regarding the factors involved in phloem differentiation

FIGURE 3 | Graphic representation of average number of motifs per promoter in embryophytes (left; blue), and in chlorophytes (right; green).
and function is still fragmentary, recent work has discovered the role of some genes in phloem development (Lucas et al., 2013). The phylogeny of such genes, as well as those for factors that are presumably required for phloem function, should be helpful to determine the ontogeny of the vascular tissue during evolution.
The following proteins were selected for a phylogenetic analysis: (1) ALTERED PHLOEM DEVELOPMENT (APL). This protein is member of the MYB coiled-coil transcription factor family required for the proper formation of sieve elements and companion cells, and additionally inhibits xylem differentiation as demonstrated by analysis of apl mutants and ectopic expression of the gene (Bonke et al., 2003). (2) OCTOPUS (OPS) is a membrane-associated polar protein, which is expressed in provascular cells and later constrained to the phloem cell lineage, giving rise to the phloem cell continuity (Truernit et al., 2012). This gene is thought to act in a XYLOGEN-like manner although it does not possess any known domains required for its function (Truernit et al., 2012). (3). YDA, which is a MAPKK Kinase involved in asymmetric cell division during embryonary and stomatal development (Bergmann et al., 2004;Lukowitz et al., 2004). This protein positively regulates embryogenesis, while in stomata it inhibits stomatal precursor cell division. YDA is expressed also in vascular tissue, and more precisely in phloem (Ruiz-Medrano et al., 2011), so it is reasonable to assume that it has a role in phloem development. (4) Floral Locus T (FT). This protein, which binds phosphatidylethanolamine and is an inhibitor of RAF kinase, has a crucial role in flower induction; it is transported from mature photosynthetic leaves to the vegetative apical meristem where it induces flower initiation and thus is a major component of florigen (Kardailsky et al., 1999;Corbesier et al., 2007;Lin et al., 2007;Tamaki et al., 2007;Taoka et al., 2011). Thus, FT is involved in long-distance signaling, and would be expected to have acquired this novel function only in angiosperms. (5) CmPP16; this protein and its transcript were found originally in pumpkin phloem sap. CmPP16 has been shown to bind RNA in a non-selective manner and is able to assist its cell-to cell transport (Xoconostle-Cázares et al., 1999). Its immunolocalization, as well as the in situ localization of its RNA have demonstrated that both coincide in CC and in the phloem translocation stream, at least in cucurbits, and also discards the notion that these originate from damaged cells (Ruiz-Medrano et al., 1999aXoconostle-Cázares et al., 1999). Its role in phloem function is unknown, but its pervasive presence in all embryophytes to date suggests an important role in phloem function, likely in long-distance signaling.
All trees were constructed using sequences retrieved from the Phytozome database (www.phytozome.net). Figure S2 shows the resulting dendrogram for APL, using Maximum Likelihood. Three large clades are discernible: the tracheophytes, the lycophytes and bryophytes closely grouped together, and the chlorophytes. These do indeed harbor an APL homolog with a relatively low E-value ranging from 10 −14 to 10 −16 , but showing in all cases similarity restricted to the N-terminus, which corresponds to the MYB DNA-binding domain (Prouse and Campbell, 2012). Thus a preexisting gene in the plant lineage may have been recruited for the phloem-specification program. This is a situation different from that of OCTOPUS. Indeed, no homologs for this protein were found in chlorophytes and P. patens. A sequence showing weak similarity, limited to 25 amino acids in its N-terminus, turned out in S. moellendorffii Thus, this sequence was utilized as outgroup. The results show that monocots and dicots form two separate clades ( Figure S2). Interestingly, Eucalyptus camaldulensis OCTOPUS forms a single clade within tracheophytes equally distant to monocots and dicots. It is likely that this gene arose de novo during plant evolution, which reflects the relatively recent specialization of phloem development.
FT is a good example of a phloem-long distance signaling protein; it has distant homologs in bryophytes and lycophytes but not in chlorophytes. The maximum likelihood dendrogram shown in Figure S3 indicates that this gene is also a recent innovation, definitely present only in embryophytes. The C. subellipsoidea sequence was used as outgroup given its divergence. S. moellendorffii and P. patens form a single clade, and tracheophytes form another. However, the groupings observed within this clade do not correspond to the known phylogenetic relationships among vascular plants; for example, the Vitis vinifera FT homolog is the most divergent sequence within tracheophytes. Furthermore, there are four well-defined groups, one including monocots. Thus, the dendrogram may reflect functional specialization of FT rather than phyletic relationships between these plant taxa.
A Maximum Likelihood dendrogram of CmPP16 shows a topology not unlike FT ( Figure S4). However, there are important differences. The first one, there is a clear potential homolog in a chlorophyte, O. lucimarinus, in contrast to other taxa from this group. This groups with the P. patens homolog. C. subellipsoidea "CmPP16" was again used as an outgroup; in general chlorophytes harbor sequences of very limited homology, if at all, and such similarity is restricted to the putative Cdomain found in the C-terminus of these proteins (E-value > 10 −3 ). Additionally, the chlorophyte sequences are much larger (300-350 amino acids in length in chlorophytes vs. 130-150 amino acids in tracheophytes), and it is possible that the tracheophyte CmPP16 was derived from the aforementioned segment of an ancestral chlorophyte C2-domain containing protein. It must be mentioned that the Neighbor-Joining method failed to produce a meaningful dendrogram; the topology of this tree is essentially a large polytomy (Figure S1). This could be explained by high rates of substitution among these proteins.
The phylogenetic analysis of YDA suggests an ancient origin of this gene ( Figure S5). C. subellipsoidea was also used as outgroup, given its weak similarity to embryophyte MAPKKK. In bryophytes and tracheophytes YDA homologs are numerous possibly by duplication events. Indeed, the pathways in which these proteins function allowed for the appearance of structures essential for the colonization of terrestrial (or more precisely, aerial) ecosystems, such as embryo, stomata, and vascular tissue (Hamel et al., 2006). The YDA homologs have a central conserved region that includes the serine/threonine kinase domain. Arabidopsis YDA possesses two unique domains, N-terminal and C-terminal, which greatly increases its size, possibly suggesting a functional specialization. The Carica papaya YDA homolog appears closely related to chlorophyte sequences perhaps because of its much smaller size than its angiopserm counterparts rather than for similarity.

THE ORIGIN OF THE VASCULAR TISSUE MAY HAVE INVOLVED GENE COORDINATION OF ANCESTRAL SETPHs
The knowledge of the genetic networks that determine the identity and differentiation of vascular tissue is still rudimentary, although xylem ontogeny and evolution is better understood than that of phloem. Recent work has identified important genes directly related to phloem development; on the other hand, high throughoutput analysis of whole plant genomes, transcriptomes and proteomes provide tools for a more detailed evolutionary analysis of the vascular tissue. The present study aims to add some evolutionary perspective on the regulatory networks in which these genes function.
In a previous work we have identified a shared sequence signature in promoters of Arabidopsis genes expressed in vascular tissue (SETPs), a GA/CT-rich motif overrepresented in these upstream regions (Ruiz-Medrano et al., 2011). Thus, the same gene set was searched for common motifs in most plant genomes whose genome sequence is available (SETPHs). Similar motifs are found in all tracheophytes, and, interestingly, in the lycophyte S. moellendorffii and the bryophyte P. patens. The fact that most of these SETPHs harbor such motifs suggests that they are regulated coordinately in embryophytes, i.e., are expressed in vascular tissue. This is especially important considering, as has been mentioned before, that lycophytes and bryophytes lack a true vascular tissue, although S. moellendorffii as well as other lycopods do display water-conducting cells and thus could be considered an intermediate between land non-vascular and vascular plants (Niklas, 1997). Furthermore, given the diversity in development and morphology of these conducting cells, it has been suggested that the vascular system of plants arose more than once in the tracheophyte lineage (Niklas, 1997).
The most frequent motifs found in the embryophytes SETPHs, using the MEME algorithm, conform to the GA/CTn arrangement, or the closely related GAA/CTTn arrangement. In the first case it has been demonstrated that a GA octadinucleotide found in the promoter of a homeobox gene from barley regulates its transcription (Santi et al., 2003). A 126 bp fragment derived from the Arabidopsis SUC2 gene promoter, harboring a GA/CT-rich sequence, was found to direct expression in the vasculature, and a short promoter fragment derived from a vascular expressed gene (and also harboring several GA/CT repeats), At1g34260 from Arabidopsis, was likewise capable of directing vascular expression when adjacent to the CaMV minimal promoter, although only consistently in T0 plants (Schneidereit et al., 2008;Ruiz-Medrano et al., 2011). In both cases the motifs present are similar, but not identical, to the auxin-response elements described previously (Li et al., 1994). This is in agreement with the fact that, while pervasive in embryophytes, the GA/CT motif was not identical between different SETPHs, and, in some cases, these were either not found or present at a high E-value. More importantly, auxin is a key regulator in vascular differentiation (Lucas et al., 2013). This could be explained considering that for various SETPHs, there is more than one homolog, so it is possible that the actual ortholog was not selected, or at least one that was not expressed in vascular tissue. It is noteworthy that a similar motif is found in non-vascular plants, with a very significant (low) E-value, and present in roughly half of the analyzed SETPHs (Table S1). It is then likely that the same set of genes that is expressed in vascular tissue in tracheophytes are also expressed coordinately in bryophytes. A different matter is what type of coordination do the expression of these genes display, temporal, spatial or both, but it is tempting to speculate that this coordination was important during the evolution of vascular tissue. It will be interesting to determine the expression patterns of these genes, now that the genetic transformation of these species is technically possible, which will surely shed light on the evolution of vascular gene expression.
As for chlorophytes, no GA/CT motifs were found in their SETPHs, but a CGT-repeat was found as the most frequent and with an extremely low E-value (Figure 4). However, these were concentrated in O. lucimarinus and M. pusilla, biasing its representation in these SETPHs. Nevertheless, it is worth mentioning that these genes, or most of them, may be expressed coordinately. Indeed, recent evidence indicates that in the case of O. lucimarinus, a marine picoalgae, there are several genes expressed in a coordinated fashion relative to genes from other microorganisms occupying the same niche, mostly in response to light (Ottesen et al., 2013). At least two of these genes are also present in the SETPH analyzed in this work, a calcium-dependent protein kinase and a MYB transcription factor family member. The frequency of common motifs with a significantly low E-value in V. carteri and C. reinhardtii is much lower, which could be interpreted as both having less genes from the SETPH set which are regulated coordinately. In any case, gene coordination in chlorophyte SETPHs may be not as extended as in embryophytes, and altogether absent in at least one case, C. pseudoellipsoidea. Evidently, the analysis of additional chlorophyte SETPHs as more genome sequences become available will allow determining the validity of these conclusions. It must be mentioned that vascular plants likely evolved from a streptophyte ancestor (Becker and Marin, 2009). Extant members of this group include multicellular green alga; unfortunately, to date no genomes from these taxa have been reported, but once available its analysis will afford valuable information regarding the evolution of the plant vasculature.

PHYLOGENETIC ANALYSIS OF PROTEINS INVOLVED IN PHLOEM DIFFERENTIATION AND/OR FUNCTION SUPPORT THE NOTION THAT SOME AROSE de novo DURING EVOLUTION AND OTHERS WERE RECRUITED FROM PREEXISTING PATHWAYS
Besides a strict coordination of the activity of SETPHs, it is likely that the recruitment of factors already present in the tracheophytes ancestor, as well as the appearance of novel ones (for example via editing, alternative splicing, genome duplication and/or gene rearrangements) was important for the evolution of the plant vascular system. Our results involving the phylogenetic analysis of selected proteins required for phloem differentiation and/or function (such as long-distance signaling) support such notion (Figure 5).
Many important genes involved in land plant development regulation are highly conserved such as transcription factors gene families. Most are shared among angiosperms, and mosses and only a few are present in chlorophytes (Riaño-Pachón et al., 2008). APL was one of the first proteins identified involved in phloem differentiation; as such, no close homologs were expected in chlorophytes. However, APL is a member of the MYB transcription factor family, which regulate cell proliferation and differentiation in distantly related taxa from animals to fungi (Prouse and Campbell, 2012), proteins containing a MYB domain were actually found in the SETPHs of chlorophytes, but the similarity to tracheophyte APL is limited precisely to the MYB domain. The ancient tracheophyte S. moellendorfii forms a monophyletic group with P. patens as a sister taxon ( Figure S1). A possible explanation could be that APL function in S. moellendorfii is more related to its role in P. patens predating its recruitment in sieve element and companion cell differentiation in more recent lineages. The existence of this protein in algae and non-vascular plants suggests a different role in ancestral lineages and its later recruitment as a part of the phloem development machinery.
APL thus originated from a preexisting protein, involved in the transcriptional regulation of a set of genes. Since the DNAbinding domain of the MYB proteins is similar, it is reasonable to assume that the target genes may be similar in chlorophytes and embryophytes. Additionally, these proteins show a continuum in sequence similarity, when S. moellendorffii and P. patens are considered, throughout all plant taxa. Also, MYB factors regulate a number of processes, including secondary metabolism, cell proliferation and differentiation and response to environmental stress (Dubos et al., 2010) some of which are likely absent in chlorophytes, so this recruitment of preexisting factors must have occurred several times during plant evolution.
OCTOPUS is an example of a protein factor involved in phloem differentiation that arose de novo. As described in the previous section, no homologs for this protein were found in chlorophytes. The groupings do not reflect the established phylogenetic relationships between tracheophytes, and unequal length of branches is evident, suggesting a rapid evolution of this gene, as well as adaptation of given taxa to particular niches. Its functional similarity with XYLOGEN raises the question about how two different proteins not related by ancestry could display similar functions in vascular tissue ontogeny (Truernit et al., 2012). This question cannot be answered satisfactorily based solely on bioinformatic analysis; experimental studies involving overexpression and deletion of this gene will be required to elucidate this functional similarity.
Flower induction has been known for decades to involve long distance communication through the phloem (Jaeger et al., 2006;Corbesier et al., 2007;Lin et al., 2007;Tamaki et al., 2007). Flowering locus T (FT) is a strong candidate for being the florigenic signal, although a role for its RNA cannot be discarded.
Evolutionary analysis of streptophytes, which include the basal land plants and charophyte algae, indicate that features such as plasmodesmata and apical growth possibly appeared before land colonization by plants (Becker and Marin, 2009). Furthermore, the presence of mobile proteins such as members of the KNOX protein family in chlorophyte algae indicates that plasmodesmal function is evolutionarily conserved (Pires and Dolan, 2012). It is therefore possible that the intercellular transport of the FT ancestor occurs in streptophytes, but evidently serving a different function.
The phylogenetic tree for FT ( Figure S3) shows that this protein is present in P. patens and S. moellendorffii; it is probable that FT had rudimentary functions that resemble its role in angiosperms, involving cell to cell movement conceivably required for controlling apical growth. No FT homologs exist in chlorophytes, with the exception of C. subellipsoidea (two potential homologs present in the Phytozome genome database), which was used as outgroup. Two bursts of evolutionary change occurred in the lineages giving rise to P. patens and S. moellendorffii. One important burst is also observed within the angiosperms in the monocot clade, which is in agreement with the number of potential protein homologs clustered within this subgroup; no other important changes in the evolutionary rate are evident within angiosperms. One explanation for this could be that no important duplication events occurred during the evolution of FT genes in flowering plants outside the monocots. Three major clades are observed in the unrooted tree for flowering plants ( Figure S6); two different clades in dicots, one of them being the most conserved, and monocots showing the most substitutions from the ancestral FT. V. vinifera was grouped alone and therefore it was not possible to infer its real relationship with other taxa.
Studies of viral movement proteins in plant systems have highlighted important mechanisms used for plant virus to utilize the host machinery operating in long distance communication system in plants and hence establish viral infection (Gilbertson and Lucas, 1996;Simón-Buela and García-Arenal, 1999;Lough and Lucas, 2006;Requena et al., 2006). CmPP16 is a plant paralog of viral movement proteins and is likewise capable to bind RNA in a non-selective manner and to interact and modify plasmodesmatal size exclusion limit (SEL) (Xoconostle-Cázares et al., 1999). Although much effort has been done to understand plant virus evolution, its rapid change rate difficults a better understanding of the functional constraints operating in the elements, responsible for the selective transport of macromolecules through the phloem long-distance communication system. A phylogenetic analysis of this protein was carried out using the Maximum Likelihood method using O. lucimarinus as the outgroup, a dendrogram is shown with considerably variation rates among all the taxa under study with no easily distinguishable natural clades beyond Brassicaceae (resolved with values of bootstrap >70%).
In plants the MAPK protein family is quite extensive, compared to mammals and other organisms. Only in Arabidopsis there are 20 genes encoding MAPKs, 10 MAPKKs and 60 MAPKKKs, whereas in yeast there are six for each [MAPK Group (Mitogen-activated protein kinases), 2002]. One of these proteins is the Arabidopsis protein YDA (a MAPKKK), which controls the asymmetric cell division in stomata, embryo development and inflorescence architecture (Bergmann et al., 2004;Lukowitz et al., 2004;Meng et al., 2012). Experimental analysis of the YDA gene promoter showed that it is not only active in stomata but also in companion cells (Ruiz-Medrano et al., 2011). Since phloem differentiation involves the asymmetric division of a phloem mother cell, phloem expression of this gene suggests a role for YDA in the development of this tissue.
In addition to YDA, the stomatal and embryo developmental programs share other elements of the MAPK pathway. These tissues predate the origin of the vascular tissue; therefore it can be hypothesized that YDA as well as other factors could have been recruited during the appearance of the vascular tissue ( Figure 5).

CONCLUDING REMARKS
The plant vascular system allowed the great diversification of extant plants, in geographical distribution, structural diversity and size. The fossil record has shed light on the morphological steps likely to have occurred during the evolution of this tissue. The analysis of the genes involved in vascular tissue cell differentiation in different species with varying degree of vascular specialization, or lack thereof may help reconstruct which molecular factors were initially required for the appearance of the vasculature. On the other hand, the analysis of the expression pattern of this set of genes in both vascular and non-vascular plants will be important to reconstruct the genetic networks involved in its evolution. Given the importance of phloem and xylem in plant productivity (and that most, if not all, crops are tracheophytes), this knowledge will have important applications. A more complete knowledge of such networks will benefit from the deciphering of gymnosperm genomes, as well as from streptophytes, the closest algal relatives of land plants. In all, our results suggest that the coordinate expression of a set of genes, as well as the appearance of novel genes, and the recruitment of existing ones with a role in other signaling and developmental networks, were important for the evolution of the plant vascular system.