The Amborella vacuolar processing enzyme family

Most vacuolar proteins are synthesized on rough endoplasmic reticulum as proprotein precursors and then transported to the vacuoles, where they are converted into their respective mature forms by vacuolar processing enzymes (VPEs). In the case of the seed storage proteins, this process is of major importance, as it conditions the establishment of vigorous seedlings. Toward the goal of identifying proteome signatures that could be associated with the origin and early diversification of angiosperms, we previously characterized the 11S-legumin-type seed storage proteins from Amborella trichopoda, a rainforest shrub endemic to New Caledonia that is also the probable sister to all other angiosperms (Amborella Genome Project, 2013). In the present study, proteomic and genomic approaches were used to characterize the VPE family in this species. Three genes were found to encode VPEs in the Amborella's genome. Phylogenetic analyses showed that the Amborella sequences grouped within two major clades of angiosperm VPEs, indicating that the duplication that generated the ancestors of these clades occurred before the most recent common ancestor of living angiosperms. A further important duplication within the VPE family appears to have occurred in common ancestor of the core eudicots, while many more recent duplications have also occurred in specific taxa, including both Arabidopsis thaliana and Amborella. An analysis of natural genetic variation for each of the three Amborella VPE genes revealed the absence of selective forces acting on intronic and exonic single-nucleotide polymorphisms among several natural Amborella populations in New Caledonia.


Introduction
Evolutionary genetics is considered as a central part of biology (Charlesworth and Charlesworth, 2009). In plants, Amborella trichopoda (Amborella), an understory shrub endemic to New Caledonia, has been proposed to correspond to the single living representative of the sister lineage to all other extant flowering plants (Bremer et al., 2009;Jiao et al., 2011;Lee et al., 2011;Wickett et al., 2014). Hence the recent release of its genome sequence provides a pivotal reference for understanding genome and gene family evolution throughout angiosperm history (Amborella Genome Project, 2013).
In previous work (Amborella Genome Project, 2013), we characterized the Amborella seed storage proteins with the goal of identifying proteome signatures that could be associated with the origin and early diversification of angiosperms. In particular, we focused our attention on the abundant 11S globulins that have been characterized and compared across seed plants in evolutionary analyses (Häger et al., 1995;Adachi et al., 2003;Li et al., 2012). We found that the Amborella genome contains three distinct 11S globulin genes (Amborella Genome Project, 2013). In all plant species, 11S globulins are synthesized in the form of high molecular weight precursors that are processed by vacuolar processing enzymes (VPEs) during seed maturation. This limited proteolysis, which is regularly directed to an Asn-Gly (N-G) junction, yields the A (acidic)-and B (basic)-subunits of mature 11S globulins that is accompanied by further assembly of the trimer precursor-protein complexes into mature hexamers within the protein storage vacuoles (PSVs) (Chrispeels et al., 1982;Müntz, 1998;Shutov et al., 2003).
Although two of the three Amborella 11S globulins do contain a canonical N-G cleavage site, we observed that a third one deviates notably from the two others as it exhibits, in place of an N-G junction, an N-V-I sequence (Amborella Genome Project, 2013). Similar deviations from the N-G cleavage motif were observed for 11S globulins from Ginkgo biloba (Amborella Genome Project, 2013) and Metasequoia glyptostroboides (Häger and Wind, 1997), thus highlighting the possibly ancestral nature of this atypical Amborella 11S globulin.
The co-existence in Amborella seeds of the angiosperm-and gymnosperm-type 11S globulins prompted us to characterize the VPE system in seeds of this plant. Here, we refine our understanding of this gene family with the characterization of several Amborella VPE homologs.
Phylogenetic analyses of plant VPEs and legumains have been previously reported. However these previous studies only considered selected sequences from monocots and eudicots and did not include sequences from gymnosperms or basal eudicots (Kato et al., 2003;Nakaune et al., 2005;Julián et al., 2013;Kang et al., 2013;Christoff et al., 2014;Pierre et al., 2014). To gain further insight in plant VPEs and benefiting from the present Amborella sequences, we reconstructed a phylogeny of VPE proteins based on the amino acid sequences of VPEs from a wide range of embryophytes (land plants). By using a comparative approach, combined with the principle of parsimony, data from this uniquely-placed angiosperm can help defining the condition of any character in the most recent common ancestor (MRCA) of the living angiosperms, and we have applied this method to the structural and functional evolution of the VPE family.
Another way to evaluate the functional relevance of genes is to examine the levels of naturally occurring genomic variations therein, i.e., polymorphism within populations (Koornneef et al., 2004). For this purpose we used next-generation sequencing data from the recently completed Amborella genome (Amborella Genome Project, 2013) to characterize singlenucleotide polymorphisms (SNPs) in VPE sequences and their distribution over the natural range distribution of Amborella in New Caledonia (Poncet et al., 2013).

Plant Material
Mature drupes of Amborella were collected from 10 individual trees located at "plateau de Dogny-Sarraméa" (New Caledonia; 21 • 37 ′ 0 ′′ N, 165 • 52 ′ 59 ′′ E). The fleshy part of the fruits was removed and pits (containing the seeds) were briefly dried on paper before being removed for seed isolation stricto sensu. Surface-sterilized seeds were cut longitudinally in two with a razor blade. A drop of sterile Milli-Q water was placed on the endospermic face of each half. Embryos were quickly extracted (in <1 min), with clean extra-thin needles and immediately frozen in liquid nitrogen.

Preparation of Soluble Protein Extracts
For the preparation of soluble protein extracts, 100 isolated Amborella embryos were ground in liquid nitrogen using a mortar and pestle. Total soluble proteins were extracted at room temperature in 400 µl thiourea/urea lysis buffer composed of 7 M urea, 2 M thiourea, 6 mM Tris-HCl, 4.2 mM Trizma R base (Sigma-Aldrich, Lyon, France), 4% (w/v) 3-[(3-cholamidopropyl)dimethylammonio]-1-propanesulfonate (CHAPS, Sigma-Aldrich) supplemented with 50 µl of the protease inhibitor cocktail Complete Mini (Roche Diagnostics France, Meylan, France). Then, 15 µl of 1 M dithiothreitol (DTT, Sigma-Aldrich), 2 µl of DNase I (Roche Diagnostics), and 5 µl of RNase A (Sigma-Aldrich) were added to the sample. Following stirring for 2 h at 4 • C, protein extracts were centrifuged at 20,000 g at 4 • C for 15 min. The resulting supernatant was submitted to a second clarifying centrifugation, as above (Rajjou et al., 2011). The final supernatant was kept and protein concentrations in the various extracts were measured according to Bradford (1976) using Bovine Serum Albumin as a standard.

Shotgun Proteomic Analysis
The Amborella seed proteome exploration was performed by LC-MS/MS analysis following preparation of soluble protein extracts (30 µg protein; n = 3 biological replicates) that had been subjected to 1D-SDS-PAGE (http://pappso.inra.fr). Protein extracts were loaded in 1X Laemmli buffer (Laemmli, 1970) with DTT (50 mM) in a stacking gel [acrylamide 8% (w/v); Tris-HCl 0.56 M, pH 8.8; SDS 0.1% (w/v)]. After 15 min of migration at 10 mA, the gel was stained with colloidal blue (GelCode Blue Stain Reagent; Thermo Fisher Scientific Inc, Rockford, IL) and destained in Milli-Q water. The whole band corresponding to total proteins was excised and submitted to in-gel digestion with the Progest system (Genomic Solution, Huntingdon, UK) according to a standard trypsin protocol. Briefly, gel pieces were washed for 1 h at 37 • C in a solution containing 25% (v/v) acetonitrile (ACN) and 50 mM ammonium bicarbonate (pH 7.8), followed by dehydration in 100% ACN for 15 min. Gel pieces were rehydrated overnight at 37 • C with 1/50 (w/w) trypsin (Promega, Madison, WI, USA) in 20 mM ammonium bicarbonate, pH 7.8. Digestion was stopped by adding 0.4% (v/v) of trifluoroacetic acid (TFA).

Phylogenetic Analyses
VPE protein sequences were obtained from the available databases by BLAST searching (Altschul et al., 1997). Multiple alignments were performed using MUSCLE (Edgar, 2004) and well-aligned sites were chosen using G-Blocks (Castresana, 2000) with settings to minimize the stringency of selection. Maximum likelihood (ML) phylogenetic reconstructions incorporating 500 bootstrap replicates were performed in PhyML (Guindon et al., 2010) using the LG substitution model (Le and Gascuel, 2008).

Gene Sequence Polymorphism
Next-generation resequencing data from the Amborella genome (Amborella Genome Project, 2013) were used to characterize SNP polymorphisms corresponding to the VPE sequences characterized in the present work. The genomic information has been generated for 12 individuals (named according to their location: Tonine, Ponandou, Pwicate, Tchamba, Ba, Aoupinié, Boregaou, Amieu, Dogny, Mé Ori, Mé Fomechawa, and Nakada) covering a wide range of Amborella's extant geographical distribution and natural genetic diversity (Poncet et al., 2013).
All the SNPs identified were defined as informative with a minor allele frequency (MAF) >0.08 and a missingness rate <0.42 by SNP. Biallelic exonic SNPs were annotated as synonymous or non-synonymous according to their reference/alternate allele nucleotides and to their position in the updated annotated coding sequences.
For each of the 12 individuals, the average MAF values were first computed across all SNPs for each of the three genes. Then, for each of the four genetic clusters inferred by a previous microsatellite (simple sequence repeats, SSR) analysis (Poncet et al., 2013), namely North (Tonine, Ponandou, Pwicate, Tchamba), Center (Ba, Aoupinié, Boregaou, Amieu, Dogny), Me (Mé Ori, Mé Fomechawa), and Nak (Nakada), we calculated for each gene (i) the mean proportion of polymorphic SNPs and (ii) the mean proportion of private alleles. Computations were performed on intronic and exonic SNP datasets independently by discarding missing data.
To assess any impact of natural selective pressures on VPE sequences, we examined the SNP patterns in comparison with 10 neutrally-behaving SSRs. Polymorphisms under neutral evolution co-vary with divergence between populations regardless of the mutation rate (Hartl and Clark, 2007). Marks of selection on SNP markers would be detected as polymorphism deviating from the expected neutral background polymorphism (influenced by forces such as divergence as well as other demographic events).

Shotgun Proteomic Analyses
A shotgun proteomic analysis revealed 415 proteins from the isolated Amborella embryos (Villegente et al., unpublished results). In particular, this analysis confirmed the presence of three 11S globulin forms in the Amborella embryos, of which two contained canonical N-G VPE cleavage sites, while the third contained a variant cleavage site (N-V-I) (data not shown). This shotgun proteomic analysis also revealed the presence of two specific peptides (GIIINHPQGEDVYAGVPK and HQADVCHAYQLLLK) (Supplemental Table S1) matching with the amino acid sequences of Amborella VPE proteins encoded by sequences on the scaffold AmTr_v1.0_scaffold 00002, labeled 27.model.AmTr_v1.0_scaffold00002.262 and evm_27.model.AmTr_v1.0_scaffold00002.263 (Figure 1).
We conclude that the Amborella VPE family is composed of three genes: AmTr_2.262-1, Am_Tr2.262-2, and AmTr_36.100 (Figure 1; Supplemental Figure S1), of which the former two are very closely related. AmTr_2.262-2 and AmTr_36.100 are both transcribed, while the transcription of AmTr_2.262-1 remains to be demonstrated. It should be noted among others that no seed tissues were used to obtain transcribed sequences in the available Amborella transcriptome databases (see http://ancangio.uga. edu/content/amborella-trichopoda), and so it is possible that AmTr_2.262-1 might be expressed, but shows an entirely seedspecific expression profile. Both of the peptide signatures identified using the proteomics approach described in the present work (GIIINHPQGEDVYAGVPK and HQADVCHAYQLLLK) show 100% identity to fragments of the predicted genes AmTr_2.262-1 and AmTr_2.262-2 (of which the latter is certainly transcribed and the former could be transcribed in other organs and tissues that were not surveyed in the available data, including fruits and seeds). No peptide signatures from AmTr_36.100, which is most presumably transcribed in non-seed tissues, were detected in seeds in the proteomics approach described in the present work.
Most of our current knowledge of the genes encoding VPEs and their biological functions comes from molecular and genetic studies of the model plant Arabidopsis. In this species, four VPE homologs have been described: α-VPE and γ-VPE, which are specific to vegetative organs, β-VPE, which is specific to seeds, and δ-VPE, which is involved in seed coat formation (Nakaune et al., 2005;Hatsugai et al., 2015) and in the processing and degradation of various proteins in senescent Arabidopsis tissues (Rojo et al., 2003).
To reveal potential biological functions of the Amborella VPEs, the amino acid sequences of the three Amborella VPEs were blasted at TAIR against the Arabidopsis genome. An analysis of the scores obtained from this comparison disclosed that the AmTr_36.100 gene would encode a γ-type VPE (Supplemental Figure S2), while the AmTr_2.262-1 and AmTr_2.262-2 genes would encode β-type VPEs (Supplemental Figure S2).

Phylogenetic Analyses
To gain further insight in plant VPEs and benefiting from the present Amborella sequences, the amino acid sequences of VPEs identified by BLAST from a wide range of embryophytes (land plants) taxa were used for phylogenetic reconstruction. These sequences were first blasted at TAIR against the Arabidopsis genome. For all VPEs presently considered, the best hits corresponded to Arabidopsis VPEs, testifying of high amino acid sequence conservation among plant VPEs (data not shown).
The above topology clearly indicates that the duplication event that generated the respective ancestors of the angiosperm α/γ/δand β-VPE clades occurred before the MRCA of the extant angiosperms. The absence of clearly distinguished α/γ/δ-and β-VPEs in gymnosperms, and the position of the gymnosperm VPE clade as sister to a clade containing all angiosperm VPEs (albeit with modest bootstrap support), suggests that the duplication that separated the angiosperm α/γ/δ-and β-VPE lineages occurred along the angiosperm stem lineage, after its separation from that of the living gymnosperms.
Arabidopsis δ-VPE occurs in a well-supported sub-clade of the angiosperm α/γ/δ-VPE clade, together with genes from widely diverged core eudicots including Medicago, Vitis, and Populus. By contrast, all α/γ/δ-clade VPEs from Amborella, monocots and basal eudicots (such as Papaver) group externally to the point of divergence of the δ-VPE sub-clade (Figure 2; Supplemental Figure S3). It thus appears that the δ-VPE lineage arose in a gene duplication event in a common ancestor of all, or a major part of, the living core eudicots. Branch lengths within the δ-VPE subclade are very long compared to those in the remainder of the angiosperm α/γ/δ-VPE clade, suggesting either strong positive or relaxed selection pressure to have operated on the δ-VPE lineage since its separation from that of the remaining α/γ/δ-VPE lineage (Guindon et al., 2004).
The Arabidopsis α-and γ-VPEs (Ath_gi15225226 and Ath_gi15233996) group with 100% bootstrap support in a small clade containing sequences only from closely related Brassicaceae, including Arabidopsis lyrata and Eutrema salsugineum (Figure 2; Supplemental Figure S3). This topology strongly suggests that the duplication that generated the α-and γ-VPE lineages occurred recently, within Brassicaceae. This conclusion accords well with the common expression pattern of Arabidopsis α-and γ-VPEs in vegetative organs.
The phylogeny in Figure 2 and Supplemental Figure S3 shows that multiple, closely related VPE proteins are present in numerous taxa, including monocots, Amborella, eudicots, and gymnosperms. Thus, relatively recent gene duplications, such as that which generated the α-and γ-VPE lineages in Arabidopsis, appear to have occurred frequently within the VPE family in diverse plant groups.

Gene Sequence Polymorphism and Amborella Population Analyses
Nucleotide sequence polymorphism was characterized at both the intron and exon levels for the Amborella VPE genes described in the present work, from 12 individuals that were considered to be representative of Amborella's extant geographical distribution and genetic diversity (Poncet et al., 2013) (Figure 3; Supplemental Figure S4).

Impact of Exonic SNPs on Protein Structure
SNPs recorded across the studied individuals revealed a similar number of intronic and exonic polymorphisms for the three genes, with one to eight exonic SNPs and 22 to 34 intronic SNPs per gene. Among the 12 SNPs located in the proteincoding regions of the three genes, seven SNPs were synonymous and five were non-synonymous, though none was found located in the codons encoding catalytic residues (Supplemental Figure S4).

Minor Allele Frequency (MAF) and Genetic Diversity within Amborella Distribution
We examined the MAF statistics of Amborella VPE genes within the previously described genetic groups North (Tonine, Ponandou, Pwicate, Tchamba), Center (Ba, Aoupinié, Boregaou, Amieu, Dogny), Me (Mé Ori, Mé Fomechawa), and Nak (Nakada). For each of the individuals and genetic groups studied, average MAFs were computed for each Amborella VPE gene (Figures 3, 4; Supplemental Figure S4). For exonic SNPs, higher frequencies were observed for the northern (Ponandou, Pwicate, Tonine, and Tchamba) individuals for genes AmTr_36.100 and AmTr_2.262-2, and for Nakada individual for genes AmTr_2.262-1 and AmTr_2.262-2 (Figure 3). This trend was comparable to the diversity distribution observed with SSR microsatellites (Poncet et al., 2013) with individuals from the north exhibiting higher diversity and clustering in a same group. A similar distribution was also observed for intronic SNPs, with the northern group displaying a high percentage of polymorphic SNPs (71, 68, and 95% for genes AmTr_36.100, AmTr_2.262-1, and AmTr_2.262-2, respectively), the highest percentage of private SNPs (18, 6, and 18% for genes AmTr_36.100, AmTr_2.262-1, and AmTr_2.262-2, respectively) and a mean MAF of 0.43 across all three genes, the highest among all groups (Supplemental Figure S4). In particular, changes in the mean proportion of exonic and intronic polymorphic SNPs across the four genetic groups follow a parallel progression with the mean SSR allelic richness (Figure 4). Levels of naturally occurring genomic variations in the VPE sequences thus appeared to co-vary with divergence and demographic history between populations and in particular with neutrally behaving polymorphisms (SSRs).

Discussion
The Amborella VPE Family The present work shows that three VPE genes are present in the sister to all other living angiosperms, Amborella trichopoda. Two of these, AmTr_2.262-1 and AmTr_2.262-2, encode closely related β-VPEs, which are orthologs of the single β-VPE found in Arabidopsis. Our proteomics work shows that at least one FIGURE 3 | Frequencies distribution of Amborella VPE exonic SNPs in New Caledonia. For each genotype, the exonic mean minor allele frequencies (MAF) are represented by colored bar proportional to their mean values in each of the three VPE genes: AmTr_36.100 ("100", red), AmTr_2.262-1 ("262-1", blue) and AmTr_2.262-2 ("262-2", yellow). The corresponding population names are given together with their assignation to the four genetic clusters as inferred by SSR analysis (Poncet et al., 2013): "North" (Tonine, Ponandou, Pwicate, Tchamba) in green, "Center" (Ba, Aoupinié, Boregaou, Amieu, Dogny) in blue, "Me" (Mé Ori, Mé Fomechawa) in yellow, and "Nak" (Nakada) in red. of these two genes is expressed in seeds, thus generating the peptide fragments GIIINHPQGEDVYAGVPK and HQADVCHAYQLLLK, which we detected. Transcriptomics data available through both the Amborella Genome Database (http:// www.amborella.org) and Ancestral Angiosperm Genome Project website (http://ancangio.uga.edu) indicate that AmTr_2.262-2 is expressed in non-seed tissues, though this gene may additionally be expressed in seeds, as no seed tissues were used to obtain these transcriptomics data. We conclude that AmTr_2.262-1 is either not expressed, or is specifically expressed in seeds. The third VPE gene present in Amborella, AmTr_36.100, appears from phylogenetic studies to be a pro-ortholog of the α-, γ-, and δ-VPEs in Arabidopsis. From transcriptomics analyses, AmTr_36.100 is transcribed in non-seed tissues, while the proteomics analysis presented here failed to show any peptide signatures derived from this gene in a seed-protein extract. We therefore conclude that AmTr_36.100 is the Amborella pro-ortholog of the α-γ-and δ-VPEs from Arabidopsis and shows an entirely non-seed expression profile. The analysis of the Amborella VPE family performed in the present work has allowed us to draw solid conclusions (see following section) on the state of the VPE family in the MRCA of the extant angiosperms which no previous study has been able to make.

A Partial Reconstruction of the Evolution of the VPE Family in Angiosperms
The Arabidopsis VPE family, whose expression patterns and functions are the best characterized of any plant species, consists of four genes encoding α-, β-, γ-, and δ-VPEs. Previous phylogenetic studies have succeeded in identifying several major clades of plant VPEs (Nakaune et al., 2005;Christoff et al., 2014). However, these studies did not clearly elucidate the deep evolutionary relationships between the major clades of VPEs identified, or the origins through gene duplication of the VPE lineages present in Arabidopsis or other established plant models.
The phylogenetic reconstructions presented here (Figure 2; Supplemental Figure S3), incorporating novel sequences from Amborella and a wide range of land plants, indicate the four VPE genes in Arabidopsis to have been generated through three duplication events that occurred at quite distinct evolutionary stages. The conclusions of this analysis are summarized in Figure 5. In non-seed plants, including bryophytes such as FIGURE 4 | Mean proportion of exonic and intronic polymorphic SNPs in the four genetic groups (North, Center, Me, and Nak; see Figure 3). The mean minor allele frequencies (MAF) are compared to the mean SSR allelic richness obtained on the same groups by Poncet et al. (2013).
Physcomitrella and non-seed, vascular plants such as Selaginella, several VPEs are found, though these group together in a clade that is positioned externally to all seed plant VPEs. Thus, the α-, β-, γ-, and δ-VPEs of Arabidopsis share no direct one-toone relationships of orthology with VPEs in non-seed plants. Similarly, multiple gymnosperm VPEs occur together in a clade that groups, though with modest bootstrap support, in a sister position to a clade containing all angiosperm VPEs. Accordingly, the α-, β-, γ-, and δ-VPEs of Arabidopsis appear to share no direct one-to-one relationships of orthology with VPEs from gymnosperms.
By contrast, in Amborella, the most basally diverging angiosperm, two VPE lineages are present with distinct relationships of orthology to the VPEs from Arabidopsis. Accordingly, proteins encoded by AmTr_2.262-1 and AmTr_2.262-2 are both orthologous to Arabidopsis β-VPE and, from our proteomics data, appear to have conserved a similar expression profile in seeds to that of Arabidopsis β-VPE. This result implies that angiosperm β-VPEs have conserved their seed-specific expression pattern since the MRCA of the living flowering plants, at least in the lineages leading to Amborella and to Arabidopsis. The protein encoded by AmTr_36.100 is the putative Amborella pro-ortholog of all three other Arabidopsis VPEs (the α-, γ-, and δ-VPEs) and appears to have conserved a non-seed specific expression profile since the MRCA of living angiosperms.
The role of Arabidopsis β-VPE is to process the major protein reserves so these are correctly assembled within PSVs during seed maturation on the mother plant (Shimada et al., 2003). The need for such a role logically arises only in seed plants, and in this regard, the absence of a direct ortholog of β-VPEs in non-seed plants, such as bryophytes and lycophytes, is not incongruent. However, the duplication event that generated the β-VPE lineage appears to be specific to angiosperms, such that the remaining seed plants, the gymnosperms, also lack direct orthologs of the angiosperm β-VPE lineage. This apparent incongruity might be explained in at least two distinct ways, which relate to the functions and expression of VPE orthologs. Firstly, gymnosperms, including Pinus and Picea, appear like angiosperms to possess multiple VPE isoforms, and these are orthologous as a group to all angiosperm VPEs, including the β-VPEs. Therefore, a careful analysis of VPE expression patterns and/or protein accumulation in gymnosperms might show that some gymnosperm VPEs, like angiosperm β-VPEs, play a specific role in the mobilization of seed protein reserves. A second explanation for the differences between the VPE family in angiosperms and gymnosperms might relate to a possibly lesser functional specificity of VPEs in gymnosperms compared to angiosperms, such that the same VPEs in gymnosperms might be responsible both for the mobilization of seed protein reserves and for the processing of proteins in vegetative tissues.
The separate origins of the α-, γ-, and δ-VPE lineages, as these occur in Arabidopsis, have been considerably elucidated in our phylogenetic analyses (Figure 2; Supplemental Figure S3). Accordingly, the δ-VPE lineage appears to have arisen by a gene duplication in a common ancestor of widely diverged core eudicot taxa including Arabidopsis, Medicago, Populus, and Vitis. The δ-VPE lineage is thus not present as a separate lineage in basal eudicots, monocots or Amborella, but is represented in these taxa by single or multiple pro-orthologs of all core-eudicot α-, γ-, and δ-VPEs. Interestingly, the very long branches within the core eudicot δ-VPE clade indicate this gene lineage to have evolved very rapidly, which may have been due to intense positive selection pressure, or to a very relaxed selection, leading to rapid, neutral evolution. Finally, the α-and γ-VPEs of Arabidopsis show direct one-to-one relationships of orthology only to VPEs from species of closely related Brassicaceae. These two genes therefore must have arisen through a gene duplication event within Brassicaceae.
The rapidity of evolution of the δ-VPE lineage compared to the remaining α/γ-lineage within the core eudicots suggests that the α-and γ-VPEs of Arabidopsis may have largely conserved the ancestral functions of the α/γ/δ lineage, while the δ-VPE lineage may have either acquired a novel function, or become specialized through sub-functionalization. Further experiments (e.g., RNASeq, RT-PCR) are needed to carefully analyze the expression of the three Amborella genes in seeds and vegetative tissues and different developmental stages. For example, it would be interesting to characterize the expression patterns of seed VPEs at the stage at which unprocessed pro-globulins forms accumulate in maturating seeds. Accordingly, it would be worth performing a thorough expression analysis of all three Amborella VPE genes in both vegetative and seed tissues. In particular, this would allow us to determine whether one of these genes is expressed in the seed coat, as is δ-VPE in Arabidopsis.

Gene Sequence Polymorphism along VPE Genes in Natural Populations
The intra-specific genetic diversity of Amborella for the three VPE genes was explored at both the exon and intron levels using 12 individuals that are considered representative of Amborella's extant geographical distribution and genetic diversity in New Caledonia (Poncet et al., 2013). While SNPs in introns were most frequent among all SNPs located in the protein-coding regions of the three genes, most of them were synonymous or led to changes for amino acid with similar polarities, and never in the enzyme active sites, suggesting low impact on the protein function. Moreover, the polymorphism pattern across individuals was totally congruent with the genetic structure previously observed with neutrally behaving markers, SSRs (Poncet et al., 2012(Poncet et al., , 2013 or with SNPs distributed throughout the genome (Amborella Genome Project, 2013) on the same populations (Figures 3, 4; Supplemental Figure S4).
If there were selective pressure acting on SNP polymorphisms, these three genes would not have co-varied with neutrallybehaving markers (SSRs and SNPs), and population divergence. Because no obvious signature of natural selection was detected either in exons or introns, the observed patterns of polymorphism were probably shaped by neutral demographic forces only, i.e., genetic drift and migration.
The present geographical distribution of Amborella in New Caledonia has been suggested to be a signature of the impact of ancient climatic changes, and notably of the severe restriction of favorable environments during the last glacial maximum (LGM) (ca. 22,000 years BP) and Holocene (ca. 12,000 years BP) when Amborella experienced a dramatic reduction (about 96%) in suitable area (Poncet et al., 2013). The survival and expansion of at least two lineages from putative refugia might have generated major diversity groups. The present data on VPE polymorphisms across the species provides an additional signature of the biogeographical history of Amborella.