Understanding Plant Cellulose Synthases through a Comprehensive Investigation of the Cellulose Synthase Family Sequences

The development of cellulose as an organizing structure in the plant cell wall was a key event in both the initial colonization and the subsequent domination of the terrestrial ecosystem by vascular plants. A wealth of experimental data has demonstrated the complicated genetic interactions required to form the large synthetic complex that synthesizes cellulose. However, these results are lacking an extensive analysis of the evolution, specialization, and regulation of the proteins that compose this complex. Here we perform an in-depth analysis of the sequences in the cellulose synthase (CesA) family. We investigate the phylogeny of the CesA family, with emphasis on evolutionary specialization. We define specialized clades and identify the class-specific regions within the CesA sequence that may explain this specialization. We investigate changes in regulation of CesAs by looking at the conservation of proposed phosphorylation sites. We investigate the conservation of sites where mutations have been documented that impair CesA function, and compare these sites to those observed in the closest cellulose synthase-like (Csl) families to better understand what regions may separate the CesAs from other Csls. Finally we identify two positions with strong conservation of the aromatic trait, but lacking conservation of amino acid identity, which may represent residues important for positioning the sugar substrate for catalysis. These analyses provide useful tools for understanding characterized mutations and post-translational modifications, and for informing further experiments to probe CesA assembly, regulation, and function through site-directed mutagenesis or domain swapping experiments.

effects on root and hypocotyl elongation or seed coat development (Beeckman et al., 2002;Desprez et al., 2002;Ellis et al., 2002;Stork et al., 2010). However, the quadruple knockout to CesAs 2, 5, 6, and 9 is lethal at the pollen stage, indicating that these gene products are partially redundant with each other and that the presence of at least one is essential (Desprez et al., 2007;Persson et al., 2007). This demonstrates that, like the secondary CesA complex, the primary complex is composed of three essential classes of CesAs.
The genes of the CesA family lie within a larger gene superfamily called the cellulose synthase-like (Csl) superfamily which are found throughout the plant kingdom and synthesize many of the non-cellulosic polysaccharides of the plant cell wall. Genes in this superfamily share a common origin and show distant similarity to CesAs identified in bacteria, which form their own clade sister to the plant Csl superfamily. All genes in the plant Csl superfamily fall into the category of type 2 processive glycosyltransferases, and each contains the active residues D, D, D, QxxRW. The amino acid sequences around these residues differ between families. Although all proteins in this superfamily have several transmembrane domains, the length and topology of the proteins also differs between families. Most Csl proteins localize to the Golgi and are presumably active there, while the CesA proteins localize to both the Golgi and plasma membrane and are presumably active only at the membrane.
In addition to the CesA family, the Csl superfamily has been divided into families CslA/B/C/D/E/F/G/H/, and J (Yin et al., 2009). Of these, CslD and CslF are most similar in sequence to the CesA genes and share the same membrane topology as the CesAs (Yin et al., 2009). CslFs are responsible for the production of β1 → 3, β1 → 4 mixed glucan linkages found in grasses (Burton et al., 2006). The product of the CslDs remains unknown.
Despite the importance of cellulose synthesis to the cell wall, a number of key questions about the CesAs remain unanswered. While it is clear from genetic evidence that there is a requirement for three distinct genetic CesAs positions in Arabidopsis primary and secondary complexes, it is not clear how broadly this division is shared across land plants. Since members of one CesA class cannot rescue mutants to a separate class, it would be expected that there are class-specific contributions that each CesA makes either to the architecture of the complex, its catalytic activity, or through the recruitment of interacting accessory and regulatory partners. However, at this time it is not understood what the class-specific contributions of each genetic position might be.
At the same time, it is not clear how the activity or assembly of the complex is regulated. Since mechanical or osmotic stress causes internalization of CesA proteins within minutes (Crowell et al., 2009;Gutierrez et al., 2009), there must be very rapid and coordinated changes to CesAs in the complex and recruitment of endocytosis factors in the presence of certain stimuli. It is also likely that some mechanism prevents cellulose synthesis from occurring in the Golgi, where the presence of crystalline cellulose could be disruptive to normal Golgi architecture and trafficking. This aspect sets CesAs apart from several other Csl family proteins, including CslFs, which are active in the Golgi (Carpita and McCann, 2010). Since CslFs are closely related to CesAs, there should be some region or modification that explains how CslFs can be active in the Golgi, while CesA activity is restricted there.
Here we use an in-depth analysis of the sequences of the CesA family to identify regions, amino acids, and modifications of potential importance to the assembly, function, and regulation of CesA proteins. Using CesA sequences from eight complete and three incomplete plant genomes representing the evolutionary distance from moss to eudicots (Table 1), we determine that CesA sequences diverged into six clades representing each essential component of the primary or secondary complex sometime after the divergence of lycophytes but prior to evolution of seed plants.
Our results identify a number of regions and residues with the potential to explain differences in activity, interaction, and regulation between the CesAs. They will prove useful for the interpretation of future mutants and experimental results. They can also be used to inform further investigation of the CesA family through site-directed mutagenesis or domain swapping.

MaterIals and Methods acquIrIng and alIgnIng sequences
Sequences were obtained by using the BLASTP resource from NCBI (Altschul et al., 1997) using each of the 10 Arabidopsis CesAs ( Table 2) and comparing against the available plant genomes (Table 1) with the genome resources available from JGI (http://www.phytozome. net). All hits were recorded, totaling 88 CesA sequences. The same was done with 62 CslD sequences from Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera, Oryza sativa, Selaginella moellendorffii, and Physcomitrella patens and 8 CslF sequences from Oryza sativa. A general BLAST using each Arabidopsis CesA sequence against bacterial genomes was performed to acquire 12 bacterial CesA sequences that could be used as an outgroup to root the phylogenetic analysis for all included plant Csl sequences. Specifically, the bacterial sequences were used to test whether the aligned plant sequences grouped appropriately into their corresponding CesA, CslD, and CslF families, and to ensure that no sequences falling outside these three families had been included in the analyses.
An alignment was generated of the plant CesAs using the alignment program MUSCLE (Edgar, 2004). In addition, an alignment of the CesA, CslD, and CslF sequences was generated by MUSCLE for comparisons among the clades. These alignments are available in Supplementary Material.
The sequence of each alignment was parsed by individual residue into a MySQL database relating the amino acid in each protein to its position in the protein and in relation to the clade and global alignments. This database allowed for rapid, automated comparisons between CesA sequences and served as the basis for subsequent analysis.
Further annotations were added based on transmembrane prediction, phosphorylation, and mutation data as well as all subsequent calculations. This database was queried against for all subsequent analysis comparing and calculating conservation at residues. It was also used in the calculation of class-specificity as shown in Figure 2. The annotations were also used to determine the position of all features used in schematics of the CesA protein shown.

constructIon of phylogeny
A phylogeny of the CesA sequences was constructed using maximum likelihood performed with the program PhyML (Guindon and Gascuel, 2003). The final phylogenetic tree was generated using PhyML from Phylogeny.fr with 100 bootstraps on the alignments of the CesA family.
We calculated the conservation score for each position in the alignment for each CesA clade, as well as at the conservation at each alignment position across the entire CesA family.
To calculate a score for class-specificity, we compare the conservation within a clade to the conservation in the overall family. The conservation of the clade is subtracted by the conservation across all clades and multiplied by the conservation across all clades (negative values are fixed at 0). When there is strong conservation within a clade at an alignment position, but not strong conservation across all clades the class specificity score will be close to 1. When there is perfect conservation across all clades or no conservation within a clade, the score will be 0. Figure 2A provides an example of these calculations for a position in a set of two clades with six total sequences.

representIng sequence conservatIon In a regIon
To represent the conservation of sequences in each family, we created frequency plots of the amino acids found at each position in the global alignment. The height of an amino acid character at a given position is proportional to the frequency at which that amino acid occurs in the alignment. We used WebLogo to generate these plots (Crooks et al., 2004).

deterMInIng phosphorylatIon sItes
The Arabidopsis Protein Phosphorylation Site Database (PhosPhAt) compiles phosphorylation sites in Arabidopsis that have been specifically observed through mass spectrometry (Heazlewood et al., 2008). Each of the positions observed to be phosphorylated in any of the Arabidopsis proteins was manually annotated in the database as an observed phosphorylation site. These annotations also indicated whether the phosphorylation had been specifically observed at a site, or whether it was an ambiguous event due to the presence of a phosphate in a peptide with multiple serines or threonines. Phosphorylation sites observed specifically in CesA7, but not present in the PhosPhAt database were also included (Taylor, 2007).

assIgnIng transMeMbrane regIons
To predict the location of the transmembrane domains of the protein, we used the module TMAP available from the European Molecular Biology Open Software Suite (Persson and Argos, 1994;

IdentIfyIng class-specIfIc regIons
To develop a measure for class-specificity, we used the global alignment of all plant CesAs and the BLOSUM62 substitution matrix (Henikoff and Henikoff, 1992). First we developed a metric to score conservation at an alignment position ranging from 0 to 1. We calculated conservation by comparing the sum-of-pairs (SP) score from the BLOSUM62 matrix, a measure of amino acid similarity commonly used to score multiple alignments (Gupta et al., 1995), against the "theoretical maximum" score (SP_Max) that could be observed at that position. SP_Max is determined by calculating SP as though each amino acid shared perfect conservation with all the others in the column. We calculated conservation at each position by dividing SP by SP_Max. Effectively, this is a normalized value of conservation at each position, with perfect conservation at an aligned position giving a value of 1 (Figure 2A).

Selaginella moellendorffii SM Lycophyte Yes
Cellulose synthase genes were identified from the following genomes listed here with the abbreviation used in this study and the classification of the organism. The following CesA and CslD sequences of Arabidopsis were as the starting point of a BLAST search to identify homologs in other organisms. Rice et al., 2000). TMAP uses an alignment of the protein family as the input for its predictions. The alignment of the CesAs was used. The resulting prediction contained eight transmembrane domains at positions agreeing with the established literature (Somerville, 2006).

IdentIfyIng known MutatIons
To map the known mutations, the literature of published mutations was screened. The position of each mutation was annotated in the database and the conservation of that position within CesA clades, within the full CesA family, and across the CesA, CslD, and CslF clades was manually investigated.

IdentIfyIng unIque resIdue requIreMents
To identify positions with strong conservation of amino acid property but lacking conservation of specific amino acid identity, we divided the amino acids into categories based on the properties in Table 8. We screened for all sites with greater than 80% of the amino acids in the alignment column of the CesA family having the same property, but with no dominant amino acid in the alignment column (i.e., the most frequently observed amino acid in the column occurred less than 50% of the time). Each of the candidate columns was manually evaluated to judge whether any potentially interesting information could be determined.

results phylogeny of the cesa faMIly
To better understand the evolution of the CesA family, we aligned the CesA sequences of the 11 genomes used for the study with MUSCLE and built a maximum likelihood phylogeny using PhyML (Guindon and Gascuel, 2003;Edgar, 2004). This phylogeny demonstrates that the CesA sequences of P. patens and S. moellendorffii form their own clades, while in the seed plants the phylogeny is separated into six major clades, reflecting ancient gene duplication events (Figure 1). Each of the clades represents one of the six required CesA classes identified in A. thaliana and O. sativa mutant studies. Each completed genome of the seed plants contains at least one CesA in each of the six major clades, suggesting that the requirement of a representative at each of the six genetic positions is shared across the seed plants. In addition, the CesA6 clade is divided into two subclades: one subclade (6A) is found only in the sampled eudicot species, while the other (6B) is found in both monocots and eudicots. Arabidopsis appears to have lost its CesA6B sequence. For convenience, we will refer to the CesA clades here based on their similarities to the A. thaliana (At) CesAs, e.g., "CesA1 clade" is the set of CesA genes most closely related to At CesA1.
The phylogenetic subdivision of the family is also reinforced by similarities in the expression patterns of CesA genes between Arabidopsis and rice. The secondary CesA genes of Arabidopsis (AtCesA4, AtCesA7, and AtCesA8) and rice (OsCesA4, OsCesA7, and OsCesA9) are expressed highly and specifically in vasculature. There is a class of highly and broadly expressed primary CesAs (AtCesAs 1, 3, and 6 and OsCesAs 1, 3, and 8). In both rice and Arabidopsis, the members of the CesA6 clade(AtCesAs 2, 5, and 9 and OsCesAs 5 and 6) are expressed differentially and in specialized tissues (Wang et al., 2010).  Table 1.

analysIs of sequence conservatIon IdentIfIes fIve class-specIfIc regIons
Because each clade reflects its own required and non-redundant isoform in the CesA complex, we decided to identify class-specific regions within the CesA sequence that could explain this specialization. We developed an algorithm to score the degree of class-specificity in a clade or across the CesA family at a given alignment position. This algorithm gives a score of 0 when there is either perfect conservation across all CesAs conservation at all at an alignment position. When a position shows strong conservation within a clade and this differs from the residues found across all CesAs, the class specificity score will be close to 1. For an example calculation see Figure 2A.
To present these data, we graphed the overall conservation and class-specificity scores as an average value of a sliding window over the sequence (Figure 2B). The overall conservation across all CesAs is similar to those published from the initially identified sequences (Pear et al., 1996).
Considering class-specificity across all CesA clades, five particular regions stand out: the far N-terminus and C-terminus are highly class-specific, there are two peaks of class-specificity within the second cytoplasmic loop, and the region at the end of the first cytoplasmic loop has a local peak in both conservation and class-specificity ( Figure 2B). Plotting the class-specificity score for each family individually revealed interesting differences between CesAs. For the far N-terminus, only the motifs for CesA clades 1, 6, and 7 were significantly class-specific. For the far C-terminus, only the motifs for CesA clades 1, 3, 4, and 7 were class-specific (Figures 2C,D).
To investigate these class-specific regions more closely, we generated frequency plots of the conserved sequences found in each CesA clade. The far N-terminus of the CesAs shows a strongly conserved pattern L(V/I)AGSHNRNE(F/L)V in the clades containing CesA1, 6, and 7. Interestingly, moss and lycophyte CesA sequences have strong similarity to the sequences seen in CesA clades 1, 6, and 7, while CesA clades 3, 4, and 8 appear to have lost this region ( Figure 3A).
The far C-terminal end of the CesA proteins is a ∼23 amino acid long putatively cytoplasmic region following the eighth transmembrane domain. CesA clades 1, 3, 4, and 7 show strong conservation, while the sequences in CesA clades 1 and 8 are more divergent ( Figure 3B). The second strongly conserved proline in this region is particularly noteworthy, because it is mutated to serine in the Arabidopsis CesA3 mutant allele rsw5. This mutant suffers radial swelling and reduced elongation when grown at the restrictive temperature of 31°C (Baskin et al., 1992) and the rsw5 mutation was shown to impair the ability of CesA3 to competitively incorporate into the CesA complex (Wang et al., 2006).
The first class-specific region in the second cytoplasmic loop occurs at the beginning of the hypervariable region. Though there are strongly class-specific residues, there is a similar overall pattern. In each case a cysteine-rich region is flanked on either side by a highly charged (usually positive) region ( Figure 3C). We will describe and discuss the results from the remaining class-specific regions in the section on CesA phosphorylation.

sequence conservatIon and putatIve phosphorylatIon sItes
The PhosPhAt compiles phosphorylation sites in Arabidopsis that have been specifically observed through mass spectrometry (Heazlewood et al., 2008). A number of phosphorylation events FiGuRe 2 | Conservation and class-specificity in the cellulose synthase family. An example calculation of conservation (Cons) and class specificity using the sum-of-pairs (SP) and max-sum-of-pairs (SP_max). SP and SP_Max are determined from the BLOSUM62 matrix (A). The conservation and class specificity averaged for all CesA genes with a sliding window of 10 amino acids is plotted. The x-axis represents the position in the global alignment. The cartoon above the graph plots the major CesA features against alignment positions. Transmembrane domains are represented as green boxes, the zinc finger as the blue oval, and both hypervariable regions are listed (HVR) (B). The class specificity for each CesA family is plotted for the primary CesAs (C) and secondary CesAs (D).
represent the fourth class-specific region identified previously (Figure 2B), following the cysteine-rich region ( Figure 3C). The secondary CesAs 4, 7, and 8 have lost the region corresponding to this site. Site-directed mutagenesis at this position has been shown to impair normal cellulose synthesis in CesA1 (Chen et al., 2010). The conservation of this position across the other primary CesAs suggests they may be similarly regulated.

IdentIfIcatIon of a poInt MutatIon sIte whIch dIstInguIshes cesa classes
A growing number of CesA mutants have been characterized in Arabidopsis. Characterized mutations provide an excellent opportunity to look for class-specific residues within the CesA family and the CesA superfamily as a whole. If a mutation at a particular residue in one of the CesA clades causes a phenotype, but the residue is not well conserved in other CesA clades, it is a good indication that the residue is involved in some class-specific function.
Mutations are also of relevance with respect to the CslD and CslF families, which share a common architecture of transmembrane, catalytic, and hypervariable regions. Since the similarity between CesA, CslD, and CslF is so high, any mutations known to impair function in CesAs, but are not conserved in the CslD or CslF families may provide valuable clues about the mechanisms for synthesis in these groups. To compare the CesAs to the CslDs and CslFs we acquired 62 CslD and 8 CslF sequences and aligned these with the 88 CesA sequences to create a superfamily alignment.
There are currently 18 characterized missense mutants in CesAs ( Table 4). All characterized mutations except rsw5 occur in residues that are strongly conserved amongst CesAs. However, a number of the characterized mutants in the CesA clades occur at residues that are quite different in the CslD and CslF families. In some cases, particularly mutations in the C-terminal region, the positions of CesA mutations are absent from the alignment of CslD have been observed in several Arabidopsis CesAs. One challenge in interpreting these phosphorylation data is that it is often difficult to determine whether an observed phosphorylation event is physiologically relevant. By mapping reported phosphorylation events onto our alignment, we show that 14 of the 23 sites show strong conservation (>70%) of serines and threonines ( Table 3).
The phosphorylation sites occur in two distinct regions. There is a broad region in the hypervariable region in the first cytoplasmic loop. This region contains several sites with strong conservation in CesA1 and CesA3 clades. There is also a single site in this region corresponding to the Arabidopsis CesA3 S211 site that is generally a serine/threonine in CesA1, CesA3, and the moss/lycophyte clade and may represent an important ancestral site of phosphorylation ( Figure 4B). This region corresponds to the second peak in classspecificity found in the CesAs (Figure 2B). With the exception of the site corresponding to CesA3 S211, the gymnosperm sequences are quite different from the angiosperm sequences in this region, suggesting that the evolution of phosphorylation at these sites may be a more recent adaptation ( Figure 4C).
The second phosphorylated region occurs in the hypervariable region in the second cytoplasmic domain near the active sites ( Figure 4A). Primary CesA clades 1, 3, and 6A, and also moss/ lycophyte sequences show strong conservation of serine/threonines in this region, suggesting that phosphorylation at this position is an ancient regulatory event. These sites and their flanking residues FiGuRe 3 | Sequences found in selected class-specific regions. Frequency plots of the sequences found in CesA clades represent the diversity of amino acids found at an alignment position, with more frequent amino acids drawn larger. The cartoon above each plot represents the location of that sequence in the global alignment. The numbers to the left of each sequence indicate which CesA clade the sequence was derived from, M represents moss plus lycophyte. A region at the far N-terminus is conserved in CesAs 1, 6, 7, and mosses and lycophytes, but lost from CesAs 3, 4, and 8 (A). The C-terminus shows strong conservation within CesAs 1, 3, 4, and 6, but not CesAs 1 and 8. The site of the rsw5 (P → S) is marked by an asterisk and is also not conserved in CesAs 1 and 8 (B). A region with conserved cysteines is flanked by charges in the second hypervariable region of the CesAs (C).

dIscussIon
In this paper we have examined the sequences of the CesA family, identifying that the evolution of the family involved early duplication events giving rise to the six essential clades of CesAs found in seed plants. We have identified class-specific regions that may be responsible for the differences between these clades. We have determined which observed phosphorylation sites are strongly conserved and mapped these onto the class-specific regions identified. We have determined that only one of the observed CesA mutations may indicate functional differences between CesA clades, but that a number of mutant sites indicate possible differences between the CesAs and their closest related families: the CslDs and CslFs. Our phylogeny indicates that the specialization of the CesA family occurred at some time between the evolution of the lycophytes and the evolution of the gymnosperms. Unfortunately, no fern CesA sequences are available to further narrow the exact time of the major duplications. Fern sequences would be particularly useful since there are some regions in which the available gymnosperm CesAs have characteristics which more closely resemble the ancestral moss sequences than the derived monocot and eudicot sequences. For example, the gymnosperm CesA3 lacks conservation around a number of phosphorylation sites strongly conserved in monocot and eudicot sequences, and the gymnosperm CesA3 contains an N-terminal region which closely resembles the region conserved across mosses, lycophytes, and CesAs 1, 6, and 7.
In every species in our analysis, the genes in each of the primary CesA clades (CesAs 1, 3, and 6) had more copies than the genes in the secondary CesA clades (CesAs 4, 7, and 8). Of the 78 CesA sequences in the seed plants, 53 were in the primary CesA clades while only 25 were in secondary ones. Among the primary CesAs, the CesA6 clade has undergone much more diversification and divergence than the other positions, and it is also the only clade to be further subdivided due to a duplication event likely occurring after the evolution of the eudicots. Unfortunately, Arabidopsis has lost its CesAs that correspond to the angiosperm CesA6B branch, retaining only eudicot-specific CesA6A members. One potential explanation is that the recent duplication of the Arabidopsis CesA6A family into CesAs 2, 5, 6, and 9 was either a cause or a result of the loss of its CesA6B gene. The greater diversification of the CesA6 position may suggest that this position is regulated to achieve subtler control of CesA complex. and CslF sequences. In other cases, such as in lew2-2, irx1-1, or bc11, the observed residue in CslD or CslF is often different from the conserved Arabidopsis residue and occasionally identical to the mutation that causes a phenotype in the CesAs. Understanding the nature of these mutations may prove useful to determining the mechanisms behind the function of CesAs, CslDs, and CslFs. In addition, a number of premature stop codons have been observed in mutant screens of CesAs. We have listed the ones we are aware of at the time of this writing ( Table 5).

IdentIfIcatIon of aroMatIc posItIons potentIally IMportant for catalysIs
When an amino acid is absolutely conserved at a particular position it is often difficult to determine which of its chemical characteristics are essential to its function. Understanding the nature of an amino acid's contribution to protein function can be important for understanding the mechanism of the protein's action and being able to engineer the protein in the future. We screened for positions with strong conservation of an amino acid property, but without strong conservation of amino acid identity (Table 8). For most amino acid properties, it was impossible to suggest a function for the residues. However, we identified two positions with the unusually characteristic of conserving an aromatic property without conserving amino acid identity (Tables 6 and 7). Each of these aromatic residues is found in close proximity to one of the aspartate residues identified as important to catalysis in GT2 family glycosyltransferases (Figure 5). The corresponding position in the CslD and CslF clades also invariably contained an amino acid with an aromatic ring.

FiGuRe 4 | Conserved phosphorylation sites of interest.
The conservation of observed phosphorylation sites is detailed by frequency plots. Red asterisks are placed above positions in which phosphorylated peptides were observed, but the specific residue phosphorylated was ambiguous. Black asterisks are place above unambiguous phosphorylation. A region near the catalytic sites has strongly conserved serines/threonines in the primary CesAs and the moss and lycophyte sequences (A). Serines/threonines observed to be phosphorylated in CesA3 are strongly conserved and aligned to a strongly conserved serine/threonine in CesA1 and the moss and lycophyte sequences (B). A phosphorylated region shows strong conservation within the CesA1 family is also found to have several phosphorylated serines/threonines in the CesA3 family, but only in eudicots (labeled 3D), possibly indicating phosphorylation here is a recent adaptation in eudicot CesA3s (C).
FiGuRe 5 | Residues with conserved aromatic property but not amino acid identity. The location of positions in the global alignment with strong aromatic property but not strong conservation of identity are marked by a orange hexagon on the schematic of the CesA protein. Shown in red are the conserved aspartate residues identified as important to catalysis in family 2 processive glycosyltransferases. The black Q marks the location of the QxxRW motif. Shown below this are frequency plots derived from the alignment of all CesA, CslD, and CslF sequences in this region. The amino acid with conserved aromatic property is labeled with an orange hexagon above it. The aspartate identified as important in the D,D,D,QxxRW motif is identified with a red asterisk.
The class specific region in the C-terminus is particularly interesting because there is a mutant which suggests a possible function for the region. The proline to serine mutation in the CesA3 mutant rsw5 was less able to competitively access the complex (Wang et al., 2006). This proline is absolutely conserved in all sequences of CesAs 1, 3, 4, and 7, but is not conserved in CesAs 1 and 8. In addition, certain residues near this proline show greater similarity between the clades CesA3 and CesA7 and between the clades of CesA6 and CesA4. This is the only region observed which shows a stronger mapping between primary and secondary CesAs than within primary or secondary CesAs. This could suggest that this region is responsible for some aspect of assembly found in both primary and secondary complexes and could indicate a common stoichiometry or assembly between the primary and secondary CesAs that is at least partially mediated We identified class-specific regions potentially responsible for the differences between the clades. Some of these represent regions found in the earliest CesA sequences that were subsequently lost in certain clades. The loss of the conserved N-terminal region in CesA clades 3, 4, and 8 may indicate that these CesAs rely on the presence of this region in the other CesA clades to provide whatever essential function it provides the CesA complex.
The phosphorylation sites in the catalytic domain represent the fourth class-specific region. This region is found in the primary CesAs and mosses, suggesting its role in regulating cellulose synthesis is ancient. However, the region is absent from all secondary CesAs. This could suggest that this region serves a purpose that is unnecessary in the secondary CesAs. For example, since cellulose synthesis in the secondary cell wall is a terminal event, CesAs might not need to be recycled from the membrane.  A list of nonsense mutations observed in the CesA family. The classification of region is the same as in Table 4. ing the phosphorylation sites differ between CesA1 and CesA3, the distribution of the phosphorylated serines and threonines follows a similar overall pattern, possibly suggesting that phosphorylation in these residues may trigger a similar conformational change.
All phosphorylation events observed in CesA6-like isoforms are not-conserved even within the closely related CesA6-like isoforms of Arabidopsis. If these sites are functionally relevant, this would indicate that regulation at these positions is a very recent innovation, and a potentially important way for plant cells to differentially control the CesA6-like isoforms.
The fact that with only one exception every documented mutation is strongly conserved across CesAs should be considered in the light of how these mutations were found. These mutations were characterized from mutagenic screens that looked for hypocotyl and root elongation defects. These screens may miss more subtle phenotypes. This could suggest that mutations to residues that would be class-specific have less severe effects than most mutations documented so far, perhaps reflecting partial redundancy of function in other isoforms over certain classspecific regions. A similar observation could be made about the fact that no missense mutations have been observed in CesA6-like genes, while multiple nonsense mutations have been reported. This may suggest that the partial redundancy at the CesA6 position protects the complex from disrupting point mutations as long as enough protein is present to maintain stiochiometric balance between CesA1, CesA3, and CesA6-like isoforms.
The observation of two positions with conserved aromatic property is interesting in the context of the activity of CesAs as glycosyltransferases. Many protein-carbohydrate interactions are mediated by the stacking effects of the electrons in an aromatic amino acid with those of the sugar ring. This has been documented in a variety of systems such as certain glycohydrolases and glycosyltransferases (Haga et al., 2003).
In conclusion, we have identified a number of CesA sequence properties that can aid in the understanding of how this essential gene family has evolved and specialized during the course of the evolution of the plant kingdom. In addition, this analysis provides a wealth of targets that can be probed in more detail through site-directed mutagenesis, domain swapping, or inter-species rescue experiments. acknowledgMents Andrew Carroll is supported by the U.S. Department of Energy (grant no. DE-FG02-09ER16008). Chelsea Specht is supported by NSF IOS 085641 and the Hellman Family Faculty Fund. by the C-terminus. This could also explain why no complementing C-terminal fluorescent protein fusions have been reported in the literature.
Many of the phosphorylation events in the N-terminal region of the CesA1 and 3 are found in a similar pattern. With the exception of the S211 site, many CesA3 phosphorylation events are not found in the gymnosperm representatives, while others are only found in eudicots. This could suggest that phosphorylation in this region developed over time in the CesA3 family. Starting from the S211 site, additional sites emerged in early angiosperms and later in evolution further sites emerged in the early eudicots. This suggests that phosphorylation in this region may have emerged convergently in the CesA1 and CesA3 clades, indicating that this region is in an ideal position to control some aspect of CesA behavior. Although the sequences surround-