Discovery of diversity in xylan biosynthetic genes by transcriptional profiling of a heteroxylan containing mucilaginous tissue

The exact biochemical steps of xylan backbone synthesis remain elusive. In Arabidopsis, three non-redundant genes from two glycosyltransferase (GT) families, IRX9 and IRX14 from GT43 and IRX10 from GT47, are candidates for forming the xylan backbone. In other plants, evidence exists that different tissues express these three genes at widely different levels, which suggests that diversity in the makeup of the xylan synthase complex exists. Recently we have profiled the transcripts present in the developing mucilaginous tissue of psyllium (Plantago ovata Forsk). This tissue was found to have high expression levels of an IRX10 homolog, but very low levels of the two GT43 family members. This contrasts with recent wheat endosperm tissue profiling that found a relatively high abundance of the GT43 family members. We have performed an in-depth analysis of all GTs genes expressed in four developmental stages of the psyllium mucilagenous layer and in a single stage of the psyllium stem using RNA-Seq. This analysis revealed several IRX10 homologs, an expansion in GT61 (homologs of At3g18170/At3g18180), and several GTs from other GT families that are highly abundant and specifically expressed in the mucilaginous tissue. Our current hypothesis is that the four IRX10 genes present in the mucilagenous tissues have evolved to function without the GT43 genes. These four genes represent some of the most divergent IRX10 genes identified to date. Conversely, those present in the psyllium stem are very similar to those in other eudicots. This suggests these genes are under selective pressure, likely due to the synthesis of the various xylan structures present in mucilage that has a different biochemical role than that present in secondary walls. The numerous GT61 family members also show a wide sequence diversity and may be responsible for the larger number of side chain structures present in the psyllium mucilage.


INTRODUCTION
A number of plants have seeds that produce mucilage that aids in hydration, dispersal and germination. The composition of mucilage varies considerably across species. As examples, Arabidopsis thaliana uses primarily pectin (Goto, 1985;Western et al., 2000) while flax utilizes a mixture of both pectin and arabinoxylan (Naran et al., 2008). Psyllium (Plantago ovata Forsk) mucilage is composed predominantly of complex heteroxylan (Edwards et al., 2003;Fischer et al., 2004;Guo et al., 2008) and, as such, presents an opportunity to discover genes involved in xylan production. The mucilage of psyllium is produced in a single cell tissue layer that is relatively easy to dissect from the developing seed. The mucilage produced by this tissue forms a large part of the tissue's dry mass and the ratio of xylan to cellulose is much higher than that found in secondary cell walls and thus represents an opportunity to distinguish genes involved in xylan Abbreviations: GT, glycosyltransferase; DPA, days post anthesis; ML, mucilaginous layer. formation from those involved in secondary cell wall biosynthesis. We have investigated this tissue, using transcriptional profiling, to determine which genes are highly expressed during mucilage formation. Using this approach we identified a previously uncharacterized component of the xylan synthases, IRX15 (Jensen et al., 2011).
Currently, a number of genes that affect xylan biosynthesis have been identified. In a few cases, the biochemical activities of these genes have been demonstrated; specifically, the addition of glucuronic acid side chain (GUX1, GUX2, GUX4; Lee et al., 2012a;Rennie et al., 2012) and the o-methylation of the glucuronic acid (GXMT1; Lee et al., 2012b;Urbanowicz et al., 2012). Three complementation groups of putative glycosyltransferase (GT) genes have been implicated in the synthesis of the β-(1,4)linked xylose backbone of xylan. Each of these three complementation groups consist of two genes, one gene with secondary cell wall expression pattern, named IREGULAR XYLEM (IRX) 9, IRX10 and IRX14, respectively, and one gene with much lower expression level and a more general expression pattern, named as their redundant homolog but with the suffix "LIKE" abbreviated L, e.g., IRX9-L. The four genes IRX9 (-L) and IRX14 (-L) are members of the GT family 43 (GT43) while the IRX10 (-L)genes are members of the GT47 family (Brown et al., 2005(Brown et al., , 2009Persson et al., 2005;Peña et al., 2007;Wu et al., 2009Wu et al., , 2010Lee et al., 2010). Our finding that IRX15, and its redundant homolog IRX15-L, also affects xylan chain length indicates further complexity of the xylan synthase (Brown et al., 2011;Jensen et al., 2011). Recently, a study performed in wheat endosperm has shown that, in contrast to Arabidopsis and psyllium, IRX15 is not expressed at high levels in the endosperm tissue, but homologs of IRX9, IRX14 and IRX10 are highly expressed . This result indicates that variation is possible in the makeup of the xylan synthase. It would appear that the synthesis of xylan in wheat endosperm does not require IRX15. Our previous results demonstrate that the xylan synthase responsible for complex heteroxylan biosynthesis in psyllium does not require IRX9 or IRX14, as these were found to be expressed at very low levels in this tissue. A homolog of IRX10 was, on the other hand, found to be abundantly expressed (Jensen et al., 2011). These indications of diversity in the xylan synthase seem to suggest that the one constant in xylan synthesis is IRX10. If IRX10 is primarily responsible for the synthesis of the xylan backbone it would be expected that the xylan synthase from the psyllium mucilaginous layer (ML) would express an IRX10 gene with different properties than found in tissues containing both GT47 and GT43 family members. Additionally, one would expect to find GTs responsible for the larger variety of xylan side chains found in the psyllium mucilage. We present in this study an examination of the IRX10 genes present in the ML, as well as stem tissue, and we examine other highly abundant transcripts in the ML encoding proteins likely involved in xylan biosynthesis.
Toluidine blue staining of psyllium inflorescence, stem top half and stem bottom half was performed on free-hand sections of fresh material. Sequential extraction of cell wall material from leaves, inflorescence, stem top half and stem bottom half and subsequent neutral monosaccharide analysis of the 1 M KOH fraction was performed as described in Jensen et al. (2011).

ASSEMBLY OF 454 ESTs AND DATABASE CONSTRUCTION
The five datasets of 454 ESTs were assembled collectively using the CLC Genomics Workbench version 4.7.2 (CLC bio, Cambridge, MA, USA) and the De-novo assembly algorithm (Parameters: Similarity 0.8; Length fraction 0.5; Insertion cost 3; Deletion cost 3; Mismatch cost 2). Unique counts were generated by aligning ESTs to the assembled contigs using the RNA-Seq Analysis algorithm for non-annotated sequences. (Parameters: Similarity 0.8; Length fraction 0.9). The assembled sequence contigs were annotated using TBLASTN (Altschul et al., 1997) against the TAIR 9 annotation of the Arabidopsis genome. The annotations were subsequently expanded with the following information: Arabidopsis gene family assignments from the Carbohydrate Active enZyme (CAZy) database (Cantarel et al., 2009;http://www.cazy.org;update 2012-05-31) were labeled e.g., "Glycosyltransferase Family 47 or "Glycoside Hydrolase Family 19 ; Arabidopsis proteins not included in CAZy but recently proposed to also encode GTs (Nikolovski et al., 2012) were labeled GT and the respective family name, eg. "Glycosyltransferase Family GT14R"; members of the nucleotide sugar transporter/triose phosphate translocators family in Arabidopsis (Ward, 2001) were added the label "NST/TPT family"; and transcription factors in the Database of Arabidopsis Transcription Factors (DATF; Guo et al., 2005; http://datf.cbi.pku.edu.cn/) were added the label "Transcription Factor"; genes co-expressed with IRX10 (r > 0.5; 184 genes) and with secondary cell wall CESA4, CESA7 and CESA8 (r > 0.5; 227 genes) (GeneCAT database; http:// genecat.mpg.de/cgi-bin/Ainitiator.py; Mutwil et al., 2008) were added the label "AtIRX10 Co-expression" and "At SCW CESA Co-expression," respectively. Contig name, DNA sequence, annotation and expression information were stored in an Oracle relational database that is located at http://glbrc.bch.msu.edu/ psyllium. The database can be queried using keywords that search contig annotation, including the added annotations mentioned above, while the contig sequence information can be analyzed using BLAST (Altschul et al., 1997) and query sequences, either DNA or protein, provided by the user. Information about each contig, such as DNA sequence, EST coverage and BLAST report against TAIR9, can be retrieved by clicking on the contig ID numbers and the "T" icon associated with each contig. Access to the individual contig data facilitates manual analysis for artifact assembly, such as ESTs from different genes grouped into the same contig or the identification of multiple contigs originating from the same transcript. Finally, a micro array viewer based on a gene expression map of Arabidopsis development (Schmid et al., 2005) is provided for each contig by clicking on the associated AGI.

IDENTIFYING GENES OF INTEREST
Because of sequencing errors, ESTs from one gene were in some cases assembled into two or more individual contigs. In the cases of PoIRX10_1 to _4 and PoGT61_1 to _7 the complete cDNA sequences were determined by cDNA cloning and Sanger sequencing. Four independent clones were sequenced in each case. PoIRX10_2 is not full length. The verified cDNA sequences were deposited at NCBI GenBank (http://www.ncbi.nlm.nih.gov/ genbank) with the following accessions KC832826 to KC832829 (PoIRX10_1 to _4) and KC894060 to KC894066 (PoGT61_1 to _7).

PHYLOGENETIC ANALYSIS
Phylogenetic trees were calculated by the use of MEGA 5.05 (Tamura et al., 2011), using the built-in ClustalW (Larkin et al., 2007) sequence alignment program, the Maximum Likelihood algorithm (Nei and Kumar, 2000), using the Poisson substitution model and bootstrapping based on 500 trees (Felsenstein, 1985). The phylogenetic analysis of GT61 members was based on protein sequences only. The phylogenetic analysis of GT47 members was based on cDNA sequences. First cDNA sequences were loaded in the MEGA program, then translated into protein sequences and aligned using the built-in ClustalW function (File S2; Larkin et al., 2007). The resulting codon based cDNA alignment was then used for phylogenetic analysis. Codon positions included were first, second, third, and non-coding.
Protein sequences were obtained from the Phytozome v8.0 database (Goodstein et al., 2012; http://www.phytozome.net/). For poplar (Populus trichocarpa, annotation v3.0) the genes Potri015G107200 and Potri015G116700 were not included in the analysis as these represent partial sequences. GT family 61 proteins from Arabidopsis and rice (Oryza sativa Japonica Group) were obtained from the CAZy database. In Brachypodium distachyon, all proteins annotated as GT family 61 proteins based on the recent genome annotation (International Brachypodium Initiative, 2010) were included.

DETERMINING DEGREE OF CELL WALL ACETYLATION
Ground plant material of Arabidopsis lower stem, dissected mucilaginous layers (8-10 DPA), psyllium husk (Now Foods, www.nowfoods.com), and whole psyllium seeds were washed three times with 70% ethanol, three times with 1:1 methanolchloroform, and two times with acetone to obtain alcohol insoluble residue (AIR). Acetyl groups from the alcohol insoluble residue were then released by alkaline hydrolysis by treating with 1 M KOH at room temperature for 5 min and then neutralized with an equal amount of HCl. The amount of freed acetic acid in solution was then subsequently determined using the K-ACETRM acetic acid quantification kit from Megazyme (www. megazyme.com).

TRANSCRIPT PROFILING OF PSYLLIUM STEM TISSUE, ASSEMBLY OF ESTs AND ASSIGNMENT OF FUNCTIONAL ANNOTATION
In order to compare xylan biosynthesis in the ML with xylan formation in other tissues of psyllium we first determined the neutral monosaccharide composition for different aerial parts of the plant ( Figure 1A). The psyllium stem and inflorescence yielded the highest levels of xylose, which were at levels comparable to Arabidopsis stem. Given glucose levels are low in these tissues, the high levels of xylose likely result from xylan as opposed to xyloglucan. Anatomical investigation by hand sectioning and toluidine blue staining verified the presence of secondary cell wall formation in both inflorescence and stem (Figures 1B-D). Subsequently, a series of sequential extractions, using CDTA, Na 2 CO 3 and KOH, were performed and the xylan enriched 1 M KOH fraction was subjected to neutral monosaccharide composition analysis ( Figure 1E). Only minor differences were found in the monosaccharide profiles between Arabidopsis lower stem, psyllium inflorescence and psyllium stem samples. Based on these analyses we chose to profile the transcriptome of psyllium stem.
The sequence data from the psyllium stem RNA-Seq experiment was added to four previous RNA-Seq datasets from psyllium ML (Jensen et al., 2011). This dataset of approximately 1 million ESTs was assembled into transcript models (contigs; Table S1 in Supplementary Material), annotated and stored in an Oracle relational database that is located at http://glbrc.bch.msu.edu/ psyllium.

OVERVIEW OF GLYCOSYLTRANSFERASES HIGHLY EXPRESSED IN PSYLLIUM MUCILAGINOUS LAYERS
Assembly and annotation of the five RNA-Seq datasets from psyllium resulted in identification of 634 contigs encoding putative GTs. The top 50 transcripts from this set are listed in Table 1 ranked by expression in the ML at 10 days post anthesis (DPA) stage. The most abundant transcripts encoding putative GTs (1000 ppm or higher in at least one of the four ML stages) are homologs of IRX10(-L) (GT47), GUX5 (GT8; Mortimer et al., 2010), RGP1/UAM (GT75; Konishi et al., 2007), and AT3G18170/AT3G18180 (GT61), and are likely involved in complex heteroxylan biosynthesis. Most of these highly abundant ML transcripts are not found in the stem transcriptome ( Table 1). Multiple homologous genes related to AT3G18170/AT3G18180 and IRX10(-L) are present in psyllium. These two gene families were investigated in further detail.
A significant level of primary cell wall biosynthesis is evident in the ML. Homologs of CESA1 and CESA3 (Arioli et al., 1998;Desprez et al., 2007;Persson et al., 2007) are found expressed in the range of 200 to 1000 ppm, while expression of putative xyloglucan GTs are found in the range of 50 to 350 ppm; e.g., homologs of CSLC4 (Cocuron et al., 2007), XLT2 (Jensen et al., 2012) and XXT3 (Vuttipongchaikij et al., 2012) (Table 1). A homolog of GAUT1 (Sterling et al., 2006) is found to be expressed at 79 ppm at 10 DPA, providing evidence for homogalacturonan synthesis. A homolog of the callose synthase, GSL12, is most abundant at 8 to 10 DAP (148 ppm) in the ML, indicating that cell division is taking place (Chen et al., 2009). Some level of secondary cell wall biosynthesis also appears to be present. Transcripts with homology to secondary cell wall CESA8 (IRX1) and CESA4 (IRX5) (Turner and Somerville, 1997;Persson et al., 2005) are found at a similar abundance as the GTs involved in xyloglucan biosynthesis. Transcripts with homology to CESA2, CESA5 and CESA9 are present in the ML transcriptome, especially abundant are transcripts with homology to CESA9. These three CESA proteins have been found to play important roles in Arabidopsis seed coat development, namely in mucilage attachment (CESA5) and formation of a secondary cell wall that reinforces the columella and radial wall (Mendu et al., 2011).
Evidence of mannan biosynthesis is indicated by the presence of CSLA2 (Dhugga et al., 2004;Goubet et al., 2009), MSR2 (Wang et al., 2012) and galactomannan galactosyltransferase (GMGT) (Edwards et al., 1999) homologs that have expression levels as high as 630 ppm (CSLA2 homolog, 10 DPA; Table 1). This finding is likely a result of endosperm tissue contamination in the dissected ML. The endosperm stores large amounts of mannan (Jensen et al., 2011) and given the attachment of the endosperm to the ML it is difficult to obtain ML tissue completely devoid of endosperm.
Out of the 50 most abundant transcripts shown in Table 1 there are 14 putative GT transcripts that cannot readily be assigned a function or to a pathway. Notably, many of these abundant transcripts have no expression in the stem transcriptome, as is seen for transcripts likely involved in heteroxylan biosynthesis (GT8, GT47, GT61, and GT75). This is in contrast to GTs involved in primary and secondary cell wall biosynthesis which reach expression levels in the stem of approximately 50 ppm or higher. The ML specific GTs without an assigned function therefore represent GTs possibly involved in complex heteroxylan synthesis in the psyllium ML, though involvement in other pathways unrelated to xylan syntheis is also possible.

PSYLLIUM STEM XYLAN BIOSYNTHESIS IS SIMILAR TO ARABIDOPSIS
All the transcripts identified encoding proteins homologous to IRX9(-L), IRX10(-L), IRX14(-L) and IRX15(-L) are listed in Table 2. This group of transcripts, with the exception of some IRX10(-L) and IRX15(-L) transcripts, had low expression or were not found in the ML. In the stem, the expression of these xylan specific genes was found to be unexpectedly low (100 ppm or lower). It appears, however, that this tissue is principally engaged in primary rather than secondary cell wall biosynthesis. When examining the expression of both the primary and secondary cell wall CESAs in the stem, the primary CESAs were found at levels as high as 1217 ppm (CESA3 ; Table 1) while the secondary CESAs were found at 10 fold lower levels. The expression of IRX9(-L), IRX10(-L), IRX14(-L), and IRX15(-L) in the stem therefore matches the level of secondary cell wall formation in this tissue. Therefore, it appears that psyllium has a similar complement of GTs found to be responsible for xylan synthesis as in Arabidopsis and that these genes are expressed at comparable levels in the psyllium stem.

FOUR HOMOLOGS OF ARABIDOPSIS IRX10 ARE HIGHLY EXPRESSED IN PSYLLIUM MUCILAGINOUS LAYERS
Transcripts encoding proteins homologous to IRX10(-L) show tissue specific distributions (Table 2), with transcripts present at high levels in the ML showing little or no expression in the stem, and vice versa. The presence of these two categories of IRX10(-L) transcripts led us to consider that at least two different genes with homology to IRX10(-L) are present in psyllium. We therefore manually examined a total of 12 IRX10(-L) contigs and found evidence of six unique IRX10(-L) genes in psyllium, named Plantago ovata IRX10 1 to 6 (PoIRX10_1 to _6). Four of these, those showing abundant expression in the ML (PoIRX10_1 to _4), were cloned from cDNA and sequenced. Analysis of the deduced amino acid sequence of PoIRX10_1, PoIRX10_3, and PoIRX10_4 for transmembrane domains as predicted by the TMHMM Server v. 2.0 (Krogh et al., 2001; http://www.cbs. dtu.dk/services/TMHMM/) resulted in a high score for a single N-terminal transmembrane domain for PoIRX10_1, an intermediate score for PoIRX10_4, and a very low score for PoIRX10_3 (File S1). The PoIRX10_2 cDNA sequence is missing the 5' end and was not analyzed.
The expression of PoIRX10_1 to _6 is shown in Figure 2. The expression profiles for PoIRX10_1 to _4 were generated by mapping the RNA-Seq data to the sequences obtained from the cDNA clones. The expression profile for PoIRX10_1 shows strong induction in the ML and reached maximum levels at 12 DPA, while PoIRX10_2 to _4 show a flat or a decreasing expression pattern over the four ML stages. PoIRX10_6 is not detected in the ML but is present in stem together with PoIRX10_5. The PoIRX10_5 is found in the ML but at a 10 fold lower level than PoIRX10_1 to _4.

POIRX10_1, _2 AND _4 REPRESENT SOME OF THE MOST DIVERGENT IRX10 PROTEINS YET IDENTIFIED
An examination of homologs of IRX10 from various higher plants showed a high degree of sequence conservation among these proteins. To obtain a broader view of this, we collected all IRX10 homologs from six different plant species with extensive phylogenetic diversity, all with fully sequenced and annotated genomes. This resulted in 18 IRX10 homologs from Physcomitrella patens (1), Selaginella moellendorffii (2), Arabidopsis thaliana (3), Populus trichocarpa (4), Brachypodium distachyon (5) and Oryza sativa (6). Table 3 shows the pair-wise amino acid maximum identity scores using the BLAST algorithm (Altschul et al., 1997; http://blast.ncbi.nlm.nih.gov/Blast.cgi) for these 18 IRX10 proteins compared against Arabidopsis IRX10 (AtIRX10) and the six PoIRX10. Arabidopsis FRA8 and XGD1 were included for comparison of more distantly related genes. FRA8 is the closest homolog to the IRX10(-L) genes in Arabidopsis (Zhong et al., 2005) and XGD1 is a xylosyltransferase from GT47 subgroup D (Jensen et al., 2008). The remaining of the pair-wise matrix is shown Table S2 in Supplementary Material. Eudicot A phylogenetic tree of the 24 IRX10 proteins, FRA8 and XGD1 is shown in Figure 3A. The phylogenetic analysis was performed on a codon based cDNA sequence alignment. This approach is beneficial when performing phylogenetic analysis of conserved proteins with many synonymous mutations. The tree identifies two major clades rooted by PpIRX10. Eudicot IRX10 sequences make up one of the major clades, while the other clade contains monocot IRX10 sequences and SmIRX10. Of the six psyllium proteins, PoIRX10_6 is grouped with AtIRX10 and two of the three poplar IRX10 proteins, while PoIRX10_1 to _5 form a separate group. The phylogenetic analysis therefore suggests that the expansion of PoIRX10 proteins has taken place after the separation of monocots and dicots.
Evaluation of evolutionarily conserved protein domains are a powerful method for predicting protein function and are collected in a number of searchable databases, e.g., Pfam (Punta et al., 2012) and InterPro (Hunter et al., 2009). The algorithm behind the SALAD database uses patterns of evolutionarily conserved motifs to determine relatedness (Mihara et al., 2010; http:// salad.dna.affrc.go.jp/salad/en/). As with other protein domain predicting methods, this approach emphasizes conserved protein function rather than phylogenetic relationships. In Figure 3B the 26 proteins from Figure 3A are depicted in a SALAD dendrogram. It shows that IRX10 proteins ranging in phylogenetic distance from P. patens to Arabidopsis are tightly clustered while PoIRX10_1, _2 and _4 form a distinct group. Notably, this psyllium specific clade consists of PoIRX10 proteins exclusively expressed in the ML. The SALAD motif structure (Figure 3C), used to construct the dendrogram, is conserved across the majority of IRX10 proteins. A few exceptions exist such as motif 5 is absent in the poplar gene Potri012G109200, motif 10 is absent in PoIRX10_2 and there is some motif variation in the N-terminus involving motif 11, 12, 14, and 15. In FRA8 motif 5, 6, and 10 are absent; while in XGD1 most of the motifs found in the

www.frontiersin.org
June 2013 | Volume 4 | Article 183 | 7 IRX10 proteins are absent. This indicates that PoIRX10_1, _2 and _4 have conserved the motif structure despite their more divergent protein sequences and suggests they have conserved protein function with the IRX10 proteins found in the other plant species.

SIMILARITIES IN XYLAN SIDE CHAIN DECORATIONS BETWEEN PSYLLIUM AND GRASSES ARE LIKELY THE RESULT OF CONVERGENT EVOLUTION
The psyllium database contains 18 contigs encoding proteins with close homology to AT3G18170 and AT3G18180. Many of these contigs represented partial transcripts and were assembled into full transcripts by manual inspection. These efforts yielded evidence for the presence of nine unique GT61 genes in psyllium, seven of which were cloned from cDNA and named Plantago ovata GT61 1 to 7 (PoGT61_1 to _7).
The expression profiles of PoGT61_1 to _7 in psyllium stem and ML are depicted in Figure 4. These expression levels were similarly high as those of the PoIRX10_1 to _4 genes in the ML and show either induction or flat to decreasing levels of expression during ML development. These proteins are therefore likely candidates for GT activities that form the side chain decorations on the ML complex heteroxylan. Figure 5 presents a phylogenetic tree of PoGT61_1 to _7 and all GT61 proteins identified in Arabidopsis, rice and B. distachyon (ClustalW alignment in File S4). The phylogenetic tree shows that the large diversification in grasses of this family is unrelated to the diversification found in psyllium. Therefore, the similar modifications of the xylan backbone found in psyllium ML and grasses are likely the results of convergent evolution.

POSSIBLE FUNCTION OF THE NUMEROUS PUTATIVE GLYCOSYLTRANSFERASES HIGHLY EXPRESSED IN PSYLLIUM MUCILAGINOUS LAYER
The structure of the xylan-based mucilage from the Plantago genus (ovata F., major L., asiatica L.) is highly complex

FIGURE 5 | Continued
full-length protein sequences deduced. A few transcripts encoding protein sequences homologous to some of the other six GT61 proteins in Arabidopsis were identified in the mucilagionous layers but these were expressed at negligible levels (<10 pm) and were not included in this analysis. The seven GT61 proteins highly expressed in psyllium mucilaginous layers (light blue) were aligned with all glycosyltransferase family 61 proteins from Arabidopsis (dark blue), rice (red) and Brachypodium distachyon (pink). (Samuelsen et al., 1999;Fischer et al., 2004;Yin et al., 2012). In the work of Guo et al. (2008) psyllium husk was fractionated using hot water and successive rounds of increasing concentrations of NaOH. This resulted in three fractions collectively accounting for 90% of the husk mass and predominantly consisting of Ara (15-25%) and Xyl (65-70%). Two of these fractions also yielded approximately 15% uronic acid. Each of the three fractions showed related but distinct glycosyl-linkage compositions providing evidence for the presence of extensively branched xylans in all three fractions. In all fractions, the branching appears to consist of single xylose residues, single arabinose residues and side chains of two to three sugars containing different combinations of xylose, arabinose, galactose, and mannose (Guo et al., 2008). An abundant side chain of α-Araf -(1→3)-β-Xylp-(1→3)-Araf present in the non-acidic fraction has been isolated and structurally characterized by NMR (Fischer et al., 2004). It therefore appears that the mucilage of P. ovata F. consist of several species of complex heteroxylans that have different structural compositions and physical characteristics.
The multiple side chains found in the psyllium mucilage are consistent with finding numerous GTs highly expressed in the ML. The identification of four different and abundantly expressed PoIRX10 genes is noteworthy. This may indicate that there are several heteroxylan subspecies being produced in the tissue and that each PoIRX10 protein is involved in making separate xylans by interacting with different decorating GTs. Alternatively, the four PoIRX10 proteins could form one or more complexes necessary to form the β-(1,4)-xylan backbone. Thirdly, some of the PoIRX10 proteins may be backbone decorating GTs and not involved in backbone synthesis ( Table S3 in Supplementary Material). It seems likely, however, that at least one of the ML specific PoIRX10 proteins constitutes the xylan synthase in this tissue, hence forming a xylan synthase activity different than that found in Arabidopsis and other eudicots.
Small amounts of rhamnose, glucose, glucuronic acid, galactose, and mannose have been identified in psyllium husk and proposed to be side chain decorations (Fischer et al., 2004;Guo et al., 2008;Yin et al., 2012). Additional GTs, from families other than GT61, may be involved in forming these side chains in psyllium heteroxylan. Two such candidates are the transcripts homologous to AT4G32290 and AT2G32750 ( Table S3 in Supplementary Material). AT4G32290 is a member of the GT14R family (Nikolovski et al., 2012). None of the members in this family have been characterized apart from having Golgi localization. The homologous transcript in psyllium is highly abundant in the ML. AT2G32750 is homologous to Arabidopsis MUR3 (Madson et al., 2003) and RLXT2 (Jensen et al., 2012), both of which transfer galactose onto xylose as part of xyloglucan biosynthesis. The homologous protein in psyllium is copiously expressed in the ML and could possibly transfer galactose onto xylose in psyllium heteroxylan.

PUTATIVE NUCLEOTIDE SUGAR TRANSPORTERS ARE HIGHLY EXPRESSED IN PSYLLIUM MUCILAGINOUS LAYERS
Golgi transport proteins for UDP-galactose, UDPgalactose/glucose and GDP-mannose have been identified in Arabidopsis (Reyes and Orellana, 2008;Handford et al., 2012) and rice (Seino et al., 2010) and are members of the NST/TPT superfamily (Ward, 2001). Proteins transporting other UDP-sugars have been proposed to also be members of this superfamily (Ward, 2001;Reyes and Orellana, 2008). Several UDP-sugar transporters are likely to be expressed in the ML in order to supply UDP-xylose and UDP-arabinofuranose to Golgi localized enzymes for the biosynthesis of complex heteroxylan. The UDP-arabinopyranose mutase (UAM) interconverts UDP-arabinofuranose and UDP-arabinopyranose (Konishi et al., 2007) and is located in the cytosol (Bar-Peled and O'Neill, 2011). The synthesis of arabinoxylan occurs in the Golgi and requires UDP-arabinofuranose, which appears to be uniquely produced by this mutase (Rautengarten et al., 2011). It is therefore necessary for UDP-arabinofuranose to be transported across the Golgi membrane in order for it to be incorporated into cell wall carbohydrates such as heteroxylan. Approximately 40% of the neutral sugar content of the ML cell wall is arabinose, likely requiring higher amounts of UDP-arabinofuranose import into ML Golgi. It is therefore likely that transcript levels for the UDP-arabinofuranose transporter would be high in the ML.
The enzyme UDP-xylose epimerase 1 (UXE1/MUR4), that interconverts UDP-xylose and UDP-arabinopyranose, has been found to be Golgi localized in Arabidopsis (Burget et al., 2003). Two contigs in the psyllium ML (M01000013775 and M01000025234) match sequences at the N-terminal and Cterminal of UXE1 and together may represent the full-length transcript of a psyllium UXE1 homolog. This homolog shares 81% amino acid sequence identity with UXE1 in the N-terminal where both proteins have transmembrane domains as predicted by the TMHMM Server v. 2.0. The psyllium UXE1 homolog is therefore likely a Golgi localized protein. To provide UDPxylose for UXE1, psyllium appears to express two isoforms of UDP-xylose synthase (UXS) at comparable levels in the ML (1000-6000 ppm) with just one of the two having a predicted transmembrane domain. Psyllium therefore appears to have the capacity to produce UDP-xylose in the cytosol, as well as in the Golgi. Finally, the substrate for UXS, UDP-glucuronic acid, is usually synthesized by the enzyme UDP-glucose 6-dehydrogenase (UGD) from UDP-glucose. The subcellular localization of this enzyme in psyllium ML could not be inferred as the contig lacks the N-terminal sequences, which would contain the transmembrane domain. The putative subcellular localization of these UDP-sugar interconverting activities in psyllium ML present several possible routes for the supply of UDP-xylose needed for xylan biosynthesis. The UDP-sugars imported into Golgi may be UDPglucose, UDP-glucuronic acid or UDP-xylose. Furthermore, as the psyllium ML UXE activity appears to be exclusively Golgi localized, UDP-arabinopyranose needs to be exported from Golgi to the cytosol in order to be converted to UDP-arabinofuranose by UMA. Hence, psyllium transporters of these UDP-sugars, as well as UDP-arabinofuranose as mentioned above, are likely to be expressed at elevated levels in the ML.
The psyllium database holds a total of 50 contigs encoding proteins with close homology to the Arabidopsis NST/TPT Family. Homologs of characterized proteins such as ATUTR3 (Reyes et al., 2010) and GONST1 (Baldwin et al., 2001) are found at nearly undetectable levels in the ML, while the most abundant transcripts reach expression levels as high as 2000 ppm. When ranked by abundance in the ML 10 DPA stage, three of the top four transcripts show homology to Arabidopsis NST proteins AT5G25400, AT1G21070 and AT4G32390, all of which are in an uncharacterized branch of the NST/TPT superfamily. Several members of this branch, including AT4G32390, have been found to be localized in the Golgi (Nikolovski et al., 2012). The second most abundant transcript has closest homology to AT1G06890. This protein has also been found in Golgi (Nikolovski et al., 2012) and is related to GONST4 and GONST5 (Handford et al., 2004).

ADDITIONAL GENES POSSIBLY INVOLVED IN XYLAN BIOSYNTHESIS IN PSYLLIUM MUCILAGINOUS LAYERS
Identifying an Arabidopsis gene with secondary cell wall expression and with a close homolog highly expressed in psyllium ML may indicate that such a gene is involved in xylan biosynthesis in Arabidopsis and psyllium ML, as has proved to be the case for the Arabidopsis IRX15(-L) proteins (Jensen et al., 2011). Table 4 shows the 12 most abundant transcripts in psyllium that show a similar expression pattern. These genes are likely involved in complex heteroxylan biosynthesis or in secondary cell wall formation associated with the psyllium ML.
The top member is homologous to Arabidopsis TBR38 and contains a domain of unknown function (DUF) 231. The DUF231 proteins constitute a 46-member protein family in Arabidopsis (Bischoff et al., 2010) in which the genes AXY4 (TBL27) and AXY4-Like (TBL22) have been shown to be involved in acetylation of xyloglucan (Gille et al., 2011) and ESK1 (TBL29) have been shown to be involved in acetylation of secondary cell wall xylan (Xiong et al., 2013). The other members of this family have been proposed to also be acetyltransferases specific for xyloglucan or other cell wall polymers, e.g., pectins and xylan (Oikawa et al., 2010;Gille and Pauly, 2012). TBR38 is part of an uncharacterized subclade of the TBR protein family. Given that the psyllium homolog has a much higher expression in the ML than the secondary cell wall CESA proteins it is likely involved in complex heteroxylan biosynthesis rather than secondary cell wall formation in this tissue. The level of cell wall acetylation in dissected psyllium ML is 12 μg acetic acid per milligram alcohol insoluble residue, approximately 4 fold lower that found in the alcohol insoluble residue of Arabidopsis lower stem (Figure 6). The acetic acid content in the psyllium ML corresponds to one acetic acid group for every 25 pentose sugars, assuming the cell wall material from the ML consist of 100% pentose sugars. Glucuronoxylan from aspen wood has been found to have an average degree of xylose backbone acetylation of approximately 60% (Teleman et al., 2000), while a degree of acetylation of approximately 50% has been found for arabinoxylan from corncobs and corn stover (Dongen et al., 2011). These findings may indicate that TBR38 and its psyllium homolog could function as xylan specific acetyltransferases.
Another candidate gene possibly involved in xylan formation in psyllium ML is a homolog of At5g47635 encoding a Pollen Ole e 1 allergen and extensin family protein. The work of Tan et al. (2013) identified and characterized two isoforms of a highly glycosylated AGP, named ARABINOXYLAN PECTIN ARABINOGALACTAN PROTEIN1 (APAP1). The authors identified two individual xylan oligomers attached as separate side chains of the APAP1 carbohydrate branch structure and so provide a link between AGP and xylan. Though other possibilities exist, the high expression of an extensin protein in psyllium ML and the secondary cell wall expression pattern of the closest Arabidopsis homolog may suggest that this extensin homolog functions by cross-linking mucilaginous heteroxylan into a bigger covalent network in the mucilage wall.

HOMOLOGS OF SEVERAL SECONDARY CELL WALL TRANSCRIPTION FACTORS ARE HIGHLY EXPRESSED IN PSYLLIUM MUCILAGINOUS LAYERS
The most abundant psyllium transcripts encoding putative transcription factors reach levels of 1000 ppm in the ML ( Table 5). Many of these transcripts show closest homology to Arabidopsis genes that are highly expressed throughout the Arabidopsis plant, including seed development, while another set of these transcripts have close homology to Arabidopsis genes that are specifically expressed during seed development, such as MYB61, NARS1, AT3G51880 and AT5G67480. Both MYB61 and NARS1 have been shown to play roles in seed coat development in Arabidopsis. Knockout mutants of MYB61 have reduced mucilage deposition and extrusion (Penfield et al., 2001), while NARS1 is expressed in the outer integument of the Arabidopsis seed where it regulates the degeneration of this tissue (Kunieda et al., 2008).
A third category of transcripts consists of ones with closest homology to Arabidopsis transcription factors involved in secondary cell wall formation, namely NST1, SND2, and KNAT7. All three transcription factors are potent regulators of secondary cell wall formation in Arabidopsis. NST1 was identified as a regulator of secondary wall thickening in anther endothecium (Mitsuda et al., 2005) and was later found to act redundantly with SND1 as a master regulator of secondary wall synthesis in fiber cells of Arabidopsis stem . Furthermore, in protoplast transactivation assays, NST1 directly activates MYB46, SND3, MYB103, and KNAT7 (Zhong et al., 2008. Overexpression of SND2 also leads to increased secondary wall thickening in Arabidopsis stem fiber cells (Zhong et al., 2008) and upregulation of, among other genes, MYB103 and SND1 (Hussey et al., 2011). KNAT7 loss-of-function mutants display IRX phenotypes (Brown et al., 2005) and Arabidopsis plants transformed with dominant repression constructs of KNAT7 lead to a moderate decrease of secondary cell wall thickening in Arabidopsis stems (Zhong et al., 2008). The KNAT7 protein has been shown to interact with OFP4 and both act as repressor proteins in protoplast transactivation assays and in planta (Li et al., 2011(Li et al., , 2012. The homologs of NST1, SND2, and KNAT7 found in psyllium ML may or may not be true orthologs to the Arabidopsis genes. However, it is striking to find several abundantly expressed homologs of transcription factors that have been implicated in secondary cell wall formation, a process involving extensive biosynthesis of xylan. The highly elevated expression levels of homologs NST1, SND2 and KNAT7 in the psyllium ML may therefore suggest that they are involved in regulating xylan biosynthesis in this tissue. Such regulatory circuit(s) in the psyllium ML may have evolved from the secondary cell wall regulatory cascade. It should be noted that this it is only a partial set of the transcriptional regulatory network controlling secondary cell wall formation in Arabidopsis (Demura and Ye, 2010) that may be detected in the psyllium ML. Of the proven downstream targets for NST1 and SND2, it is only a homolog of KNAT7 that is found highly expressed in the psyllium ML. Homologs of proven targets of NST1 in Arabidopsis, such as SND1, MYB46, and MYB103, are not detected, while a homolog of SND3 is detected but at low levels of approximately 50 ppm. When over-expressed, NST1 will induce abundant secondary cell wall formation in Arabidopsis mesophyll leaf cells . If the psyllium homolog of NST1 found in the ML is functionally othologous to Arabidopsis NST1, it appears that branches of the NST1 transcriptional cascade that leads to cellulose and lignin deposition, rather than xylan formation, has been specifically suppressed in the psyllium ML.

LESSONS LEARNED FROM XYLAN BIOSYNTHESIS IN PSYLLIUM MUCILAGINOUS LAYERS MAY PROVE VALUABLE FOR BIOFUELS RESEARCH AND BIOTECHNOLOGY
The study of tissues having cell walls with unusual composition may provide valuable insights into manipulating plant cell walls for improved characteristics as biofuel feedstocks, such as improved digestibility, higher biomass, and altered composition of lignin, cellulose, and hemicellulose. It seems plausible that the diverse cell walls found in many highly specialized tissues, for instance in many seeds, are derived from existing cell wall biosynthetic pathways and so provide examples of cell wall alterations which provide new characteristics. This study provides evidence for biosynthetic enzymes, sugarnucleotide transporters and transcription factors as likely candidates involved in xylan biosynthesis. These new targets may serve as novel entry points to manipulate xylan deposition and structure. To date, it has not been possible to reconstitute the xylan synthase activity from known components. This has limited our ability to assign roles for the genes shown to be components of the synthase by genetic methods. The four cloned PoIRX10 from the ML may constitute a simpler xylan synthase, as it has a reduced set of components, suggesting that it may be more tractable than xylan synthases from systems such as Arabidopsis. If so, the psyllium IRX10 genes would offer a tool for future research in understanding and manipulating xylan formation.
The seven cloned PoGT61 sequences may prove useful in altering xylan branch structures in cell walls of both monocot and eudicot crops for improved biofuel traits such as digestibility. Finally, identification of direct transcriptional regulators of xylan biosynthetic genes, such as IRX10, is likely to identify more genes involved in xylan biosynthesis which could constitute key points of regulating xylan biosynthesis.
Full access has been provided to the RNA-Seq data from psyllium through a user-friendly web interface. The database features several custom made tools facilitating further analysis and may provide a valuable resource for the research community in other areas than xylan biosynthesis, such as mucilage development. No. (DE-AC02-05CH11231). We also thank Nick Thrower for providing the bioinformatic expertise clustering of the cDNA RNA-Seq libraries.