<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Plant Sci.</journal-id>
<journal-title>Frontiers in Plant Science</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Plant Sci.</abbrev-journal-title>
<issn pub-type="epub">1664-462X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpls.2017.00066</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Plant Science</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Bioinformatics Prediction and Evolution Analysis of Arabinogalactan Proteins in the Plant Kingdom</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Ma</surname> <given-names>Yuling</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/407235/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Yan</surname> <given-names>Chenchao</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Li</surname> <given-names>Huimin</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Wu</surname> <given-names>Wentao</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Liu</surname> <given-names>Yaxue</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Wang</surname> <given-names>Yuqian</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Chen</surname> <given-names>Qin</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/320056/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Ma</surname> <given-names>Haoli</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/317996/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>State Key Laboratory of Crop Stress Biology for Arid Areas, College of Agronomy, Northwest A&#x00026;F University</institution> <country>Yangling, China</country></aff>
<aff id="aff2"><sup>2</sup><institution>National Base for the Talents on Life-Science and Technology, Innovation Experimental College, Northwest A&#x00026;F University</institution> <country>Yangling, China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Diwakar Shukla, University of Illinois at urbana-champaign, USA</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Li Tan, University of Georgia, USA; Elisabeth Jamet, Paul Sabatier University, France</p></fn>
<fn fn-type="corresp" id="fn001"><p>&#x0002A;Correspondence: Qin Chen <email>chenpeter2289&#x00040;nwsuaf.edu.cn</email></p></fn>
<fn fn-type="corresp" id="fn002"><p>Haoli Ma <email>mahaoli&#x00040;nwsuaf.edu.cn</email></p></fn>
<fn fn-type="other" id="fn003"><p>This article was submitted to Plant Physiology, a section of the journal Frontiers in Plant Science</p></fn></author-notes>
<pub-date pub-type="epub">
<day>26</day>
<month>01</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="collection">
<year>2017</year>
</pub-date>
<volume>8</volume>
<elocation-id>66</elocation-id>
<history>
<date date-type="received">
<day>21</day>
<month>11</month>
<year>2016</year>
</date>
<date date-type="accepted">
<day>12</day>
<month>01</month>
<year>2017</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2017 Ma, Yan, Li, Wu, Liu, Wang, Chen and Ma.</copyright-statement>
<copyright-year>2017</copyright-year>
<copyright-holder>Ma, Yan, Li, Wu, Liu, Wang, Chen and Ma</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>Arabinogalactan proteins (AGPs) are a family of extracellular glycoproteins implicated in plant growth and development. With a rapid growth in the number of genomes sequenced in many plant species, the family members of AGPs can now be predicted to facilitate functional investigation. Building upon previous advances in identifying <italic>Arabidopsis</italic> AGPs, an integrated strategy of systematical AGP screening for &#x0201C;classical&#x0201D; and &#x0201C;chimeric&#x0201D; family members is proposed in this study. A Python script named Finding-AGP is compiled to find AGP-like sequences and filter AGP candidates under the given thresholds. The primary screening of classical AGPs, Lys-rich classical AGPs, AGP-extensin hybrids, and non-classical AGPs was performed using the existence of signal peptides as a necessary requirement, and BLAST searches were conducted mainly for fasciclin-like, phytocyanin-like and xylogen-like AGPs. Then glycomodule index and partial PAST (Pro, Ala, Ser, and Thr) percentage are adopted to identify AGP candidates. The integrated strategy successfully discovered AGP gene families in 47 plant species and the main results are summarized as follows: (i) AGPs are abundant in angiosperms and many &#x0201C;ancient&#x0201D; AGPs with Ser-Pro repeats are found in <italic>Chlamydomonas reinhardtii</italic>; (ii) Classical AGPs, AG-peptides, and Lys-rich classical AGPs first emerged in <italic>Physcomitrella patens, Selaginella moellendorffii</italic>, and <italic>Picea abies</italic>, respectively; (iii) Nine subfamilies of chimeric AGPs are introduced as newly identified chimeric subfamilies similar to fasciclin-like, phytocyanin-like, and xylogen-like AGPs; (iv) The length and amino acid composition of Lys-rich domains are largely variable, indicating an insertion/deletion model during evolution. Our findings provide not only a powerful means to identify AGP gene families but also probable explanations of AGPs in maintaining the plant cell wall and transducing extracellular signals into the cytoplasm.</p>
</abstract>
<kwd-group>
<kwd>arabinogalactan proteins</kwd>
<kwd>bioinformatics</kwd>
<kwd>chimeric AGP</kwd>
<kwd>evolution</kwd>
<kwd>Finding-AGP program</kwd>
</kwd-group>
<contract-num rid="cn001">31500159</contract-num>
<contract-num rid="cn002">2016JQ3029</contract-num>
<contract-sponsor id="cn001">National Natural Science Foundation of China<named-content content-type="fundref-id">10.13039/501100001809</named-content></contract-sponsor>
<contract-sponsor id="cn002">Natural Science Foundation of Shaanxi Province<named-content content-type="fundref-id">10.13039/501100007128</named-content></contract-sponsor>
<counts>
<fig-count count="6"/>
<table-count count="6"/>
<equation-count count="0"/>
<ref-count count="70"/>
<page-count count="17"/>
<word-count count="12108"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>Arabinogalactan proteins (AGPs) are a subfamily of hydroxyproline-rich glycoproteins (HRGPs) and implicated in many processes of plant growth and development (Seifert and Roberts, <xref ref-type="bibr" rid="B52">2007</xref>; Kavi Kishor et al., <xref ref-type="bibr" rid="B24">2015</xref>). AGPs consist of a protein backbone and carbohydrate side chains rich in arabinose and galactose (Ellis et al., <xref ref-type="bibr" rid="B8">2010</xref>; Showalter and Basu, <xref ref-type="bibr" rid="B58">2016</xref>). In most conditions, glycosylphosphatidylinositol (GPI) anchor signals are present in the C-terminal of AGPs (Borner et al., <xref ref-type="bibr" rid="B5">2003</xref>). AGPs are highly glycosylated, namely the percentage of carbohydrate is often more than 90% and the molecular mass is above 60&#x02013;300 kD (Seifert and Roberts, <xref ref-type="bibr" rid="B52">2007</xref>; Ellis et al., <xref ref-type="bibr" rid="B8">2010</xref>; Hijazi et al., <xref ref-type="bibr" rid="B19">2014a</xref>; Nguema-Ona et al., <xref ref-type="bibr" rid="B44">2014</xref>). AGPs are glycoproteins with high heterogeneity due to the arrangement patterns and variable contents of different monosaccharaides (Gaspar et al., <xref ref-type="bibr" rid="B10">2001</xref>).</p>
<p>Based on variable core protein backbones, AGPs were generally classified as classical AGPs and non-classical AGPs (Showalter, <xref ref-type="bibr" rid="B56">1993</xref>, <xref ref-type="bibr" rid="B57">2001</xref>). Protein backbones of classical AGPs usually consisted of three parts: an N-terminal signal peptide; single central domain with varying length and rich in Pro, Ala, Ser, and Thr (PAST) residues; and a C-terminal GPI anchor signal (Schultz et al., <xref ref-type="bibr" rid="B50">2000</xref>). The PAST-rich domains of one type of classical AGPs are usually separated by Lys-rich regions and termed as Lys-rich classical AGPs (Li and Showalter, <xref ref-type="bibr" rid="B33">1996</xref>; Gilson et al., <xref ref-type="bibr" rid="B13">2001</xref>; Sun et al., <xref ref-type="bibr" rid="B64">2005</xref>). The other kind of classical AGPs, termed as AG peptides because of its mature protein backbone, are only 10&#x02013;15 amino acids in length. There are also many chimeric AGPs with different conservative domains that could be classified into three main subfamilies: fasciclin-like AGPs (FLA; Johnson et al., <xref ref-type="bibr" rid="B23">2003</xref>; Ma and Zhao, <xref ref-type="bibr" rid="B35">2010</xref>; MacMillan et al., <xref ref-type="bibr" rid="B38">2015</xref>), phytocyanin-like AGPs (PAG; Mashiguchi et al., <xref ref-type="bibr" rid="B39">2009</xref>; Ma et al., <xref ref-type="bibr" rid="B36">2011</xref>), and xylogen-like AGPs (XYLP; Motose et al., <xref ref-type="bibr" rid="B43">2004</xref>; Kobayashi et al., <xref ref-type="bibr" rid="B26">2011</xref>). In addition, AGPs with sequence characteristics of both AGPs and extensins (EXT) are termed as AGP-extensin hybrids (HAE; Showalter et al., <xref ref-type="bibr" rid="B60">2010</xref>).</p>
<p>Typical AGPs are rich in PAST and these amino acids are regularly arranged as Ala-Pro, Ser-Pro, and Thr-Pro, which were introduced as arabinogalactan (AG) glycomodules (Shpak et al., <xref ref-type="bibr" rid="B62">1999</xref>; Ellis et al., <xref ref-type="bibr" rid="B8">2010</xref>). Previous studies have used synthetic peptides to examine glycosylation patterns that refer to special arrangements. Pro that is non-contiguously present in repeated sequences of Ala-Pro and Ser-Pro are totally hydroxylated for glycosylation and the glycans are rich in arabinose and galactose. Meanwhile, Pro contiguously arranged in Ser-Pro<sub>2&#x02212;4</sub> are also hydroxylated and the main component of the glycan is arabinose except that both arabinosides and arabinogalactan polysaccharides were found in the carbohydrate of Ser-Pro<sub>3</sub> repeats (Shpak et al., <xref ref-type="bibr" rid="B62">1999</xref>, <xref ref-type="bibr" rid="B61">2001</xref>). These experiments led to the Hyp-contiguity hypothesis, that states contiguous Hyp (e.g., Ser-Hyp<sub>2&#x02212;4</sub>) are mainly glycosylated by arabinoses, while non-contiguous Hyp are glycosylated by AG. Several studies have proved this hypothesis by using a specific reagent called &#x003B2;-glucosyl Yariv (&#x003B2;-GlcY) that could bind with the carbohydrate moieties of AGPs, which was used to purify AGPs and further examine the glycosyl composition and the distribution patterns of Hyp. On the basis of this method, studies have already proved that at least 19 proteins in <italic>Arabidopsis</italic> are glycosylated by AG, including classical AGPs, FLA, PAG, and AG-peptides (Schultz et al., <xref ref-type="bibr" rid="B50">2000</xref>; Johnson et al., <xref ref-type="bibr" rid="B23">2003</xref>; Hijazi et al., <xref ref-type="bibr" rid="B17">2012</xref>). There are also several &#x003B2;-GlcY reactive AGPs in <italic>Oryza sativa</italic> namely OsAGP1, OsAGPEP1, OsAGPEP2, OsAGPEP3, OsENDOL1, and OsLTPL1 (Mashiguchi et al., <xref ref-type="bibr" rid="B41">2004</xref>). Although X-Pro repeats (where X represents Ala, Ser, or Thr) are present in a lot of known AGPs, there are also some exceptions without non-contiguous X-Pro repeats. For example, AG modified SOS5/FLA4 (Salt Overly Sensitive 5/Fasciclin-like AGP 4) only contains TPPPT and SPPPA motifs, and three PPAKAPIKLP repeats are found in AtAGP30 (Shi et al., <xref ref-type="bibr" rid="B54">2003</xref>; van Hengel and Roberts, <xref ref-type="bibr" rid="B66">2003</xref>; Griffiths et al., <xref ref-type="bibr" rid="B16">2016</xref>). By analyzing mutated sequences of sporamin, it was found that Pro located in amino acid sequences, such as [not basic]-[not T]-[AVSG]-Pro-[AVST]-[GAVPSTC]-[APS], are efficiently AG glycosylated (Shimizu et al., <xref ref-type="bibr" rid="B55">2005</xref>).</p>
<p>On the basis of biased amino acid compositions and special sequence arrangements, recent approaches use bioinformatics to identify AGPs from <italic>Arabidopsis</italic> and rice (Schultz et al., <xref ref-type="bibr" rid="B51">2002</xref>; Ma and Zhao, <xref ref-type="bibr" rid="B35">2010</xref>; Showalter et al., <xref ref-type="bibr" rid="B60">2010</xref>). An excellent Perl script called &#x0201C;amino acid bias&#x0201D; can effectively distinguish PAST-rich proteins from others with certain thresholds (e.g., &#x0003E;50%, Schultz et al., <xref ref-type="bibr" rid="B51">2002</xref>). However, chimeric AGPs with a relatively low PAST proportion are not easily discovered by using amino acid bias. A series of studies identified chimeric AGPs by homology searching of FLA, XYLP, and PAG across genome databases of <italic>Arabidopsis</italic>, rice, wheat, cabbage, eucalyptus, and poplar (Johnson et al., <xref ref-type="bibr" rid="B23">2003</xref>; Faik et al., <xref ref-type="bibr" rid="B9">2006</xref>; Mashiguchi et al., <xref ref-type="bibr" rid="B39">2009</xref>; Ma and Zhao, <xref ref-type="bibr" rid="B35">2010</xref>; Kobayashi et al., <xref ref-type="bibr" rid="B26">2011</xref>; Ma et al., <xref ref-type="bibr" rid="B36">2011</xref>, <xref ref-type="bibr" rid="B37">2014</xref>; Li et al., <xref ref-type="bibr" rid="B31">2013</xref>; MacMillan et al., <xref ref-type="bibr" rid="B38">2015</xref>; Zang et al., <xref ref-type="bibr" rid="B69">2015</xref>). Furthermore, the BIO OHIO software is developed to identify and classify AGPs, EXTs, proline-rich proteins (PRPs), and hybrid HRGPs in <italic>Arabidopsis</italic>. Typically, Ala-Pro, Pro-Ala, Ser-Pro, and Thr-Pro counts are used to evaluate AGPs in addition to calculating the proportion of PAST (Showalter et al., <xref ref-type="bibr" rid="B60">2010</xref>). Recently, the newly released version 2.0 of BIO OHIO was used to identify the HRGPs of <italic>Populus trichocarpa</italic>, including 162 AGPs, 60 EXTs, and 49 PRPs (Showalter et al., <xref ref-type="bibr" rid="B59">2016</xref>). Although whole genome sequences of many plant species have been released, to date only the entire gene families of <italic>Arabidopsis</italic> and rice are systematically analyzed. Building upon previous studies referred to amino acid bias and BIO OHIO, we develop a program named &#x0201C;Finding-AGP&#x0201D; to identify entire AGP gene family from mass data. Compared with previous advances in identifying AGPs, the Finding-AGP program could not only identify AGPs with high PAST percentage (&#x0003E;50%) but also cover most chimeric AGPs with low PAST percentage. Because the main processes of post translational modifications including Pro hydroxylation and AG glycosylation were happened in the endomembrane system including endoplasmic reticulum and Golgi apparatus (Gaspar et al., <xref ref-type="bibr" rid="B10">2001</xref>; Nguema-Ona et al., <xref ref-type="bibr" rid="B44">2014</xref>), and most predicted AGPs and all confirmed AGPs by monosaccharide composition analysis were predicted to be secreted (Schultz et al., <xref ref-type="bibr" rid="B50">2000</xref>; Johnson et al., <xref ref-type="bibr" rid="B23">2003</xref>; Mashiguchi et al., <xref ref-type="bibr" rid="B41">2004</xref>; Hijazi et al., <xref ref-type="bibr" rid="B17">2012</xref>), the presence of N-terminal signal peptide was used a dichotomous variable to reduce the number of false positives. The AG glycomodules were determined by statistical analyses of the amino acid compositions of 87 representative AGP-like sequences. The motif of successful AG glycosylation was defined to be at least three glycomodules which were interspaced by no more than 10 amino acid residues. Based on above descriptions, seven variables were incorporated into the Finding-AGP program to find AGP-like sequences, including total length, total PAST percentage, total glycomodule number, partial length, partial PAST percentage, partial glycomodule number, and glycomodule index. Moreover, we used the Finding-AGP program to identify the entire AGP gene families of 47 selected plant species. The most important contribution of this study is in finding a more accurate and effective way to identify AGPs.</p>
</sec>
<sec sec-type="materials and methods" id="s2">
<title>Materials and methods</title>
<sec>
<title>Development and basic operations of the finding-AGP script</title>
<p>A Python script named Finding-AGP was written on PyCharm Edition 5.0.3 to find AGP-like sequences and calculate the sequence characteristics of whole protein sequences and AGP-like sequences (part of whole protein sequences), which could be used on Microsoft Windows and Linux CentOS systems. In this study, the glycomodules were determined to be Ala-Pro, Pro-Ala, Ser-Pro, Pro-Ser, Thr-Pro, and Pro-Thr, and there were at least three glycomodules in corresponding AGP-like sequence. The Finding-AGP script could screen for AGP candidates using seven variables under user-defined parameters, including the length of whole protein sequence (Length<sub>T</sub>) and AGP-like sequence (Length<sub>P</sub>), the PAST percentage in whole protein sequence (PAST<sub>T</sub>%) and AGP-like sequence (PAST<sub>P</sub>%), the glycomodule number of the whole protein sequence (GlycoNo<sub>T</sub>) and AGP-like sequence (GlycoNo<sub>P</sub>), and the glycomodule index of the AGP-like sequence (GlycoIndex). The input files were compatible with multiple formats, such as pep, fasta, and txt. The output files contained two txt files. One txt file included the protein identifiers meeting the criteria, values of the seven variables, and the AGP-like sequences of the corresponding identifier. The other output was the sequences of AGP candidates in fasta format.</p>
</sec>
<sec>
<title>Publicly available data collection</title>
<p>A wide range of sequenced plant species were used in the present study, including 33 species of eudicot plants and 10 species of monocot plants. There were also one species each in gymnosperm, pteridophyte, bryophyta, and chlorophyta. The annotated protein sequences of most species were downloaded from Phytozome V11 (<ext-link ext-link-type="uri" xlink:href="https://phytozome.jgi.doe.gov/pz/portal.html">https://phytozome.jgi.doe.gov/pz/portal.html</ext-link>) and the others were obtained from genome sequencing databases of the species (database websites and data versions were listed in Supplementary Table <xref ref-type="supplementary-material" rid="SM5">5</xref>).</p>
</sec>
<sec>
<title>Signal peptide predictions</title>
<p>The portable version of SignalP 4.1 available for Linux was requested from the SignalP website (<ext-link ext-link-type="uri" xlink:href="http://www.cbs.dtu.dk/services/SignalP/">http://www.cbs.dtu.dk/services/SignalP/</ext-link>) (Petersen et al., <xref ref-type="bibr" rid="B46">2011</xref>) and installed on a computer with a Linux CentOS system. The Perl (Version 5.6 or higher) and GNUPLOT (Version 4.0 or higher) programs must be already installed for successful running of SignalP 4.1. The number of input sequences allowed per run (MAX_ALLOWED_ENTRIES) was edited in the top of the file &#x0201C;signalp&#x0201D; and the value was set as 100,000 which were more than the greatest number of proteins in all selected species. The sensitive mode was used to judge whether there was a signal peptide in the N-terminal of a protein, namely <italic>D</italic>-value cutoff was more than 0.34.</p>
</sec>
<sec>
<title>BLASTP searches</title>
<p>Local BLAST analyses were performed using the stand-alone BLAST application (version ncbi-blast-2.2.28&#x0002B;). Text-based commands were input to run utilities through a command window. The protein sequences of 47 species in fasta format were reformatted into database files suitable for the BLAST application using the command of formatting database. Then, the protein sequences of known AGPs in fasta format were used as seed sequences to obtain homologous proteins using BLASTP utility with a cutoff <italic>e</italic>-value (<italic>e</italic><sup>&#x02212;3</sup>).</p>
</sec>
<sec>
<title>Key bioinformatics websites and settings</title>
<p>A series of bioinformatics websites were used in this study. (i) The phylogenetic relationship of 47 selected plant species was determined by the common tree taxonomy tool at NCBI (<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi">http://www.ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi</ext-link>) and referred to the phylogenetic tree of the species in Phytozome (<ext-link ext-link-type="uri" xlink:href="https://phytozome.jgi.doe.gov/pz/portal.html">https://phytozome.jgi.doe.gov/pz/portal.html</ext-link>). (ii) The signal peptides were also predicted on SignalP 4.1 Server (<ext-link ext-link-type="uri" xlink:href="http://www.cbs.dtu.dk/services/SignalP/">http://www.cbs.dtu.dk/services/SignalP/</ext-link>) in addition to running on local laptop, the input files were in fasta format and the <italic>D</italic>-cutoff values (0.34) were set in the sensitive mode. (iii) The GPI-anchored signals were determined on big-PI Predictor-GPI Modification Site Prediction (<ext-link ext-link-type="uri" xlink:href="http://mendel.imp.ac.at/sat/gpi/gpi_server.html">http://mendel.imp.ac.at/sat/gpi/gpi_server.html</ext-link>). (iv) The first 20 amino acids and last 20 amino acids were excluded before predicting the transmembrane domains, because the N-terminal signal peptides and C-terminal GPI anchor signals were usually predicted to be transmembrane domains. The transmembrane domains of putative AGPs were predicted on TMHMM Server v. 2.0. (v) The conserved domain of chimeric AGPs was determined on the NCBI Batch Web CD-Search Tool (<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi">http://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi</ext-link>).</p>
</sec>
<sec>
<title>Searching criteria for AGPs</title>
<p>Based on the definition of AG glycomodules and the previous description of AGPs, AGPs were termed as proteins that contained predominantly glycomodules (at least three) throughout all or partial sequence (except N-terminal secreted, C-terminal GPI-anchored signal, and other conserved regions) of the protein backbones without having repeated sequence corresponding to EXTs or PRPs (e.g., Ser-Pro<sub>2&#x02212;4</sub> or PVKCYT; Schultz et al., <xref ref-type="bibr" rid="B51">2002</xref>; Showalter et al., <xref ref-type="bibr" rid="B60">2010</xref>). The classifications of AGP subfamilies were listed as follows: (i) classical AGPs consisted of a N-terminal signal peptide, a PAST-rich region of variable length, and a GPI-anchored signal; (ii) AG-peptides were a subclass of classical AGPs of short length (&#x0003C;90 amino acid residues); (iii) Lys-rich classical AGPs were another subclass of classical AGPs, their PAST regions were spaced by a short Lys-rich region; (iv) AGPs were classified as chimeric if there were atypical regions in addition to the PAST-rich region except N- and C-terminal signals; (v) There were also a subclass of AGPs named AGP-extensin hybrids that have characteristics of AGPs and extensins.</p>
</sec>
<sec>
<title>Multiple sequence alignment and phylogenic analysis</title>
<p>The full-length amino acid sequences of Lys-rich classical AGPs were used in multiple sequence alignments, which were performed using ClustalX (version 1.83) with default settings. An unrooted phylogenetic tree was generated using MEGA 6.0 with the neighbor-joining method and bootstrapping was performed 1000 times.</p>
</sec>
</sec>
<sec sec-type="results" id="s3">
<title>Results</title>
<sec>
<title>Statistical analyses on amino acid compositions of AGP-like sequences</title>
<p>According to the biased amino acid composition (high PAST percentage, usually &#x0003E;50%) and specific arrangement of Ala-Pro, Pro-Ala, Ser-Pro, and Thr-Pro, AGPs were distinguishable from other kinds of proteins and other subfamilies of HRGP such as EXT and PRP (Schultz et al., <xref ref-type="bibr" rid="B51">2002</xref>; Showalter et al., <xref ref-type="bibr" rid="B60">2010</xref>). In order to reveal the amino acid code of arabinogalactan (AG) glycosylation, we conducted statistical analyses on amino acid compositions of 325 known AGPs from 22 plant species. The 325 AGPs included 42 classical AGPs, 9 Lys-rich AGPs, 40 AG-peptides, 5 HAE, 98 FLA, 74 PAG, 35 XYLP, and 22 non-classical AGPs (Supplementary Tables <xref ref-type="supplementary-material" rid="SM1">1</xref>, <xref ref-type="supplementary-material" rid="SM2">2</xref>). First, we calculated the total number of each amino acid (20 in total) and the frequency of given amino acids, then the percentage of each amino acid in all amino acid residues and the average number per sequence were also determined (Supplementary Table <xref ref-type="supplementary-material" rid="SM3">3</xref>). Seven types of amino acids including Ala (13.09%), Ser (10.76%), Pro (10.47%), Leu (8.55%), Val (7.51%), Thr (7.10%), and Gly (6.75%) accounted for almost half of all amino acids (50.38%) and were present in all 325 sequences except for Thr which was absent in PpAGP5. Obviously, the amino acid compositions of signal peptides, GPI-anchor signals, and conserved domains (e.g., fasciclin-like, etc.) significantly affected correct estimation of AGP-like sequences. Thus, a total of 87 classical AGPs, Lys-rich classical AGPs, and AG-peptides (short classical) were selected to represent the characteristics of AGP-like sequences. Meanwhile, the signal peptides and GPI-anchor signals were removed at most possible cleavage sites to avoid interference caused by these domains. The statistical analyses presented in Supplementary Table <xref ref-type="supplementary-material" rid="SM3">3</xref> were also conducted on AGP-like sequences of the 87 AGPs mentioned above (Table <xref ref-type="table" rid="T1">1</xref>). As a result, we found that 1772 residues of Pro accounted for 27.58% of all amino acids and were the most abundant of the 20 amino acids, indicating that these sequences were family members of HRGPs. The number of Ala, Ser, and Thr were 1313, 899, and 566 in total, respectively, and accounted for 20.44, 13.99, and 8.66% of all amino acids, respectively. It was noteworthy that Pro and Ala were presented in all 87 selected sequences. Compared with amino acid compositions of 325 known AGPs in full-length, the order of the seven most abundant amino acids changed from &#x0201C;Ala, Ser, Pro, Leu, Val, Thr, Gly&#x0201D; to &#x0201C;Pro, Ala, Ser, Thr, Val, Gly&#x0201D; in the analyses of 87 AGP-like sequences, indicating that the enrichment of Leu was not a leading feature of AGP-like sequences (Table <xref ref-type="table" rid="T1">1</xref> and Supplementary Table <xref ref-type="supplementary-material" rid="SM3">3</xref>). It was believed that the high levels of PAST percentage in 87 processed sequences (70.67% of all amino acids) could be used to identify AGP-like sequences.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p><bold>Compositions of 20 amino acids in 87 AGP-like sequences</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Amino acid<xref ref-type="table-fn" rid="TN1"><sup>a</sup></xref>(three letter)</bold></th>
<th valign="top" align="left"><bold>Amino acid (one letter)</bold></th>
<th valign="top" align="center"><bold>Total number<xref ref-type="table-fn" rid="TN2"><sup>b</sup></xref></bold></th>
<th valign="top" align="center"><bold>Frequency of given amino acid in 87 sequences</bold></th>
<th valign="top" align="center"><bold>Average number of given amino acid per sequence<xref ref-type="table-fn" rid="TN3"><sup>c</sup></xref></bold></th>
<th valign="top" align="center"><bold>Percentage of given amino acid in all amino acids (%)<xref ref-type="table-fn" rid="TN4"><sup>d</sup></xref></bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Pro</td>
<td valign="top" align="left">P</td>
<td valign="top" align="center">1772</td>
<td valign="top" align="center">87</td>
<td valign="top" align="char" char=".">20.37</td>
<td valign="top" align="char" char=".">27.58</td>
</tr>
<tr>
<td valign="top" align="left">Ala</td>
<td valign="top" align="left">A</td>
<td valign="top" align="center">1313</td>
<td valign="top" align="center">87</td>
<td valign="top" align="char" char=".">15.09</td>
<td valign="top" align="char" char=".">20.44</td>
</tr>
<tr>
<td valign="top" align="left">Ser</td>
<td valign="top" align="left">S</td>
<td valign="top" align="center">899</td>
<td valign="top" align="center">82</td>
<td valign="top" align="char" char=".">10.96</td>
<td valign="top" align="char" char=".">13.99</td>
</tr>
<tr>
<td valign="top" align="left">Thr</td>
<td valign="top" align="left">T</td>
<td valign="top" align="center">556</td>
<td valign="top" align="center">74</td>
<td valign="top" align="char" char=".">7.51</td>
<td valign="top" align="char" char=".">8.66</td>
</tr>
<tr>
<td valign="top" align="left">Val</td>
<td valign="top" align="left">V</td>
<td valign="top" align="center">323</td>
<td valign="top" align="center">66</td>
<td valign="top" align="char" char=".">4.89</td>
<td valign="top" align="char" char=".">5.03</td>
</tr>
<tr>
<td valign="top" align="left">Gly</td>
<td valign="top" align="left">G</td>
<td valign="top" align="center">261</td>
<td valign="top" align="center">69</td>
<td valign="top" align="char" char=".">3.78</td>
<td valign="top" align="char" char=".">4.06</td>
</tr>
<tr>
<td valign="top" align="left">Glu</td>
<td valign="top" align="left">E</td>
<td valign="top" align="center">214</td>
<td valign="top" align="center">62</td>
<td valign="top" align="char" char=".">3.45</td>
<td valign="top" align="char" char=".">3.33</td>
</tr>
<tr>
<td valign="top" align="left">Lys</td>
<td valign="top" align="left">K</td>
<td valign="top" align="center">204</td>
<td valign="top" align="center">42</td>
<td valign="top" align="char" char=".">4.86</td>
<td valign="top" align="char" char=".">3.18</td>
</tr>
<tr>
<td valign="top" align="left">Asp</td>
<td valign="top" align="left">D</td>
<td valign="top" align="center">189</td>
<td valign="top" align="center">64</td>
<td valign="top" align="char" char=".">2.95</td>
<td valign="top" align="char" char=".">2.94</td>
</tr>
<tr>
<td valign="top" align="left">Leu</td>
<td valign="top" align="left">L</td>
<td valign="top" align="center">146</td>
<td valign="top" align="center">45</td>
<td valign="top" align="char" char=".">3.24</td>
<td valign="top" align="char" char=".">2.27</td>
</tr>
<tr>
<td valign="top" align="left">Gln</td>
<td valign="top" align="left">Q</td>
<td valign="top" align="center">114</td>
<td valign="top" align="center">61</td>
<td valign="top" align="char" char=".">1.87</td>
<td valign="top" align="char" char=".">1.77</td>
</tr>
<tr>
<td valign="top" align="left">Met</td>
<td valign="top" align="left">M</td>
<td valign="top" align="center">85</td>
<td valign="top" align="center">24</td>
<td valign="top" align="char" char=".">3.54</td>
<td valign="top" align="char" char=".">1.32</td>
</tr>
<tr>
<td valign="top" align="left">His</td>
<td valign="top" align="left">H</td>
<td valign="top" align="center">84</td>
<td valign="top" align="center">24</td>
<td valign="top" align="char" char=".">3.5</td>
<td valign="top" align="char" char=".">1.31</td>
</tr>
<tr>
<td valign="top" align="left">Ile</td>
<td valign="top" align="left">I</td>
<td valign="top" align="center">65</td>
<td valign="top" align="center">33</td>
<td valign="top" align="char" char=".">1.97</td>
<td valign="top" align="char" char=".">1.01</td>
</tr>
<tr>
<td valign="top" align="left">Asn</td>
<td valign="top" align="left">N</td>
<td valign="top" align="center">62</td>
<td valign="top" align="center">35</td>
<td valign="top" align="char" char=".">1.77</td>
<td valign="top" align="char" char=".">0.97</td>
</tr>
<tr>
<td valign="top" align="left">Arg</td>
<td valign="top" align="left">R</td>
<td valign="top" align="center">48</td>
<td valign="top" align="center">22</td>
<td valign="top" align="char" char=".">2.18</td>
<td valign="top" align="char" char=".">0.75</td>
</tr>
<tr>
<td valign="top" align="left">Tyr</td>
<td valign="top" align="left">Y</td>
<td valign="top" align="center">39</td>
<td valign="top" align="center">18</td>
<td valign="top" align="char" char=".">2.17</td>
<td valign="top" align="char" char=".">0.61</td>
</tr>
<tr>
<td valign="top" align="left">Phe</td>
<td valign="top" align="left">F</td>
<td valign="top" align="center">31</td>
<td valign="top" align="center">21</td>
<td valign="top" align="char" char=".">1.48</td>
<td valign="top" align="char" char=".">0.48</td>
</tr>
<tr>
<td valign="top" align="left">Cys</td>
<td valign="top" align="left">C</td>
<td valign="top" align="center">12</td>
<td valign="top" align="center">4</td>
<td valign="top" align="center">3</td>
<td valign="top" align="char" char=".">0.19</td>
</tr>
<tr>
<td valign="top" align="left">Trp</td>
<td valign="top" align="left">W</td>
<td valign="top" align="center">7</td>
<td valign="top" align="center">4</td>
<td valign="top" align="char" char=".">1.75</td>
<td valign="top" align="char" char=".">0.11</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="TN1">
<label>a</label>
<p><italic>The order of 20 amino acids was displayed according to their total number from high to low</italic>.</p></fn>
<fn id="TN2">
<label>b</label>
<p><italic>The sum of given amino acid in 87 processed sequences</italic>.</p></fn>
<fn id="TN3">
<label>c</label>
<p><italic>The value was calculated by using the total number of given amino acid divide the number of sequences with given amino acid</italic>.</p></fn>
<fn id="TN4">
<label>d</label>
<p><italic>The value was calculated by using total number of given amino acid divide total number of amino acids in 87 processed sequences</italic>.</p></fn>
</table-wrap-foot>
</table-wrap>
<p>Moreover, the notion of glycomodules was used to characterize AGP-like sequences, namely dipeptides like Ala-Pro, Ser-Pro, and Thr-Pro were present in many AGPs. In consideration of this, we analyzed distribution patterns of Pro and other amino acid residues in 87 AGP-like sequences. In other words, we counted the number of X-Pro (X represented any other amino acid except for Pro) in these sequences. The glycomodules like Ala-Pro, Ser-Pro, and Thr-Pro were much more abundant than the other 16 composition modes (X-Pro, X represented any other amino acid except for Pro, Ala, Ser, and Thr) and accounted for 81.06% of non-contiguous Pro residues (Table <xref ref-type="table" rid="T2">2</xref>). For the other 16 composition modes, we further counted the number of Ala, Ser, and Thr after Pro (e.g., Gly-Pro-Ala, Gly-Pro-Ser, Gly-Pro-Thr; Supplementary Table <xref ref-type="supplementary-material" rid="SM4">4</xref>). The number of Pro-Ala, Pro-Ser, and Pro-Thr was 126 in total and accounted for 9.14% of non-contiguous Pro residues. The specific arrangements of Pro, Ala, Ser, and Thr successfully characterized AGP-like sequences, namely the glycomodules Ala-Pro, Pro-Ala, Ser-Pro, Pro-Ser, Thr-Pro, and Pro-Thr represented 90.20% of non-contiguous Pro residues in 87 AGP-like sequences. Thus, the method of glycomodule counts could be another important indicator to identify AGP-like sequences.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p><bold>Glycomodule counts in 87 AGP-like sequences</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Glycomodule<xref ref-type="table-fn" rid="TN5"><sup>a</sup></xref></bold></th>
<th valign="top" align="center"><bold>Total number<xref ref-type="table-fn" rid="TN6"><sup>b</sup></xref></bold></th>
<th valign="top" align="center"><bold>Frequency of given glycomodule in 87 sequences</bold></th>
<th valign="top" align="center"><bold>Average number of given glycomodule per sequence<xref ref-type="table-fn" rid="TN7"><sup>c</sup></xref></bold></th>
<th valign="top" align="center"><bold>Percentage of given glycomodule in all amino acids (%)<xref ref-type="table-fn" rid="TN8"><sup>d</sup></xref></bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">AP</td>
<td valign="top" align="center">547</td>
<td valign="top" align="center">87</td>
<td valign="top" align="char" char=".">6.29</td>
<td valign="top" align="char" char=".">39.70</td>
</tr>
<tr>
<td valign="top" align="left">SP</td>
<td valign="top" align="center">383</td>
<td valign="top" align="center">61</td>
<td valign="top" align="char" char=".">6.28</td>
<td valign="top" align="char" char=".">27.79</td>
</tr>
<tr>
<td valign="top" align="left">TP</td>
<td valign="top" align="center">187</td>
<td valign="top" align="center">44</td>
<td valign="top" align="char" char=".">4.25</td>
<td valign="top" align="char" char=".">13.57</td>
</tr>
<tr>
<td valign="top" align="left">GP</td>
<td valign="top" align="center">73</td>
<td valign="top" align="center">40</td>
<td valign="top" align="char" char=".">1.83</td>
<td valign="top" align="char" char=".">5.30</td>
</tr>
<tr>
<td valign="top" align="left">VP</td>
<td valign="top" align="center">43</td>
<td valign="top" align="center">27</td>
<td valign="top" align="char" char=".">1.59</td>
<td valign="top" align="char" char=".">3.12</td>
</tr>
<tr>
<td valign="top" align="left">LP</td>
<td valign="top" align="center">30</td>
<td valign="top" align="center">18</td>
<td valign="top" align="char" char=".">1.67</td>
<td valign="top" align="char" char=".">2.18</td>
</tr>
<tr>
<td valign="top" align="left">EP</td>
<td valign="top" align="center">20</td>
<td valign="top" align="center">17</td>
<td valign="top" align="char" char=".">1.18</td>
<td valign="top" align="char" char=".">1.45</td>
</tr>
<tr>
<td valign="top" align="left">KP</td>
<td valign="top" align="center">19</td>
<td valign="top" align="center">12</td>
<td valign="top" align="char" char=".">1.58</td>
<td valign="top" align="char" char=".">1.38</td>
</tr>
<tr>
<td valign="top" align="left">MP</td>
<td valign="top" align="center">15</td>
<td valign="top" align="center">4</td>
<td valign="top" align="char" char=".">3.75</td>
<td valign="top" align="char" char=".">1.09</td>
</tr>
<tr>
<td valign="top" align="left">IP</td>
<td valign="top" align="center">12</td>
<td valign="top" align="center">10</td>
<td valign="top" align="char" char=".">1.20</td>
<td valign="top" align="char" char=".">0.87</td>
</tr>
<tr>
<td valign="top" align="left">QP</td>
<td valign="top" align="center">12</td>
<td valign="top" align="center">9</td>
<td valign="top" align="char" char=".">1.33</td>
<td valign="top" align="char" char=".">0.87</td>
</tr>
<tr>
<td valign="top" align="left">NP</td>
<td valign="top" align="center">9</td>
<td valign="top" align="center">9</td>
<td valign="top" align="char" char=".">1.00</td>
<td valign="top" align="char" char=".">0.65</td>
</tr>
<tr>
<td valign="top" align="left">DP</td>
<td valign="top" align="center">7</td>
<td valign="top" align="center">5</td>
<td valign="top" align="char" char=".">1.40</td>
<td valign="top" align="char" char=".">0.51</td>
</tr>
<tr>
<td valign="top" align="left">YP</td>
<td valign="top" align="center">6</td>
<td valign="top" align="center">5</td>
<td valign="top" align="char" char=".">1.20</td>
<td valign="top" align="char" char=".">0.44</td>
</tr>
<tr>
<td valign="top" align="left">RP</td>
<td valign="top" align="center">5</td>
<td valign="top" align="center">3</td>
<td valign="top" align="char" char=".">1.67</td>
<td valign="top" align="char" char=".">0.36</td>
</tr>
<tr>
<td valign="top" align="left">FP</td>
<td valign="top" align="center">4</td>
<td valign="top" align="center">4</td>
<td valign="top" align="char" char=".">1.00</td>
<td valign="top" align="char" char=".">0.29</td>
</tr>
<tr>
<td valign="top" align="left">WP</td>
<td valign="top" align="center">3</td>
<td valign="top" align="center">1</td>
<td valign="top" align="char" char=".">3.00</td>
<td valign="top" align="char" char=".">0.22</td>
</tr>
<tr>
<td valign="top" align="left">CP</td>
<td valign="top" align="center">3</td>
<td valign="top" align="center">1</td>
<td valign="top" align="char" char=".">3.00</td>
<td valign="top" align="char" char=".">0.22</td>
</tr>
<tr>
<td valign="top" align="left">HP</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
<td valign="top" align="char" char=".">0.00</td>
<td valign="top" align="char" char=".">0.00</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="TN5">
<label>a</label>
<p><italic>The order of 19 putative glycomodule was displayed according to their total number from high to low</italic>.</p></fn>
<fn id="TN6">
<label>b</label>
<p><italic>The sum of given glycomodule in 87 processed sequences</italic>.</p></fn>
<fn id="TN7">
<label>c</label>
<p><italic>The value was calculated by using the total number of given glycomodule divide the number of sequences with given glycomodule</italic>.</p></fn>
<fn id="TN8">
<label>d</label>
<p><italic>The value was calculated by using total number of given glycomodule divide total number of all putative glycomodules in 87 processed sequences</italic>.</p></fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec>
<title>The variables of glycomodule index and partial PAST percentage in identifying AGPs</title>
<p>The definition of AGPs is that glycomodules such as Ala-Pro, Ser-Pro, and Thr-Pro are distributed throughout the sequence and non-contiguous Pro is interspaced by no more than 11 amino acid residues (Schultz et al., <xref ref-type="bibr" rid="B51">2002</xref>). The major differences between classical AGPs, AG-peptides, and chimeric AGPs are the length of AGP-like sequences and the number of glycomodules. Similar to the definition of AGPs in previous studies, we defined that the sequence with glycomodules spaced by no more than 10 amino acid residues might be effectively glycosylated by AG. In our opinion, the criterion that protein sequences with only two glycomodules was less likely to discriminate AGPs and other proteins. For instance, there were more than half of <italic>Arabidopsis</italic> proteins (i.e., 15,851 of 27,416) and nearly two-thirds of rice proteins (i.e., 26,618 of 39,049) have at least two glycomodules. Also, because the shortest AGP-like sequences found in AG-peptides usually had three glycomodules, we defined that the AGP-like sequences consisted of at least three glycomodules. A notion designated as the glycomodule index (GlycoIndex) was proposed in this study to represent the enrichment of glycomodules in AGP-like sequences with variable length. The GlycoIndex could be calculated as the ratio of the number of glycomodules to the length of AGP-like sequence, and the beginning and end of AGP-like sequences were both glycomodules. For example, the AGP-like sequence of AtAGP1 is &#x0201C;SPAPAPSNVGGRRISPAPSPKKMTAPAPAPEVSPSPSPAAALTPESSASPPSPPLADSPTADSPALSPSAISDSPTEAPGPA,&#x0201D; therefore, the GlycoIndex is 0.26 (21/82, glycomodule number/sequence length). If there were two or more AGP-like sequences in one protein, these AGP-like sequences were joined together into one sequence which was then regarded as the representative AGP-like sequence of the corresponding protein.</p>
<p>Moreover, a total of 325 known AGPs were collected to illustrate the method of identifying AGPs. A Python script named &#x0201C;Finding-AGP&#x0201D; was written to extract AGP-like sequences (i.e., at least three glycomodules interspaced by no more than ten amino acid residues). We found that 15 of the 325 known AGPs were not in accordance with our definition of AGPs that contained at least three glycomodules interspaced by no more than ten amino acid residues, including one non-classical AGP, two AG-peptides, three FLA, four PAG, and five XYLP. Consequently, the statistical analyses of a total of 310 known AGPs were conducted to find an effective strategy and searching criteria for identifying AGPs (Supplementary Table <xref ref-type="supplementary-material" rid="SM5">5</xref>). Generally, the seven variables, total length (Length<sub>T</sub>), total PAST percentage (PAST<sub>T</sub>%), total glycomodule number (GlycoNo<sub>T</sub>) in whole protein sequences, and partial length (Length<sub>P</sub>), partial PAST percentage (PAST<sub>P</sub>%), partial glycomodule number (GlycoNo<sub>P</sub>), and GlycoIndex in AGP-like sequences, were incorporated in the Finding-AGP program (Table <xref ref-type="table" rid="T3">3</xref>, see Section Materials and Methods for details). A correlation analysis was performed to reveal the internal relationships among these seven variables and the correlation coefficients were calculated to the degree of correlation in pairs (Table <xref ref-type="table" rid="T4">4</xref>). The regression coefficient analysis showed that very strong correlations (|<italic>r</italic>| &#x0003E; 0.8) were found between GlycoNo<sub>T</sub> and GlycoNo<sub>P</sub> (<italic>r</italic> &#x0003D; 0.962), between GlycoNo<sub>P</sub> and Length<sub>P</sub> (<italic>r</italic> &#x0003D; 0.961), and between GlycoNo<sub>T</sub>and Length<sub>P</sub> (<italic>r</italic> &#x0003D; 0.935), indicating that most glycomodules were located in AGP-like sequences and the increase of glycomodules was positively correlated with the length of AGP-like sequences. Furthermore, the efficiencies of variables Length<sub>P</sub> and GlycoNo<sub>P</sub> in identifying AGPs were mostly the same as the GlycoNo<sub>T</sub> that was formerly proposed as a glycomodule count by Showalter et al. (<xref ref-type="bibr" rid="B60">2010</xref>). Interestingly, degrees of correlation were low between GlycoIndex and all other variables (|<italic>r</italic>| &#x0003C; 0.5) except PAST<sub>P</sub>% (<italic>r</italic> &#x0003D; 0.724), indicating that the variables GlycoIndex and PAST<sub>P</sub>% could identify AGPs which were not covered by the variables Length<sub>T</sub> and PAST<sub>T</sub>%. In other words, a large number of AGPs (especially chimeric AGPs) with a high GlycoIndex in the AGP-like sequences could be effective in identifying AGPs with low PAST<sub>T</sub>%. Meanwhile, the PAST<sub>P</sub>% of AGP-like sequences was higher than PAST<sub>T</sub>% because the enrichment of glycomodules in AGP-like sequence consequentially led to an increase of PAST percentage.</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p><bold>Seven variables used in identifying AGPs</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Variable name</bold></th>
<th valign="top" align="left"><bold>Abbreviations</bold></th>
<th valign="top" align="left"><bold>Method of calculation</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Total length</td>
<td valign="top" align="left">Length<sub>T</sub></td>
<td valign="top" align="left">The length of whole protein sequence</td>
</tr>
<tr>
<td valign="top" align="left">Total PAST percentage</td>
<td valign="top" align="left">PAST<sub>T</sub>%</td>
<td valign="top" align="left">The percentage of Pro, Ala, Ser, and Thr in whole protein sequence</td>
</tr>
<tr>
<td valign="top" align="left">Total glycomodule number</td>
<td valign="top" align="left">GlycoNo<sub>T</sub></td>
<td valign="top" align="left">The number of Ala-Pro, Pro-Ala, Ser-Pro, Pro-Ser, Thr-Pro, and Pro-Thr in whole protein sequence</td>
</tr>
<tr>
<td valign="top" align="left">Partial Length</td>
<td valign="top" align="left">Length<sub>P</sub></td>
<td valign="top" align="left">The length of AGP-like sequence</td>
</tr>
<tr>
<td valign="top" align="left">Partial PAST percentage</td>
<td valign="top" align="left">PAST<sub>P</sub>%</td>
<td valign="top" align="left">The percentage of Pro, Ala, Ser, and Thr in AGP-like sequence</td>
</tr>
<tr>
<td valign="top" align="left">Partial glycomodule number</td>
<td valign="top" align="left">GlycoNo<sub>P</sub></td>
<td valign="top" align="left">The number of Ala-Pro, Pro-Ala, Ser-Pro, Pro-Ser, Thr-Pro, and Pro-Thr in AGP-like sequence</td>
</tr>
<tr>
<td valign="top" align="left">Glycomodule index</td>
<td valign="top" align="left">GlycoIndex</td>
<td valign="top" align="left">The ratio of GlycoNo<sub>P</sub> and Length<sub>P</sub></td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p><bold>Correlation analysis of seven variables in 310 known AGPs with at least three glycomodules</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center"><bold>Length<sub>T</sub></bold></th>
<th valign="top" align="center"><bold>PAST<sub>T</sub>%</bold></th>
<th valign="top" align="center"><bold>GlycoNo<sub>T</sub></bold></th>
<th valign="top" align="center"><bold>Length<sub>P</sub></bold></th>
<th valign="top" align="center"><bold>PAST<sub>P</sub>%</bold></th>
<th valign="top" align="center"><bold>GlycoNo<sub>P</sub></bold></th>
<th valign="top" align="center"><bold>GlycoIndex</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Length<sub>T</sub></td>
<td valign="top" align="center">1</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">PAST<sub>T</sub>%</td>
<td valign="top" align="char" char=".">&#x02212;0.33<xref ref-type="table-fn" rid="TN9"><sup>&#x0002A;&#x0002A;</sup></xref></td>
<td valign="top" align="center">1</td>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">GlycoNo<sub>T</sub></td>
<td valign="top" align="char" char=".">0.47<xref ref-type="table-fn" rid="TN9"><sup>&#x0002A;&#x0002A;</sup></xref></td>
<td valign="top" align="char" char=".">0.49<xref ref-type="table-fn" rid="TN9"><sup>&#x0002A;&#x0002A;</sup></xref></td>
<td valign="top" align="center">1</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">Length<sub>P</sub></td>
<td valign="top" align="char" char=".">0.29<xref ref-type="table-fn" rid="TN9"><sup>&#x0002A;&#x0002A;</sup></xref></td>
<td valign="top" align="char" char=".">0.58<xref ref-type="table-fn" rid="TN9"><sup>&#x0002A;&#x0002A;</sup></xref></td>
<td valign="top" align="char" char=".">0.93<xref ref-type="table-fn" rid="TN9"><sup>&#x0002A;&#x0002A;</sup></xref></td>
<td valign="top" align="center">1</td>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">PAST<sub>P</sub>%</td>
<td valign="top" align="char" char=".">&#x02212;0.29<xref ref-type="table-fn" rid="TN9"><sup>&#x0002A;&#x0002A;</sup></xref></td>
<td valign="top" align="char" char=".">0.16<xref ref-type="table-fn" rid="TN9"><sup>&#x0002A;&#x0002A;</sup></xref></td>
<td valign="top" align="char" char=".">&#x02212;0.13</td>
<td valign="top" align="char" char=".">&#x02212;0.20<xref ref-type="table-fn" rid="TN9"><sup>&#x0002A;&#x0002A;</sup></xref></td>
<td valign="top" align="center">1</td>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">GlycoNo<sub>P</sub></td>
<td valign="top" align="char" char=".">0.27<xref ref-type="table-fn" rid="TN9"><sup>&#x0002A;&#x0002A;</sup></xref></td>
<td valign="top" align="char" char=".">0.59<xref ref-type="table-fn" rid="TN9"><sup>&#x0002A;&#x0002A;</sup></xref></td>
<td valign="top" align="char" char=".">0.96<xref ref-type="table-fn" rid="TN9"><sup>&#x0002A;&#x0002A;</sup></xref></td>
<td valign="top" align="char" char=".">0.96<xref ref-type="table-fn" rid="TN9"><sup>&#x0002A;&#x0002A;</sup></xref></td>
<td valign="top" align="char" char=".">&#x02212;0.07</td>
<td valign="top" align="center">1</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">GlycoIndex</td>
<td valign="top" align="char" char=".">&#x02212;0.15</td>
<td valign="top" align="char" char=".">&#x02212;0.16<xref ref-type="table-fn" rid="TN9"><sup>&#x0002A;&#x0002A;</sup></xref></td>
<td valign="top" align="char" char=".">&#x02212;0.21<xref ref-type="table-fn" rid="TN9"><sup>&#x0002A;&#x0002A;</sup></xref></td>
<td valign="top" align="char" char=".">&#x02212;0.36<xref ref-type="table-fn" rid="TN9"><sup>&#x0002A;&#x0002A;</sup></xref></td>
<td valign="top" align="char" char=".">0.72<xref ref-type="table-fn" rid="TN9"><sup>&#x0002A;&#x0002A;</sup></xref></td>
<td valign="top" align="char" char=".">&#x02212;0.20<xref ref-type="table-fn" rid="TN9"><sup>&#x0002A;&#x0002A;</sup></xref></td>
<td valign="top" align="center">1</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="TN9">
<label>&#x0002A;&#x0002A;</label>
<p><italic>Correlation is significant at the 0.01 level (2-tailed)</italic>.</p></fn>
</table-wrap-foot>
</table-wrap>
<p>Typically, four AGPs from <italic>Arabidopsis</italic> were selected to demonstrate the applicability of the GlycoIndex feature (Figure <xref ref-type="fig" rid="F1">1</xref>), including AtAGP1 (classical), AtAGP57 (classical), AtAGP12 (AG-peptide), and AtFLA3 (chimeric FLA). The AGP-like sequences of these four AGPs were first obtained and then the PAST<sub>T</sub>%, GlycoIndex, and PAST<sub>P</sub>% were also calculated. The frequently used threshold of identifying AGPs is more than 50% PAST<sub>T</sub> which could only screen out AtAGP1 (59.54%). If the PAST<sub>T</sub>% was reduced to 35% and at the same time the Length<sub>T</sub> was below 90, AG-peptides like AtAGP12 were easily identified. However, for AGPs with relatively low PAST<sub>T</sub>% and long Length<sub>T</sub> (e.g., AtAGP57 and AtFLA3), the PAST<sub>T</sub>% calculation method lost its usability. The GlycoIndex parameter seemed to be a universal feature of AGP-like sequences because these four selected AGPs had relatively high values from 0.22 to 0.26 and the only difference was the variable lengths of their AGP-like sequences. Meanwhile, the PAST<sub>P</sub>% of the AGP-like sequences uniformly arrived at high levels from 67.95 to 75%. Undoubtedly, the high levels of GlycoIndex and PAST<sub>P</sub>% could effectively screen out AGPs with low PAST<sub>T</sub>% (from 38.75 to 43.33%) even if these proteins were variable in Length<sub>T</sub> and belonging to different subfamilies. Therefore, the variables GlycoIndex and PAST<sub>P</sub>% were used to screen for AGP candidates in this study.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p><bold>Illustration of the variables glycomodule index and partial PAST percentage of representative <italic><bold>Arabidopsis</bold></italic> AGPs</bold>. Red colored sequences indicate putative AG glycomodules (Ala-Pro, Pro-Ala, Ser-Pro, Pro-Ser, Thr-Pro, and Pro-Thr) and blue colored sequences indicate putative EXT glycomodules (Ser-Pro<sub>2&#x02212;4</sub>). Total PAST percentage (PAST<sub>T</sub>%) and partial PAST percentage (PAST<sub>P</sub>%) represent the PAST percentage in whole protein sequences and AGP-like sequences, respectively. GlycoIndex, glycomodule index.</p></caption>
<graphic xlink:href="fpls-08-00066-g0001.tif"/>
</fig>
</sec>
<sec>
<title>Strategy and criteria of identifying AGPs</title>
<p>Based on variable length and the presence of signal peptide and conserved domains of AGPs, an integrated strategy was proposed to identify whole gene families (Figure <xref ref-type="fig" rid="F2">2</xref>). First, because the overwhelming majority of AGPs (295 of 310, 95.16%) were predicted to be secreted to the endoplasmic reticulum for post-translational modification by SignalP4.1 when the cutoff value was set to sensitive mode, a dichotomic variable concerning the existence of signal peptides (i.e., whether there was a signal peptide or not) was proposed to be a necessary requirement of AGP prediction in Strategy 1. To identify as many AGP candidates as possible, especially for AGPs without signal peptides, homologous proteins (Strategy 2) of known AGPs were obtained by using the protein utility of the Basic Local Alignment Search Tool (BLASTP; cutoff value &#x0003D; <italic>e</italic><sup>&#x02212;3</sup>). Then, we removed the BLAST search hits from the results of signal peptide prediction and only retained sequences that were not homologous to the known AGPs in Strategy 1.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p><bold>Schematic workflow of the integrated strategy in identifying AGPs</bold>. Strategy 1 (signal peptide filtration), which includes three groups, is mainly for the identifications of AGPs with relatively high PAST<sub>T</sub>% in given length ranges. The operation of Group 1 identifies AG-peptides. Group 2 and 3 can identify classical AGPs, Lys-rich classical AGPs, AGP-extensin hybrids, and non-classical AGPs. Strategy 2 (BLAST searches) is mainly for the identifications of chimeric AGPs (FLAs, PAGs, and XYLPs) and any other AGPs with high homology to known AGPs. The PAST<sub>P</sub>% and GlycoIndex are used to get PAST-rich AGP-like sequences with glycomodules throughout. The GlycoNo<sub>P</sub> is also defined as five for both Group 2 and 3. The other types of HRGPs (mainly EXTs and PRPs) are removed and the remaining AGP candidates are classified into classical AGPs, Lys-rich classical AGPs, AGP-extensin hybrid, AG-peptides, chimeric AGPs (FLA, PAG, and XYLP), and other chimeric AGPs (non-classical AGPs are deemed as members belong to other types of chimeric AGPs).</p></caption>
<graphic xlink:href="fpls-08-00066-g0002.tif"/>
</fig>
<p>The BLASTP search could identify all subfamily members belong to FLA, PAG, XYLP, and other AGP candidates with high homology to known AGPs, the signal peptide filtration mainly identified AGPs with low homology, including classical AGPs (C), Lys-rich classical AGPs (KC), AGP-extensin hybrids (HAE), AG-peptide (Pep), and non-classical AGPs (NC). Obviously, the subfamilies of C, KC, and HAE with similar statistical distribution could be treated as a whole; especially as the average values of PAST<sub>T</sub>%, PAST<sub>P</sub>%, GlycoNo<sub>T</sub>, GlycoNo<sub>P</sub>, and Length<sub>P</sub> were higher than any other subfamilies (Supplementary Table <xref ref-type="supplementary-material" rid="SM5">5</xref>). Subfamily members of Pep were different from any other subfamily because of their short length (Supplementary Figure <xref ref-type="supplementary-material" rid="SM11">1</xref>). Therefore, the threshold (55 &#x02264; LengthT &#x0003C; 90) was used to obtain Pep, and for AGPs longer than 90 amino acids, the threshold (PAST<sub>T</sub>% &#x02265; 42%) was used to discriminate C, KC, HAE, and NC with high PAST<sub>T</sub>% (Supplementary Figure <xref ref-type="supplementary-material" rid="SM11">1</xref>). As a result, the remaining AGPs that were &#x0003C;42% PAST<sub>T</sub> in Strategy 1 all belonged to the NC subfamily. To sum up, the results of signal peptide filtration were divided into three groups, namely the Pep subfamily was Group 1, the subfamilies of C, KC, HAE, and NC with high PAST<sub>T</sub>% were Group 2, and the NC with low PAST<sub>T</sub>% was Group 3. Moreover, the thresholds of the two variables including GlycoIndex and PAST<sub>P</sub>% were determined by comparing the efficiencies of these variables in identifying known AGPs (Table <xref ref-type="table" rid="T5">5</xref>). The thresholds of GlycoIndex and PAST<sub>P</sub>% in identifying Pep were 0.15 and 60%, respectively. In order to obtain a strict screening threshold that could filter negative results and retain positive results at the same time, the thresholds of GlycoIndex and PAST<sub>P</sub>% in identifying C, KC, HAE, and NC (PAST<sub>T</sub>% &#x02265; 42) were set as 0.15 and 55%, respectively. Under these thresholds, AtAGP52 and ZmHRA1 were not detected. For the rest of the NC (PAST<sub>T</sub>% &#x0003C; 42%), the thresholds of GlycoIndex and PAST<sub>P</sub>% were 0.20 and 60%, respectively. Generally speaking, the glycomodules distributed throughout the AGP-like sequences of C, KC, HAE, and NC, thus the variable GlycoNo<sub>P</sub> was determined to be greater than or equal to five (all AGPs in Groups 2 and 3 were greater than or equal to five glycomodules except AtAGP28). The minimums of GlycoIndex (0.13) and PAST<sub>P</sub>% (45%) were then used to identify subfamily members of FLA, PAG, XYLP, and homologs of known AGPs. Finally, the resultant AGP candidates were uploaded to the NCBI Batch CD-Search Tool for annotating conserved domains. EXTs and PRPs were removed according to the descriptions in Section in Materials and Methods. Subfamily classifications were performed according to the protein length, distribution patterns of AG glycomodules, and annotations of conserved domains.</p>
<table-wrap position="float" id="T5">
<label>Table 5</label>
<caption><p><bold>The efficiencies of glycomodule index and partial PAST percentage in identifying known AGPs</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Group</bold></th>
<th valign="top" align="left"><bold>Subfamily</bold></th>
<th valign="top" align="center" colspan="4" style="border-bottom: thin solid #000000;"><bold>GlycoIndex</bold></th>
<th valign="top" align="center" colspan="4" style="border-bottom: thin solid #000000;"><bold>PAST</bold><sub><bold>P</bold></sub><bold>%</bold></th>
</tr>
<tr>
<th/>
<th/>
<th valign="top" align="center"><bold>0.10</bold></th>
<th valign="top" align="center"><bold>0.15</bold></th>
<th valign="top" align="center"><bold>0.20</bold></th>
<th valign="top" align="center"><bold>0.25</bold></th>
<th valign="top" align="center"><bold>45%</bold></th>
<th valign="top" align="center"><bold>50%</bold></th>
<th valign="top" align="center"><bold>55%</bold></th>
<th valign="top" align="center"><bold>60%</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1</td>
<td valign="top" align="left">Pep</td>
<td valign="top" align="center">38</td>
<td valign="top" align="center"><bold>38</bold><xref ref-type="table-fn" rid="TN10"><sup>a</sup></xref></td>
<td valign="top" align="center">37</td>
<td valign="top" align="center">29</td>
<td valign="top" align="center">38</td>
<td valign="top" align="center">38</td>
<td valign="top" align="center">38</td>
<td valign="top" align="center"><bold>38</bold></td>
</tr>
<tr>
<td valign="top" align="left">2</td>
<td valign="top" align="left">C, KC, HAE, and NC<xref ref-type="table-fn" rid="TN11"><sup>b</sup></xref></td>
<td valign="top" align="center">67</td>
<td valign="top" align="center"><bold>66</bold></td>
<td valign="top" align="center">53</td>
<td valign="top" align="center">23</td>
<td valign="top" align="center">67</td>
<td valign="top" align="center">66</td>
<td valign="top" align="center"><bold>66</bold></td>
<td valign="top" align="center">63</td>
</tr>
<tr>
<td valign="top" align="left">3</td>
<td valign="top" align="left">NC<xref ref-type="table-fn" rid="TN12"><sup>c</sup></xref></td>
<td valign="top" align="center">10</td>
<td valign="top" align="center">10</td>
<td valign="top" align="center"><bold>10</bold></td>
<td valign="top" align="center">7</td>
<td valign="top" align="center">10</td>
<td valign="top" align="center">10</td>
<td valign="top" align="center">10</td>
<td valign="top" align="center"><bold>10</bold></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="TN10">
<label>a</label>
<p><italic>Boldface indicates that the corresponding threshold is used in data screening of given group</italic>.</p></fn>
<fn id="TN11">
<label>b</label>
<p><italic>Non-classical AGPs (NC) with &#x02265; 42% total PAST percentage</italic>.</p></fn>
<fn id="TN12">
<label>c</label>
<p><italic>NC with &#x0003C; 42% total PAST percentage</italic>.</p></fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec>
<title>Identification of AGPs across plant genomes</title>
<p>To date, the whole genome sequences of many plant species have been released and annotated, which has enabled us to conduct bioinformatics identification of AGPs across plant genomes in one effort. After performing the consecutive operations mentioned above (Figure <xref ref-type="fig" rid="F2">2</xref>), AGP candidates were set apart from other proteins that did not meet the screening criteria (Supplementary Table <xref ref-type="supplementary-material" rid="SM6">6</xref>). In tomato (<italic>Solanum lycopersicum</italic>), for example, signal peptide filtration and BLAST search resulted in 4317 and 258 sequences, respectively. Then the redundant sequences of the previous two steps (216) were removed and the remaining 4101 sequences of signal peptide filtration were retained for further use. Moreover, these sequences were divided into three groups based on their Length<sub>T</sub> and PAST<sub>T</sub>%, and the parameters of the GlycoIndex and PAST<sub>P</sub>% were used to obtain AGP candidates. Specifically, the GlycoNo<sub>P</sub> was set as five for identifying C, KC, HAE, and NC. Consequently, there were 12, 45, and 62 AGP candidates in Group 1, 2, and 3, respectively, and 130 AGP candidates in the BLAST search. Finally, a total of 249 AGP candidates were submitted to NCBI Batch CD-search for identifying annotated conserved domains. AGP candidates belonging to chimeric AGPs without fasciclin-like, plastocyanin-like, and xylogen-like domains were termed as other types of chimeric AGPs (including NC). After removing proteins belonging to EXTs and PRPs, the remaining sequences were termed as putative AGPs and mainly classified into five subfamilies, including Pep, C, KC, HAE, and chimeric AGPs.</p>
<p>As a result, a total of 7216 putative AGPs were identified from 47 selected plant species, including 734 C, 111 KC, 597 HAE, 148 Pep, 1506 PAG, 1047 FLA, 954 XYLP, and 2092 other types of chimeric AGPs (Figure <xref ref-type="fig" rid="F3">3</xref> and Supplementary Table <xref ref-type="supplementary-material" rid="SM7">7</xref>). The number of AGPs in most plant species ranged from 100 to 200 and the number of AGPs in nine species was between 200 and 300. There were only two species that contained more than 300 AGPs (313 in <italic>Glycine max</italic> and 306 in <italic>Zea mays</italic>). The number of AGPs in seven species was &#x0003C;100. It was noteworthy that the AGP family members of monocots varied largely (e.g., the number of AGPs in <italic>Sorghum bicolor</italic>, <italic>Z. mays</italic>, and <italic>O. sativa</italic> was about three times more than in <italic>Hordeum vulgare</italic>). Moreover, we also found that classical AGPs, FLA, PAG, and XYLP could be found in all selected species except that classical AGPs and XYLP were absent in <italic>Chlamydomonas reinhardtii</italic>. Additionally, Lys-rich classical AGPs were found in all angiosperms except for several monocots (Figure <xref ref-type="fig" rid="F3">3</xref>). In particular, most AGPs (153 of 159) identified in <italic>C. reinhardtii</italic> were HAE and other types of chimeric AGPs. The subfamily of classical AGPs, AG-peptides, and Lys-rich classical AGPs first emerged in <italic>Physcomitrella patens, S. moellendorffii</italic>, and <italic>Picea abies</italic>, respectively.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p><bold>Distribution of AGP genes across different plant lineages</bold>. The copy number of each AGP subfamily is indicated at the top. The species tree was constructed based on the three representations of the species in Phytozome (<ext-link ext-link-type="uri" xlink:href="https://phytozome.jgi.doe.gov/pz/portal.html">https://phytozome.jgi.doe.gov/pz/portal.html</ext-link>) and then referred to the taxonomy tree in NCBI (<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/guide/taxonomy/">http://www.ncbi.nlm.nih.gov/guide/taxonomy/</ext-link>). The horizontal bar indicates the total number of AGP genes in each plant species.</p></caption>
<graphic xlink:href="fpls-08-00066-g0003.tif"/>
</fig>
</sec>
<sec>
<title>Exploration of new subfamilies belong to chimeric AGPs</title>
<p>A large number of chimeric AGPs classified as other types were obtained by the integrated screening method of AGP prediction (Supplementary Tables <xref ref-type="supplementary-material" rid="SM7">7</xref>, <xref ref-type="supplementary-material" rid="SM8">8</xref>). Statistical analysis of conserved domains indicated that several new subfamilies could be analogous to already classified chimeric subfamilies: FLA, PAG, and XYLP (Table <xref ref-type="table" rid="T6">6</xref>). Typically, there were 454 chimeric AGPs with protein kinase (PK) domains in 46 species and 198 chimeric AGPs with formin homology 2 (FH2) domains in 43 species, respectively. Though the number of proteins in the other seven subfamilies was less than FLA, PAG, XYLP, and chimeric AGPs with FH2-like and PK-like domains, they exhibited a relatively higher occurrence rate than others. In general, it was found that nine kinds of chimeric AGPs existed in more than half of the 47 selected species (Table <xref ref-type="table" rid="T6">6</xref>). Therefore, these nine kinds of chimeric AGPs were regarded as new subfamilies of chimeric AGPs in this study, including (i) chimeric AGPs with protein kinase-like domains (PK-like), (ii) chimeric AGPs with formin homology 2-like domains (FH2-like), (iii) chimeric AGPs with glycosyl hydrolase-like domains (GH-like), (iv) chimeric AGPs with pollen Ole e I-like domains (POeI-like), (v) chimeric AGPs with leucine-rich repeats-like domains (LRR-like), (vi) chimeric AGPs with X8-like domains (X8-like), (vii) chimeric AGPs with pectin methylesterase inhibitor-like domains (PMEI-like), (viii) chimeric AGPs with pectate lyase-like domains (PCL-like), and (ix) chimeric AGPs with SGNH hydrolase-like domains (SGNH-like). To further investigate the possible roles of chimeric AGPs with PK-like and FH2-like domains, the transmembrane domains of them were predicted on TMHMM Server. It was found that 93.99% (422 of 449) PK-like and 86.36% (171 of 198) FH2-like were predicted to be having at least one transmembrane motif in the middle of given sequence (Supplementary Table <xref ref-type="supplementary-material" rid="SM7">7</xref>), which were compatible with their main functional aspects, such as sensing extracellular signals and transducing them into the cytoplasm.</p>
<table-wrap position="float" id="T6">
<label>Table 6</label>
<caption><p><bold>Summary of chimeric AGPs covering more than half species</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Chimeric AGPs<xref ref-type="table-fn" rid="TN13"><sup>a</sup></xref></bold></th>
<th valign="top" align="left"><bold>Abbreviations</bold></th>
<th valign="top" align="center"><bold>Total<xref ref-type="table-fn" rid="TN14"><sup>b</sup></xref></bold></th>
<th valign="top" align="center"><bold>Frequency<xref ref-type="table-fn" rid="TN15"><sup>c</sup></xref></bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Fasciclin-like</td>
<td valign="top" align="left">FLA</td>
<td valign="top" align="center">1076</td>
<td valign="top" align="center">47/47</td>
</tr>
<tr>
<td valign="top" align="left">Phytocyanin-like</td>
<td valign="top" align="left">PAG</td>
<td valign="top" align="center">1506</td>
<td valign="top" align="center">47/47</td>
</tr>
<tr>
<td valign="top" align="left">Xylogen-like</td>
<td valign="top" align="left">XYLP</td>
<td valign="top" align="center">957</td>
<td valign="top" align="center">46/47</td>
</tr>
<tr>
<td valign="top" align="left">Protein kinase-like<xref ref-type="table-fn" rid="TN16"><sup>d</sup></xref></td>
<td valign="top" align="left">PK</td>
<td valign="top" align="center">449</td>
<td valign="top" align="center">46/47</td>
</tr>
<tr>
<td valign="top" align="left">Formin homology 2-like</td>
<td valign="top" align="left">FH2</td>
<td valign="top" align="center">198</td>
<td valign="top" align="center">43/47</td>
</tr>
<tr>
<td valign="top" align="left">Glycosyl hydrolase-like<xref ref-type="table-fn" rid="TN17"><sup>e</sup></xref></td>
<td valign="top" align="left">GH</td>
<td valign="top" align="center">83</td>
<td valign="top" align="center">41/47</td>
</tr>
<tr>
<td valign="top" align="left">Pollen Ole e I-like<xref ref-type="table-fn" rid="TN18"><sup>f</sup></xref></td>
<td valign="top" align="left">POeI</td>
<td valign="top" align="center">98</td>
<td valign="top" align="center">40/47</td>
</tr>
<tr>
<td valign="top" align="left">Leucine-rich repeats-like<xref ref-type="table-fn" rid="TN19"><sup>g</sup></xref></td>
<td valign="top" align="left">LRR</td>
<td valign="top" align="center">62</td>
<td valign="top" align="center">35/47</td>
</tr>
<tr>
<td valign="top" align="left">X8-like</td>
<td valign="top" align="left">X8</td>
<td valign="top" align="center">75</td>
<td valign="top" align="center">32/47</td>
</tr>
<tr>
<td valign="top" align="left">Pectin methylesterase inhibitor-like</td>
<td valign="top" align="left">PMEI</td>
<td valign="top" align="center">51</td>
<td valign="top" align="center">30/47</td>
</tr>
<tr>
<td valign="top" align="left">Pectate lyase-like</td>
<td valign="top" align="left">PCL</td>
<td valign="top" align="center">61</td>
<td valign="top" align="center">28/47</td>
</tr>
<tr>
<td valign="top" align="left">SGNH hydrolase-like</td>
<td valign="top" align="left">SGNH</td>
<td valign="top" align="center">49</td>
<td valign="top" align="center">26/47</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="TN13">
<label>a</label>
<p><italic>Boldface indicates chimeric subfamilies that were previously identified in Arabidopsis and rice</italic>.</p></fn>
<fn id="TN14">
<label>b</label>
<p><italic>Total number of each subfamily from 47 plant species</italic>.</p></fn>
<fn id="TN15">
<label>c</label>
<p><italic>The frequency represents species number with more than one subfamily member</italic>.</p></fn>
<fn id="TN16">
<label>d</label>
<p><italic>Chimeric AGPs with protein kinase-like domains include three types of protein kinases with conserved domains as STK_BAK1_like, STKc_IRAK, and PKc_like</italic>.</p></fn>
<fn id="TN17">
<label>e</label>
<p><italic>Chimeric AGPs with glycosyl hydrolase-like domains include eight types of glycosyl hydrolases</italic>.</p></fn>
<fn id="TN18">
<label>f</label>
<p><italic>Several members of chimeric AGPs with Pollen Ole e I-like domains were previously identified (AtAGP31, TTS1, and etc.)</italic>.</p></fn>
<fn id="TN19">
<label>g</label>
<p><italic>Chimeric AGPs with leucine-rich repeats-like domains are proteins only possess LRR which are different from leucine-rich repeats receptor-like kinase (e.g., several members belong to chimeric AGPs with protein kinase-like domains)</italic>.</p></fn>
</table-wrap-foot>
</table-wrap>
<p>To confirm that our definitions of these new subfamilies were reliable, representative protein sequences of four chimeric subfamilies in rice (<italic>O. sativa</italic>) were selected to show the sequence characteristics (Supplementary Figure <xref ref-type="supplementary-material" rid="SM12">2</xref>), including PK-like (LOC_Os07g49240.1), FH2-like (LOC_Os02g50570.1), PCL-like (LOC_Os01g44970.1), and PMEI-like (LOC_Os07g14340.1). For chimeric subfamilies in <italic>O. sativa</italic>, amino acid length of members in FH2-like and PK-like subfamilies were longer than those of FLA, PAG, and XYLP. Most of the relatively longer conserved domains had a low PAST<sub>T</sub>% percentage (&#x0003C;30%) but a high PAST<sub>P</sub>% in their AGP-like sequences (&#x0003E;60%). The arrangement patterns of AG glycomodules in these AGPs were similar to known classical AGPs and chimeric AGPs (i.e., FLA, PAG, and XYLP). Typically, consecutive X-Pro repeats like &#x0201C;TPAPSPAPSPSP&#x0201D; was found in a chimeric with PK-like domain (LOC_Os07g49240.1) and there were also other discontinuous glycomodules interspaced by &#x0003C;10 amino acids. It was noteworthy that the GlycoIndex of these four AGPs reached universally high levels from 0.22 to 0.25. Similar glycomodule distribution patterns were also found in other chimeric AGPs not listed. All these sequences had signal peptides which indicated that they could be efficiently AG glycosylated like known AGP family members.</p>
</sec>
<sec>
<title>Amino acid composition and phylogeny of lys-rich classical AGPs</title>
<p>In some classical AGPs, short Lys-rich regions (approximately 10&#x0007E;13 amino acids) interspaced their PAST-rich regions, which were termed as Lys-rich classical AGPs (Sun et al., <xref ref-type="bibr" rid="B64">2005</xref>). For example, LeAGP1, the first Lys-rich classical AGP identified from tomato (<italic>S. lycopersicum</italic> formerly <italic>Lycopersicon esculentum</italic>), contained a short Lys-rich sequence &#x0201C;KGKVKGKKGKKHN&#x0201D; (Li and Showalter, <xref ref-type="bibr" rid="B33">1996</xref>). However, the Lys-rich region of AtAGP19 &#x0201C;KHKRKHKHKRHHH&#x0201D; was not only rich in Lys but also with a high percentage of His (i.e., five Lys, six His, and two Arg; Supplementary Table <xref ref-type="supplementary-material" rid="SM9">9</xref>). It was noteworthy that the other eight residues were basic amino acids which are polar and positively charged. Identification of Lys-rich AGPs in 47 plants made it possible to confirm whether the occurrence of His and Arg residues was a special case only found in AtAGP19. Therefore, we statistically analyzed the length of Lys-rich regions and the number of Lys, His, Arg, and other amino acid residues. As shown in Figure <xref ref-type="fig" rid="F3">3</xref> and Supplementary Table <xref ref-type="supplementary-material" rid="SM9">9</xref>, Lys-rich AGPs were present in 39 species but absent in eight species. For the short Lys-rich regions, lengths were largely variable from 5 to 22 amino acids. Moreover, the amino acid composition of the short Lys-rich regions was considerably different. The Lys-rich, His-rich, or Arg-rich regions were characterized by three kinds of basic amino acids which contributed the most to the short regions. Typically, 72 Lys-rich regions were found in 111 proteins, followed by 24 His-rich regions and four Arg-rich regions. Also, we found eight Lys/His-rich regions (i.e., Lys and His were equal in number) and three His/Arg-rich regions (i.e., His and Arg were equal in number), respectively, but did not find any Lys/Arg-rich region in this study. Lys residues were absent in the basic amino acid-rich region of Glyma.02G130000.1.p and only one Lys residue was found in the short region (14 amino acids) of Glyma.01G092300.1.p. More specifically, it would be inappropriate if calling AGPs with few and even without Lys residues as &#x0201C;Lys-rich AGPs.&#x0201D;</p>
<p>To understand the sequence variation of Lys-rich AGPs, multiple sequence alignments were performed using representative Lys-rich AGPs from Rosales, Brassicales, and Fabales species. The length of Lys-rich regions was varied in closely related Rosales Lys-rich AGPs. For instance, two short Lys-rich regions were found in both DMP0000290620 and DMP0000173174, namely &#x0201C;KKPKH&#x0201D; and &#x0201C;KSKSKKPKHK&#x0201D; were spaced by &#x0201C;ESPAAAPTPS.&#x0201D; However, in Pm002430 and ppa011163, the &#x0201C;KKPKH&#x0201D; was missed and &#x0201C;KSKSKKPKHK&#x0201D; was shortened into &#x0201C;KKKPKHK&#x0201D; and &#x0201C;KKKSKHK,&#x0201D; respectively. Moreover, the Lys-rich region was only a five-amino acid sequence &#x0201C;KSKHK&#x0201D; in Pbr025221.2 (Figure <xref ref-type="fig" rid="F4">4</xref>). Similar patterns were found in representatives of Brassicales, there were seven amino acids &#x0201C;KHKKKHK,&#x0201D; ten amino acids &#x0201C;KHKKKTKKHK,&#x0201D; and twelve amino acids &#x0201C;KHKKTKKTKKHK&#x0201D; in the Lys-rich regions. Interestingly in Fables, we not only found the sequence length change of Lys-rich regions between homologs but also found the Lys-rich region was absent in Glyma.01G092200.1, meaning that the Lys-rich AGPs had a homologous classical AGP.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p><bold>Multiple sequence alignments of representatives from classical and Lys-rich classical AGPs</bold>. Amino acid sequences were aligned using the Clustal X 1.83 program with default parameters to show the deletion/insertion model amongst classical and Lys-rich classical AGPs. <bold>(A)</bold> Lys-rich classical AGPs from Rosales; <bold>(B)</bold> Lys-rich classical AGPs from Brassicales; <bold>(C)</bold> classical and Lys-rich classical AGPs from Fabales. The asterisk (<sup>&#x0002A;</sup>), colon (:), and dot (.) represent different conservative level from high to low.</p></caption>
<graphic xlink:href="fpls-08-00066-g0004.tif"/>
</fig>
<p>In order to investigate the evolutionary events of Lys-rich AGPs, we also identified the Lys-rich AGPs of 16 angiosperms (especially monocot species) and Chlorophyta representatives in addition to the 47 plant species. To sum up, a total of 138 Lys-rich AGPs were identified (Supplementary Table <xref ref-type="supplementary-material" rid="SM10">10</xref>). It was clear that the Lys-rich AGPs were only present in seed plants (Spermatophyta) including gymnosperm and angiosperm but absent in spikemoss (<italic>S. moellendorffii</italic>), moss (<italic>P. patens</italic>), and green alga (Chlamydomonadales). Besides, several monocot species were also lacking Lys-rich AGPs. Moreover, the full-length amino acid sequences of 138 Lys-rich AGPs were used to perform multiple sequence alignment and generate phylogenetic trees. The phylogeny analysis suggested that the Lys-rich AGPs experienced at least two ancient duplications which gave rise to the three subgroups: monocot group, eudicot group I, and eudicot group II (Figure <xref ref-type="fig" rid="F5">5</xref>). The Lys-rich AGPs from the distinctive angiosperm plant&#x02212;<italic>Amborella trichopoda</italic> and the gymnosperm plant <italic>P. abies</italic> were closer to eudicot group I and II, respectively. There were three exceptions, <italic>Beta vulgaris</italic> (Bv6_124300_ryze.t1) was independent of the eudicot group I and another two from <italic>Actinidia chinensis</italic> (Achn194341 and Achn194361) were in the monocot group (Figure <xref ref-type="fig" rid="F5">5</xref>).</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p><bold>Phylogenetic relationship of Lys-rich AGPs</bold>. The phylogenetic tree is constructed based on the multiple sequence alignments of Lys-rich AGPs using MEGA 6.0. Bootstrapping is performed from 1000 reiterations of the neighbor-joining method. Scale bar represents 0.1 amino acid substitutions per site.</p></caption>
<graphic xlink:href="fpls-08-00066-g0005.tif"/>
</fig>
</sec>
</sec>
<sec sec-type="discussion" id="s4">
<title>Discussion</title>
<sec>
<title>The development of AGP prediction: from amino acid bias and BIO OHIO to finding-AGP</title>
<p>Following the complete genome sequencing of many plant species, protein sequences were annotated according to their sequence homology with other proteins. However, for AGP family members, most were incorrectly annotated due to their relatively low homology. Therefore, in order to annotate and identify the entire AGP family, we must find an accurate and efficient method to separate them from other proteins. The amino acid bias method (i.e., sequences are biased for Pro, Ala, Ser, and Thr) was firstly proposed to identify AGPs (Schultz et al., <xref ref-type="bibr" rid="B51">2002</xref>), which efficiently identified 62 candidate genes from 25,617 protein sequences with a high PAST<sub>T</sub>% in <italic>Arabidopsis</italic> (i.e., &#x0003E;50%). However, most members of several subfamilies were absent, including Pep, chimeric AGPs (FLA, PAG, and XYLP), and NC. If the threshold of PAST<sub>T</sub>% was reduced to 30% to cover most of the missed AGPs mentioned above, many false positive results were found. Finally, Schultz et al., <xref ref-type="bibr" rid="B51">2002</xref> discussed the &#x0201C;windows&#x0201D; approach to identify AGP-like sequences by calculating the PAST percentage in &#x0201C;windows&#x0201D; of 15&#x02013;25 amino acid residues, which could reduce false positives when the PAST<sub>T</sub>% was relatively low. Recently, the sliding window feature was developed in the BIO OHIO program (Showalter et al., <xref ref-type="bibr" rid="B60">2010</xref>). When the PAST percentage was &#x0003E;60% in window size of 10 amino acids, a large number of false positives were generated and limited the use of the sliding windows feature.</p>
<p>In this study, we compiled a Python script called Finding-AGP to find and analyze AGP-like sequences. Compared to the sliding window feature of BIO OHIO, the most distinctive difference in Finding-AGP was to extract AGP-like sequences based on the definition of effective AG glycosylation instead of specifying the length of a user-defined window. Based on the statistical analysis of known AGPs, the glycomodule patterns were designated as Ala-Pro, Pro-Ala, Ser-Pro, Pro-Ser, Thr-Pro, and Pro-Thr. Seven continuous variables (Length<sub>T</sub>, PAST<sub>T</sub>%, GlycoNo<sub>T</sub>, Length<sub>P</sub>, PAST<sub>P</sub>%, GlycoNo<sub>P</sub>, and GlycoIndex) and an dichotomic variable (whether there was a predicted signal peptide) were proposed and incorporated into the Finding-AGP program (Table <xref ref-type="table" rid="T3">3</xref>). An integrated strategy proposed in this study could separately identify different subfamilies with reduced false positives. The Finding-AGP program had obvious advantages in speed and accuracy, especially in identification of chimeric AGPs with low PAST<sub>T</sub>% (&#x0003C;42%), the number of sequences from most species was &#x0003C;200 (44 of 47 species, Supplementary Table <xref ref-type="supplementary-material" rid="SM6">6</xref>). Though it was believed that we identified the absolute majority of all AGPs, some &#x0201C;fishes may have escaped from the net.&#x0201D; Specifically, because of our searching criteria, proteins without a signal peptide and at the same time with low homology (<italic>e</italic>-value &#x0003E; <italic>e</italic><sup>&#x02212;3</sup>) could not be obtained. Also, the relatively high threshold settings in screening NC with more than 42% PAST<sub>T</sub> may have led to missing some putative AGPs.</p>
</sec>
<sec>
<title>Evolutionary history of the AGP gene family</title>
<p>Many previous studies of AGPs start with phrases like &#x0201C;AGPs were widely distributed in higher plants [plants, plant kingdom, etc.]&#x0201D; (van Hengel and Roberts, <xref ref-type="bibr" rid="B66">2003</xref>; Gaspar et al., <xref ref-type="bibr" rid="B11">2004</xref>; Qin and Zhao, <xref ref-type="bibr" rid="B47">2006</xref>). In fact, the distribution patterns of AGPs were mostly based on 22 species with known AGP genes and other species by the detection of AGP carbohydrate epitopes. In this study, the identification of AGP genes in 47 plant species enabled us to investigate the evolutionary history of this gene family. It is noteworthy that five FLA, one PAG and several proteins with AGP-like and EXT-like glycomodules (e.g., continuous SP or SP<sub>2&#x02212;4</sub> repeats) were found in green alga, indicated that other subfamilies (i.e., C, KC, Pep, and XYLP) might have emerged after the divergence of the green alga and land plants. Previously, it was reported that Xylogen (a member of XYLP) mediated the transformation of undifferentiated suspension cells in liquid culture medium into tracheary elements, which are a basal component in the xylem of vascular tissue (Motose et al., <xref ref-type="bibr" rid="B43">2004</xref>). We speculated that the occurrence of XYLP might be a key incident in the evolution of vascular plants from lower plants, namely the formation of main components of vascular bundles (tracheary elements) could take place in unicellular or multicellular green alga. Corresponding to this point of view, the XYLP members first emerged in moss and existed in all vascular plants (Figure <xref ref-type="fig" rid="F3">3</xref>). Moreover, regarding the close-related three subfamilies (C, KC, and Pep), C was first appeared in <italic>P. patens</italic> and then Pep in <italic>S. moellendorffii</italic>, and at last KC in <italic>P. abies</italic>.</p>
<p>The origin of KC emergence needs to be further elucidated, but the most likely scenario is that it took place after the divergence of <italic>S. moellendorffii</italic> and the common ancestor of seed plants. Moreover, based on the occurrence order of C and KC, the KC was most likely produced by insertions of Lys codons in the coding sequences of C. There were two phylogenetic branches in eudicots but only one for monocot, and KC was absent in several monocots, implying a substantially different evolutionary fate between monocots and eudicots. The difference was probably happened in the ancient whole genome duplication (WGD) event that occurred when the ancestors of angiosperms generated duplicates of each gene (Jiao et al., <xref ref-type="bibr" rid="B22">2011</xref>, <xref ref-type="bibr" rid="B21">2014</xref>). However, these speculations need to be testified by adding more species of Gymnosperm, Lycopodiophyta, and Bryophyta to improve the resolution of the phylogenetic trees.</p>
</sec>
<sec>
<title>The diversity of AGP gene family</title>
<p>In this study, a large number of other types of chimeric AGPs were identified along with known subfamilies, which contributed to almost half of the total number. However, whether the presence of glycomodules in newly identified chimeric AGPs led to successful AG glycosylation should be verified in the future. Up to now, the experimental evidences of AG glycosylation were only proved in several members of FLA, PAG, XYLP, and chimeric AGPs with POeI-like domains (Johnson et al., <xref ref-type="bibr" rid="B23">2003</xref>; Mashiguchi et al., <xref ref-type="bibr" rid="B41">2004</xref>; Hijazi et al., <xref ref-type="bibr" rid="B17">2012</xref>). The major challenge will be to discover a high throughput method to verify the post-translational modifications of putative chimeric AGPs obtained by bioinformatics predictions.</p>
<p>The existence of many HAE indicated that there was not an insurmountable gap between AGPs and EXTs in the course of plant evolution. The pollen extensin-like 1 (PEX1) and LRX members related to the pollen tube and root growth had already been reported in several studies (Rubinstein et al., <xref ref-type="bibr" rid="B48">1995</xref>; Baumberger et al., <xref ref-type="bibr" rid="B3">2001</xref>; Draeger et al., <xref ref-type="bibr" rid="B7">2015</xref>) and the LRX members in rice and <italic>Arabidopsis</italic> were also identified (Baumberger et al., <xref ref-type="bibr" rid="B2">2003</xref>). Previous studies about EXTs summarized the existence of formin-like EXTs, proline-rich extensin-like receptor kinases (PERKs), and leucine-rich-repetitive EXTs (LRXs; Borassi et al., <xref ref-type="bibr" rid="B4">2016</xref>). AG glycomodules were also found in many formins, receptor-like kinases, and proteins with leucine-rich-repetitive (LRR) motifs. These evidences led us to believe that three subfamilies of chimeric AGPs (PK-like, LRR-like, and FH2-like) were most likely existed. Moreover, at least eight chimeric AGPs with Pollen_Ole_e_I domains were previously identified, four of which were proved to be AG glycosylated, including TTS1, TTS2, DcAGP1, and AtAGP31 (Cheung et al., <xref ref-type="bibr" rid="B6">1995</xref>; Wu et al., <xref ref-type="bibr" rid="B68">1995</xref>; Baldwin et al., <xref ref-type="bibr" rid="B1">2001</xref>; Liu and Mehdy, <xref ref-type="bibr" rid="B34">2007</xref>; Hijazi et al., <xref ref-type="bibr" rid="B18">2014b</xref>). We found a total of 98 chimeric AGPs with POeI-like domains across 40 plant species (Table <xref ref-type="table" rid="T6">6</xref>). In addition, the PMEI-like (HAE1), LAM (AtAGP31) alpha_CA_prokaryotic_like (AtAGP33) were also identified (Liu and Mehdy, <xref ref-type="bibr" rid="B34">2007</xref>; Showalter et al., <xref ref-type="bibr" rid="B60">2010</xref>). According to an uncompleted statistic, more than 100 conserved domains from the NCBI CDD website were found in putative AGPs identified in this study (Supplementary Table <xref ref-type="supplementary-material" rid="SM7">7</xref>), only a dozen of them were introduced and the rest of them needed to be further investigated.</p>
<p>The diversity of the AGP gene family not only resulted from the various kinds of subfamilies but also relied on the variable gene numbers in each subfamily. In green algae, although 159 total AGP-like sequences were found, the subfamilies of C, KC, and Pep were absent, simple repeats were typical characteristics of these sequences (e.g., [SP]n and [SP<sub>2&#x02212;4</sub>]n). C and Pep appeared in Bryophyta (<italic>P. patens</italic>) and Pteridophyta (<italic>S. moellendorffii</italic>), respectively (Figure <xref ref-type="fig" rid="F3">3</xref>). The AGPs flourished in both subfamilies and numbers in seed plants (gymnosperm and angiosperm) except for the decreased numbers in several species. The incidents of AGP molecular evolution needs to be elucidated in the future.</p>
<p>Thirdly, the sequences of AGPs were diverse even in the same subfamily. Building phylogenetic trees failed in classical AGPs because of the low sequence homology, but we could generate a phylogenetic tree of Lys-rich AGPs even if it was largely divergent (Figure <xref ref-type="fig" rid="F5">5</xref>). We only observed high sequence homology in closely related species (e.g., Brassicales). Moreover, the Lys-rich domains were also different in length and amino acid constitution (Supplementary Table <xref ref-type="supplementary-material" rid="SM9">9</xref>). More specifically, Lys was actually abundant in most Lys-rich AGPs, but His-rich and Arg-rich AGPs were also found. The short regions were usually rich in the three basic amino acids; thus it would be more appropriate if this subfamily was named basic amino acid-rich AGPs.</p>
</sec>
<sec>
<title>Roles of chimeric AGPs in modulating cell wall mechanics and mediating signaling between the cell wall and cytoplasm</title>
<p>The fact that AGPs could specifically bind to the &#x003B2;-Yariv reagent suggested that they might interact with &#x003B2;-linked polysaccharides in the cell wall matrix (Kitazawa et al., <xref ref-type="bibr" rid="B25">2013</xref>; Hijazi et al., <xref ref-type="bibr" rid="B18">2014b</xref>). The co-purification of Yariv reactive glycoproteins and cellulose also indicated possible roles of AGPs in cell wall mechanics (Girault et al., <xref ref-type="bibr" rid="B14">2000</xref>). A classical AGP named ARABINOXYLAN PECTIN ARABINOGALACTAN PROTEIN1 (APAP1) was proved to be covalently attached to wall matrix hemicellulosic and pectic polysaccharides through the rhamnosyl residue arabinogalactan (AG) in <italic>Arabidopsis thaliana</italic> (Tan et al., <xref ref-type="bibr" rid="B65">2013</xref>). Besides, it was also found that AGP31 interacted with rhamnogalacturonan I (RGI) through its PRP-AGP containing Cys (PAC) domain and bound methylesterified polygalacturonic acid through its His-stretch (Hijazi et al., <xref ref-type="bibr" rid="B18">2014b</xref>). In this study, a large number of genes encoding X8 proteins and glycosyl hydrolases with AGP-like glycomodules were found, indicating possible roles of AGPs in binding to carbohydrates and catalyzing cell wall polymer biosynthesis. AGPs might be a &#x0201C;pectin plasticizer&#x0201D; based on the observations of porosity when AGPs are incorporated into pectin gel (Lamport and Kieliszewski, <xref ref-type="bibr" rid="B28">2005</xref>). In addition, a tight association of type II AG with pectin has been observed and reviewed by Nothnagel (<xref ref-type="bibr" rid="B45">1997</xref>). Our study has indicated that obtaining experimental evidences between AGPs and pectin would be a useful direction. Three kinds of chimeric AGPs related to pectin were discovered including pectate lyase, pectin methylesterase inhibitor, and pectin esterase, which might lead to explanations of AGP function in rapid tip growth and pathogen infection (Mollet et al., <xref ref-type="bibr" rid="B42">2002</xref>; Vorwerk et al., <xref ref-type="bibr" rid="B67">2004</xref>; Figure <xref ref-type="fig" rid="F6">6</xref>).</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p><bold>Putative working model of several types of chimeric AGPs</bold>. PCL, chimeric AGPs with pectin lyase-like domain; PME, pectin methylesterase; PMEI, chimeric AGPs with pectin methylesterase inhibitor-like domain; LRR, chimeric AGPs with leucine-rich repeats-like domain; Formin, chimeric AGPs with formin homology 2-like domain; LRR-RLK, chimeric AGPs with leucine-rich repeat receptor kinase-like domain; AG, arabinogalactan, which is represented by pink ovals; P, phosphate (yellow circle); KAPP, kinase-associated protein phosphatase.</p></caption>
<graphic xlink:href="fpls-08-00066-g0006.tif"/>
</fig>
<p>It was previously found that the addition of &#x003B2;-Yariv reagent inhibited cell division of a rose cell suspension line (Serpe and Nothnagel, <xref ref-type="bibr" rid="B53">1994</xref>) and also interrupted the first zygotic asymmetric division of tobacco (Qin and Zhao, <xref ref-type="bibr" rid="B47">2006</xref>). Moreover, &#x003B2;-Yariv treatment rapidly disrupted the normal organization of the microtubule and actin cytoskeleton in tobacco BY2 cells (Sardar et al., <xref ref-type="bibr" rid="B49">2006</xref>). All these reports indicated that the irregular divisions were accompanied by cytoskeleton change. The change of cytokinesis and cell polarization is controlled by the rearrangement of the actin and microtubule cytoskeleton, which might be related to the functions of chimeric AGPs with formin homology 2-like domains (Figure <xref ref-type="fig" rid="F6">6</xref>).</p>
<p>The developmental roles of AGPs made us believe that they might interact with receptor-like kinases (RLKs) and act as a co-receptor (Zhang et al., <xref ref-type="bibr" rid="B70">2011</xref>). This flow of ideas considered that AGPs might be involved in phytohormone and stress signaling as co-receptors because they lacked receptor-like sequences. For example, &#x003B2;-Yariv treatment of barley possibly acted in gibberellic acid signal transduction through suppressing the induction of &#x003B1;-amylase enzyme to regulate starch degradation (Mashiguchi et al., <xref ref-type="bibr" rid="B40">2008</xref>). Immunofluorescence of AG epitopes in tobacco protoplasts revealed their co-localization patterns with wall associated kinase (WAK; Gens et al., <xref ref-type="bibr" rid="B12">2000</xref>). Hypotheses about this phenomenon focus on the interaction between AGP and WAKs. However, Seifert and Roberts (<xref ref-type="bibr" rid="B52">2007</xref>) found that &#x0007E;60 out of 620 RLK proteins had putative AG glycomodules. If RLKs could be part of the AGP family, finding receptors that interacted with AGP (deemed as co-receptor) was heading in the wrong direction. In this study, a total of 454 chimeric AGPs with protein kinase-like domains were identified across 46 plant species except <italic>A. trichopoda</italic>. The subgroups of STK_BAK1_like and STKc_IRAK were possible receptor-like candidates of AGPs. <italic>A. thaliana</italic> BRASSINOSTEROID (BR) INSENSITIVE 1 (BRI1), the receptor for BRs, belonged to STKc_IRAK group (Li et al., <xref ref-type="bibr" rid="B32">2002</xref>). A total of 229 STK_BAK1_like and STKc_IRAK chimeric AGPs were identified across 44 plant species except <italic>A</italic>. <italic>trichopoda, C. reinhardtii</italic>, and <italic>Elaeis guineensis</italic>. RLK proteins consist of a Pro-rich extracellular domain, transmembrane region, and an intracellular kinase domain (Figure <xref ref-type="fig" rid="F6">6</xref>). They might be connected with the cell wall and function in sensing signals from the cell wall to cytoplasm by activating their kinase domains (Silva and Goring, <xref ref-type="bibr" rid="B63">2002</xref>; Goring, <xref ref-type="bibr" rid="B15">2015</xref>; Humphrey et al., <xref ref-type="bibr" rid="B20">2015</xref>). The LRR motifs with the consensus sequence LxxLxLxxN/CxL were not only found in receptor-like kinases (RLKs; e.g., BRI1) but also found in many chimeric AGPs (called LRR-like; Kobe and Kajava, <xref ref-type="bibr" rid="B27">2001</xref>). The roles of the LRR domain were highlighted in protein&#x02013;protein interactions which might bind to extracellular ligands or form dimers. On the other hand, the molecular functions of AGPs involved in signaling were assumed to depend on the bonds of glucuronic acid residues of AG and Ca<sup>2&#x0002B;</sup> (Lamport and V&#x000E1;rnai, <xref ref-type="bibr" rid="B29">2013</xref>). The recycling of Ca<sup>2&#x0002B;</sup> was proposed to be conducted by an AGP-Ca<sup>2&#x0002B;</sup> oscillator, which provide suitable explanations for the involvement of AGPs in Ca<sup>2&#x0002B;</sup> and auxin-related plant morphogenesis (Lamport et al., <xref ref-type="bibr" rid="B30">2014</xref>).</p>
</sec>
</sec>
<sec id="s5">
<title>Author contributions</title>
<p>HM and QC conceived and designed the research plans; YM and CY performed most of the experiments and analyzed the data; HL, WW, YL, and YW provided technical assistance to YM and CY; YM wrote the article with contributions of all the authors; HM and QC supervised and complemented the writing.</p>
<sec>
<title>Conflict of interest statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</sec>
</body>
<back>
<ack><p>This research was supported by the National Natural Science Foundation of China (31500159) and the Natural Science Foundation of Shaanxi Province (2016JQ3029).</p>
</ack>
<sec sec-type="supplementary-material" id="s6">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="http://journal.frontiersin.org/article/10.3389/fpls.2017.00066/full#supplementary-material">http://journal.frontiersin.org/article/10.3389/fpls.2017.00066/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Table1.DOC" id="SM1" mimetype="application/msword" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table2.XLS" id="SM2" mimetype="application/vnd.ms-excel" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table3.DOC" id="SM3" mimetype="application/msword" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table4.DOC" id="SM4" mimetype="application/msword" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table5.XLS" id="SM5" mimetype="application/vnd.ms-excel" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table6.XLS" id="SM6" mimetype="application/vnd.ms-excel" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table7.XLS" id="SM7" mimetype="application/vnd.ms-excel" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table8.DOC" id="SM8" mimetype="application/msword" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table9.XLS" id="SM9" mimetype="application/vnd.ms-excel" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table10.XLS" id="SM10" mimetype="application/vnd.ms-excel" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Image1.PDF" id="SM11" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Image2.PDF" id="SM12" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baldwin</surname> <given-names>T. C.</given-names></name> <name><surname>Domingo</surname> <given-names>C.</given-names></name> <name><surname>Schindler</surname> <given-names>T.</given-names></name> <name><surname>Seetharaman</surname> <given-names>G.</given-names></name> <name><surname>Stacey</surname> <given-names>N.</given-names></name> <name><surname>Roberts</surname> <given-names>K.</given-names></name></person-group> (<year>2001</year>). <article-title>DcAGP1, a secreted arabinogalactan protein, is related to a family of basic proline-rich proteins</article-title>. <source>Plant Mol. Biol.</source> <volume>45</volume>, <fpage>421</fpage>&#x02013;<lpage>435</lpage>. <pub-id pub-id-type="doi">10.1023/A:1010637426934</pub-id><pub-id pub-id-type="pmid">11352461</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baumberger</surname> <given-names>N.</given-names></name> <name><surname>Doesseger</surname> <given-names>B.</given-names></name> <name><surname>Guyot</surname> <given-names>R.</given-names></name> <name><surname>Diet</surname> <given-names>A.</given-names></name> <name><surname>Parsons</surname> <given-names>R. L.</given-names></name> <name><surname>Clark</surname> <given-names>M. A.</given-names></name> <etal/></person-group>. (<year>2003</year>). <article-title>Whole-genome comparison of leucine-rich repeat extensins in Arabidopsis and rice. A conserved family of cell wall proteins form a vegetative and a reproductive clade</article-title>. <source>Plant Physiol.</source> <volume>131</volume>, <fpage>1313</fpage>&#x02013;<lpage>1326</lpage>. <pub-id pub-id-type="doi">10.1104/pp.102.014928</pub-id><pub-id pub-id-type="pmid">12644681</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baumberger</surname> <given-names>N.</given-names></name> <name><surname>Ringli</surname> <given-names>C.</given-names></name> <name><surname>Keller</surname> <given-names>B.</given-names></name></person-group> (<year>2001</year>). <article-title>The chimeric leucine-rich repeat/extensin cell wall protein LRX1 is required for root hair morphogenesis in <italic>Arabidopsis thaliana</italic></article-title>. <source>Genes Dev.</source> <volume>15</volume>, <fpage>1128</fpage>&#x02013;<lpage>1139</lpage>. <pub-id pub-id-type="doi">10.1101/gad.200201</pub-id><pub-id pub-id-type="pmid">11331608</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Borassi</surname> <given-names>C.</given-names></name> <name><surname>Sede</surname> <given-names>A. R.</given-names></name> <name><surname>Mecchia</surname> <given-names>M. A.</given-names></name> <name><surname>Salgado Salter</surname> <given-names>J. D.</given-names></name> <name><surname>Marzol</surname> <given-names>E.</given-names></name> <name><surname>Muschietti</surname> <given-names>J. P.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>An update on cell surface proteins containing extensin-motifs</article-title>. <source>J. Exp. Bot.</source> <volume>67</volume>, <fpage>477</fpage>&#x02013;<lpage>487</lpage>. <pub-id pub-id-type="doi">10.1093/jxb/erv455</pub-id><pub-id pub-id-type="pmid">26475923</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Borner</surname> <given-names>G. H.</given-names></name> <name><surname>Lilley</surname> <given-names>K. S.</given-names></name> <name><surname>Stevens</surname> <given-names>T. J.</given-names></name> <name><surname>Dupree</surname> <given-names>P.</given-names></name></person-group> (<year>2003</year>). <article-title>Identification of glycosylphosphatidylinositol-anchored proteins in arabidopsis. A proteomic and genomic analysis</article-title>. <source>Plant Physiol.</source> <volume>132</volume>, <fpage>568</fpage>&#x02013;<lpage>577</lpage>. <pub-id pub-id-type="doi">10.1104/pp.103.021170</pub-id><pub-id pub-id-type="pmid">12805588</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cheung</surname> <given-names>A. Y.</given-names></name> <name><surname>Wang</surname> <given-names>H.</given-names></name> <name><surname>Wu</surname> <given-names>H. M.</given-names></name></person-group> (<year>1995</year>). <article-title>A floral transmitting tissue-specific glyxoprotein attracts pollen tubes and stimulates their growth</article-title>. <source>Cell</source> <volume>82</volume>, <fpage>383</fpage>&#x02013;<lpage>393</lpage>. <pub-id pub-id-type="doi">10.1016/0092-8674(95)90427-1</pub-id><pub-id pub-id-type="pmid">7634328</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Draeger</surname> <given-names>C.</given-names></name> <name><surname>Ndinyanka Fabrice</surname> <given-names>T.</given-names></name> <name><surname>Gineau</surname> <given-names>E.</given-names></name> <name><surname>Mouille</surname> <given-names>G.</given-names></name> <name><surname>Kuhn</surname> <given-names>B. M.</given-names></name> <name><surname>Moller</surname> <given-names>I.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>Arabidopsis leucine-rich repeat extensin (LRX) proteins modify cell wall composition and influence plant growth</article-title>. <source>BMC Plant Biol.</source> <volume>15</volume>:<fpage>155</fpage>. <pub-id pub-id-type="doi">10.1186/s12870-015-0548-8</pub-id><pub-id pub-id-type="pmid">26099801</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ellis</surname> <given-names>M.</given-names></name> <name><surname>Egelund</surname> <given-names>J.</given-names></name> <name><surname>Schultz</surname> <given-names>C. J.</given-names></name> <name><surname>Bacic</surname> <given-names>A.</given-names></name></person-group> (<year>2010</year>). <article-title>Arabinogalactan-proteins: key regulators at the cell surface?</article-title> <source>Plant Physiol.</source> <volume>153</volume>, <fpage>403</fpage>&#x02013;<lpage>419</lpage>. <pub-id pub-id-type="doi">10.1104/pp.110.156000</pub-id><pub-id pub-id-type="pmid">20388666</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Faik</surname> <given-names>A.</given-names></name> <name><surname>Abouzouhair</surname> <given-names>J.</given-names></name> <name><surname>Sarhan</surname> <given-names>F.</given-names></name></person-group> (<year>2006</year>). <article-title>Putative fasciclin-like arabinogalactan-proteins (FLA) in wheat (<italic>Triticum aestivum</italic>) and rice (<italic>Oryza sativa</italic>): identification and bioinformatic analysis</article-title>. <source>Mol. Genet. Genomics</source> <volume>276</volume>, <fpage>478</fpage>&#x02013;<lpage>494</lpage>. <pub-id pub-id-type="doi">10.1007/s00438-006-0159-z</pub-id><pub-id pub-id-type="pmid">16944204</pub-id></citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gaspar</surname> <given-names>Y.</given-names></name> <name><surname>Johnson</surname> <given-names>K. L.</given-names></name> <name><surname>McKenna</surname> <given-names>J. A.</given-names></name> <name><surname>Bacic</surname> <given-names>A.</given-names></name> <name><surname>Schultz</surname> <given-names>C. J.</given-names></name></person-group> (<year>2001</year>). <article-title>The complex structures of arabinogalactan-proteins and the journey towards understanding function</article-title>. <source>Plant Mol. Biol.</source> <volume>47</volume>, <fpage>161</fpage>&#x02013;<lpage>176</lpage>. <pub-id pub-id-type="doi">10.1023/A:1010683432529</pub-id><pub-id pub-id-type="pmid">11554470</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gaspar</surname> <given-names>Y.</given-names></name> <name><surname>Nam</surname> <given-names>J.</given-names></name> <name><surname>Schultz</surname> <given-names>C. J.</given-names></name> <name><surname>Lee</surname> <given-names>L. Y.</given-names></name> <name><surname>Gilson</surname> <given-names>P. R.</given-names></name></person-group> (<year>2004</year>). <article-title>Characterization of the Arabidopsis lysine-rich arabinogalactan-protein AtAGP17 mutant (rat1) that results in a decreased efficiency of agrobacterium transformation</article-title>. <source>Plant Physiol.</source> <volume>135</volume>, <fpage>2162</fpage>&#x02013;<lpage>2171</lpage>. <pub-id pub-id-type="doi">10.1104/pp.104.045542</pub-id><pub-id pub-id-type="pmid">15286287</pub-id></citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gens</surname> <given-names>J. S.</given-names></name> <name><surname>Fujiki</surname> <given-names>M.</given-names></name> <name><surname>Pickard</surname> <given-names>B. G.</given-names></name></person-group> (<year>2000</year>). <article-title>Arabinogalactan protein and wall-associated kinase in a plasmalemmal reticulum with specialized vertices</article-title>. <source>Protoplasma</source> <volume>212</volume>, <fpage>115</fpage>&#x02013;<lpage>134</lpage>. <pub-id pub-id-type="doi">10.1007/BF01279353</pub-id><pub-id pub-id-type="pmid">11543565</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gilson</surname> <given-names>P.</given-names></name> <name><surname>Gaspar</surname> <given-names>Y. M.</given-names></name> <name><surname>Oxley</surname> <given-names>D.</given-names></name> <name><surname>Youl</surname> <given-names>J. J.</given-names></name> <name><surname>Bacic</surname> <given-names>A.</given-names></name></person-group> (<year>2001</year>). <article-title>NaAGP4 is an arabinogalactan-protein whose expression is suppressed by wounding and fungal infection in <italic>Nicotiana alata</italic></article-title>. <source>Protoplasma</source> <volume>215</volume>, <fpage>128</fpage>&#x02013;<lpage>139</lpage>. <pub-id pub-id-type="doi">10.1007/BF01280309</pub-id><pub-id pub-id-type="pmid">11732052</pub-id></citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Girault</surname> <given-names>R.</given-names></name> <name><surname>His</surname> <given-names>I.</given-names></name> <name><surname>Andeme-Onzighi</surname> <given-names>C.</given-names></name> <name><surname>Driouich</surname> <given-names>A.</given-names></name> <name><surname>Morvan</surname> <given-names>C.</given-names></name></person-group> (<year>2000</year>). <article-title>Identification and partial characterization of proteins and proteoglycans encrusting the secondary cell walls of flax fibres</article-title>. <source>Planta</source> <volume>211</volume>, <fpage>256</fpage>&#x02013;<lpage>264</lpage>. <pub-id pub-id-type="doi">10.1007/s004250000281</pub-id><pub-id pub-id-type="pmid">10945220</pub-id></citation>
</ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goring</surname> <given-names>D. R.</given-names></name></person-group> (<year>2015</year>). <article-title>PERK-KIPK-KCBP signalling negatively regulates root growth in <italic>Arabidopsis thaliana</italic></article-title>. <source>J. Exp. Bot.</source> <volume>66</volume>, <fpage>71</fpage>&#x02013;<lpage>83</lpage>. <pub-id pub-id-type="doi">10.1093/jxb/eru390</pub-id><pub-id pub-id-type="pmid">25262228</pub-id></citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Griffiths</surname> <given-names>J. S.</given-names></name> <name><surname>Crepeau</surname> <given-names>M. J.</given-names></name> <name><surname>Ralet</surname> <given-names>M. C.</given-names></name> <name><surname>Seifert</surname> <given-names>G. J.</given-names></name> <name><surname>North</surname> <given-names>H. M.</given-names></name></person-group> (<year>2016</year>). <article-title>Dissecting seed mucilage adherence mediated by FEI2 and SOS5</article-title>. <source>Front. Plant Sci.</source> <volume>7</volume>:<fpage>1073</fpage>. <pub-id pub-id-type="doi">10.3389/fpls.2016.01073</pub-id><pub-id pub-id-type="pmid">27524986</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hijazi</surname> <given-names>M.</given-names></name> <name><surname>Durand</surname> <given-names>J.</given-names></name> <name><surname>Pichereaux</surname> <given-names>C.</given-names></name> <name><surname>Pont</surname> <given-names>F.</given-names></name> <name><surname>Jamet</surname> <given-names>E.</given-names></name> <name><surname>Albenne</surname> <given-names>C.</given-names></name></person-group> (<year>2012</year>). <article-title>Characterization of the arabinogalactan protein 31 (AGP31) of <italic>Arabidopsis thaliana</italic>: new advances on the Hyp-O-glycosylation of the Pro-rich domain</article-title>. <source>J. Biol. Chem.</source> <volume>287</volume>, <fpage>9623</fpage>&#x02013;<lpage>9632</lpage>. <pub-id pub-id-type="doi">10.1074/jbc.M111.247874</pub-id><pub-id pub-id-type="pmid">22270363</pub-id></citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hijazi</surname> <given-names>M.</given-names></name> <name><surname>Roujol</surname> <given-names>D.</given-names></name> <name><surname>Nguyen-Kim</surname> <given-names>H.</given-names></name> <name><surname>Del Rocio Cisneros Castillo</surname> <given-names>L.</given-names></name> <name><surname>Saland</surname> <given-names>E.</given-names></name> <name><surname>Jamet</surname> <given-names>E.</given-names></name> <etal/></person-group>. (<year>2014b</year>). <article-title>Arabinogalactan protein 31 (AGP31), a putative network-forming protein in <italic>Arabidopsis thaliana</italic> cell walls?</article-title> <source>Ann. Bot.</source> <volume>114</volume>, <fpage>1087</fpage>&#x02013;<lpage>1097</lpage>. <pub-id pub-id-type="doi">10.1093/aob/mcu038</pub-id><pub-id pub-id-type="pmid">24685714</pub-id></citation>
</ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hijazi</surname> <given-names>M.</given-names></name> <name><surname>Velasquez</surname> <given-names>S. M.</given-names></name> <name><surname>Jamet</surname> <given-names>E.</given-names></name> <name><surname>Estevez</surname> <given-names>J. M.</given-names></name> <name><surname>Albenne</surname> <given-names>C.</given-names></name></person-group> (<year>2014a</year>). <article-title>An update on post-translational modifications of hydroxyproline-rich glycoproteins: toward a model highlighting their contribution to plant cell wall architecture</article-title>. <source>Front. Plant Sci.</source> <volume>5</volume>:<fpage>395</fpage>. <pub-id pub-id-type="doi">10.3389/fpls.2014.00395</pub-id><pub-id pub-id-type="pmid">25177325</pub-id></citation>
</ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Humphrey</surname> <given-names>T. V.</given-names></name> <name><surname>Haasen</surname> <given-names>K. E.</given-names></name> <name><surname>Aldea-Brydges</surname> <given-names>M. G.</given-names></name> <name><surname>Sun</surname> <given-names>H.</given-names></name> <name><surname>Zayed</surname> <given-names>Y.</given-names></name> <name><surname>Indriolo</surname> <given-names>E.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>PERK-KIPK-KCBP signalling negatively regulates root growth in <italic>Arabidopsis thaliana</italic></article-title>. <source>J. Exp. Bot.</source> <volume>66</volume>, <fpage>71</fpage>&#x02013;<lpage>83</lpage>. <pub-id pub-id-type="doi">10.1093/jxb/eru390</pub-id><pub-id pub-id-type="pmid">25262228</pub-id></citation>
</ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jiao</surname> <given-names>Y.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Tang</surname> <given-names>H.</given-names></name> <name><surname>Paterson</surname> <given-names>A. H.</given-names></name></person-group> (<year>2014</year>). <article-title>Integrated syntenic and phylogenomic analyses reveal an ancient genome duplication in monocots</article-title>. <source>Plant Cell</source> <volume>26</volume>, <fpage>2792</fpage>&#x02013;<lpage>2802</lpage>. <pub-id pub-id-type="doi">10.1105/tpc.114.127597</pub-id><pub-id pub-id-type="pmid">25082857</pub-id></citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jiao</surname> <given-names>Y.</given-names></name> <name><surname>Wickett</surname> <given-names>N. J.</given-names></name> <name><surname>Ayyampalayam</surname> <given-names>S.</given-names></name> <name><surname>Chanderbali</surname> <given-names>A. S.</given-names></name> <name><surname>Landherr</surname> <given-names>L.</given-names></name> <name><surname>Ralph</surname> <given-names>P. E.</given-names></name> <etal/></person-group>. (<year>2011</year>). <article-title>Ancestral polyploidy in seed plants and angiosperms</article-title>. <source>Nature</source> <volume>473</volume>, <fpage>97</fpage>&#x02013;<lpage>100</lpage>. <pub-id pub-id-type="doi">10.1038/nature09916</pub-id><pub-id pub-id-type="pmid">21478875</pub-id></citation>
</ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Johnson</surname> <given-names>K. L.</given-names></name> <name><surname>Jones</surname> <given-names>B. J.</given-names></name> <name><surname>Bacic</surname> <given-names>A.</given-names></name> <name><surname>Schultz</surname> <given-names>C. J.</given-names></name></person-group> (<year>2003</year>). <article-title>The fasciclin-like arabinogalactan proteins of Arabidopsis. A multigene family of putative cell adhesion molecules</article-title>. <source>Plant Physiol.</source> <volume>133</volume>, <fpage>1911</fpage>&#x02013;<lpage>1925</lpage>. <pub-id pub-id-type="doi">10.1104/pp.103.031237</pub-id><pub-id pub-id-type="pmid">14645732</pub-id></citation>
</ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kavi Kishor</surname> <given-names>P. B.</given-names></name> <name><surname>Hima Kumari</surname> <given-names>P.</given-names></name> <name><surname>Sunita</surname> <given-names>M. S.</given-names></name> <name><surname>Sreenivasulu</surname> <given-names>N.</given-names></name></person-group> (<year>2015</year>). <article-title>Role of proline in cell wall synthesis and plant development and its implications in plant ontogeny</article-title>. <source>Front. Plant Sci.</source> <volume>6</volume>:<fpage>544</fpage>. <pub-id pub-id-type="doi">10.3389/fpls.2015.00544</pub-id><pub-id pub-id-type="pmid">26257754</pub-id></citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kitazawa</surname> <given-names>K.</given-names></name> <name><surname>Tryfona</surname> <given-names>T.</given-names></name> <name><surname>Yoshimi</surname> <given-names>Y.</given-names></name> <name><surname>Hayashi</surname> <given-names>Y.</given-names></name> <name><surname>Kawauchi</surname> <given-names>S.</given-names></name> <name><surname>Antonov</surname> <given-names>L.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>&#x003B2;-galactosyl Yariv reagent binds to the &#x003B2;-1,3-galactan of arabinogalactan proteins</article-title>. <source>Plant Physiol.</source> <volume>161</volume>, <fpage>1117</fpage>&#x02013;<lpage>1126</lpage>. <pub-id pub-id-type="doi">10.1104/pp.112.211722</pub-id><pub-id pub-id-type="pmid">23296690</pub-id></citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kobayashi</surname> <given-names>Y.</given-names></name> <name><surname>Motose</surname> <given-names>H.</given-names></name> <name><surname>Iwamoto</surname> <given-names>K.</given-names></name> <name><surname>Fukuda</surname> <given-names>H.</given-names></name></person-group> (<year>2011</year>). <article-title>Expression and genome-wide analysis of the xylogen-type gene family</article-title>. <source>Plant Cell Physiol.</source> <volume>52</volume>, <fpage>1095</fpage>&#x02013;<lpage>1106</lpage>. <pub-id pub-id-type="doi">10.1093/pcp/pcr060</pub-id><pub-id pub-id-type="pmid">21558309</pub-id></citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kobe</surname> <given-names>B.</given-names></name> <name><surname>Kajava</surname> <given-names>A. V.</given-names></name></person-group> (<year>2001</year>). <article-title>The leucine-rich repeat as a protein recognition motif</article-title>. <source>Curr. Opin. Struct. Biol.</source> <volume>11</volume>, <fpage>725</fpage>&#x02013;<lpage>732</lpage>. <pub-id pub-id-type="doi">10.1016/S0959-440X(01)00266-4</pub-id><pub-id pub-id-type="pmid">11751054</pub-id></citation>
</ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lamport</surname> <given-names>D. T.</given-names></name> <name><surname>Kieliszewski</surname> <given-names>M. J.</given-names></name></person-group> (<year>2005</year>). <article-title>Stress upregulates periplasmic arabinogalactan proteins</article-title>. <source>Plant Biosyst.</source> <volume>139</volume>, <fpage>60</fpage>&#x02013;<lpage>64</lpage>. <pub-id pub-id-type="doi">10.1080/11263500500055106</pub-id></citation>
</ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lamport</surname> <given-names>D. T. A.</given-names></name> <name><surname>V&#x000E1;rnai</surname> <given-names>P.</given-names></name></person-group> (<year>2013</year>). <article-title>Periplasmic arabinogalactan glycoproteins act as a calcium capacitor that regulates plant growth and development</article-title>. <source>New Phytol.</source> <volume>197</volume>, <fpage>58</fpage>&#x02013;<lpage>64</lpage>. <pub-id pub-id-type="doi">10.1111/nph.12005</pub-id><pub-id pub-id-type="pmid">23106282</pub-id></citation>
</ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lamport</surname> <given-names>D. T.</given-names></name> <name><surname>Varnai</surname> <given-names>P.</given-names></name> <name><surname>Seal</surname> <given-names>C. E.</given-names></name></person-group> (<year>2014</year>). <article-title>Back to the future with the AGP-Ca2&#x0002B; flux capacitor</article-title>. <source>Ann. Bot.</source> <volume>114</volume>, <fpage>1069</fpage>&#x02013;<lpage>1085</lpage>. <pub-id pub-id-type="doi">10.1093/aob/mcu161</pub-id><pub-id pub-id-type="pmid">25139429</pub-id></citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Gao</surname> <given-names>G.</given-names></name> <name><surname>Zhang</surname> <given-names>T.</given-names></name> <name><surname>Wu</surname> <given-names>X.</given-names></name></person-group> (<year>2013</year>). <article-title>The putative phytocyanin genes in Chinese cabbage (<italic>Brassica rapa</italic> L.): genome-wide identification, classification and expression analysis</article-title>. <source>Mol. Genet. Genomics</source> <volume>288</volume>, <fpage>1</fpage>&#x02013;<lpage>20</lpage>. <pub-id pub-id-type="doi">10.1007/s00438-012-0726-4</pub-id><pub-id pub-id-type="pmid">23212439</pub-id></citation>
</ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Wen</surname> <given-names>J.</given-names></name> <name><surname>Lease</surname> <given-names>K. A.</given-names></name> <name><surname>Doke</surname> <given-names>J. T.</given-names></name> <name><surname>Tax</surname> <given-names>F. E.</given-names></name> <name><surname>Walker</surname> <given-names>J. C.</given-names></name></person-group> (<year>2002</year>). <article-title>BAK1, an Arabidopsis LRR receptor-like protein kinase, interacts with BRI1 and modulates brassinosteroid signaling</article-title>. <source>Cell</source> <volume>110</volume>, <fpage>213</fpage>&#x02013;<lpage>222</lpage>. <pub-id pub-id-type="doi">10.1016/S0092-8674(02)00812-7</pub-id><pub-id pub-id-type="pmid">12150929</pub-id></citation>
</ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>S. X.</given-names></name> <name><surname>Showalter</surname> <given-names>A. M.</given-names></name></person-group> (<year>1996</year>). <article-title>Cloning and developmental/stress-regulated expression of a gene encoding a tomato arabinogalactan protein</article-title>. <source>Plant Mol. Biol.</source> <volume>32</volume>, <fpage>641</fpage>&#x02013;<lpage>652</lpage>. <pub-id pub-id-type="doi">10.1007/BF00020205</pub-id><pub-id pub-id-type="pmid">8980516</pub-id></citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>C.</given-names></name> <name><surname>Mehdy</surname> <given-names>M. C.</given-names></name></person-group> (<year>2007</year>). <article-title>A nonclassical arabinogalactan protein gene highly expressed in vascular tissues, AGP31, is transcriptionally repressed by methyl jasmonic acid in Arabidopsis</article-title>. <source>Plant Physiol.</source> <volume>145</volume>, <fpage>863</fpage>&#x02013;<lpage>874</lpage>. <pub-id pub-id-type="doi">10.1104/pp.107.102657</pub-id><pub-id pub-id-type="pmid">17885091</pub-id></citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ma</surname> <given-names>H.</given-names></name> <name><surname>Zhao</surname> <given-names>J.</given-names></name></person-group> (<year>2010</year>). <article-title>Genome-wide identification, classification, and expression analysis of the arabinogalactan protein gene family in rice (<italic>Oryza sativa</italic> L.)</article-title>. <source>J. Exp. Bot.</source> <volume>61</volume>, <fpage>2647</fpage>&#x02013;<lpage>2668</lpage>. <pub-id pub-id-type="doi">10.1093/jxb/erq104</pub-id><pub-id pub-id-type="pmid">20423940</pub-id></citation>
</ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ma</surname> <given-names>H.</given-names></name> <name><surname>Zhao</surname> <given-names>H.</given-names></name> <name><surname>Liu</surname> <given-names>Z.</given-names></name> <name><surname>Zhao</surname> <given-names>J.</given-names></name></person-group> (<year>2011</year>). <article-title>The phytocyanin gene family in rice (<italic>Oryza sativa</italic> L.): genome-wide identification, classification and transcriptional analysis</article-title>. <source>PLoS ONE</source> <volume>6</volume>:<fpage>e25184</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0025184</pub-id><pub-id pub-id-type="pmid">21984902</pub-id></citation>
</ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ma</surname> <given-names>T.</given-names></name> <name><surname>Ma</surname> <given-names>H.</given-names></name> <name><surname>Zhao</surname> <given-names>H.</given-names></name> <name><surname>Qi</surname> <given-names>H.</given-names></name> <name><surname>Zhao</surname> <given-names>J.</given-names></name></person-group> (<year>2014</year>). <article-title>Identification, characterization, and transcription analysis of xylogen-like arabinogalactan proteins in rice (<italic>Oryza sativa</italic> L.)</article-title>. <source>BMC Plant Biol.</source> <volume>14</volume>:<fpage>299</fpage>. <pub-id pub-id-type="doi">10.1186/s12870-014-0299-y</pub-id><pub-id pub-id-type="pmid">25407280</pub-id></citation>
</ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>MacMillan</surname> <given-names>C. P.</given-names></name> <name><surname>Taylor</surname> <given-names>L.</given-names></name> <name><surname>Bi</surname> <given-names>Y.</given-names></name> <name><surname>Southerton</surname> <given-names>S. G.</given-names></name> <name><surname>Evans</surname> <given-names>R.</given-names></name> <name><surname>Spokevicius</surname> <given-names>A.</given-names></name></person-group> (<year>2015</year>). <article-title>The fasciclin-like arabinogalactan protein family of <italic>Eucalyptus grandis</italic> contains members that impact wood biology and biomechanics</article-title>. <source>New Phytol.</source> <volume>206</volume>, <fpage>1314</fpage>&#x02013;<lpage>1327</lpage>. <pub-id pub-id-type="doi">10.1111/nph.13320</pub-id><pub-id pub-id-type="pmid">25676073</pub-id></citation>
</ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mashiguchi</surname> <given-names>K.</given-names></name> <name><surname>Asami</surname> <given-names>T.</given-names></name> <name><surname>Suzuki</surname> <given-names>Y.</given-names></name></person-group> (<year>2009</year>). <article-title>Genome-wide identification, structure and expression studies, and mutant collection of 22 early nodulin-like protein genes in Arabidopsis</article-title>. <source>Biosci. Biotechnol. Biochem.</source> <volume>73</volume>, <fpage>2452</fpage>&#x02013;<lpage>2459</lpage>. <pub-id pub-id-type="doi">10.1271/bbb.90407</pub-id><pub-id pub-id-type="pmid">19897921</pub-id></citation>
</ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mashiguchi</surname> <given-names>K.</given-names></name> <name><surname>Urakami</surname> <given-names>E.</given-names></name> <name><surname>Hasegawa</surname> <given-names>M.</given-names></name> <name><surname>Sanmiya</surname> <given-names>K.</given-names></name> <name><surname>Matsumoto</surname> <given-names>I.</given-names></name> <name><surname>Yamaguchi</surname> <given-names>I.</given-names></name> <etal/></person-group>. (<year>2008</year>). <article-title>Defense-related signaling by interaction of arabinogalactan proteins and beta-glucosyl Yariv reagent inhibits gibberellin signaling in barley aleurone cells</article-title>. <source>Plant Cell Physiol.</source> <volume>49</volume>, <fpage>178</fpage>&#x02013;<lpage>190</lpage>. <pub-id pub-id-type="doi">10.1093/pcp/pcm175</pub-id><pub-id pub-id-type="pmid">18156132</pub-id></citation>
</ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mashiguchi</surname> <given-names>K.</given-names></name> <name><surname>Yamaguchi</surname> <given-names>I.</given-names></name> <name><surname>Suzuki</surname> <given-names>Y.</given-names></name></person-group> (<year>2004</year>). <article-title>Isolation and identification of glycosylphosphatidylinositol-anchored arabinogalactan proteins and novel &#x003B2;-glucosyl Yariv-reactive proteins from seeds of rice (<italic>Oryza sativa</italic>)</article-title>. <source>Plant Cell Physiol.</source> <volume>45</volume>, <fpage>1817</fpage>&#x02013;<lpage>1829</lpage>. <pub-id pub-id-type="doi">10.1093/pcp/pch208</pub-id><pub-id pub-id-type="pmid">15653800</pub-id></citation>
</ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mollet</surname> <given-names>J. C.</given-names></name> <name><surname>Kim</surname> <given-names>S.</given-names></name> <name><surname>Jauh</surname> <given-names>G. Y.</given-names></name> <name><surname>Lord</surname> <given-names>E. M.</given-names></name></person-group> (<year>2002</year>). <article-title>Arabinogalactan proteins, pollen tube growth, and the reversible effects of Yariv phenylglycoside</article-title>. <source>Protoplasma</source> <volume>219</volume>, <fpage>89</fpage>&#x02013;<lpage>98</lpage>. <pub-id pub-id-type="doi">10.1007/s007090200009</pub-id><pub-id pub-id-type="pmid">11926071</pub-id></citation>
</ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Motose</surname> <given-names>H.</given-names></name> <name><surname>Sugiyama</surname> <given-names>M.</given-names></name> <name><surname>Fukuda</surname> <given-names>H.</given-names></name></person-group> (<year>2004</year>). <article-title>A proteoglycan mediates inductive interaction during plant vascular development</article-title>. <source>Nature</source> <volume>429</volume>, <fpage>873</fpage>&#x02013;<lpage>878</lpage>. <pub-id pub-id-type="doi">10.1038/nature02613</pub-id><pub-id pub-id-type="pmid">15215864</pub-id></citation>
</ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nguema-Ona</surname> <given-names>E.</given-names></name> <name><surname>Vicr&#x000E9;-Gibouin</surname> <given-names>M.</given-names></name> <name><surname>Gott&#x000E9;</surname> <given-names>M.</given-names></name> <name><surname>Plancot</surname> <given-names>B.</given-names></name> <name><surname>Lerouge</surname> <given-names>P.</given-names></name> <name><surname>Bardor</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>Cell wall O-glycoproteins and N-glycoproteins: aspects of biosynthesis and function</article-title>. <source>Front. Plant Sci.</source> <volume>5</volume>:<fpage>499</fpage>. <pub-id pub-id-type="doi">10.3389/fpls.2014.00499</pub-id><pub-id pub-id-type="pmid">25324850</pub-id></citation>
</ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nothnagel</surname> <given-names>E. A.</given-names></name></person-group> (<year>1997</year>). <article-title>Proteoglycans and related components in plant cells</article-title>. <source>Int. Rev. Cytol.</source> <volume>174</volume>, <fpage>195</fpage>&#x02013;<lpage>291</lpage>. <pub-id pub-id-type="doi">10.1016/S0074-7696(08)62118-X</pub-id><pub-id pub-id-type="pmid">9161008</pub-id></citation>
</ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Petersen</surname> <given-names>T. N.</given-names></name> <name><surname>Brunak</surname> <given-names>S.</given-names></name> <name><surname>von Heijne</surname> <given-names>G.</given-names></name> <name><surname>Nielsen</surname> <given-names>H.</given-names></name></person-group> (<year>2011</year>). <article-title>SignalP 4.0: discriminating signal peptides from transmembrane regions</article-title>. <source>Nat. Methods</source> <volume>8</volume>, <fpage>785</fpage>&#x02013;<lpage>786</lpage>. <pub-id pub-id-type="doi">10.1038/nmeth.1701</pub-id><pub-id pub-id-type="pmid">21959131</pub-id></citation>
</ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Qin</surname> <given-names>Y.</given-names></name> <name><surname>Zhao</surname> <given-names>J.</given-names></name></person-group> (<year>2006</year>). <article-title>Localization of arabinogalactan proteins in egg cells, zygotes, and two-celled proembryos and effects of beta-D-glucosyl Yariv reagent on egg cell fertilization and zygote division in <italic>Nicotiana tabacum</italic> L</article-title>. <source>J Exp. Bot.</source> <volume>57</volume>, <fpage>2061</fpage>&#x02013;<lpage>2074</lpage>. <pub-id pub-id-type="doi">10.1093/jxb/erj159</pub-id><pub-id pub-id-type="pmid">16720612</pub-id></citation>
</ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rubinstein</surname> <given-names>A. L.</given-names></name> <name><surname>Marquez</surname> <given-names>J.</given-names></name> <name><surname>Suarez-Cervera</surname> <given-names>M.</given-names></name> <name><surname>Bedinger</surname> <given-names>P. A.</given-names></name></person-group> (<year>1995</year>). <article-title>Extensin-like glycoproteins in the maize pollen tube wall</article-title>. <source>Plant Cell</source> <volume>7</volume>, <fpage>2211</fpage>&#x02013;<lpage>2225</lpage>. <pub-id pub-id-type="doi">10.1105/tpc.7.12.2211</pub-id><pub-id pub-id-type="pmid">12242372</pub-id></citation>
</ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sardar</surname> <given-names>H. S.</given-names></name> <name><surname>Yang</surname> <given-names>J.</given-names></name> <name><surname>Showalter</surname> <given-names>A.</given-names></name></person-group> (<year>2006</year>). <article-title>Molecular interactions of arabinogalactan-proteins (AGPs) with cortical microtubules and F-actin in bright yellow-2 (BY-2) tobacco cultured cells</article-title>. <source>Plant Physiol.</source> <volume>142</volume>, <fpage>1469</fpage>&#x02013;<lpage>1479</lpage>. <pub-id pub-id-type="doi">10.1104/pp.106.088716</pub-id><pub-id pub-id-type="pmid">17056757</pub-id></citation>
</ref>
<ref id="B50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schultz</surname> <given-names>C. J.</given-names></name> <name><surname>Johnson</surname> <given-names>K. L.</given-names></name> <name><surname>Currie</surname> <given-names>G.</given-names></name> <name><surname>Bacic</surname> <given-names>A.</given-names></name></person-group> (<year>2000</year>). <article-title>The classical arabinogalactan protein gene family of Arabidopsis</article-title>. <source>Plant Cell</source> <volume>12</volume>, <fpage>1751</fpage>&#x02013;<lpage>1768</lpage>. <pub-id pub-id-type="doi">10.1105/tpc.12.9.1751</pub-id><pub-id pub-id-type="pmid">11006345</pub-id></citation>
</ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schultz</surname> <given-names>C. J.</given-names></name> <name><surname>Rumsewicz</surname> <given-names>M. P.</given-names></name> <name><surname>Johnson</surname> <given-names>K. L.</given-names></name> <name><surname>Jones</surname> <given-names>B. J.</given-names></name> <name><surname>Gaspar</surname> <given-names>Y. M.</given-names></name> <name><surname>Bacic</surname> <given-names>A.</given-names></name></person-group> (<year>2002</year>). <article-title>Using genomic resources to guide research directions. The arabinogalactan protein gene family as a test case</article-title>. <source>Plant Physiol.</source> <volume>129</volume>, <fpage>1448</fpage>&#x02013;<lpage>1463</lpage>. <pub-id pub-id-type="doi">10.1104/pp.003459</pub-id><pub-id pub-id-type="pmid">12177459</pub-id></citation>
</ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Seifert</surname> <given-names>G. J.</given-names></name> <name><surname>Roberts</surname> <given-names>K.</given-names></name></person-group> (<year>2007</year>). <article-title>The biology of arabinogalactan proteins</article-title>. <source>Annu. Rev. Plant. Biol.</source> <volume>58</volume>, <fpage>137</fpage>&#x02013;<lpage>161</lpage>. <pub-id pub-id-type="doi">10.1146/annurev.arplant.58.032806.103801</pub-id><pub-id pub-id-type="pmid">17201686</pub-id></citation>
</ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Serpe</surname> <given-names>M. D.</given-names></name> <name><surname>Nothnagel</surname> <given-names>E. A.</given-names></name></person-group> (<year>1994</year>). <article-title>Effects of Yariv phenylglycosides on rosa cell-suspensions-evidence for the involvement of arabinogalactan-proteins in cell proliferation</article-title>. <source>Planta</source> <volume>193</volume>, <fpage>542</fpage>&#x02013;<lpage>550</lpage>. <pub-id pub-id-type="doi">10.1007/BF02411560</pub-id></citation>
</ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shi</surname> <given-names>H.</given-names></name> <name><surname>Kim</surname> <given-names>Y.</given-names></name> <name><surname>Guo</surname> <given-names>Y.</given-names></name> <name><surname>Stevenson</surname> <given-names>B.</given-names></name> <name><surname>Zhu</surname> <given-names>J. K.</given-names></name></person-group> (<year>2003</year>). <article-title>The arabidopsis SOS5 locus encodes a putative cell surface adhesion protein and is required for normal cell expansion</article-title>. <source>Plant Cell</source> <volume>15</volume>, <fpage>19</fpage>&#x02013;<lpage>32</lpage>. <pub-id pub-id-type="doi">10.1105/tpc.007872</pub-id><pub-id pub-id-type="pmid">12509519</pub-id></citation>
</ref>
<ref id="B55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shimizu</surname> <given-names>M.</given-names></name> <name><surname>Igasaki</surname> <given-names>T.</given-names></name> <name><surname>Yamada</surname> <given-names>M.</given-names></name> <name><surname>Yuasa</surname> <given-names>K.</given-names></name> <name><surname>Hasegawa</surname> <given-names>J.</given-names></name> <name><surname>Kato</surname> <given-names>T.</given-names></name> <etal/></person-group>. (<year>2005</year>). <article-title>Experimental determination of proline hydroxylation and hydroxyproline arabinogalactosylation motifs in secretory proteins</article-title>. <source>Plant J.</source> <volume>42</volume>, <fpage>877</fpage>&#x02013;<lpage>889</lpage>. <pub-id pub-id-type="doi">10.1111/j.1365-313X.2005.02419.x</pub-id><pub-id pub-id-type="pmid">15941400</pub-id></citation>
</ref>
<ref id="B56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Showalter</surname> <given-names>A. M.</given-names></name></person-group> (<year>1993</year>). <article-title>Structure and function of plant-cell wall proteins</article-title>. <source>Plant Cell</source> <volume>5</volume>, <fpage>9</fpage>&#x02013;<lpage>23</lpage>. <pub-id pub-id-type="doi">10.1105/tpc.5.1.9</pub-id><pub-id pub-id-type="pmid">8439747</pub-id></citation>
</ref>
<ref id="B57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Showalter</surname> <given-names>A. M.</given-names></name></person-group> (<year>2001</year>). <article-title>Arabinogalactan-proteins: structure, expression and function</article-title>. <source>Cell Mol. Life Sci.</source> <volume>58</volume>, <fpage>1399</fpage>&#x02013;<lpage>1417</lpage>. <pub-id pub-id-type="doi">10.1007/PL00000784</pub-id><pub-id pub-id-type="pmid">11693522</pub-id></citation>
</ref>
<ref id="B58">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Showalter</surname> <given-names>A. M.</given-names></name> <name><surname>Basu</surname> <given-names>D.</given-names></name></person-group> (<year>2016</year>). <article-title>Extensin and arabinogalactan-protein biosynthesis: glycosyltransferases, research challenges, and biosensors</article-title>. <source>Front. Plant Sci.</source> <volume>7</volume>:<fpage>814</fpage>. <pub-id pub-id-type="doi">10.3389/fpls.2016.00814</pub-id><pub-id pub-id-type="pmid">27379116</pub-id></citation>
</ref>
<ref id="B59">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Showalter</surname> <given-names>A. M.</given-names></name> <name><surname>Keppler</surname> <given-names>B. D.</given-names></name> <name><surname>Liu</surname> <given-names>X.</given-names></name> <name><surname>Lichtenberg</surname> <given-names>J.</given-names></name> <name><surname>Welch</surname> <given-names>L. R.</given-names></name></person-group> (<year>2016</year>). <article-title>Bioinformatic identification and analysis of hydroxyproline-rich glycoproteins in <italic>Populus trichocarpa</italic></article-title>. <source>BMC Plant Biol.</source> <volume>16</volume>:<fpage>229</fpage>. <pub-id pub-id-type="doi">10.1186/s12870-016-0912-3</pub-id><pub-id pub-id-type="pmid">27769192</pub-id></citation>
</ref>
<ref id="B60">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Showalter</surname> <given-names>A. M.</given-names></name> <name><surname>Keppler</surname> <given-names>B.</given-names></name> <name><surname>Lichtenberg</surname> <given-names>J.</given-names></name> <name><surname>Gu</surname> <given-names>D.</given-names></name> <name><surname>Welch</surname> <given-names>L. R.</given-names></name></person-group> (<year>2010</year>). <article-title>A bioinformatics approach to the identification, classification, and analysis of hydroxyproline-rich glycoproteins</article-title>. <source>Plant Physiol.</source> <volume>153</volume>, <fpage>485</fpage>&#x02013;<lpage>513</lpage>. <pub-id pub-id-type="doi">10.1104/pp.110.156554</pub-id><pub-id pub-id-type="pmid">20395450</pub-id></citation>
</ref>
<ref id="B61">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shpak</surname> <given-names>E.</given-names></name> <name><surname>Barbar</surname> <given-names>E.</given-names></name> <name><surname>Leykam</surname> <given-names>J. F.</given-names></name> <name><surname>Kieliszewski</surname> <given-names>M. J.</given-names></name></person-group> (<year>2001</year>). <article-title>Contiguous hydroxyproline residues direct hydroxyproline arabinosylation in <italic>Nicotiana tabacum</italic></article-title>. <source>J. Biol. Chem.</source> <volume>276</volume>, <fpage>11272</fpage>&#x02013;<lpage>11278</lpage>. <pub-id pub-id-type="doi">10.1074/jbc.M011323200</pub-id><pub-id pub-id-type="pmid">11154705</pub-id></citation>
</ref>
<ref id="B62">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shpak</surname> <given-names>E.</given-names></name> <name><surname>Leykam</surname> <given-names>J. F.</given-names></name> <name><surname>Kieliszewski</surname> <given-names>M. J.</given-names></name></person-group> (<year>1999</year>). <article-title>Synthetic genes for glycoprotein design and the elucidation of hydroxyproline-O-glycosylation codes</article-title>. <source>Proc. Natl. Acad. Sci. U.S.A.</source> <volume>96</volume>, <fpage>14736</fpage>&#x02013;<lpage>14741</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.96.26.14736</pub-id><pub-id pub-id-type="pmid">10611282</pub-id></citation>
</ref>
<ref id="B63">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Silva</surname> <given-names>N. F.</given-names></name> <name><surname>Goring</surname> <given-names>D. R.</given-names></name></person-group> (<year>2002</year>). <article-title>The proline-rich, extensin-like receptor kinase-1 (PERK1) gene is rapidly induced by wounding</article-title>. <source>Plant Mol. Biol.</source> <volume>50</volume>, <fpage>667</fpage>&#x02013;<lpage>685</lpage>. <pub-id pub-id-type="doi">10.1023/A:1019951120788</pub-id><pub-id pub-id-type="pmid">12374299</pub-id></citation>
</ref>
<ref id="B64">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sun</surname> <given-names>W.</given-names></name> <name><surname>Xu</surname> <given-names>J.</given-names></name> <name><surname>Yang</surname> <given-names>J.</given-names></name> <name><surname>Kieliszewski</surname> <given-names>M. J.</given-names></name> <name><surname>Showalter</surname> <given-names>A. M.</given-names></name></person-group> (<year>2005</year>). <article-title>The lysine-rich arabinogalactan-protein subfamily in Arabidopsis: gene expression, glycoprotein purification and biochemical characterization</article-title>. <source>Plant Cell Physiol.</source> <volume>46</volume>, <fpage>975</fpage>&#x02013;<lpage>984</lpage>. <pub-id pub-id-type="doi">10.1093/pcp/pci106</pub-id><pub-id pub-id-type="pmid">15840645</pub-id></citation>
</ref>
<ref id="B65">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tan</surname> <given-names>L.</given-names></name> <name><surname>Eberhard</surname> <given-names>S.</given-names></name> <name><surname>Pattathil</surname> <given-names>S.</given-names></name> <name><surname>Warder</surname> <given-names>C.</given-names></name> <name><surname>Glushka</surname> <given-names>J.</given-names></name> <name><surname>Yuan</surname> <given-names>C.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>An Arabidopsis cell wall proteoglycan consists of pectin and arabinoxylan covalently linked to an arabinogalactan protein</article-title>. <source>Plant Cell</source> <volume>25</volume>, <fpage>270</fpage>&#x02013;<lpage>287</lpage>. <pub-id pub-id-type="doi">10.1105/tpc.112.107334</pub-id><pub-id pub-id-type="pmid">23371948</pub-id></citation>
</ref>
<ref id="B66">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>van Hengel</surname> <given-names>A. J.</given-names></name> <name><surname>Roberts</surname> <given-names>K.</given-names></name></person-group> (<year>2003</year>). <article-title>AtAGP30, an arabinogalactan-protein in the cell walls of the primary root, plays a role in root regeneration and seed germination</article-title>. <source>Plant J.</source> <volume>36</volume>, <fpage>256</fpage>&#x02013;<lpage>270</lpage>. <pub-id pub-id-type="doi">10.1046/j.1365-313X.2003.01874.x</pub-id><pub-id pub-id-type="pmid">14535889</pub-id></citation>
</ref>
<ref id="B67">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vorwerk</surname> <given-names>S.</given-names></name> <name><surname>Somerville</surname> <given-names>S.</given-names></name> <name><surname>Somerville</surname> <given-names>C.</given-names></name></person-group> (<year>2004</year>). <article-title>The role of plant cell wall polysaccharide composition in disease resistance</article-title>. <source>Trends Plant Sci.</source> <volume>9</volume>, <fpage>203</fpage>&#x02013;<lpage>209</lpage>. <pub-id pub-id-type="doi">10.1016/j.tplants.2004.02.005</pub-id><pub-id pub-id-type="pmid">15063871</pub-id></citation>
</ref>
<ref id="B68">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>H. M.</given-names></name> <name><surname>Wang</surname> <given-names>H.</given-names></name> <name><surname>Cheung</surname> <given-names>A. Y.</given-names></name></person-group> (<year>1995</year>). <article-title>A pollen tube growth stimulatory glycoprotein is deglycosylated by pollen tubes and displays a glycosylation gradient in the flower</article-title>. <source>Cell</source> <volume>82</volume>, <fpage>395</fpage>&#x02013;<lpage>403</lpage>. <pub-id pub-id-type="doi">10.1016/0092-8674(95)90428-X</pub-id><pub-id pub-id-type="pmid">7634329</pub-id></citation>
</ref>
<ref id="B69">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zang</surname> <given-names>L.</given-names></name> <name><surname>Zheng</surname> <given-names>T.</given-names></name> <name><surname>Chu</surname> <given-names>Y.</given-names></name> <name><surname>Ding</surname> <given-names>C.</given-names></name> <name><surname>Zhang</surname> <given-names>W.</given-names></name> <name><surname>Huang</surname> <given-names>Q.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>Genome-wide analysis of the fasciclin-like arabinogalactan protein gene family reveals differential expression patterns, localization, and salt stress response in Populus</article-title>. <source>Front. Plant Sci.</source> <volume>6</volume>:<fpage>1140</fpage>. <pub-id pub-id-type="doi">10.3389/fpls.2015.01140</pub-id><pub-id pub-id-type="pmid">26779187</pub-id></citation>
</ref>
<ref id="B70">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Yang</surname> <given-names>J.</given-names></name> <name><surname>Showalter</surname> <given-names>A. M.</given-names></name></person-group> (<year>2011</year>). <article-title>AtAGP18, a lysine-rich arabinogalactan protein in <italic>Arabidopsis thaliana</italic>, functions in plant growth and development as a putative co-receptor for signal transduction</article-title>. <source>Plant Signal. Behav.</source> <volume>6</volume>, <fpage>855</fpage>&#x02013;<lpage>857</lpage>. <pub-id pub-id-type="doi">10.4161/psb.6.6.15204</pub-id><pub-id pub-id-type="pmid">21849816</pub-id></citation>
</ref>
</ref-list>
</back>
</article>