Genomic, Molecular Evolution, and Expression Analysis of Genes Encoding Putative Classical AGPs, Lysine-Rich AGPs, and AG Peptides in Brassica rapa

Arabinogalactan proteins (AGPs) belong to a class of Pro/Hyp-rich glycoproteins and are some of the most complex types of macromolecules found in plants. In the economically important plant species, Brassica rapa, only chimeric AGPs have been identified to date. This has significantly limited our understanding of the functional roles of AGPs in this plant. In this study, 64 AGPs were identified in the genome of B. rapa, including 33 classical AGPs, 28 AG peptides and three lys-rich AGPs. Syntenic gene analysis between B. rapa and A. thaliana suggested that the whole genome triplication event dominated the expansion of the AGP gene family in B. rapa. This resulted in a high retained proportion of the AGP family in the B. rapa genome, especially in the least fractionated subgenome. Phylogenetic and motif analysis classified the classical AGPs into six clades and three orphan genes, and the AG peptides into three clades and five orphan genes. Classical AGPs has a faster rate of molecular evolution than AG peptides revealed by estimation of molecular evolution rates. However, no significant differences were observed between classical AGPs and lys-rich AGPs. Under control conditions and in response to phytohormones treatment, a complete expression profiling experiment has identified five anther-specific AGPs and quite a number of AGPs responding to abscisic acid, methyl jasmonate and/or gibberellin. In this study, we presented a bioinformatics approach to identify important types of AGPs. Moreover, the association between their function and their protein structure, as well as the evolution and the expression of AGP genes were investigated, which might provide fundamental information for revealing the roles of AGPs in B. rapa.

The AGP family consists of different members that can be variable in specific plant species. Therefore, genome-wide screens are commonly used for the identification of AGPs. Knowledge of distinct characteristics shared across known members of the protein family enables their detection within the complete set of proteins in an organism. AGPs are rich in proline or hydroxyproline (Pro/Hyp), serine (Ser), threonine (Thr), and alanine (Ala), which comprise up to 99% of the molecular mass of AGP proteins (Ellis, 2010). According to differences in the composition of their protein backbone, AGPs are further classified into classical AGPs, arabinogalactan (AG) peptides, lysine (Lys)-rich AGPs, and chimeric AGPs (Schultz et al., 2002). Classical AGPs are defined by the core protein containing Hyp, Ala, Ser, Thr, and Glycine (Gly) as the major amino acid constituents, and their C terminus is glycosylphosphatidylinositol (GPI) anchored (Showalter et al., 2010). Lys-rich AGPs have a Lys-rich domain of approximately 16 amino acid residues that is flanked on both sides by AGP glycol modules. AG peptides are composed of only 10-13 amino acid residues and the putative cell adhesion molecules (Schultz et al., 2002). Most AGPs are characterized by the entire protein containing only P/HRGP modules, while chimeric AGPs are consisted of one or two known P/HRGP motifs and additional unrelated motifs such as, fascilin-like domain for fasciclin-like AGPs (FLAs), early nodulin-like domain for eNodlike AGPs (ENODL) and non-specific lipid transfer protein-like domain for nsLTP-like AGPs (Schultz et al., 2002). Based on specific characteristics of the protein backbone and length, the presence of Ala-Pro, Pro-Ala, Ser-Pro, or Thr-Pro repeats, signal peptide and GPI anchor addition sequence, 22 classical AGPs, 16 AG peptides and three lys-rich AGPs were identified in the A. thaliana genome (Showalter et al., 2010) and 11 classical AGPs, 15 AG peptides and two lys-rich AGPs were found in the rice (Oryza sativa) genome (Ma and Zhao, 2010). In vegetable crops, chimeric AGPs like FLAs and ENODLs have been identified in Brassica rapa (Jun and Xiaoming, 2012;Li et al., 2013), which is the most important economic vegetable crop in East Asia . B. rapa shares a common ancestor with A. thaliana; their chromosomes were derived from the rearrangement of 24 conserved collinear blocks in ancestral karyotype of crucifer species (Cheng et al., 2013). A whole genome triplication (WGT) event, occurred 13-17 million years ago, has differentiated the genome of B. rapa from that of A. thaliana. This makes B. rapa an excellent system for studying the expansion of gene families (Wang et al., 2011). Chromosomal localization and gene duplication analysis illustrated that the expansion of FLAs and ENODLs in B. rapa depends on the WGT event, and several FLAs in B. rapa display similar expression patterns as their orthologs in A. thaliana. In our previous studies, a screening for differentially expressed genes between the male fertile and sterile plants of the B. rapa genic male sterility (GMS) A/B line has identified two classical AGPs encoding genes exclusively expressed in the fertile flower buds (Huang et al., 2008a). Further functional characterization revealed that these two AGPs might play important roles in pollen wall formation (Huang et al., 2008b;Lin et al., 2014). However, it is still unknown whether there are more classical AGPs function in pollen development in B. rapa. Also, there is limited information about the presence of other types of AGPs in B. rapa, which restricts a comprehensive understanding of the roles of AGP gene family in this crop.
In this study, in order to obtain more detailed information about the AGP family in B. rapa, Perl scripts were specifically written to screen for classical AGPs, AG peptides and lys-rich AGPs in the genome of B. rapa. Subsequently, the candidate B. rapa AGPs (BrAGPs) were annotated and their characteristics, including retained proportion, syntenic relationship, protein motif structure and molecular evolutionary rates, were analyzed. To understand the functions of BrAGPs and identify the candidate AGPs involved in pollen development, the expression of AGP genes in different tissues and different developmental stages of pollen were investigated. Furthermore, the response of BrAGPs to GA, MeJA and ABA treatments were also evaluated. This study may provide valuable insights and reveal some underlying mechanisms of the AGP gene family in B. rapa.

Plant Materials
The "Bcajh97-01A/B" is a GMS A/B line (sister line) of B. rapa, with the progenies segregating into sterile and fertile types in a 1:1 ratio. The sterile "Bcajh97-01A" has been demonstrated as a mutant of male meiosis cytokinesis without mature pollen, which is the only difference detected between "Bcajh97-01A" and the fertile "Bcajh97-01B" (Huang et al., 2008a(Huang et al., , 2009. This characteristic of "Bcajh97-01A/B" GMS line is ideal to study gene expression during pollen development. "Bcajh97-01A/B" was planted in experimental green-house of Zhejiang University. At the flowering stage, siliques at 48 h after pollination (HAP), roots, stems, leaves, and inflorescences were collected from 15 individual plants of the fertile line for reverse transcription PCR. At the same time, according to the longitudinal diameters, the flower buds were divided into five stages corresponding to five distinct pollen developmental stages (Huang et al., 2008a). The pollen development in flower buds was confirmed by microscopic examination.
Pistils in the fertile plants were collected at 1, 3, and 10 HAP. One HAP pistils correspond to the period of pollen germination on the stigma, 3 HAP pistils represent pollen tube growth phase, and 10 HAP pistils correspond to the period when the pollen tube reaches the ovule and fertilization occurs (Jiang et al., 2013). The materials were frozen in liquid nitrogen immediately after harvested and stored at −75 • C for subsequent analysis.

Phytohormone Treatments
For phytohormone treatments, one of the following solutions of 100 µM GA, 100 µM MeJA, and 100 µM ABA was sprayed on all leaves (Liu et al., 2014;Duan et al, 2015). Control plants were sprayed with distilled water. The concentration of phytohormone treatments chosen was based on the studies by Suzuki et al. (2002), Liu and Mehdy (2007), and van Hengel et al. (2004). Seedlings in three containers were used in each treatment, and each container had 12 seedlings. The treatments were repeated three times. Control and treated leaves were harvested at 0, 4, and 12 h after treatment (HAT). The samples were frozen in liquid nitrogen and stored at −80 • C until use.

RNA Isolation and Reverse Transcription PCR
Total RNA was extracted from tissues mentioned above using Trizol reagent (Invitrogen, Carlsbad, CA). The RNA integrity was analyzed on agarose gel. SuperScript IV (Invitrogen) and oligo (dT) primer was used to synthesize the cDNA. The expression of cDNA was detected using BiometraT Professional Thermo cycler and Taq (Dingguo, Shanghai). Reverse transcription polymerase chain reaction (RT-PCR) was employed to detect the expression of BrAGPs in different tissues of B. rapa, while quantitative real-time PCR (qRT-PCR) was carried out to test the expression variation of BrAGPs in treatment with endogenous hormones. RT-PCR and qRT-PCR were both carried out in triplicates using gene specific primers (Supplementary data 1: Tables S1, S2). BrUBC10 was used as the reference gene. The results of qRT-PCR were calculated using the 2 − Ct method (Livak and Schmittgen, 2001) and normalized to the corresponding distilled water treatment. They were further genewise normalized, mean-centered and clustered hierarchically using the centroid linkage clustering method in Cluster 3.0 (http://bonsai.hgc.jp/∼mdehoon/software/cluster/index.html).

Identification of AGPs, AG-Peptides and Lys-Rich AGPs by Calculating the Biased Amino Acid Composition and Length
The searching criteria for AGPs referred to the guidelines that are widely used (Schultz et al., 2002(Schultz et al., , 2004Tan et al., 2003;Ma and Zhao, 2010). A Perl script (amino acid bias) was written to calculate the PAST (Pro, Ala, Ser, Thr) percentage for the candidate proteins (Supplementary data 1: Supplement Methods). All annotated proteins were downloaded from the Brassica database (BRAD, http://brassicadb.org, V1.2). Perl is a high-level, general-purpose, interpreted, dynamic programming language for UNIX, Windows, and OSX operating systems (http://www.activeperl.com/). For this study, the Perl language compiler ActivePerl-5.16.2.3010812913 was used.
The amino acid bias program was compiled with two different output files of the lists of proteins. The first output file named "AGPs" includes that all the proteins above 50% PAST threshold and >90 amino acid residues in length. Another output file named "AG-peptide" includes all proteins above 35% PAST and between 50 and 90 amino acid residues in length. The N-terminal and C-terminal signals of AG-peptide account for more than 50% of the deduced proteins and the length of AG-peptide is shorter than classical AGPs, so we reduced the PAST level and shortened the searching length of the sequences. Classical AGPs and lys-rich AGPs were extracted from file "AGPs, " in which lys-rich AGPs had a lys-rich domain of approximately 16 amino acid residues that was flanked on both sides by AGP glycomodules, and classical AGPs covered the other genes without the domains like fascilin-like domain and early nodulin-like domain. Pfam (http:// pfam.xfam.org/) and SMART (http://smart.embl-heidelberg.de/) were used for analyzing the conserved domains. Moreover, orthologous genes of all the annotated classical AGPs, AG peptides and lys-rich AGPs in A. thaliana were screened in the proteome of B. rapa.

Signal Peptide and GPI Anchor Prediction
The signal peptide of proteins was predicted by the SignalP 4.1 Server (http://www.cbs.dtu.dk/services/SignalP/) (Petersen et al., 2011). In addition, proteins with predicted signal peptide cleavage site in the first 50 amino acids were considered as the presence of N-terminal signal sequences (Nielsen et al., 1997). To determine the presence of a C-terminal GPI anchor additional signal, all the proteins in the "AGPs" and "AG-peptide" files were subjected to the big-PI Plant Predictor (http://mendel.imp. ac.at/gpi/plant_server.html) and PSORT Prediction (http://psort. hgc.jp/form.html) (Nakai and Horton, 1999;Eisenhaber and Eisenhaber, 2007). Proteins having both signal peptide and GPI anchor were treated as "high confidence" AGPs.

The Analysis of Synteny and Retained Proportion
The analysis of synteny using information of 24 conserved collinear blocks of ancestral karyotype (AK) in B. rapa and A. thaliana was carried out according to the previous study (Cheng et al., 2012(Cheng et al., , 2013. Chromosomal locations of the syntenic genes of A. thaliana AGPs (AtAGPs) and BrAGPs were gathered from The Arabidopsis Information Resource (TAIR, http://arabidopsis.org) and BRAD, respectively. The syntenic relationships between or inside the genomes were illustrated by Circos 5.05 (Krzywinski et al., 2009).

Phylogenetic and Protein Motifs Analysis
Multiple sequence alignment was performed by ClustalW (http:// www.ch.embnet.org/software/ClustalW.html) in standard settings with some modification. Phylogenetic trees were constructed with both neighbor-joining (NJ) and maximum likelihood (ML) method with 1,000 bootstraps using MEGA6 software (Tamura et al., 2013). Poisson model were used in Model/Method and Uniform rates were chosen in Rates among sites. Treatment of Gaps/Missing Data was selected as Pairwise deletion. The clans consisting only one orthologous gene were treated as orphan genes, while other clans were named as clades in sequence. Motif analysis was performed in Multiple Em for Motif Elicitation (MEME) (http://meme-suite.org/tools/meme) (Bailey et al., 2009) for the identification of the short conserved motifs in the subfamilies of classical AGPs and AG peptides.

Molecular Evolutionary Rate Analysis
The molecular evolutionary rates between orthologous genes were estimated by calculating the ratio of nonsynonymous substitution rate (Ka) and synonymous substitution rate (Ks) between orthologous gene pairs. The protein sequences were aligned by ClustalW and the resulting alignment was used as a reference to align the nucleotide sequences. The Ka/Ks value was then determined using the YN86 module in PAML 4.7 (Yang, 2007). Significant differences were calculated by one-way ANOVA tests in SPSS 19 (IBM, USA).

Identification of AGPs in the Brassica rapa Genome
Initially, 76 proteins longer than 90 amino acids were retrieved in the B. rapa genome, with biased amino acid compositions of at least 50% PAST. Similarly, 115 potential AG peptides were identified by searching for proteins between 50 and 90 amino acids in length with biased amino acid compositions of at least 35% PAST (Table 1).
Signal peptides are important for glycosylation of AGPs in the endoplasmic reticulum, while GPI anchor is required for anchoring AGPs to the plasma membrane. Therefore, these 191 proteins were further screened for the presence of a signal peptide and a GPI anchor. Fifty eight proteins longer than 90 amino acids and 24 proteins shorter than 90 amino acids were identified as they contain both the signal peptide and the GPI anchor ( Table 1). By eliminating the proteins contained other domains such as fascilin-like domain and early nodulin-like domain, 22 classical AGPs, 24 AG peptides, and three lys-rich AGPs were identified as "high confidence" AGPs (Tables 1, 2). Additionally, the orthologous genes of all the annotated classical AGPs, AG peptides and lys-rich AGPs in A. thaliana were screened in the proteome of B. rapa. With this approach, 15 additional AGPs have been revealed as AGP candidates, including 11 classical AGPs, and four AG peptides ( Table 2).
All AGPs in B. rapa were named after their orthologous proteins in A. thaliana. Bra038390 (81 amino acids in length) was identified as the ortholog of a classical AGP-AtAGP10. Hence, for convenience, Bra038390 was regarded as a classical AGP but not as an AG peptide. Moreover, sequence homology analysis did not result in any orthologous proteins in the annotated AtAGPs for three BrAGPs, Bra028633, Bra035584, and Bra036401, so they were designated as BrAGP59, BrAGP60, and BrAGP61, respectively. Overall, apart from the chimeric proteins, 64 AGPs were detected in B. rapa, including 33 classical AGPs, 28 AG peptides and three lys-rich AGPs ( Table 2).

Retained Proportion Analysis
Most of the BrAGPs were distributed on chromosomes, with the exception of BrAGP9.2, BrAGP11.2, and BrAGP12.2, which were located on scaffolds. On average, the chromosomes contained four to nine BrAGPs, with chromosome A06 harboring only three and chromosome A03 having up to 11. The 24 conserved collinear blocks in ancestral karyotype were used to identify the syntenic relationship between BrAGPs and their corresponding orthologous genes in A. thaliana. The result illustrated that all BrAGPs are located in the same blocks as their A. thaliana orthologous genes (Figure 1), suggesting that the expansion of AGP gene family in B. rapa was depended on the WGT event.
No orthologous genes of AtAGP5, AtAGP7, AtAGP19, AtAGP41, AtAGP51, and AtAGP56 were identified in B. rapa. To investigate whether these AtAGPs were newly acquired in   A. thaliana or were lost in B. rapa, their orthologous genes in other sequenced Brassica species were analyzed. Brassica species can be classified into three lineages by spectrum analysis, and most sequenced species were concentrated on Lineage I and Lineage II. Specifically, A. thaliana, A. lyrata, C. rubella, and C. sativa belonged to Lineage I, while B. rapa, B. napus, B. oleracea, T. halophile, T. salsuginea, S. irio, and S. parvula were in Lineage II (Koch and German, 2013). Analysis of these orthologous genes illustrated that AtAGP5, AtAGP7, AtAGP19, AtAGP41, and AtAGP56 exist not only in the species belonging to Lineage I, but also in the species belonging to Lineage II as well as in the primitive Brassica species, A. arabicum. This finding indicated that these genes may be lost in B. rapa (Supplementary data 1: Figure S1). The orthologs of AtAGP51 could only be detected in species in Lineage I, indicating that this gene arose after the separation of A. thaliana and B. rapa. Based on this result, the retained proportion of AGPs in B. rapa was calculated. The AGP gene family had a retained proportion of 50%, which was much higher than that of randomly selected genes (45%) and similar to that of core eukaryotic genes (52%) (Figure 2A). Particularly, classical AGPs, AG peptides, and lys-rich AGPs had a proportion of 48, 58, and 33%, respectively ( Figure 2B). As illustrated in the whole genome level, BrAGPs also displayed differentially-retained proportions among the three subgenomes, namely least fractionated subgenome (LF), medium fractionated subgenome (MF1) and most fractionated subgenome (MF2) of B. rapa ( Figure 2C). The AGP gene family contained more genes of LF subgenome (63%) than those of random selected genes (59%) and core eukaryotic genes (59%), while it had similar proportion of core eukaryotic genes in the MF1 and MF2 subgenomes for AGP genes and core eukaryotic genes, respectively. This indicated that the highly retained proportion of AGPs is mainly attributed to the gene reservation in LF.

Phylogenetic and Protein Structure Analysis
Full length protein sequences of all identified BrAGPs and their orthologs in A. thaliana were used to construct phylogenetic trees. The topologies of NJ and ML trees were mainly consistent, thus only the NJ trees are presented here (Figure 3). The 55 classical AGPs in B. rapa and A. thaliana were classified into six clades and three orphan genes with high bootstrap support (Figure 3). Motif analysis showed that the AGPs in Clade I mainly have two types of structural features. AGP2, AGP3, AGP4, and AGP7 contained motif 1, motif 4 and motif 3, while AGP1, AGP5, and AGP10 included motif 1 and motif 4. Clade II was constituted of AGP54 and AGP57 with motif 1 and motif 2. Clade III had three members, AGP6, AGP11, and AGP50, in which AGP6 and AGP11 had motif 1 and motif 4, similar to some of the members in Clade I, while AGP50 only had motif 1. AGP9, AGP58, and AGP62 belonged to Clade IV with AGP58 having only motif 4, while AGP9 and AGP62 having motif 1 and motif 4. Clade V was composed of AGP25, AGP26, AGP27, which only hold motif 1. Tandem gene pair, AGP52 and AGP53 contained motif 4. Phylogenetic analysis indicated that the tandem repeat event occurred after the separation of B. rapa and A. thaliana, as the AGPs belonging to B. rapa had a closer resemblance to each other than to the orthologs in A. thaliana.
Phylogenetic analysis of AG peptides led to the division of three clades and five orphan genes, and almost all AG peptides had motif 6 and motif 7 (Figure 4). Four members of Clade I, AGP12, AGP13, AGP14, and AGP21, had motif 10 in addition to motif 6 and motif 7. AGP16, AGP20, AGP22, and AGP41 in Clade II all contained each motif 6, motif 7 and motif 8. Clade III included AGP23 and AGP43, which all had motif 8.

Estimation of the Molecular Evolutionary Rates of AGPs
Estimation of molecular evolution rates forms an important basis for our understanding of the evolution processes in B. rapa. YN86 module was used for estimating the molecular evolutionary rates of the orthologous gene pairs among AGPs (Supplementary data 1: Table S3). All three types of AGPs encoding genes were evolved at a Ka/Ks value lower than 1, indicating that they all have evolved through purifying selection ( Figure 5A). Classical AGPs showed an average Ka/Ks value of 0.30, AG peptides exhibited and Lys-rich AGPs were evolved at an average Ka/Ks value of 0.20 and 0.38, respectively. Statistical analysis illustrated that classical AGPs have evolved faster than AG peptides, but there is no significant differences existed between classical AGPs and lys-rich AGPs. Furthermore, the average molecular evolutionary rates of each clade were compared. In classical AGPs, Clade VI had a significantly higher Ka/Ks value than the other five clades ( Figure 5B). In AG peptides, Clade I, Clade II, and Clade III shared a similar molecular evolutionary rate, while the orphan genes evolved at a faster rate of evolution ( Figure 5C).

Differential Expression of BrAGPs
Half of the classical BrAGPs showed expression in all of the tissues detected, including root, stem, leaf, inflorescence and silique. Seven AGPs were expressed in more than one but not all tissues. BrAGP6, BrAGP11.1, BrAGP11.2, and BrAGP54.1 expression was restricted in inflorescence ( Figure 6A). Eleven of the 24 AG peptides encoding genes were expressed in all tissues with BrAGP12.2, BrAGP13.2, BrAGP15.1, BrAGP15.2, BrAGP22.1, and BrAGP22.2 displaying similar expression levels among different tissues. Half of the genes encoding AG peptides were expressed in different tissues, of which BrAGP15.3 was inflorescence-specific and BrAGP14 was expressed exclusively in the stem. The expression of BrAGP44 was not detected in any of the tissues (Figure 6A).
Among the three lys-rich AGPs encoding genes, BrAGP18.1 and BrAGP18.2 were expressed in all tissues, while BrAGP17 was only detectable in stem, inflorescence and silique ( Figure 6A). Interestingly, most paralogous gene pairs   For the five inflorescence-specific genes, including four classical AGPs and one AG peptide encoding genes, their expression were further analyzed among different developmental stages of pollen in subdivided flower buds from the "Bcajh97-01A/B"GMS line (Figure 6B). These five BrAGPs were expressed in the fertile but not in sterile flower buds. BrAGP6 and BrAGP11.2 were only expressed at Stage I which corresponds to the mature pollen stage, while BrAGP11.1, BrAGP15.3, and BrAGP54.1 were detected at Stage III to V, which means that they are expressed throughout the uninucleate pollen stage to the mature pollen stage. Since some genes expressed in mature pollen usually continue to be expressed in germinating pollen and pollen tubes, the expression levels of the five BrAGPs were investigated in pistils at three different time points after pollination in a fertile line. BrAGP11.1 and BrAGP54.1 were found to be expressed in pistils during the whole fertilization processes; however the other three genes were not detected in the pistils ( Figure 6C).

Effects of Exogenous GA, MeJA, and ABA on the Expression of BrAGPs in the Leaf
Phytohormones serve as key signals for the plants to respond to environmental stimuli (Wania et al., 2016). In order to investigate whether BrAGPs were involved in phytohormone responses, we chose 15 classical AGPs, 13 AG peptides and two lys-rich AGPs, which according to our qRT-PCR displayed expression in the leaf. Three vital phytohormones, ABA, GA, and MeJA, involved in abiotic and biotic stress resistance, were used in this study to investigate the leaf BrAGPs expression in response to phytohormone treatments.
Following ABA treatment, 22 BrAGPs were up-regulated, including 13 classical AGPs, eight AG peptides and one lys-rich AGPs encoding genes ( Figure 7A). Expression of 11 classical AGPs, four AG peptides, and a lys-rich AGPs encoding genes were markedly increased at 4 HAT and then recovered at 12 HAT; whereas, the expression levels of two classical AGPs and three AG peptides encoding genes were increased at 12 HAT. Eight BrAGPs, including two classical AGPs, five AG peptides, and one lys-rich AGPs encoding genes were down-regulated upon ABA treatment. Their expression levels decreased mainly at 4 HAT and didn't recover at 12 HAT, while BrAGP22.3 decreased at 12 HAT. Except BrAGP2.1 and BrAGP2.2 that shared similar responses, most paralogous gene pairs displayed different expression patterns after ABA treatment.
Up-regulation of 12 classical AGPs, seven AG peptides, and one lys-rich AGPs genes was observed after GA treatment ( Figure 7B). The expression levels of three classical AGPs encoding genes were increased at 4 HAT and subsequently at 12 HAT. Three classical AGPs and three AG peptides encoding genes were also up-regulated at 4 HAT but their transcript levels dropped back at 12 HAT. There was no effect on the expression of genes encoding for six classical AGPs, four AG peptides, and one lys-rich AGP, until 12 HAT. In contrast, the expression of encoding genes for three classical AGPs, six AG peptides, and one lys-rich AGP was down-regulated. The expression level of FIGURE 3 | Phylogenetic and motif analysis of classical AGPs in Brassica rapa and Arabidopsis thaliana. Phylogenetic tree of classical AGPs was constructed in whole protein sequences using neighbor-joining with 1,000 bootstrap in MEGA 6.0. Motif analysis was performed by MEME software. Red points are for genes in clades and green points are for orphan genes.
BrAGP2.2 decreased at 4 HAT and recovered at 12 HAT, while BrAGP12.1 was down-regulated at 12 HAT. The remaining genes were down-regulated at 4 HAT and sustained low expression levels at 12 HAT. Although the expression of most paralogous gene pairs was different, the response pattern between BrAGP40.1 and BrAGP40.2, and between BrAGP22.2 and BrAGP22.3, was similar ( Figure 7B).
Nineteen BrAGPs were up-regulated in MeJA treatment. Two classical AGPs and two AG peptides encoding genes were induced at 12 HAT, while 10 classical AGPs, four AG peptides and one lys-rich AGPs encoding genes, were up-regulated at 4 HAT and recovered at 12 HAT ( Figure 7C). In addition, 11 AGPs were down-regulated, following MeJA treatment. One classical AGP and two AG peptides were down-regulated at 4 HAT and recovered at 12 HAT, while the expression levels of the encoding genes for two classical AGPs, five AG peptides, and one lys-rich AGP decreased at 4 HAT and didn't recover at 12 HAT. BrAGP1.1 and BrAGP1.2 showed the same response to ABA, but the rest of the paralogous gene pairs displayed different expression patterns.
Interestingly, the ABA-, GA-, and MeJA-induced expression of BrAGPs were largely overlapped. Seventeen BrAGPs were upregulated by all three phytohormones, including genes encoding FIGURE 4 | Phylogenetic and motif analysis of AG peptides in Brassica rapa and Arabidopsis thaliana. Phylogenetic tree of AG peptides was constructed in whole protein sequences using neighbor-joining with 1,000 bootstrap in MEGA 6.0. Motif analysis was performed by MEME software. Red points are for genes in clades and green points are for orphan genes.
10 classical AGPs, six AG peptides, and one lys-rich AGPs. In addition, BrAGP2.2 and BrAGP26 were up-regulated by both ABA and MeJA, and BrAGP13.1 was up-regulated both by ABA and GA (Figure 7D). Five AG peptides encoding genes and one lys-rich AGP encoding gene were down-regulated by all three phytohormones. However, BrAGP3.2 and BrAGP60 were solely down-regulated by ABA and MeJA, the expression levels of BrAGP10.1 and BrAGP12.2 decreased only in response to GA and MeJA, respectively ( Figure 7E).
Comparison of the gene expression induction kinetics of different hormones revealed that the overlap between the response to ABA and MeJA is much bigger than that between ABA and GA, or MeJA and GA (Figure 7). In general, most of the transcriptional response of BrAGPs to ABA and MeJA treatments was rapid and transient with expression changes detected at 4 HAT, while these of GA were relatively slow (expression changes were not detectable until 12 HAT).

The High Retained Proportions of AGPs in B. rapa May Be Attributed to Subfunctionalization
AGPs are characterized by the repetitive nature of their protein backbones with less than 40% similarity between them (Schultz et al., 2002). The low similarity makes it difficult to be identified using the basic local alignment search tool (BLAST) or profile hidden Markov model (HMM) methods. "Hyp contiguity hypothesis" has provided a guidance to identify AGPs by Perl scripts (Kieliszewski and Lamport, 1994). However, it may lead to variable results if different programs and standards are employed (Schultz et al., 2002;Showalter et al., 2010). Here, based on the threshold values set in Showalter et al. (2010), a whole genome wide screening of classical AGPs, AG peptides, and lys-rich AGPs was performed in B. rapa, resulting in the identification of 64 BrAGPs including 49 with high confidence.
B. rapa contains more AGPs than A. thaliana, especially classical AGPs and AG peptides that is mainly attributed to the WGT event. However, we could not identify orthologs for three BrAGPs in A. thaliana, and orthologs for six AtAGPs in B. rapa. Ortholog comparison with closely related Brassica species revealed that five AtAGPs are lost in B. rapa, while one AtAGP and three BrAGPs were independently evolved after the separation of A. thaliana and B. rapa. These results indicate that besides the WGT event, other processes, for instance, dosage effect, might have also contributed to the evolution of the AGP family. In this study we have shown that classical AGPs, AG peptides and lys-rich AGPs are highly represented in the genome of B. rapa. Similar results were also reported by two previous studies where 33 FLAs and 52 ENODLs were identified in B. rapa. The corresponding retained proportions for these two types of chimeric AGPs were described to be 52 and 67%, respectively (Jun and Xiaoming, 2012;Li et al., 2013).
According to previous study, the over retained genes or gene families were always dosage sensitive or usually undergone subfunctionalization and/or neofunctionalization . In our investigation of the expression of BrAGPs, we revealed that 4/5 "high confidence" classical AGP and 6/7 AG peptide paralogous gene pairs had distinct expression patterns in different tissues. Such results indicated that the high retained proportions of classical AGPs and AG peptides are probably due to subfunctionalization and/or neofunctionalization, but not due to the dosage effect, which may yield increased expression of a gene (Edger and Pires, 2009). However, the result of molecular evolutionary rate analysis indicated that these AGPs were all under purifying selection, illustrating that neofunctionalization may have hardly occurred in these proteins. We thus suggest that subfunctionalization is the most likely mechanism that facilitated the retention of duplicated genes in classical AGPs and AG peptides. Unfortunately, in the previous studies on FLAs and ENODLs, the paralogous gene pairs in B. rapa and A. thaliana were treated as one gene for their gene expression pattern (Jun and Xiaoming, 2012;Li et al., 2013). Therefore, we were unable to verify whether the high retained proportion of the chimeric AGPs are also depended on subfunctionalization. Relevant research work should be focused on this issue in the future.

Implication of a Subset of AGPs in Sexual Reproduction
Comprehensive studies on gene expression can provide useful information for predicting gene function. In this study, we first investigated the expression of BrAGPs in five different tissues and found tissue-specific and non-tissue specific genes.
To identify candidate AGPs that may be involved in pollen were further investigated for their in male fertile/sterile flower buds at five different developmental stages. All these genes were specifically expressed in fertile flower buds at the late FIGURE 7 | Responses of AGPs in Brassica rapa to exogenous phytohormones treatments. Responses of leaf-expressed AGPs to exogenous ABA (A), GA (B), and MeJA (C) are detected by qRT-PCR at 0 h after treatment (HAT), 4 HAT, and 12 HAT. The gene expression under treatments were normalized to the corresponding distilled water treatment and calculated through 2 − Ct method, displayed in heatmap and clustered hierarchically by Cluster 3.0. Red points are for classical AGPs encoding genes, green points are for AG peptides encoding genes, and yellow points are for lys-rich AGPs encoding genes. Venn diagrams of up-regulated AGPs (D) and down-regulated AGPs (E) after the treatment of exogenous ABA, GA, and MeJA. developmental stages, indicating a role in pollen formation or pollen function. Some genes expressed in mature pollen usually continued to be expressed in germinating pollen and pollen tubes, showing that they are specifically involved in the regulation of the subsequent pollination and fertilization processes (Pina et al., 2005;Becker and Feijo, 2007;Wang et al., 2008). We also detected the expression of these genes in pistils at three different time points after pollination. BrAGP11.1 and BrAGP54.1 were shown to be expressed in pistils during the whole fertilization processes, which indicated that they might also function in subsequent sexual reproductive events. A previous study has demonstrated that AtAGP6 and AtAGP11 are the only two pollen-specific classical AGPs and are essential for pollen and pollen tube function in A. thaliana (Pereira et al., 2006;Levitin et al., 2008). Recent studies using qRT-PCR and in situ hybridization in our laboratory also revealed that the homologous gene BrAGP6 (formerly named as BcMF18) was specifically expressed in pollen (unpublished data), and BrAGP11.1 (or BcMF8) was expressed in the developing pollen and the pollen tube (Huang et al., 2008b;Lin et al., 2014). The functional disruption of BrAGP6 by antisense RNA technology caused collapse of the pollen due to the absence of cellular content and nucleus, and failure of intine layer formation due to lack of cellulose deposition (unpublished data). The inhibition of BrAGP11.1 resulted in slipper-shaped and bilaterally sunken pollen with abnormal intine development and aperture formation, an arrest of pollen germination and unstable pollen tube formation (Lin et al., 2014). Although the functions of BrAGP15.3 and BrAGP54.1 are not fully understood, their expression patterns are similar to the known pollen-related AGP genes. Therefore, we suggest that these two genes may play important roles in pollen development and/or fertilization. Our results also revealed that only a small number of BrAGPs was inflorescence-specific. This is in agreement with a previous expression analysis of the FLAs and ENODLs in B. rapa, which also showed that only two BrENODL genes and five FLA genes were specifically expressed in inflorescence (Jun and Xiaoming, 2012;Li et al., 2013). Similarly, in A. thaliana, only a subset of AGP genes is expressed in pollen grain and pollen tubes (Pereira et al., 2006). Therefore, we speculate that, despite of the diversity of AGPs, only a small number of AGPs are endowed with a specific and defined role in male sexual reproduction in B. rapa as those in A. thaliana.

Novel Roles of BrAGPs in Phytohormone Signaling
AGPs are also implicated in signal recognition and transduction (Suzuki et al., 2002;van Hengel and Roberts, 2003), although the exact mechanisms are still elusive. The mRNA level of AtAGP31 decreases at 8 h after ABA, MeJA, and wounding treatment (Liu and Mehdy, 2007), while AtAGP30 specifically responds to ABA treatment (van Hengel and Roberts, 2003). Moreover, a mutation of this gene showed suppression of the ABA-induced delay in germination and alteration of the expression of some ABA-regulated genes (van Hengel and Roberts, 2003). In rice, two classical AGPs, OsAGP1, and OsAGP15 were significantly up-regulated under the ABA treatment, indicating that they may response to ABA (Ma and Zhao, 2010). Moreover, AGPs are also involved in response to GA, for example, AGPs are suggested to play important roles in GA-induced α-amylase production in barley aleurone cells (Suzuki et al., 2002). The ATH1 microarray analysis results (GEO accession number: GSE39384) also revealed that many AGP-encoding genes are regulated by ABA, GA, and MeJA in A. thaliana (Supplementary data 2: Table S4). A similar trend was observed for some homologous AGPs between B. rapa and A. thaliana. For example, AGP1 and AGP2 were up-regulated in A. thaliana after 3 h treated by ABA, while their orthologs in B. rapa were also up-regulated at 4 HAT by ABA (Figure 7). The conserved response to the phytohormone treatments in different plant species indicates that AGPs may be important functional genes for phytohormone signaling pathways. However, many homologous BrAGP genes show diverse change trends under phytohormone treatments with their A. thaliana homologs. This result demonstrated that the differentiation of the AGP genes might be intended to acclimatize the plant to specific environmental changes. In our study, 17 BrAGPs were up-regulated and six were down-regulated by all three types of phytohormones, ABA, GA, and MeJA treatments. These genes comprise the majority (approximately 76%) of AGPs which are expressed in the leaf of B. rapa. These findings indicate that quite a few AGPs might be directly associated or indirectly interact with the hormone signaling network.
The molecular mechanism by which AGPs are involved in phytohormone signaling remains to be elucidated. The relative rapid expression of AGPs under exogenous phytohormone treatments suggested that they may function in maintaining the organism's homeostasis rather than solely being structural components. In plants, sugar levels reflect the plant's physiological conditions and sugar signaling is closely associated with multiple phytohormone signaling cascades and abiotic stress signaling (Smeekens, 2000;Gibson, 2005;Rolland et al., 2006). Interestingly, sugar can induce ABA in the sugar-ABA signaling cascade (Laby et al., 2000). Changes in the expression of each BrAGP gene by exogenous ABA treatment indicates that the AGPs may participate in the signaling cascade by the dynamic release and uptake of glycosyl radicals. It was clear that there is complex cross-talk among plant hormones, which are important in plant growth regulation and defense responses (Kohli et al., 2013;Wania et al., 2016). In this study, we have shown that there is a large overlap of the BrAGPs induced or suppressed by ABA, GA, and MeJA treatments. It indicates that these BrAGPs might participate in different signaling pathways or in overlapping processes that are controlled by phytohormonal interactions.

CONCLUSION
We identified, for the first time, 33 classical AGPs, 28 AG peptides, and three lys-rich AGPs in the genome of B. rapa. We also elucidated their genomic characteristics, protein structures, duplication status, molecular evolutionary rates and expression characteristics in different tissues as well as under phytohormone treatments. High retained proportions of AGPs were observed in B. rapa, which may be resulted from subfunctionalization. Similar expression patterns to the known pollen-related BrAGPs, were found in the remaining two genes with unknown function, BrAGP15.3 and BrAGP54.1, indicating they may also be involved in pollen development and/or fertilization. Large numbers of BrAGPs displayed responses under the treatment of ABA, MeJA, and/or GA, suggesting that they may participate in the response to phytohormone signaling independently and/or cooperatively. In summary, this study has provided fundamental information for revealing the roles of the classical AGPs, AG peptides, and lys-rich AGPs, three important types of AGPs in B. rapa.

AUTHOR CONTRIBUTIONS
TH performed the identification of AGPs, expression analysis, and participated in drafting the manuscript. HD performed the protein structure and evolution analysis, and participated in manuscript preparation. JCu, ML, SL, and JCa participated in phytohormone treatments, expression analysis, identification of AGPs, research design, respectively. LH conceived the project, designed research, and revised the manuscript. All authors read and approved the final manuscript.