Quantitative Proteomic and Transcriptomic Study on Autotetraploid Paulownia and Its Diploid Parent Reveal Key Metabolic Processes Associated with Paulownia Autotetraploidization

Polyploidy plays a very important role in speciation and plant evolution by way of genomic merging and doubling. In the process of polyploidy, rapid genomic, and transcriptomic changes have been observed and researched. However, proteomic divergence caused by the effects of polyploidization is still poorly understood. In the present study, we used iTRAQ coupled with mass spectrometry to quantitatively analyze proteomic changes in the leaves of autotetraploid Paulownia and its diploid parent. A total of 2963 proteins were identified and quantified. Among them, 463 differentially abundant proteins were detected between autotetraploid Paulownia and its diploid parent, and 198 proteins were found to be non-additively abundant in autotetraploid Paulownia, suggesting the presence of non-additive protein regulation during genomic merger and doubling. We also detected 1808 protein-encoding genes in previously published RNA sequencing data. We found that 59 of the genes that showed remarkable changes at mRNA level encoded proteins with consistant changes in their abundance levels, while a further 48 genes that showed noteworthy changes in their expression levels encoded proteins with opposite changes in their abundance levels. Proteins involved in posttranslational modification, protein turnover, and response to stimulus, were significantly enriched among the non-additive proteins, which may provide some of the driving power for variation and adaptation in autopolyploids. Quantitative real-time PCR analysis verified the expression patterns of related protein-coding genes. In addition, we found that the percentage of differentially abundant proteins that matched previously reported differentially expressed genes was relatively low.


INTRODUCTION
Polyploidy is a widespread and prominent process that plays an important role in the evolution of all angiosperm plants through genomic merging and doubling. There are three main types of polyploids: autopolyploids, allopolyploids, and segmental allopolyploids (Stebbins, 1947). Autopolyploids such as Solanum tuberosum, Saccharum officinarum, and Medicago sativa, combine more than two sets of identical genomes in their nucleus (Chen, 2010). They often thrive more vigorously and can survive in severe environments because, through genome duplication (Soltis and Soltis, 2009), they have traits that are superior to their parents. Many hypotheses have been proposed to explain the function of a doubled genome. One such hypothesis is that genome duplication endows distinct superiorities that enable polyploids to survive and thrive in circumstances that are too severe for their parents (Otto, 2007;Weiss-Schneeweiss et al., 2013). Despite the widespread occurrence and survival superiorities of polyploids, and some progress in this area, there is still not enough information to explain the cause and direct effect of the assigned biological functions of the increased content of the doubled genome.
High-throughput genomic and transcriptomic sequencing techniques have contributed greatly to gaining a comprehensive perspective on the genomic information of autopolyploids. However, because changes at the transcriptional level do not always correlate with changes at the translational level, it is also necessary to investigate the differences between autotetraploids and their corresponding progenitors at the proteome level. The correlation between the expression of RNA transcripts and protein abundance is usually not direct because processes such as posttranscriptional regulation and posttranslational modifications can make it difficult to predict patterns of protein abundance. Thus, applying proteomic approaches to investigate autopolyploids will greatly increase the understanding of their evolution and adaption. Until recently, proteomic changes in autopolyploids and their diploid parents have been rarely reported, and only a few studies on Arabidopsis (Ng et al., 2012) and Manihot esculenta Crantz  were found. In most of these researches, the method of two-dimensional electrophoresis (2-DE) combined with mass spectrometry was used. In general, there are three kinds of methods involving in proteome: 2-DE, label, and shotgun. They are complementary to each other. Coupled with mass spectrometry analysis, iTRAQ has proven to be a powerful method to investigate proteomic changes because many proteins can be identified simultaneously and proteomic changes can be measured with high sensitivity in related species (Ross et al., 2004;O'Brien et al., 2010). Therefore, an iTRAQ-based proteomic analysis in autopolyploids should provide valuable information and new insights into the effects of autopolyploidization at the translational level (Koh et al., 2012).
Paulownia is an economically important genus of trees that are native to China. Paulownia "Yuza 1" is an improved species that was bred from P. tomentosa × P. fortunei. The autopolyploid "Yuza 1" does not exist in nature; however, it has valuable traits, including drought and witches' broom diseases resistance, that were acquired as a result of its autopolyploidy (Liu et al., 2013;Li et al., 2014;Xu et al., 2014). In this study, iTRAQ technology combined with liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) was employed to investigate the effects of polyploidization on the "Yuza 1" proteome. Our results provide a survey of protein changes as a result of genome duplication and provide a better understanding of the mechanisms of autoploidization in Paulownia. We also cataloged differences in the abundance of proteins and expression levels of mRNAs using previously reported RNA sequencing data . Finally, potential target genes identified in this study could be engineered for acclimation in plants, especially stress.

Plant Material
All biological materials used in this study were obtained from the Institute of Paulownia, Henan Agricultural University, Zhengzhou, Henan Province, China.

Protein Extraction, Digestion, and iTRAQ Labeling
The Y2 and Y4 samples were ground into powder in liquid nitrogen, extracted with lysis buffer containing 1 mM PMSF and 2 mM EDTA. After that, protein was extracted according to the method of Tang et al. (2016). Total protein was digested and iTRAQ-labeled according to the method of Meng et al. (2014). The samples were labeled with iTRAQ tags as follows (sample, tag): Y2, 113; Y4, 116; Y2-2, 117; Y4-2, 118. Four other samples were labeled with the same tags as repeats. The peptides were labeled with the isobaric tags and incubated at room temperature for 2 h. The labeled peptide mixtures were then pooled for strong cation exchange and dried by vacuum centrifugation.

Strong Cation Exchange
Strong cation exchange chromatography was performed with a LC-20AB HPLC pump system. The iTRAQ-labeled peptide mixtures were reconstituted with 4 mL buffer A and loaded onto a 4.6 × 250 mm Ultremex SCX column containing 5-µm particles. The peptides were eluted at a flow rate of 1 mL min −1 with a gradient of buffer A for 10 min, 5-60% buffer B for 27 min, and 60-100% buffer B for 1 min. The system was then maintained at 100% buffer B for 1 min before equilibrating with buffer A for 10 min prior to the next injection. Elution was monitored by measuring the absorbance at 214 nm, and fractions were collected every 1 min. The eluted peptides were pooled into 20 fractions, desalted with a Strata X C18 column and vacuum-dried.

LC-MS/MS Analysis
Mass spectroscopy analysis was performed using an AB SCIEX TripleTOF TM 5600 mass spectrometer, coupled with an online micro-flow HPLC system as described in Section Strong Cation Exchange. The peptides were separated using the method of Qiao et al. (2012).

iTRAQ Protein Identification and Quantification
The acquired raw data files were converted into Mascot generic format files according the method of Lin et al. (2013). Protein identification was performed using the Mascot search engine against a Paulownia transcriptome database that contained 82,934 sequences. The sequencing data have been submitted to the Short Reads Archive under accession number SRP034738 . Protein identification and quantification was performed according the method of Guo et al. (2014).
To reduce the probability of false peptide identification, only peptides with significance scores in the 99% confidence interval greater than "identity" in a Mascot probability analysis were counted as identified. Each confident protein identification involved at least one unique peptide.
For protein quantization, each protein was required to contain at least two unique peptides. The quantitative protein ratios were weighted and normalized by the median ratio in Mascot. Only ratios with p < 0.05 and fold changes >1.2 were considered significant.

Bioinformatics Analysis
Functional analysis of the identified proteins was conducted using Gene Ontology annotations and the proteins were categorized according to their biological processes, molecular functions and cellular localizations. The proteins were further analyzed using the Clusters of Orthologous Groups of proteins database and the Kyoto Encyclopedia of Genes and Genomes database . Principal Component Analysis (PCA) of the samples were done using the software of SIMCA-P (Wu et al., 2010).

RNA Preparation and Quantitative RT-PCR
The RNA samples from Y2, Y4, Y2-2, and Y4-2 were extracted with Trizol (Sangon, Shanghai, China). The RNA was then precipitated with isopropanol. Purified and concentrated RNA was denatured and first-strand cDNAs for all the samples were synthesized using a PrimeScript RT reagent Kit (Takara, Dalian, China). Primers were designed using Beacon Designer version 7.7 (Premier Biosoft International, Ltd., Palo Alto, CA, USA; Table 1). The genes encoding 10 differentially abundant proteins (DAPs) were selected for qRT-PCR analysis. The qRT-PCR reactions were run in So Fast EvaGreen Supermix starting with 1 µL of the cDNA template in a standard 20-µL reaction. The cDNAs were then amplified in a Bio-Rad CFX96TM Real-Time System with SYBR Premix Ex Taq TM II. The PCR cycles were as follows: 95 • C for 1 min, followed by 40 cycles of 95 • C for 10 s, and 55 • C for 15 s. Relative expression levels of the genes were calculated using the 2 − Ct method and normalized with 18S rRNA from "Yuza 1."

Proteomics Characterization
A total of 366,435 spectra were generated from the iTRAQbased quantitative proteomics analysis of the proteins extracted from Y2, Y2-2, Y4, and Y4-2 seedling leaves. The analysis using Mascot software identified 21,423 spectra that matched known spectra. Among them, 18,165 unique spectra were matched to 7477 unique peptides and 2963 proteins ( Figure 1A), ∼54% of which consisted of at least two of the peptides ( Figure 1B). The vast majority of these proteins were larger than 10 kDa, although their molecular weights covered a wide range ( Figure 1C). Most of the identified proteins had good peptide coverage; 59% had more than 10% sequence coverage, and 33% had 20% sequence coverage ( Figure 1D). Further, to avoid identification omissions, we also confined the peptide matching error of the database search strategy to less than 2 ppm. The reproducibility of the proteomic analysis is shown in Figure 2.
These results indicate that the proteomics analyses were reliable. The results of PCA of the samples were shown in Figure S1. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE (Vizcaíno et al., 2016) partner repository with the dataset identifier PXD004237. Data are available via ProteomeXchange with identifier PXD004237 (Vizcaino et al., 2014). Reviewer account details were as follows: username was reviewer36112@ebi.ac.uk and password was TH1qVbgd.
To carry out a functional analysis, we assigned GO terms to all the quantitated proteins. Under biological process, metabolic process (1950) and cellular process (1830) were the most represented groups; under cellular component, cell (2290) and cell part (2290) were the two largest groups; and under molecular function, catalytic activity (1518) and binding (1331) were most represented (Figure 3). These results indicate that the identified proteins are involved in almost every aspect of "Yuza 1" metabolism.
To further understand the functions of the 2963 proteins, they were assigned to 23 categories using the COG database. The most highly represented functional categories were "general function prediction only" and "posttranslational modification, protein turnover, chaperones, " which represented ∼16 and 14% of the identified proteins, respectively ( Figure 4).
Next, the annotated proteins were mapped onto 121 KEGG pathways (Table S1). Among them, "metabolic pathways" (758) was significantly more highly represented than the other pathways, and was followed by "biosynthesis of secondary metabolites" (455) and "ribosome" (116). Based on the KEGG analysis, we concluded that most of the mapped proteins may affect cellular component biogenesis, posttranslational modification, protein turnover, and response to stimulus.

Overview and Analysis of Differentially Abundant Proteins Related to Autotetraploidy
DAPs were defined as proteins that showed a greater than 1.2-fold change in relative abundance and a p < 0.05. A total of 463 DAPs were identified in the Y4 vs. Y2 comparison; 265 were more abundant and 198 were less abundant ( Table S2). Among the top 10 more abundant DAPs, only five could be aligned to proteins with known functions in GenBank. These proteins were annotated as lipid transfer protein 2 (Salvia miltiorrhiza, ABP01769.1gi|144601657),

Comparison of the Abundance Profiles of DAPs between the Y2 and Y4 Samples
Based on the Y4 vs. Y2 comparison, DAPs were retrieved and analyzed using the GO, COG, and KEGG databases. The DAPs were assigned to 51 functional groups under the three main GO categories (Figure 5 and Table S3). Under biological process, metabolic process and cellular process were the most represented groups; under cellular component, cell and cell part were the two largest groups; and under molecular function, catalytic activity and binding were the most represented. Interestingly, the GO classifications for the DAPs were similar to the classifications for all the identified proteins. The most enriched GO terms under biological process, cellular component, and molecular function were the same for all the proteins and for the DAPs subset ( Table 2). We suggest that these enriched GO terms may be related to autotetraploidy.
To predict and classify the possible functions of the DAPs, we assigned them to COG categories. Based on their sequence homology with known proteins in GenBank, 355 DAPs, which comprised 11.98% of all the proteins, were divided into 20 categories ( Table 3). The "posttranslational modification, protein turnover, chaperones" category contained 48 proteins and was the largest, followed by "energy production and conversion" (47), "translation, ribosomal structure and biogenesis" (46), "general function prediction only" (41), "carbohydrate transport and metabolism" (38), "amino acid transport and metabolism" (27), and "coenzyme transport and metabolism" (17). The smallest group was "intracellular trafficking, secretion, and vesicular transport" with only one protein ( Figure S2).
The DAPs were mapped to 77 KEGG metabolic pathways (Table S4): "metabolic pathways, " "biosynthesis of secondary FIGURE 2 | Repeatability of the expression of duplicate samples. X-axis represents the difference of the quantitative ratios between the first and the second biological repeats of the two samples. The right y-axis represents the cumulative percentage between the proteins of a certain range and the quantitative proteins, while the left y-axis represents the number of total protein in a certain range. metabolites, " "ribosome, " and "photosynthesis" were highly enriched, followed by "glyoxylate and dicarboxylate metabolism, " "amino sugar and nucleotide sugar metabolism, " "pyruvate metabolism, " "carbon fixation in photosynthetic organisms, " "other glycan degradation, " "glycolysis/gluconeogenesis, " "starch and sucrose metabolism, " and "protein processing in endoplasmic reticulum."

Correlation between Proteins and Transcripts
To compare changes in protein abundance with transcript levels, we compared the more and less abundant DAPs identified by iTRAQ with the up-and down-regulated differentially expressed genes identified from a previous transcriptome analysis. Differentially expressed unigenes were screened based on an absolute fold change value of log2 ratio >1 with p < 0.001 and FDR < 0.001. A total of 1808 corresponding proteinencoding genes were detected by RNA sequencing. We found that 59 of the genes that showed noteworthy changes in their expression levels encoded proteins with corresponding changes in their abundance levels; 37 of these genes were up-regulated and 22 were down-regulated, while the corresponding proteins were more and less abundant, respectively (Table S5). A further 48 genes that showed noteworthy changes in their expression levels encoded proteins with opposite changes in their abundance levels ( Table S5). We found 252 genes that showed noteworthy changes in transcript levels, while the corresponding proteins showed no changes in their abundances (Table S6). Conversely, 356 proteins showed noteworthy changes in their abundance levels, while the corresponding encoding genes showed no changes in their expression levels ( Table S7).

Confirmation of Differentially Expressed Proteins by qRT-PCR
To confirm the differential expression of proteins identified by iTRAQ analysis, qRT-PCR was performed to detect the transcript expression levels of the corresponding genes. In Y4 vs. Y2, the qRT-PCR results shown that 15 DAPs were consistent with the iTRAQ LC-MS/MS analysis results, 2 DAPs displayed opposite protein and mRNA expression patterns (Figure 6), 1 transcript of the DAP failed to be amplified in the qRT-PCR experiment. This discrepancy may be attributed to posttranscriptional and posttranslational regulatory processes.
FIGURE 3 | Gene Ontology classification of distinct proteins that were detected in "Yuza 1" leaves 2683 proteins (90.55% of total) were categorized into 51 function groups.
FIGURE 4 | COG function classification of distinct proteins that were detected in "Yuza 1" leaves protein or domains were annotated and divided into 23 specific categories.
Frontiers in Plant Science | www.frontiersin.org  Immune system process 3 Localization 19 Metabolic process 239 Multi-organism process 9 Multicellular organismal process 9 Negative regulation of biological process 3 Positive regulation of biological process

Insights into the Proteomic and Transcriptomic Analyses of Autopolyploids
An important aspect of polypoid genomics research is the observation of genome expansion and contraction. The C-value, which is a measure of the amount of DNA in the unreplicated gametic nucleus, is often used to estimate the ploidy level in plants (Udall et al., 2006). It is assumed that polyploids have large C-values that increase in direct proportion with ploidy (Poggio et al., 2014). However, all three possible fates of genome size evolution after ploidization, namely expansion, contraction, and no change, have been reported. Genome expansion has been reported in only a few studies. In a recent study of five Nicotiana polyploids, the genomes of the polyploids were found to have increased in size (Leitch and Leitch, 2008). Conversely, a large-scale analysis of 3008 angiosperms that combined available genome size data, suggested that genome contraction was a widespread biological response after polyploid formation (Leitch and Bennett, 2004;Ammiraju et al., 2010). It is also been reported that in some polyploid series, there is no change in genome size after polyploidization, and such phenomenon especially takes plase in newly-formed ones (Yang et al., 2011). In our work, the genome size of the autotetraploid Paulownia was found to be less than twice the diploid genome size, suggesting genome contraction had occurred in the process of autopolyploidization. We predicted that the contraction may be due to an insertion/deletion bias that could result in frequent and larger deletions and result in a reduction in genome size because of an increase in illegitimate recombinations (Grover et al., 2007). Our proteomic and transcriptomic data gave quite different abundance/expression results. This may be because gene expression differences were not translated into protein abundance differences because of posttranscriptional regulation and/or posttranslational modification, or because differences between the autotetraploid and its diploid parent were underestimated at the proteomic level because the iTRAQbased analysis determines the number of DAPs based on an estimate of the fold change of different samples. It has been reported that iTRAQ-based methods may have problems with accuracy because they may systematically underestimate ratios, especially when the ratio changes are large (Bantscheff et al., 2008;Ow et al., 2009;Karp et al., 2010). In this work, as in most iTRAQ studies, fold changes ≥1.2 or ≤0.8 were employed because, in iTRAQ quantitation, small changes may reflect larger differences.

Regulation of Protein Expression in "Yuza 1" Autotetraploid Can Increase the Possibility of Adaptation to Various Stresses
Polyploids have been reported to acquire enhanced stress responses compared with the corresponding diploids (Saleh et al., 2008;Chandra and Dubey, 2010;Manzaneda et al., 2012). Casneuf et al. found through a genome-wide analysis in Arabidopsis, that stress-related genes were more likely to be retained during polyploidization compared with other genes (Casneuf et al., 2006). In a previous study, we also found that genes involved in stress responses, including drought, salt, and phytoplasma infection, were significantly up-regulated in Paulownia autotetraploids compared with their diploid parents (Liu et al., 2013;Dong et al., 2014a,b,c;Fan et al., 2014;Xu et al., 2014). Moreover, the regulation of microRNA expression in Paulownia autotetraploids also increased the probability of adaptation to climate because some down-regulated microRNAs were found to target genes that were related to response to stress in autotetraploids compared with their parents (Fan et al., 2015;Niu et al., 2016). In this study, we also found that 81 proteins that were involved in response to various stresses tended to be differentially abundant in "Yuza 1" autotetraploids compared with its diploid parent. For instance, during these proteins three monocopper oxidase-like proteins (XP_002266352.1, XP_002329905.1, XP_004234931.1) and one L-ascorbate peroxidase (XP_002513962.1), which play important roles in the process of paulownia development and stress tolerance, turned out higher abundance levels in "Yuza 1" autotetraploid compared with its parents. These findings suggest that the regulation of protein abundance in autotetraploid "Yuza 1" may make for the increased adaptation observed in autopolyploids.
In plants, it has been reported that the when some stressresponsive genes are constitutively expressed, accompanied fitness costs are usually observed (Shen et al., 2015). Therefore, when defense is not required, the costs to plants could be reduced by induced resistance (Heil, 2002;Heil and Baldwin, 2002;Accamando and Cronin, 2012). In Arabidopsis thaliana autotetraploids, it was found that the higher expression levels of genes that responded to stress might be linked with slow growth of the plant compared with its parents (Ng et al., 2012). Shen et al. also reported that, in a Brassica hexaploid, the up-regulation of stress responsive genes might slow down the growth of the plant by cellular and metabolic processes (Shen et al., 2015). Interestingly, in this study, we found that the higher abundance of stress-responsive proteins was accompanied by a higher level of autotetraploid plant growth compared with the parent. How the autotetraploid Paulownia coordinates these biological processes is still unclear and further studies are required to clarify the underlying mechanisms.

Correlation between Protein Abundance and Transcript Expression Is Limited
To interpret biological processes, such as gene expression, protein interactions, and the structures and functions of cellular systems, it is essential to investigate the correlation of protein abundance and mRNA transcript levels. However, the coefficient of correlation between them has been found to be different among different studies and different plant tissues. In yeast, good correlation has been reported between transcript levels and protein abundances (Lackner et al., 2012), while in most plant tissues, such as A. thaliana leaves and roots (Lan et al., 2012;Ng et al., 2012), Brassica napus (Marmagne et al., 2010), Tragopogon mirus (Koh et al., 2012), and Triticum aestivum (Song et al., 2007), only limited correlation has been found. In general, low correlation between protein abundance and mRNA expression levels is more often observed and is more widespread. In a study conducted by Lan et al. RNA sequencing and iTRAQ-based proteomics were both employed to generate genomic expression and proteomic abundance data of Arabidopsis roots, and discordant changes were reported. However, interestingly, they found that for highly up-regulated genes, the concordance between expression levels and protein abundance was generally high (Lan et al., 2012). In our study, we used RNA sequencing data and iTRAQ-based quantitative proteomics to compare the transcript levels and protein abundances in "Yuza 1" autotetraploid and diploid Paulownia leaves. Among the 2963 proteins detected by iTRAQ, 1808 genes that encoded these proteins were detected in the RNA sequencing data. Among them, 37 up-regulated genes corresponded to the more abundant proteins, and 22 down-regulated genes corresponded to the less abundant proteins in the Y4 vs. Y2 comparison. The expressions of a further 48 genes showed an opposite trend to the abundances of the corresponding proteins. We found that most of the genes and corresponding proteins that exhibited the same trend at the transcriptional and translational levels, respectively, were related to response to stimulus or oxidation-reduction. In an earlier study, we also found that up-regulated genes in Paulownia autotetraploids were significantly related with response to stimulus . These data imply that changes in transcript levels may be one of the main determinants for changes in protein abundances.
Several factors may explain the reported lack of concordance between protein abundance and mRNA expression levels.  First, it is widely assumed that posttranscriptional regulation and posttranslational modifications can control translational efficiency, and that these processes could lead to discordance between mRNA expression levels and protein abundance. For example, small RNAs, including microRNAs, which regulate the expression of their target genes, play important roles in many biological processes (Ha et al., 2009). Second, the limitations of the different experimental techniques in detecting the expression of mRNAs and abundance of proteins could contribute in a major way to this problem. Third, the use of different plant tissues might lead to different results because the transcriptional and translational processes are both temporal and spatial. Proteomic studies on whole tissues may draw a more comprehensive picture, but studies on specific organelles will likely show differences more exactly. In summary, despite the limited correlation, integrated analyses of the transcriptome and the proteome data are essential for understanding the effects of polyploidization.

AUTHOR CONTRIBUTIONS
GF conceived the experiments. YD wrote the paper. MD performed the experiments. ZZ analyzed the data. YD contributed reagents/materials/analysis tools.

ACKNOWLEDGMENTS
We thank WZ for their interest and invaluable assistance during this work.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpls.2016. 00892 Table S1 | KEGG pathway analysis of all identified proteins for "Yuza 1." Table S2 | Differentially abundant proteins identified in the comparison of Y4 vs. Y2 Up-regulated proteins were marked in red while down-regulated ones in green.    Figure S1 | Principal Component Analysis (PCA) of Y4 and Y2 samples. Figure S2 | Classification of the clusters of orthologous groups (COG) for the Itraq-based proteotome of "Yuza 1."