Metabolic fingerprinting of Arabidopsis thaliana accessions

In the post-genomic era much effort has been put on the discovery of gene function using functional genomics. Despite the advances achieved by these technologies in the understanding of gene function at the genomic and proteomic level, there is still a big genotype-phenotype gap. Metabolic profiling has been used to analyze organisms that have already been characterized genetically. However, there is a small number of studies comparing the metabolic profile of different tissues of distinct accessions. Here, we report the detection of over 14,000 and 17,000 features in inflorescences and leaves, respectively, in two widely used Arabidopsis thaliana accessions. A predictive Random Forest Model was developed, which was able to reliably classify tissue type and accession of samples based on LC-MS profile. Thereby we demonstrate that the morphological differences among A. thaliana accessions are reflected also as distinct metabolic phenotypes within leaves and inflorescences.


Introduction
Biodiversity constitutes a valuable resource for searching genes of interest. Natural variation in Arabidopsis has been observed for a variety of traits (Koornneef et al., 2004;Weigel, 2012) like seed size (Alonso-Blanco et al., 1999), light and hormone sensitivity (Maloof et al., 2001), growth rate (Beemster et al., 2002), root growth responses to phosphate starvation (Chevalier et al., 2003), and cold stress responses (Barah et al., 2013), among others. Comparison of whole genomes from Arabidopsis thaliana accessions showed that genetic differences exist among them, for instance, over 200 genes found in different accessions are not present in the reference genome Col-0 (Gan et al., 2011;Schneeberger et al., 2011). Furthermore, natural variation has also been studied at the transcriptomic (Gan et al., 2011;Stein and Waters, 2012;Wang et al., 2013) and proteomic (Chevalier et al., 2004) levels.
Metabolomics is adding another dimension to investigate gene function (Fiehn et al., 2000;Saito and Matsuda, 2010). Metabolic analysis methods such as profiling and fingerprinting have evolved from diagnostic tools used to elucidate metabolite accumulation patterns in different tissues and cell compartments of individual plants (Matsuda et al., 2009(Matsuda et al., , 2011Krueger et al., 2011;Mintz-Oron et al., 2012) to integrative tools, enhancing the strength of functional genomics in the process of shortening the distance of the genotype-phenotype gap (Fiehn et al., 2000;Taylor et al., 2002;Enot and Draper, 2007;Fernie and Schauer, 2009;García-Flores et al., 2012, 2015Landesfeind et al., 2014). Recently, the attention in this area has expanded to the study of natural variation of metabolite levels between individual plants, a strategy that is suggested to provide useful information to improve crop quality (Fernie and Schauer, 2009;Montero-Vargas et al., 2013). In this sense, several studies in Arabidopsis combining metabolomic and QTL analysis showed that metabolite variation between different accessions exists (Keurentjes et al., 2006(Keurentjes et al., , 2008Rowe et al., 2008;Fu et al., 2009;Chan et al., 2010;Joseph et al., 2013Joseph et al., , 2014, and highlighted that interactions between transcript and metabolite variation are complex and governed by epistatic interactions (Wentzell et al., 2007;Rowe et al., 2008;Joseph et al., 2013Joseph et al., , 2014. Moreover, the metabolic relationship between accessions depends on different factors like tissue, plant age, and environment Houshyani et al., 2012).
In the present work, we present a metabolite profiling study of A. thaliana accessions frequently used in the laboratory: Columbia (Col-0) and Wassilewskija (Ws-3) (Alonso-Blanco and Koornneef, 2000). Col-0 was selected from the original Laibach Landsberg population and is the accession that was sequenced in the Arabidopsis Genome Initiative (Rédei, 1992;AGI, 2000), and Ws-3 is a Russian accession (Laibach, 1951).
We investigated whether a distinct metabolic phenotype in two different tissues could be distinguished besides the morphological and developmental differences observed among the Arabidopsis accessions.

Plant Growth and Plant Material
Col-0 and Ws-3 accessions of Arabidopsis (A. thaliana) plants were germinated in soil (3:1:1, peat moss:perlite:vermiculite) in a growth chamber at 22 • C under long-day conditions (16 h of light/8 h of dark) and transferred to standard greenhouse conditions (22-27 • C, natural light). All plants were grown at the same time under the same environmental conditions.

Sample Preparation
Fully expanded leaves after flowering, and inflorescences from 10 plants were collected and pooled. Each pool was collected from different plants. Three biological replicas were used for each accession (with exception of Ws-3 leaves with only two biological replicas). Frozen plant material (fully expanded leaves after flowering or whole flowers) was ground in liquid nitrogen. For each 100 mg of fresh tissue, 300 µL of cold acetone was added, and the mixture was vortexed, sonicated for 5 min, and then centrifuged at 16,100 g to separate the crude extract from the tissue, as previously described (Sotelo-Silveira et al., 2013). The supernatant was lyophilized and used for analysis. The lyophilized samples were dissolved in 1000 µL of 100% MeOH and filtered through a 0.22 µm filter before the injection into the chromatographic column. We used for each biological replica two analytical replicas, giving in total 12 inflorescence and 10 leaf samples that were injected (SQLite database; Supplemental Data 3).

Chromatography
Chromatographic separation was performed on a ACQUITY BEH C-18 column (2.1 × 50 mm i.d., 1.7 µm, Waters, Mexico) using an ACQUITY UPLC system (Waters Corps., Mexico), as previously described (Sotelo-Silveira et al., 2013). The column was maintained at 35 • C and eluted with a 30 min gradient. The mobile phase, at a flow rate of 0.2 mL/min, consisted of a starting mixture of solvents A: B (MeOH: H 2 O; 1: 9; A: 100% MeOH; and B: H 2 O + 0.1% formic acid). A decrease of solvent B up to 20% over 15 min was then performed. Solvent B was returned to its initial composition over 1 min and the initial condition was maintained for 15 min in order to equilibrate the column. The volume of sample injected onto the column was 5 µL.

Mass Spectrometry
The eluent was introduced into the Q-Tof mass spectrometer (LCT Premier ™ XE, Waters Corps. Mexico) by electrospray ionization, with capillary and cone voltages set in the positive ion mode to 3100 and 70 V, as previously described (Sotelo-Silveira et al., 2013). The desolvation gas was set to 850 L/h at a temperature of 350 • C for the positive mode. The cone gas was set to 10 L/h, and the source temperature to 80 • C for the positive mode. Continuum data were acquired from m/z 50-1000 using an accumulation time of 0.2 s per spectrum. All spectra were mass corrected in real-time by reference to leucine enkephalin (2 µg/mL), infused at 5 µL/min through an independent reference electrospray. The resolution of the system was 11,000 for the positive mode.

Data Analysis
Waters LCT Premier ™ XE * .raw data files were converted to * .mzML community standard data format using the ProteoWizard (Chambers et al., 2012) and processed with an OpenMS/TOPPAS pipeline (Sturm et al., 2008). A TOPPAS workflow containing the detailed parameters is provided as Supplemental Material (Supplemental Data 1). In short, the LC-MS features of each data set were detected with the FeatureFinderMetabo tool and subsequently merged to create a consensus map. The consensus features were exported to plain text format and manually analyzed using standard text processing and spreadsheet programs.
Only high-quality (HQ) features, which were quantified in all evaluated 12 inflorescence or all 10 leaf samples, respectively, were used for further data analyses. In total 803 such HQ features were found for the inflorescence samples and 561 for the leaf samples. For identifying the HQ features, a metabolite database (DB) for Arabidopsis was created from the KNApSAcK database (http://kanaya.naist.jp/knapsack_jsp/top.html) (Afendi et al., 2012) and experimental liquid-chromatograph mass spectrometry (LC-MS) literature data. Automated DB generation and MS data matching were performed using SpiderMass (Winkler, 2015). The SpiderMass Meta-DB for Arabidopsis is provided as Supplemental Data 2. Mass spectrometry data processing was performed on the analysis platform MASSyPup (Winkler, 2014).
Consensus features, HQ features and putative metabolite identifications with their compound classes were integrated into a SQLite (https://sqlite.org/) database, which is available as Supplemental Data 3. For statistical analysis we used the R script "MetabR" (Ernest et al., 2012), which calculates the fold-changes and p-values according to Tukey's Honest Significant Difference (HSD).
The R package and Graphical User Interface (GUI) "Rattle" (Williams, 2009(Williams, , 2011 was employed to create and evaluate classification models for the metabolic data sets. Due to the characteristics of the data, i.e., relatively few samples, but multiple numeric variables, we decided for a Random Forest model (Williams, 1987(Williams, , 1988 (Figure 3). For the model training sets, we only considered features present in all 22 LC-MS datasets. To avoid the necessity of imputing values, all features with missing data were omitted. We created a metavariable "Accession_Tissue, " which describes the four possible combinations of our experiment and which was used as the target variable. In total, 16 datasets with 460 metabolic input variables were used for the model building. 10,000 decision trees were calculated. The number of selected variables was set to the square root of the number of variables (suggested default), which corresponded to 21 variables. The evaluation of the models was done with five testing datasets that represented the four possible combinations of tissue type and accession. The Rattle project, which contains the final model and the metabolic feature data, is provided as Supplemental Data 4.

Selection of Accessions
A. thaliana has over 1000 natural accessions that have been collected from around the world (Alonso-Blanco and Koornneef, 2000;Gaut, 2012;Horton et al., 2012). Natural accessions are very variable in terms of shape, development, and physiology (Weigel, 2012). Plants of the commonly used laboratory strains (or accessions), Columbia (Col-0) and Wassilewskija (Ws-3), are distinguishable based on their morphology and development (Figure 1). Particularly, they show differences in rosette leaf development and flowering time. Col-0 plants produce more rosette leaves, have a longer duration of the leaf production period (i.e., they flower later), and have a final rosette leaf area significantly larger than Ws-3 (Massonnet et al., 2010) (Figure 1).

Distinct Metabolic Phenotypes Were Detected for Each Accession and Tissue
To assess the natural variation in metabolite content among Arabidopsis accessions in two different tissues, we performed UPLC-QTOF MS-based untargeted metabolic fingerprinting of crude acetone extracts from leaves and inflorescences collected and pooled from Col-0 and Ws-3. Notably, using an organic solvent favors an extraction toward hydrophobic compounds, which are under-represented in studies using polar solvent mixtures.
The metabolic profiles demonstrate considerable quantitative and qualitative differences between the tissues and accessions. More than 14,000 and 17,000 features from inflorescences and leaves, respectively, were detected in the two accessions (SQLite database; Supplemental Data 3). In total 803 high quality features from inflorescences and 561 from leaves, which were quantified in all evaluated 12 inflorescence or all 10 leaf samples, respectively, were considered for identification through searching in databases of metabolites (Supplemental Data 2). In leaf samples, 222 high quality features presented significant differences (p ≤ 0.01) and in inflorescences samples 418 high quality features (Figure 2). From these metabolites that presented significant differences we could putatively identify 26 and 36 To evaluate the possibility, to identify tissue types and their accessions based on their metabolic profiles, we created a predictive Random Forest Model (Figure 3). During the training of the model, an error rate of 12.5% was estimated. Applying the final model to a testing dataset (which was not involved in the model building) resulted in the Error Matrix shown in Table 3. All five test samples were identified correctly.
Consequently, the metabolic identity of both, tissues and accessions, is sufficiently distinct to allow for a reliable classification with a Random Forest Model using LC-MS data.

Metabolites Differentially Accumulated Among Accessions and Tissues
To better understand the variation among tissues of the different accessions we focused in the analysis on putative identified metabolites with significant differences that belong to one of the KEGG pathways (Kanehisa et al., 2014) of A. thaliana. With this criterion putatively identified metabolites were classified into 9 classes (Tables 1, 2), and each class was analyzed to identify whether a conserved accumulation pattern among samples and accessions was present. We also searched for changes in the presence or difference in accumulation of specific metabolites in the different tissues and/or accessions, and these are described below for each class when pertinent.

Class 1: Phenylpropanoids, Monolignols, and Sinapate Derivatives
Four metabolites were found in leaf and inflorescence samples that were classified as belonging to class 1 (Tables 1, 2). Interestingly, qualitative and quantitative differences were found among tissues (Tables 1, 2). Furthermore, each accession has different abundance of metabolites reflected by the fold change in the intensity of each m/z. Col-0 leaves accumulated more m/z 311.1692 and 363.0737, putatively identified as Sinapine and Sinapoyl-(S)-malate, respectively, whereas in Ws-3 leaf samples more m/z 195.0648 and 197.0803, putatively identified as Ferulic acid and 5-Hydroxyconiferyl alcohol, respectively ( Table 1).

Class 2: Prenol Lipids, Terpenoid Backbone Biosynthesis Mevalonate and MEP/DOXP Pathways
Six and nine metabolites found in leaf and inflorescence samples belong to class 2. Interestingly, qualitative differences were identified among tissues, like m/z 369.1222 that was present in leaf ( Table 1), but not in inflorescence samples ( Table 2). On the contrary, m/z 251. 0213, 265.1432, 287.1312, and 514.2115 were detected in inflorescence, but not in leaf samples. Some of the metabolites were putatively identified as known hormones or as hormone precursors (Tables 1, 2).

Class 3: Biosynthesis of Phenylpropanoids, Biosynthesis of Flavonoids, Flavonones
Four metabolites detected in leaf and four in inflorescence samples belong to class 3. They were putatively identified as phenylpropanoids (Flavonols, Flavonones, Flavones, Anthocyanines, and Leucoanthocyanidins). Leaf and inflorescence samples showed distinctive metabolites, like m/z 311.0458 and 595.1585 that were present only in leaves (Table 1), and m/z 595.1589 and 611.1567 that were detected only in inflorescences ( Table 2).
Leaves  These metabolites were differentially accumulated among accessions and tissues.
Class 4: Fatty Acids, Fatty Acyls, Octadecanoids, Jasmonic Acid One and five metabolites were found in leaf and inflorescence samples, respectively, that belong to class 4. Col-0 leaf samples have more m/z 223.1695, putatively identified as Lauric acid (Table 1).

Class 5: Alkaloids
Only three metabolites with significant differences were found that belong to this class. Col-0 leaves accumulated more m/z 190.0039, putatively identified as Quinolic acid, whereas Ws-3 accumulated more m/z 199.0751, putatively identified as N-hydroxy tryptamine. The m/z 363.0392 was distinct in inflorescence samples and showed higher accumulation in Ws-3 than in the Col-0 accession.

Class 6: Glucosinolates Biosynthesis and Degradation
Three metabolites found in leaf and in inflorescence samples, respectively, belong to class 6.

Class 7: Amino Acids and Amino Acid Metabolism
Class 7 contains metabolites involved in the biosynthesis or metabolism of amino acids. Four and eight metabolites that belong to this class were identified in leaf and inflorescence These metabolites were differentially accumulated among accessions and tissues.

Class 8: Carbohydrates
We putatively identified two metabolites that belong to this class and showed differential behavior among accessions. One of them was Sucrose (m/z 343.1212) that accumulated more in Col-0 than in Ws-3 leaves, but was more abundant in Ws-3 than in Col-0 inflorescences ( Table 1). The second one was D-Ribulose 1,5-bisphosphate (m/z 328.9407), which was more abundant in Col-0 inflorescences samples.
In summary, metabolic profiling revealed distinct metabolic phenotypes for each accession and tissue. The metabolic phenotype included metabolites from at least 9 different classes. Figures 4, 5, as an example, represent some of the metabolic differences observed in leaf and inflorescence samples. Besides the potential use of these metabolites for identification of accession and tissues, some of them like sucrose, gibberellins A20, D-Ribulose 1,5-biphosphate, are interesting for further studies that could help to understand the morphological differences as well as the growth potential of the accessions.

2007
). Furthermore, there is evidence that different factors like environment, tissue type, and plant age affect the outcomes of the genetic network controlling metabolism in Arabidopsis .
In our study we observed natural metabolic variation among two different tissues of the accessions Col-0 and Ws-3, which are commonly used in research. The mass profiles of each tissue of the different accessions presented quantitative and qualitative variation, allowing us to distinguish among these accessions and tissues in terms of their metabolic profiles.
We detected more than 14,000 and 17,000 peaks from inflorescences and leaves, respectively, in the two accessions. 222 high quality features presented significant differences (p ≤ 0.01) in leaf samples and 418 high quality features did in inflorescence samples (p ≤ 0.01). From these metabolites that presented significant differences we could putatively identify 26 and 36 metabolites in leaf and in inflorescence samples (Tables 1, 2), with 17 of those metabolites present in both tissues in the two accessions (Supplementary Table S1). Although many signals remain unidentified, we created a Random Forest Model, which permits the classification of both tissue and accession based on their metabolic fingerprint. The model is predictive and may be employed for the correct identification of otherwise indistinguishable plants.

Nine Metabolite Classes Were Differentially Accumulated in Different Accessions and Tissues
In this study we found quantitative variation in nine metabolite classes, indicating different metabolite compositions in each accession and tissue.
Several studies using A. thaliana natural accessions have shown that differential gene expression exists among the accessions. The most differentially expressed genes concerned to the response to the biotic environment, including pathogen defense and the production of glucosinolates (West et al., 2006Kliebenstein et al., 2006a,b;van Leeuwen et al., 2007;Gan et al., 2011). In agreement with these observations, our study showed quantitative and qualitative variation in metabolites related to pathogen defense.
Particularly, in our extraction conditions, we observed one distinct metabolite for leaves (m/z 235.0595; Table 1) and one for inflorescences (m/z 465.0834; Other studies reported glucosinolate variation in leaves and seeds of Arabidopsis accessions (Kliebenstein et al., 2001b;Matsuda et al., 2010), and they have been used to discriminate among some A. petrea populations (Davey et al., 2008). They also have been shown to be subjected to genetic variation in Arabidopsis (Kliebenstein et al., 2001b).
Other metabolites, classified in our study as belonging to class 4, were differentially accumulated among tissues and accessions. Ws-3 inflorescences accumulated more (+)-Epijasmonic acid (m/z 233.1163; Table 2) than Col-0, while Col-0 inflorescences more Arabidopside B (m/z 825.4698; Table 2). Jasmonic acid and Methyl jasmonate play an essential role in plant defense responses, pollen development and leaf growth control by repressing cell proliferation (Światek et al., 2004;Zhang and Turner, 2008;Chen et al., 2013;Noir et al., 2013). Arabidopside B seems to have an inhibitory effect on root growth and a possible role as a reservoir for slow release of free OPDA, a Jasmonate precursor (Kourtchenko et al., 2007).
Also, it was interesting to note the differential accumulation of Lauric acid (m/z 223.1696) in the leaves of the two accessions, being Col-0 the one that accumulated the most in leaves and Ws-3 in inflorescences. It has been demonstrated that Lauric acid can be elongated and desaturated into Linolenic acid that then is incorporated into Jasmonic acid and Methyljasmonate (Afitlhile et al., 2004).
Phenylpropanoid pathway metabolites are also known for their protective roles (Buer et al., 2010;Fraser and Chapple, 2011). We observed a distinct pattern of accumulation of metabolites belonging to the flavonoid branch of this pathway among tissues and accessions, inflorescences being the samples that presented the most diversity in these compounds. Many of these compounds have also been considered as chemical messengers, physiological regulators and cell cycle inhibitors (Buer et al., 2010;Falcone Ferreyra et al., 2012). Furthermore, a distinct pattern of accumulation of lignin precursors was also identified here as well as differences in the content of Sinapate esters.
The rate of plant growth depends on a combination of photosynthetic carbon (C) assimilation rate and developmental programs that determine how rapidly metabolites are used for growth, although the molecular and genetic basis are not wellunderstood (Sulpice et al., 2009). It has been reported that natural variation in the level of central metabolites exists (Loudet et al., 2003;Calenge et al., 2006) and that there are positive and negative correlations between these metabolites and biomass (Meyer et al., 2007;Sulpice et al., 2010). Metabolites that are negatively correlated with biomass were sucrose, glucose-and fructose-6-phosphate, which link carbon flow from photosynthesis and starch and sucrose metabolism with cell wall formation, the TCA cycle members citric acid, succinate or malic acid, as well as the amino acids glutamine and phenylalanine (Meyer et al., 2007). In our study, we observed differences in sucrose content in leaf and inflorescence samples between accessions, and of D-ribulose 1,5 biphospate in inflorescence samples as well as differences in the content of phenylalanine and phenylalanine derived compounds that contribute to cell wall formation (ferulic acid, sinapine). Houshyani et al. (2012) observed significant differences among Arabidopsis accessions for some primary metabolites, e.g., fructose, 1-methyl-alpha-Dglucopyranoside, glucopyranose, sucrose, and L-glutamic acid. Metabolite QTLs were also associated with central metabolism, suggesting that differences in central carbon metabolism can exist among accessions (Rowe et al., 2008;Houshyani et al., 2012).
Plant hormones affect gene expression and transcription levels promoting cellular division, growth and differentiation, and directing developmental programs that determine how rapidly metabolites are used for growth (Alabadi et al., 2009;Sulpice et al., 2010). Paparelli et al. (2013) showed that plant size is determined by a mechanism in which carbohydrates produced by photosynthesis modulate the synthesis of gibberellins (Paparelli et al., 2013). In our study, we observed a different pattern of GAs and sucrose accumulation among accessions (Tables 1, 2 and Figures 4, 5), which could be further investigated to establish a correlation with the growth phenotypes.
Moreover, differences between accessions in the content of abscisic acid or intermediates in abscisic acid biosynthesis were found. Abscisic acid is well-known for its effect on seed germination, flowering, and during plant response to environmental stress and plant pathogens. Col-0 and Ws-3 present differences in flowering time that could eventually be explained by the differences here found in the content of GAs and ABA, hormones that concur to regulate this process. Recent studies showed that ABA potentially delays flowering under unstressed conditions, but promotes it when plants are stressed (Finkelstein, 2013).
Further work must be done to investigate whether there is a correlation between the hormonal differences and the metabolic signatures found in each accession that could explain the morphological differences among accessions.

Concluding Remarks
In this work we found a distinct metabolic phenotype of each tissue and Arabidopsis accession studied. We found quantitative variation in nine metabolite classes, resulting in different compositions of metabolites in each accession and tissue. Moreover, a predictive Random Forest Model was made that is able to reliably classify tissue type and accession of samples based on LC-MS profiles.
The metabolite signature of accessions found in this work could set a basis for future studies to understand how these profiles correlate with their respective phenotype. For example, by exploring its correlation with interesting developmental processes, like cell division, cell expansion, flowering time, and biomass production. Moreover, knowledge of metabolite natural diversity could help to direct plant breeding approaches.

Author Contributions
MS performed the sample preparations and AC performed chromatography and mass spectrometry experiments. MS, NM, RW, and SF conceived the project, designed the experiments, and analyzed the data. RW performed data processing, Random Forest Model development, and statistical analysis. MS, NM, RW, and SF drafted the manuscript. All authors read and approved the final manuscript.