Ultra Performance Liquid Chromatography and High Resolution Mass Spectrometry for the Analysis of Plant Lipids

Holistic analysis of lipids is becoming increasingly popular in the life sciences. Recently, several interesting, mass spectrometry-based studies have been conducted, especially in plant biology. However, while great advancements have been made we are still far from detecting all the lipids species in an organism. In this study we developed an ultra performance liquid chromatography-based method using a high resolution, accurate mass, mass spectrometer for the comprehensive profiling of more than 260 polar and non-polar Arabidopsis thaliana leaf lipids. The method is fully compatible to the commonly used lipid extraction protocols and provides a viable alternative to the commonly used direct infusion-based shotgun lipidomics approaches. The whole process is described in detail and compared to alternative lipidomic approaches. Next to the developed method we also introduce an in-house developed database search software (GoBioSpace), which allows one to perform targeted or un-targeted lipidomic and metabolomic analysis on mass spectrometric data of every kind.


INTRODUCTION
Holistic analysis of a cellular metabolome, the complement of all small molecules within a cell (Oliver et al., 1998), is still quite complicated due to the huge complexity and the large chemical heterogeneity of all the contained molecules. Besides the polar compounds, like sugars and amino-and organic-acids, there are also a large number of non-polar (water insoluble) compounds which need to be analyzed. The high complexity and chemical diversity, but also the huge difference in the molar abundance of these compounds explains why up to now no single analytical platform has been developed that is able to detect and quantify all of these compounds in a single analysis (Oldiges et al., 2007). As a consequence, different sample extraction and fractionation methods have been developed which allow a rough separation of the metabolites into less complex and more homogeneous fractions before their analysis (Vuckovic et al., 2010). One functionally and chemically distinct metabolic fraction that can be efficiently separated from crude extracts contains the water insoluble, generally hydrophobic lipids.
Lipids have essential functions for all living cells, not only because they are the building blocks of the membranes, which enclose the cell and the internal organelles (Van Meer et al., 2008), but also by functioning as energy storage or signaling molecules (Downes and Currie, 1998;Spiegel and Milstien, 2003;Wenk, 2005;Wymann and Schneiter, 2008). For this purpose it is not surprising that a complete new branch in the metabolomics area, namely the field of lipidomics, emerged, and has made great advancement within the last few years (Dennis, 2009;Blanksby and Mitchell, 2010;Wenk, 2010;Harkewicz and Dennis, 2011). Lipids, which are often defined by their inability to dissolve in water, do still cover a broad spectrum of diverse substances ranging from slightly polar [e.g., glycosylated sphingolipids (Merrill et al., 2009) to highly non-polar lipids like, e.g., triacylglycerol (Kuksis, 2007)]. Estimations on lipid numbers within eukaryotic cells range from a few 100 to several 1,000 lipid species (Dennis, 2009), indicating the expected high complexity. To structure this complexity and to generate a uniform nomenclature for the known lipids a general classification and nomenclature system was required. The publicly funded LIPID MAPS Consortium (Fahy et al., 2005(Fahy et al., , 2009) provided a new definition system, which is mostly based on the biosynthetic origin of the different lipids and not only on the solubility of the compound. Therefore the lipids are now defined as hydrophobic or amphiphatic small molecules, which originate from carbanion-based condensation of thioesters or by carbocation-based condensation of isoprene units (Fahy et al., 2005). This new definition is not only more precise then the old water insolubility-based definition, but it also allows to classify the commonly known lipids into homogenous functional subclasses: namely the fatty acids, glycerolipids, glycerophospholipids, sphingolipids, sterols, prenols, saccharolipids, and polyketides (Fahy et al., 2005).
The fact that no single analytical technology has allowed the identification and quantification of all metabolite species in a single experiment is also true for the analysis of all the different lipids from a cell (Wenk, 2010). Historically, lipids have been analyzed by diverse chromatography-based separation methods (Bausch, 1993). Commonly used technologies comprised methods like one or two dimensional thin layer chromatography in combination www.frontiersin.org with different visualization strategies (Touchstone, 1995), but also high performance liquid chromatography (HPLC) methods in combination with various detection systems (Picchioni et al., 1996). Even though these methods have proven useful for many purposes, it seems that their limitations for large scale quantitative lipid analysis are more evident (Blanksby and Mitchell, 2010). As a consequence, mass spectrometry (MS)-based methods, with or without chromatographic separation techniques, have evolved to fill this technological gap (Welti et al., 2007b;Griffiths and Wang, 2009;Blanksby and Mitchell, 2010;Wenk, 2010;Harkewicz and Dennis, 2011).
There are many different MS instruments available which can be combined with an even larger number of separation systems (Griffiths and Wang, 2009;Wenk, 2010). Still, only two main strategies for the analysis of lipids have been used in most of the described reports: on one hand there is the most successfully used method, namely shotgun lipidomics, which relies on a separation free (direct infusion) analysis of a crude lipid extract on triple quadrupole (QqQ) or quadrupole time-of-flight (qTOF) mass spectrometers (Welti and Wang, 2004;Ejsing et al., 2009;Yang et al., 2009), on the other hand there is chromatography-based separation prior to the mass spectrometric measurement for the lipid analysis, which has been used only in a small number of studies thus far (Markham and Jaworski, 2007;Rainville et al., 2007;Glauser et al., 2008b;Nakanishi et al., 2009;Nygren et al., 2011). Both methods have their advantages and disadvantages: for example, the shotgun approach is prone to strong ion suppression effects, which can in part be compensated for by large sample dilutions or by the use of internal reference compounds (Moore et al., 2007). While the chromatography-based methods are less sensitive to these suppression effects, due to the chromatographic separation (Muller et al., 2002;Annesley, 2003), these approaches were thus far unsuitable for absolute lipid quantification (Stahlman et al., 2009).
In the field of plant metabolomics both technologies have found their applications, while the polar glycerolipids have been widely analyzed by the shotgun lipidomic approach Welti et al., 2007b;Zhang et al., 2009;Kilaru et al., 2010), sphingolipids have been most successfully analyzed by targeted LC-MS-based approaches (Markham et al., 2006;Markham and Jaworski, 2007;Chen et al., 2008). Still, since most of these studies made use of highly sensitive, but low resolution mass spectrometers, they were mostly performed in a targeted way, by simply profiling a limited number of known lipid species (Lu et al., 2008).
In this report we describe a versatile and reproducible ultra performance liquid chromatography (UPLC)-based separation system, coupled to a high resolution mass spectrometer operating in MS as well as all-ion fragmentation mode. The developed system allows for the accurate qualitative and semi-quantitative targeted analysis of several hundred different lipid species extracted from a single plant sample. Additionally, due to the combination of chromatography and high resolution MS and all-ion MS/MS, the method allows to revisit the data long after the actual measurement and therefore extract and possibly elucidate novel structures (Harkewicz and Dennis, 2011). For the actual data mining we introduce a novel database search (GoBioSpace), which allows one to perform either targeted or un-targeted database searches with the acquired lipid data.

PLANT GROWTH
The Arabidopsis thaliana Col-0 plants used for the metabolite extraction were grown in a light and temperature controlled phytotron under constant CO 2 conditions using a BioBox growth chamber (GMS Gaswechsel-Messsysteme GmbH, Berlin, Germany). The plant material preparation and the experimental settings for the BioBox were as previously described (Huege et al., 2007). Plant growth in the BioBox was performed for 42 days. The aerial parts of the plants were separated from the roots by cutting, and immediately snap frozen in liquid nitrogen.

LIPID EXTRACTION PROTOCOL
Lipids were extracted from three independent biological replicates of Arabidopsis thaliana leaves. In brief: 50 mg of frozen leaf tissue was homogenized in a 2 ml Eppendorf tube (Eppendorf, Hamburg, Germany) for two times 1 min at maximum speed within a Retsch mill (MM 301, Retsch, Düsseldorf, Germany). The lipids were extracted from each aliquot using 1 ml of a precooled (−20˚C) homogenous methanol:methyl-tert-butyl-ether (1:3) mixture, spiked with 0.1 μg/ml PE 34:0 (17:0, 17:0), and PC 34:0 (17:0, 17:0) as internal standards. For the extraction, the samples were incubated for 10 min in a shaker at 4˚C (Thermostat Plus, Eppendorf), followed by another 10 min incubation in an ultrasonication bath at RT. After adding 500 μl of UPLC grade water:methanol (3:1), the homogenate was vortexed and centrifuged for 5 min at 4˚C in a table top centrifuge (Eppendorf). The addition of water:methanol leads to a phase separation producing an upper organic phase, containing the lipids, and a lower phase containing the polar and semi-polar metabolites. The upper organic phase was removed, dried in a speed-vac concentrator, and stored at −80˚C until used.

UPLC-FT-MS MEASUREMENT OF LIPIDS
The dried lipid extracts were re-suspended in 500 μl buffer B (see below) and transferred to a glass vial. Two microliters of this sample were injected on a C 8 reversed phase column (100 mm × 2.1 mm × 1.7 μm particles waters), using a Waters Acquity UPLC system. The two mobile phases were water (UPLC MS grade, BioSolve) with 1% 1 M NH 4 Ac, 0.1% acetic acid (Buffer A,), and acetonitrile:isopropanol (7:3, UPLC grade BioSolve) containing 1% 1 M NH 4 Ac, 0.1% acetic acid (Buffer B). The gradient separation, which was performed at a flow rate of 400 μl/min, was: 1 min 45% A, 3 min linear gradient from 45% A to 35% A, 8 min linear gradient from 25 to 11% A, 3 min linear gradient from 11% A to 1% A. After washing the column for 3 min with 1% A the buffer was set back to 45% A and the column was re-equilibrated for 4 min (22 min total run time).
The mass spectra were acquired using an Exactive mass spectrometer (Thermo-Fisher, Bremen, Germany). The spectra were recorded using altering full scan and all-ion fragmentation scan mode, covering a mass range from 100-1500 m/z. The resolution was set to 10,000 with 10 scans per second, restricting the Orbitrap loading time to a maximum of 100 ms with a target value of 1E6 Frontiers in Plant Science | Plant Physiology ions. The capillary voltage was set to 3 kV with a sheath gas flow value of 60 and an auxiliary gas flow of 35. The capillary temperature was set to 150˚C, while the drying gas in the heated electro spray source was set to 350˚C. The skimmer voltage was held at 25 V while the tube lens was set to a value of 130 V. The spectra were recorded from min 1 to min 20 of the UPLC gradients.

MANUAL AND AUTOMATED PEAK EXTRACTION AND ALIGNMENT
Chromatograms from the UPLC-FT-MS runs were analyzed and processed either by using Xcalibur (Version 2.10, Thermo-Fisher, Bremen, Germany), ToxID (Version 2.1.1, Thermo-Fisher), or automatically with the Refiner MS® software (Version 6.0, Gene-Data, Basel, Switzerland). In the automated approach the molecular masses, retention time, and associated peak intensities for the three replicates of each sample were extracted from the raw files, which contained the full scan MS and the all-ion fragmentation MS data. The processing of the MS data included the separate processing of the full scan spectra and the all-ion fragmentation spectra. Chemical noise was automatically removed from the spectra before the chromatograms were aligned using a pair wise-based alignment tree algorithm (Refiner MS 6.0).
Further peak filtering on the manually extracted spectra or the aligned data matrices was performed in Excel or Access (Microsoft, Seattle, WA, USA).

GOBIOSPACE DATABASE
Based on the fact that the masses measured in the mass spectrometer are almost directly connected to the elemental composition of a measured analyte, considering either an addition or loss of a sub structure -so called adducts (i.e., [M + H] + protonation, [M − H] − de-protonation, M+NH 4 ] + Ammonium-, [M + Na] + Sodium-, [M + Ca] + Calcium-adduct), GoBioSpace (Golm Biochemical Space) was conceptualized as a repository of elemental compositions with source tagged annotations for properties such as InChI strings, CAS numbers, IUPAC names, synonyms, cross references or KEGG Pathway names, among others.
The source of an annotation -the so called depositor -facilitates as a filter for the biological relevance of elemental compositions. The meaningful interpretation of search results in a biological context is accomplished by a targeted search limiting the formula to biology related depositors such as KEGG and Bio-Cyc, among others. In contrast, relaxed searches in regard to the formula's depositor (i.e., including those elemental compositions only reported from vendors of potentially synthesized chemicals) result in search hits with lower biological interpretability.
To date, we collected more than 366 million meta information for 2.1 million unique elemental compositions from more than 150 public available databases (143 included in PubChem), such as the chemical focused databases PubChem Substance 1 and ChemSpider 2 or biological focused databases such as the Human Metabolome Database 3 and Metabolome.JP 4 into the GoBioSpace repository. Our approach also facilitates the search against potentially putative elemental compositions such as described for lipids in the chapter "Targeting Specific Lipids within the Total Ion Chromatogram: Pick What You Know." For high resolution mass search queries, the accurate isotopic masses for either ambient 12 C or fully isotopic labeled 13 C, 15 N, and 34 S formula were calculated according to Böhlke et al. (2005). An indexed view in the database allows the single step matching of measured masses to elemental compositions, tolerating a given mass error and considering user defined sets of expected analytical adducts and depositors to correct the measured masses. In addition, the client side search application supports the restriction of elemental composition hits based on atom number constraints.
To make the mass search functionality accessible to the community, we implemented a Web Service within the Golm Metabolome Database (GMD 5 ; Kopka et al., 2005;Hummel et al., 2010) and integrated this web service into a graphical user interface which is also made available http://gmd.mpimp-golm.mpg.de/ GoBioSpace.aspx. Here, elemental compositions and individual or batched (tabulator formatted text files) masses can easily be configured and searched against databases of interest. The matching results are returned as browse-and sort-able tables which can be exported for further analysis as tabular formatted text files. However, the web services can be integrated for non-commercial use into any data processing pipeline. All software is implemented using the Microsoft .NET 4.0 framework, the C# language, and Microsoft Visual Studio® 2010. The data back end is based on a Microsoft® SQL Server® 2005.

UPLC-FT/MS-BASED SEPARATION AND MEASUREMENT OF CRUDE ARABIDOPSIS LIPID EXTRACTS
Arabidopsis thaliana lipids were extracted using a buffer system containing methyl-tert-butyl-ether instead of chloroform as the organic solvent (Matyash et al., 2008). This extraction protocol enabled us not only to extract the lipids with a higher efficiency, but also to extract lipids, polar and semi-polar metabolites, starch, and proteins from a single sample (Giavalisco et al., 2011). The extracted lipids were analyzed on a C 8 reversed phase UPLC column, using 1.7 μm particles (Rainville et al., 2007), in a 22 min method. Both steps, the extraction as well as the chromatographic separation are simple and high-throughput compatible methods, and are applicable for several different plants but also other, nonphotosynthetic organisms like, e.g., yeast, Drosophila, C. elegans, or mammalian tissue (data not shown).
All mass spectrometric measurements were performed on a standalone high resolution Orbitrap (Exactive) mass spectrometer (Lu et al., 2010), coupled to an ultra performance liquid chromatography system. This "smaller" version of an Orbitrap (lacking a the linear ion trap in front of the Orbitrap analyzer), which actually does not cost more than a QqQ mass spectrometer, still matches all the demands of an high resolution mass spectrometer [fast scanning (up to 10 Hz), high resolution (up to 100,000 R), and accurate mass (<2 ppm)]. The combination of these attributes therefore allows one not only to distinguish compounds with very similar masses, but also to directly annotate elemental compositions, without a need for a reference compound, based on the measured accurate masses (Giavalisco et al., 2008;Xu et al., 2010).
Each lipid extract was separated and measured twice, once in positive ionization ( Figure 1A) and once in negative ionization mode ( Figure 1B). The reason for this duplicated measurement can be easily seen by looking at the two chromatograms, as they appear quite different. The explanation for this difference comes from the chemical nature of the detected lipid species Devaiah et al., 2006). Even though all of these lipids are constructed from a small number of building blocks (a glycerol backbone linked to a number of fatty acids), their general mass spectrometric behavior is controlled by the chemical property of their class-specific head group (Yang et al., 2009). Accordingly, even though most of these lipids ionize in both ionization modes, they do have a clear bias for a specific adduct and, as a consequence, a specific polarity ( Table 1).  Figure 1A).
The correct adduct annotation is of particular importance, especially if looking at lipids where the different adducts might have very similar (or even identical) masses. One example for such a case is given in Figure 2 for a phosphatidylserine (PS) and phosphatidylglycerol (PG) lipid. As the protonated PS 34:2 is only 0.02 ppm different from the ammonium adduct of PG 34:4, which means that for the mass of 760.51385 ± 5 ppm we will get two lipid peaks from our positive mode spectrum. Looking at the adduct patterns of the spectra (including also the negative ion mode spectra), helps to solve the above mentioned annotation dilemma for these two compounds, since only the peak with a retention time of 7.17 min pairs to a sister peak with a distance of 17.02, which indicates that this peak is the ammonium adduct of PG 34:4, while the peak at RT 7.97 min can be annotated as the PS 34:2.

TARGETING SPECIFIC LIPIDS WITHIN THE TOTAL ION CHROMATOGRAM: PICK WHAT YOU KNOW
In almost all cases lipidomics studies performed in the plant field were conducted in a targeted way, meaning that a number (a few dozen to several 100) expected lipids species were profiled (Devaiah et al., 2006;Markham and Jaworski, 2007). To validate our system, we decided to profile the lipids from these previously conducted studies by selectively extracting the expected masses from our chromatograms. In total we prepared a target list containing 332 different lipid species types [168 sphingolipids (Markham and Jaworski, 2007), 147 phosphoglycero-or galacto-lipids (Devaiah et al., 2006), and 17 oxylipin species (Buseman et al., 2006)], which were detected in three independent studies, using three different extraction protocols, and three different types of mass spectrometers. As illustrated in Figure 3A we conducted the peak extraction by simply extracting each single mass associated to a specific lipid and relatively quantified the intensity of the different adducts from each chromatogram (Table S1 in Supplementary Material). In the same way it is also possible to extract several masses, belonging to different lipids, within a specific lipids class or from different classes, and quantitatively compare them to each other in parallel (e.g., whole PC 36: 1-6 and PC 34: 1-6 series is displayed in Figure 3B).
By manually extracting the masses from the chromatograms we matched 187 of the 332 different lipids, including 127/147 of the previously described phospho-, lysophospho-, and galactolipids (Devaiah et al., 2006), all 17 of the 17 previously described oxylipins (Buseman et al., 2006), and 43 of the 168 possible sphingolipids (Markham and Jaworski, 2007). Compared to the excellent coverage of lipid species from the phosphoglycero and galacto lipids the result achieved for the sphingolipids were less comprehensive, only covering the most abundant lipid species from the Markham and Jaworski (2007) study. This indicated that we were not having a general loss of sphingolipids in our method, but rather a sensitivity problem, which can often be observed if ion trap-like mass spectrometers are compared to QqQ-type mass spectrometers (Mcluckey and Wells, 2001). Additionally, we noticed that the sample preparation method used in the sphingolipid study was highly sophisticated and specifically tailored to this lipid class, including a depletion step of the highly abundant phospholipids, which will lead to a higher detection sensitivity due to strongly decreased ion suppression effects (Markham and Jaworski, 2007).
Taken together we can conclude that we do see most of the expected lipid species in our samples and most of them with several different ion species (different adducts). The data of these initially extracted and validated lipid species is collected in Table  S1 in Supplementary Material.

SYSTEMATIC DISTRIBUTION OF RETENTION TIME AND MASS AIDS TO VALIDATE THE ANNOTATION OF THE MEASURED LIPIDS
Confidence in the annotation of a measured compound can be increased with the number of parameters this compound shares with related compounds. Since lipids are constructed as modular molecules (Fahy et al., 2009;Yang et al., 2009), which usually vary only slightly between the different species within a lipid class (extension of the fatty acid chain length or the degree of saturation), they have a very systematic mass and retention time behavior (Hermansson et al., 2005). Therefore, both these parameters allows the validation of lipids within a specific class by simply plotting the m/z and RT values of the measured species of the most abundant adduct in a scatter plot. As can be seen for Figure 4 (scatter plot for the measured PCs from Table S1 in Supplementary Material), the lipids with longer fatty acid chains lead to a higher mass and increased retention time, while fatty acids with higher degrees of un-saturation result in lipids with lower masses and decreased retention times. As a consequence, a diagonal series appears within the plots. These contain lipid species with the same number of carbons atoms in the fatty acid chains but show decreasing number of double bonds from left to right (Figure 4). Wrongly annotated Frontiers in Plant Science | Plant Physiology shows the mass spectrum from the apex of the MGDG 34:6 peak with the retention time of 7.08 min and its associated ionization adducts. (B) As above, but here the TIC and the spectrum of the negative ion mode measurements are shown.
or unusually distributed lipids can be easily detected within these patterns since they appear as dots outside the systematic scatter pattern. A curious and unexplained example is given for the PCs with 40 carbons in the two fatty acid chains (Figure 4). Even www.frontiersin.org though these lipids are systematically distributed by themselves, it is evident from the plot that they are not matching the distribution of the other, shorter fatty acid chain lipids in this lipid class. The PC 40:2 for example, which would be predicted to have a later elution time than the PC 38:2, does actually elute almost a minute earlier than the shorter chain classmate (Figure 4). This could indicate that the PC 40:X lipids have been either annotated wrongly or there is a systematic shift in these longer fatty acid chain lipids.
Next to the exclusion of possibly wrongly annotated lipids, the scatter plot representation allows one to also quickly detect missing lipid species within a systematic series. In this case one or several dots would be missing within the diagonal line. In Figure 4 we can see for example that we could not detect PC 38:1. Even rechecking the spectra at the expected retention time did not allow us to detect the expected peak.

ALL-ION FRAGMENTATION DATA FOR THE LIPID ANNOTATION VALIDATION
Using high resolution accurate mass data is in many cases sufficient to predict an elemental composition of a measured peak (Giavalisco et al., 2009). Still the accuracy and probability for a correct annotation is increased if along with the accurate mass of the intact molecule (precursor) an additional mass of a compoundspecific fragment can be detected. The measurement of the mass of the intact precursor and one or several fragments are the essential values for the peak identification in shotgun lipidomic analysis . The occurrence of these specific fragment ions results from either a specific loss of a charged molecule (e.g., choline head group from PC lipids) or from the loss of an  uncharged fragment (neutral loss). This technique can also be used on LC-MS-based systems in non-shotgun lipidomic studies, but only if fragmentation mass spectra are recorded.

Frontiers in Plant Science | Plant Physiology
The main advantage of high pressure sub 2 μm particle UPLC systems, compared to conventional, lower pressure, larger particle HPLC systems, is its fast, sensitive, and highly reproducible chromatography (Plumb et al., 2004). The faster chromatography and the smaller peak width, which is a consequence of the higher plate number achieved in the UPLC system, turns into a disadvantage when the number of scans/time of the mass spectrometer www.frontiersin.org are too low to perform the survey full scans and data-dependent MS/MS measurements of the most abundant peaks (Schmitt-Kopplin et al., 2008). The FT-MS instrument used in this study, which has a scan speed of up to 10 Hz at a resolution of 10,000 can circumvent this problem partially, but still, even 10 scans/s are not enough time to perform classical data-dependent MS/MS analysis of several eluting masses while recording sufficient information for good peak integration, especially if the eluting peaks are only 3-6 s long ( Figure 3A). The solution for this problem, which has originally been developed and implemented under the name MS e as a scan method for qTOF mass spectrometers (Bateman et al., 2007), and simply relies on the fragmentation of all precursor ions measured in the full scan instead of selecting individual masses. This approach has successfully been used in a proteomic study in the Exactive MS and was called all-ion fragmentation (Geiger et al., 2010). In Figure 5A an illustration of the measurement method used for our lipidomic analysis is given, showing that we constantly alter between low energy full scans and high energy all-ion fragmentation scans throughout the whole chromatographic separation. The advantage of this procedure is that two independent MS data-sets are generated, one contains the intact mass information for all the compounds eluting during the chromatographic separation, while the second contains the fragmentation data for the selfsame compounds. To integrate this data and to validate a predicted lipid it is only necessary to connect the elution profile of a full scan (low energy) mass to the similarly eluting masses from the all-ion MS/MS (high energy) spectra. In Figure 5B this procedure is illustrated for PC 36:6. As can be seen, three fragment masses (m/z 184.07381, m/z 500.31598, and m/z 518.32513) within the mass spectra between 7.2 and 7.8 min are exactly co-eluting to the phosphocholine lipid (m/z 778.53894) and should therefore be associated. Another two masses (m/z 728.52446 and m/z 573.48822), which are closely co-eluting, show clearly differential elution profiles and can therefore excluded to be associated to PC36:6, indicating that they should represent different lipids.
The systematic analysis of these all-ion MS/MS spectra therefore allows us to uncover a number of lipid specific fragments, which can be used to validate a specific lipid species, e.g., the masses m/z 500.31598 and m/z 518.32513, which are specific fragments of PG 36:6 ( Figure 5B). As well, we can also find class-specific fragments, like the m/z 184.07381, which is the positively charged choline fragment that can be detected for all phosphocholine lipids.

AUTOMATED LIPID ANNOTATION STRATEGIES
The strategy presented for the analysis of lipids thus far still requires a high manual input, especially for the validation of the lipid annotation. Of course this is only true if a novel sample (a new organism or a new tissue) is analyzed. Once a sample is annotated and no major changes in the extraction procedure or the chromatographic separation are introduced, the following lipid profiles can be simply matched to the results of the initially performed peak annotation.
The chromatographic and the spectral compatibility between different samples, namely the retention time and the spectral Frontiers in Plant Science | Plant Physiology

FIGURE 5 | Ultra performance liquid chromatography-MS measurement strategy employed for the lipid analysis in this study: (A) illustration of high and low energy alteration for the acquisition of full scan and all-ion MS/MS spectra. (B)
Extracted ion chromatograms of the indicated masses (derived either from the high or low energy mass spectra) from a representative positive ion modes UPLC chromatogram. Peaks with the same elution profile can be regarded as co-eluting masses, which are derived from the same precursor molecule. Differentially eluting peaks have to be regarded as different compounds, requiring different annotations. intensities, are achieved by using the two internal standards (PE 34:0 and PC 34:0), which we have spiked into the extraction buffer. Increasing the number of internal standards might be useful in the long run if the retention time system needs to be converted into a retention index system, which would possibly allow one to not only match lipids within a single experiment, but also between different experiments.
After having annotated the initial expected lipids from a novel matrix the data analysis can be automated by using one of the two different strategies depicted in Figure 6. The main distinction between the two approaches lies in the fact that one strategy directly targets only the peaks of interest by selectively extracting the masses of lipids of interest at specific retention times from the generated chromatograms (left part of Figure 6), while the second strategy relies on a slightly different approach. Here, all the peaks from the chromatograms are extracted and aligned into a data matrix before matching these peaks to the m/z and RT values of an annotated peak list (right part of Figure 6). The result in both cases should be almost identical. The major difference between the two approaches lies in the fact that in the first approach only annotated peaks can be used for the analysis, while the second approach allows for the further use of an un-annotated matrix, derived from the peak picking software, providing the basis for fully un-targeted lipidomics.
www.frontiersin.org FIGURE 6 | Automated, software-assisted, strategies for targeted but also un-targeted lipid profiling. On the left hand side a purely targeted strategy is depicted, where based on a target list several chromatograms are searched for the occurrence of specific m/z and retention times. If a peak is found, within certain tolerance boundaries, the intensity is loaded to a result table. The strategy on the right hand side indicates a diverse strategy. Here all peaks are extracted first and these are written into an un-annotated data matrix. This matrix can then be compared against a target list (same as for the first strategy) or used for statistical analysis of significantly differential peaks, which then have to be annotated.
For the purpose of targeted peak picking (left part of Figure 6), software is usually provided by the vendor of the mass spectrometer. This software can be used by uploading a target list containing the name, the m/z, and the RT of the peaks of interest. This target list is then used to query the chromatograms generated during the analysis. The output of such a search is a list where every peak of interest is associated to the compound name, the measured m/z and RT, and an intensity value, which is equivalent to the relative amount of the compound within the sample. For the analysis of Exactive or other Thermo-Fisher MS data two software packages are available: either a processing method [which has to be entered compound by compound within Xcalibur (Thermo-Fisher, Bremen, Germany)] can be generated, or if the ToxID software package (Thermo-Fisher) is used, a comma separated text file can be employed for the targeted analysis of the lipidomic data.
For the purpose of targeted, but also un-targeted data analysis (right part of Figure 6), peak picking and matrix alignment of all peaks is necessary first. Here several commercial, but also open source software packages are available (Katajamaa et al., 2006;Smith et al., 2006;Katajamaa and Oresic, 2007;Benton et al., 2008;Lommen, 2009;Pluskal et al., 2010). Once the initial, un-annotated matrix is generated from a suitable software package, this matrix can be further filtered and compared to the previously generated reference lists.
Usually a matrix from Arabidopsis leaf tissue contains 30,000 or more reproducible peaks which are above a minimal threshold of 10,000 counts (data not shown). The difference in dimensions between the target list and the global matrix already indicates that even though we are mining a significant portion of lipids from these samples (200-300 lipid species, Tables S1 and S2 in Supplementary Material), the majority of the detectable peaks remains un-annotated.

GOBIOSPACE: A DATABASE SEARCH INTERFACE FOR MASS SPECTROMETRIC DATA
As shown in Figure 6 the un-targeted global matrix, which contains all the extractable peaks from the recorded mass spectra, can be compared against a reference list of annotated compounds. The size and the content of these lists can vary significantly: therefore one can use the reference list generated in this study (Table S1 in

Frontiers in Plant Science | Plant Physiology
Supplementary Material) or other more comprehensive customer made lists. Furthermore public and commercial databases like, e.g., the Lipid Maps (Fahy et al., 2005(Fahy et al., , 2009, the KNApSAcK (Shinbo et al., 2006), KEGG (Kanehisa et al., 2008), PubChem ), or ChemSpider (Williams, 2008 can be employed for even more comprehensive or specific database searches. The problem with these comparisons is that first of all not all these databases are easily accessible, but also even if they are, it still requires experience and personal effort with appropriate tools to compile these databases into a suitable resource. For this purpose we decided to develop a distributed client-server application utilizing a graphical user interface which supports the matching of measured masses to elemental compositions deposited in a relational database and make this tool publicly available. We named this software GoBioSpace (for Golm Biochemical Space), which can be installed on Microsoft Windows XP Service Pack 3 and later desktop computers using the ClickOnce deployment 6 . The database server is accessed in-house directly using ADO.NET 7 , while internet users fall back to WSDL-based [W3C (2001) Web Services Description Language (WSDL) 8 ] web services 9 .
The main functionality of GoBioSpace is to compare measured masses from mass spectrometric measurements, now including all kind of mass spectrometric data (high accurate mass but also lower mass accuracy), against a single or several databases (see Materials and Methods). As illustrated in Figure 7, the workflow for the data analysis is simple: a single mass or an elemental composition, but also a list of masses or formulas (tab-delimited text file) can be loaded into the software and searched against a single or several databases (at the moment more than 150 public databases are hosted, including the whole PubChem collection). Prior to the database search a number of parameters have to be specified, including the possible adducts of the measured mass (e.g., , the mass accuracy of the entered data, and finally a selection of elements expected to be contained in the matching compounds. The database search by itself (the inhouse version) is quite fast and can process easily 2,000 searches per second, meaning that even a large list containing 30,000 peaks is processed within 15 s. However, reasoned by the increased complexity of protocol layers utilizing xml (eXtensible Markup Language) 10 and http (Hypertext Transport Protocol) 11 for data encapsulation and transport over the internet, we expect the performance of the internet version to fall below this value, also depending on the final capacity of the web and database servers. The output format of the result list, which is again a tab-delimited text file, contains all the information contained in the input table (measured m/z, RT and intensity of the measured peaks) added by the possible elemental composition of the measured mass, the adduct used to match measured and calculated mass, the database this hit was derived from, one or several compound name(s) if specified within the selected databases, and the mass error between the measured mass and the matched hit.
This database search resulted initially in a list of more than 4,000 hits for the positive mode spectra and 1,500 hits for the negative mode spectra. After correcting for the accurate adducts ( Table 1) but also the expected retention times of the expected lipids within their lipid classes (Table S1 in Supplementary Material) we annotated, still very conservatively, 577 distinct peaks which were annotated to 265 unique elemental compositions (Tables S1 and S2 in Supplementary Material). Still, the number of hits within the already highly targeted database search seems to promise that this data-set contains many more compounds awaiting a proper annotation.
For overview purposes and to visualize the annotated data we mapped all the annotated lipids from Table S2 in Supplementary Material into a scatter plot (Figure A1 in Appendix) and the different lipid classes and their distribution within the positive mode UPLC chromatogram (Figure 8).

PROS AND CONS OF DIFFERENT LIDOMIC STRATEGIES
The most common approach for systematic lipid profiling is still the well-established shotgun lipidomic approach Welti et al., 2007b;Yang et al., 2009), which was conceptually developed more than 15 years ago (Han and Gross, 1994). Due to this fact, there are several publications available (including comprehensive plant studies), which either made directly use of the QqQ approach (Welti and Wang, 2004;Devaiah et al., 2006;Welti et al., 2007b) or modified it for the use on different mass spectrometers like qTOF (Ekroos et al., 2002;Ejsing et al., 2006;Esch et al., 2007) or the Orbitrap (Yang et al., 2007. As a consequence different commercial and open source software packages were developed to make use of this kind of data (Ejsing et al., 2006 #127;Graessler et al., 2009;Yang et al., 2009;Herzog et al., 2011).
The developments and the application of LC-MS lipidomics, especially in the plant field, seems to be less popular, even though a number of groups developed different open source software packages for these applications (Haimi et al., 2006(Haimi et al., , 2009Taguchi and Ishikawa, 2010;Nygren et al., 2011). The lack of absolute quantification, or better the lack of control of ion suppression in LC-MS-based lipidomic studies and the increased analytical complexity seem to be the main reasons for this discrepancy (Stahlman et al., 2009).
Ion suppression in shotgun lipidomic studies cannot be eliminated, even if lipid class-specific internal standards are used. The function of these internal standards is basically to corrected for the differential suppression effects on each measured lipid molecule www.frontiersin.org FIGURE 7 | Overview with screenshots of the GoBioSpaceassisted database search procedure. The workflow is separated in three steps: data input (single mass, mass list, or formula), specification of search criteria (databases, expected mass adducts, mass error tolerance, expected elements, and isotope label), and data output. (Stahlman et al., 2009;Yang et al., 2009). Making use of mixtures of internal standards (in best case using one or two standard lipids per lipid class (Welti and Wang, 2004)) for LC-MS-based lipidomic studies could be possible if these mixtures are spiked in the eluting sample post-column, using a second pump and a tconnection. Such an on-line LC-MS approach using continuously infused internal standards at low concentrations, which has not been demonstrated yet, would definitely be an excellent compromise between complicated and time consuming off-line sample pre-fractionation (Stahlman et al., 2009), and the use of strongly ion-suppressed shotgun lipidomics. Our developed system could therefore provide an excellent test case for such an approach.
Alternatively, the use of fully labeled metabolomes or lipidomes (Ekroos et al., 2002;Hegeman et al., 2007;Giavalisco et al., 2008Giavalisco et al., , 2009) could be an alternative way to quantify and annotate lipids in LC-MS-based studies. For this purpose analytical samples will be spiked with the same amount of the isotope-labeled matrix (Giavalisco et al., 2009). This approach, which has been tested by us (data not shown), is of course more complicated and expensive than the post-column spiking with a handful of reference Frontiers in Plant Science | Plant Physiology  Table S2 in  Supplementary Material. compounds, but next to the relative quantification, it will also allow the reliable annotation of previously unknown compounds (Giavalisco et al., 2008(Giavalisco et al., , 2009).

ANNOTATING LIPIDS WITH DIFFERENT STRATEGIES: HOW MANY LIPIDS REMAIN UN-ANNOTATED?
One of the biggest differences between targeted and un-targeted lipid analysis lies in the fact that even though a number of 150 profiled and quantified lipids enables a meaningful analysis of an organism , there still remain many unidentified peaks to be annotated before we can really call it a lipidomic analysis. Looking at the data from our study already shows that of the 30,000 extractable peaks"only"577 were annotated to a compound by using a targeted approach (Table S2 in Supplementary Material). Increasing the size of the employed databases would therefore directly provide a larger number of possible annotations, but this comes, in dependence of the database size used for the annotation, at the price of also annotating more false positives (Matsuda et al., 2009). Here the use of additional, orthogonal, physico-chemical properties can increase the validation of the recorded data. While the use of fragmentation data will greatly help to exclude false positives, also the use of the retention time information will improve the predictability of an annotation, which strongly argues in favor of LC-MS-based lipidomics (Figures 4 and 8).
Another advantage of LC-MS-based lipidomics in combination with global, un-targeted peak extraction lies in the statistically analyzed whole data-set consisting of 30,000 peaks prior to peak annotation. As a consequence, only the differential peaks would be regarded as potentially interesting and therefore subjected to more sophisticated peak annotation strategies. The annotation strategy could include isotope-labeling (see above) or analytical preparation techniques, including peak collection from the chromatographic run and subsequent analysis using higher order MS/MS, analysis on a high resolution mass spectrometer (Schwudke et al., 2007), or other orthogonal analytical techniques such as NMR.

COME BACK LATER: REVISITING OLD SPECTRA WITH NEW KNOWLEDGE
High resolution full scan and all-ion fragmentation spectra containing thousands of peaks are not only a rich source of biological information for a "one-pass" analysis but could serve as a repository of information, which can be reused with new knowledge repeatedly.
We demonstrated in our study that the use of targeted data, derived from a limited number of plant lipidomic studies (Buseman et al., 2006;Devaiah et al., 2006;Esch et al., 2007;Markham and Jaworski, 2007;Welti et al., 2007a,b; www.frontiersin.org et al., 2008a,b), allowed us to profile and annotate more than 260 lipid species. Increasing the list of targets by annotating novel lipid species, or simply checking literature for previously un-targeted lipids like N -acyl phosphatidylethanolamines (NAPE) and more complex sphingolipids (Welti and Wang, 2004), or tetra galactolipids (Moreau et al., 2008), will increase the length of the list of lipids which can be profiled. This includes the repercussive profiling of old data. Therefore, in the future more knowledge about thus far unidentified lipid moieties will allow us to annotate and profile more and more lipid species; we will not have to rerun all of our old experiments, since we can simply revisit our old high resolution chromatograms and reexamine them. This cannot be done using shotgun lipidomics with highly sensitive, but low resolution mass spectrometers.