Lipidomics of Environmental Microbial Communities. I: Visualization of Component Distributions Using Untargeted Analysis of High-Resolution Mass Spectrometry Data

Lipids, as one of the main building blocks of cells, can provide valuable information on microorganisms in the environment. Traditionally, gas or liquid chromatography coupled to mass spectrometry (MS) has been used to analyze environmental lipids. The resulting spectra were then processed through individual peak identification and comparison with previously published mass spectra. Here, we present an untargeted analysis of MS1 spectral data generated by ultra-high-pressure liquid chromatography coupled with high-resolution mass spectrometry of environmental microbial communities. Rather than attempting to relate each mass spectrum to a specific compound, we have treated each mass spectrum as a component, which can be clustered together with other components based on similarity in their abundance depth profiles through the water column. We present this untargeted data visualization method on lipids of suspended particles from the water column of the Black Sea, which included >14,000 components. These components form clusters that correspond with distinct microbial communities driven by the highly stratified water column. The clusters include both known and unknown compounds, predominantly lipids, demonstrating the value of this rapid approach to visualize component distributions and identify novel lipid biomarkers.


INTRODUCTION
Lipids are the main building blocks of microorganisms and occur ubiquitously in the environment. A large number of lipids are synthesized by many different genera and orders of microbes but some lipids are unique to specific organisms or groups of organisms or to specific biogeochemical processes (e.g., Koga et al., 1998;Cvejic et al., 2000;Sinninghe Damsté et al., 2002b;Belt et al., 2007;Rossel et al., 2008;Bauersachs et al., 2009;Elling et al., 2017;Rush and Sinninghe Damsté, 2017;Longo et al., 2018;Gutiérrez et al., 2020), and hence serve as biomarker lipids. Furthermore, intact polar lipids (IPLs) are thought to degrade rapidly upon cell death indicating recent activity of microbial cells (White et al., 1979;Harvey et al., 1986). Therefore, analysis of IPLs and other lipids can, complementary to microbiological and molecular methods, provide valuable information on the diversity and activity of microorganisms in the environment. Early biomarker lipid studies of the microbial composition in the environment utilized gas chromatography (GC) often coupled to mass spectrometry (MS) to detect lipid biomarker such as steroids (Gaskell and Eglinton, 1976;Summons et al., 1992), and in particular phospholipid derived fatty acids (PLFAs) (White et al., 1979). In recent decades, the range of biomarker lipids utilized in environmental studies has been extended to include direct analysis of larger IPLs using liquid chromatography (LC) coupled to MS (e.g., Sturt et al., 2004).
Traditionally, the range of IPLs detected in the environment using LC-MS are constrained through individual peak identification and comparison with previously published mass spectra or standards (Sturt et al., 2004;Ertefai et al., 2008;Lipp and Hinrichs, 2009;Schubotz et al., 2009Schubotz et al., , 2018Van Mooy and Fredricks, 2010;Popendorf et al., 2011;Brandsma et al., 2012;Gibson et al., 2013;Bale et al., 2016;Ding et al., 2020). This generally results in the detection only of dominant (groups of) known IPLs. With the advent of ultra-high-pressure liquid chromatography coupled with multistage high-resolution mass spectrometry (UHPLC-HRMS n ) the richness of the mass spectral data has significantly increased, leading to the identification of many new lipid compounds (e.g., Liu et al., 2012a,b;Moore et al., 2013Moore et al., , 2016Zhu et al., 2014a,b;Yoshinaga et al., 2015;Sollai et al., 2018;Bale et al., 2019b), but this identification is selective and tedious and could be sped up by using computational approaches. Furthermore, compounds with unknown mass spectra may be ignored even though they could be indicative of the abundance of specific microbes.
Recently, a number of studies have applied lipidomic data handling strategies in marine environments (Hunter et al., 2015;Collins et al., 2016;Cutignano et al., 2016;Becker et al., 2018a;Li et al., 2018;Saleem et al., 2019;Zeng et al., 2019). These studies describe various methods for lipidomic workflows such as highthroughput annotation and identification of lipids in HRMS data and statistical analysis of the outputted lipid data. Some of which have taken an untargeted approach, rather than attempting to relate each mass spectrum to a certain compound, they extract all spectral components (sometimes denoted as "features") (Pluskal et al., 2010(Pluskal et al., , 2020. The composition of spectral components can then be used to, e.g., define specific environmental niches. Here, we perform an untargeted analysis of the lipidome of the Black Sea. We extract components based on high resolution MS 1 spectra from UHPLC-HRMS analysis of lipid extracts of suspended particles and use a statistical approach to compare the individual abundance depth profiles of all components. These depth profiles are clustered by their similarity to one another, without bias toward known or abundant components. We compare these findings with previous studies, which have examined the lipidome of the Black Sea water column using traditional IPL identification (e.g., Neretin et al., 2007;Wakeham et al., 2007;Schubotz et al., 2009;Sollai et al., 2018). Our rapid visualization method lays a foundation for the study of Ding et al. (2021), reported in parallel to this study, which used information theory framework combined with molecular networking to investigate lipid diversity and specificity as well as identify novel lipids.

Sampling and Environmental Setting
Suspended particulate matter (SPM) at various water depth in the water column ( Table 1) was collected during two cruises in 2013 and 2017 in the Black Sea. The 2013 Black Sea SPM was collected from 50 to 2000 meter below sea level (mbsl; see Table 1 for depths) during the PHOXY cruise (June 2013) aboard of the R/V Pelagia (Kraal et al., 2017;Sollai et al., 2018). The sampling station (PHOX2) was located at 42 • 53.8 N, 30 • 40.7 E in the western gyre of the Black Sea. The 2017 Black Sea SPM (50-2000 m, Table 1) was collected during the 64PE418 cruise (March 2017) also aboard of the R/V Pelagia. For the latter, the sampling station (Station "4") was located at 42 • 46.9 N, 29 • 21.1 E. Because the two SPM profiles (2013 and 2017) were collected from different stations (∼60 nautical miles apart, although both in the center of the western gyre of the Black Sea), they represent two distinct sample sets, both temporally and spatially.
Suspended particulate matter was collected on pre-ashed filters mounted on McLane WTS-LV in situ pumps (McLane Laboratories Inc., Falmouth, United Kingdom). In 2013 142mm-diameter 0.7-µm pore size glass fiber GF/F filters (Pall Corporation) were used and in 2017 they were 0.3 µm GF75 filters (Advantec). Upon the recovery of the in situ pumps on the deck of the ship, the filters were immediately stored at −80 • C until extraction. Physical parameters of the water column were recorded with a Sea-Bird SBE911C conductivitytemperature-depth (CTD) system equipped with a 24 × 12 L Niskin bottle rosette sampler. For the methods used for the measurement of dissolved oxygen (O 2 ) and inorganic nutrients see Sollai et al. (2018).

SPM Extraction and LC-MS Analysis
Lyophilized filters were extracted using a modified Bligh-Dyer procedure (similar to that described in Sturt et al., 2004) in batches of three alongside an extraction blank, consisting of pre-ashed glass fiber filter. Briefly, the cut-up filters were twice extracted ultrasonically for 10 min in a mixture of methanol, dichloromethane, and phosphate buffer (2:1:0.8, v:v) and the combined supernatants were phase-separated by adding additional dichloromethane and buffer to a final solvent ratio of 1:1:0.9 (v:v). The organic phase containing the IPLs was collected and the aqueous phase re-extracted three times with dichloromethane. All steps of the extraction were then repeated on the residue but with a mixture of methanol, dichloromethane, and aqueous trichloroacetic acid solution (TCA) pH 3 (2:1:0.8,  Sollai et al. (2018).
v:v). Finally, the organic extracts were combined and dried under a stream of N 2 gas. Before analysis the extract was redissolved in a mixture of MeOH:DCM (9:1, v:v) which contained two internal standards (IS), a platelet activating factor (PAF) standard (1-O-hexadecyl-2-acetyl-snglycero-3-phosphocholine) and a deuterated betaine lipid {1,2-dipalmitoyl-sn-glycero-3-O-4 -[N,N,N-trimethyl(d9)]-homoserine; Avanti Lipids}. Aliquots were filtered through 0.45 µm regenerated cellulose syringe filters (4 mm diameter; Grace Alltech). Extraction blanks were performed alongside the SPM extractions, using the same glassware, solvents and extraction methodology, but with no glass fiber or SPM material. Analysis of SPM extracts was carried out using UHPLC-HRMS according to the reversed phase method of Wörmer et al. (2013) with the following modifications. We used an Agilent 1290 Infinity I UHPLC, equipped with thermostatted autoinjector and column oven, coupled to a Q Exactive Orbitrap MS with Ion Max source with heated electrospray ionization (HESI) probe (Thermo Fisher Scientific). Separation was achieved on an Acquity BEH C18 column (Waters, 2.1 × 150 mm, 1.7 µm) maintained at 30 • C.  [50:50:0.12:0.04 (v:v)]. The elution program was: 95% A for 3 min, followed by a linear gradient to 40% A at 12 min and then to 0% A at 50 min, this was maintained until 80 min. The flow rate was 0.2 mL min −1 . Positive ion HESI settings were: capillary temperature, 300 • C; sheath gas (N 2 ) pressure, 40 arbitrary units (AU); auxiliary gas (N 2 ) pressure, 10 AU; spray voltage, 4.5 kV; probe heater temperature, 50 • C; S-lens 70 V. Lipids were analyzed with a mass range of m/z 350-2000 (resolving power 70,000 ppm at m/z 200), followed by data-dependent tandem MS/MS (resolving power 17,500 ppm), in which the 10 most abundant masses in the mass spectrum were fragmented successively (stepped normalized collision energy 15, 22.5, 30; isolation width, 1.0 m/z). The Q Exactive was calibrated within a mass accuracy range of 1 ppm using the Thermo Scientific Pierce LTQ Velos ESI Positive Ion Calibration Solution. During analysis dynamic exclusion was used to temporarily exclude masses (for 6 s) in order to allow selection of less abundant ions for MS/MS.
It should be noted that IPL species have diverse degrees of ionization efficiencies (Van Mooy and Fredricks, 2010) and hence the peak areas, in response units, of different components do not necessarily reflect their actual relative abundance. However, this method allows for comparison between samples when analyzed together.

Data Processing
Each SPM extract was analyzed in duplicate by UHPLC-HRMS, in sequence with extraction blanks (Bligh-Dyer extraction without SPM material) and analytical blanks (solvent injections on system between extract analyses). Extractions blanks were included in the subsequent data processing scheme, while analytical blanks were used for routine quality control checks but were not included in further data processing. For the Black Sea 2013 series, there were 30 analyses of SPM extracts (15 samples in duplicate) and 8 analyses of extraction blanks (4 extraction blanks in duplicate), totaling 38 analyses. In order to provide relevant information about background compounds and contamination, all extraction blanks were carried out using approximately the same solvent volumes as for the samples and extracts and were injected in the same injection volume. For the Black Sea 2017 series, there were 22 analyses of SPM extracts (11 samples in duplicate) and 8 analyses of extraction blanks (4 blanks in duplicate), totaling 30 analyses. For each UHPLC-HRMS analysis, the raw data files was converted to an.mzXML file using MSconvert of the ProteoWizard package (Chambers et al., 2012). The mzXML files were then processed using MZmine software (Version 2.34) (Pluskal et al., 2010). A signal threshold of 1 × 10 5 was applied for MS 1 mass detection within the mzXML files. The detected masses were used to build extracted ion chromatograms (EIC) with a minimum peak height of 1 × 10 6 (AU) and a 3 ppm relative mass tolerance. Chromatographic deconvolution (separation of detected masses into individual peaks) was carried out with the "baseline cutoff " algorithm where the minimum peak height was 1 × 10 5 AU and the maximum peak width was 5 min. Following the deconvolution, isotope peak grouping was carried out with a 3 ppm relative mass tolerance and a 0.2 min retention time tolerance followed by alignment of EICs with a 20 ppm mass tolerance and a 0.5 min retention time tolerance. Only aligned EICs containing a minimum of two isotope peaks and occurring in at least two samples were included. The next stage of processing removed duplicate peaks (within a window of 5 ppm mass and 0.4 min retention time). For each individual component, an MS 1 peak area was recorded for each sample, which was then corrected for the amount of sea water filtered (in L).
A preliminary statistical analysis of datasets was then carried out using Ward method average-neighbor hierarchical clustering (Ward, 1963) using JMP R software (Version 14.2.0., SAS Institute Inc.). For the 2013 dataset, for example, the 38 analyses were clustered according to the similarity in their component distributions (Figure 1). All duplicate analyses clustered together, demonstrating the reproducibility of the UHPLC-HRMS analytical method and the MZmine component extraction. The SPM extracts also clustered in a relatively good accordance to the different redox zones where they were collected ( Table 1) as indicated with red annotation in Figure 1. The duplicate SPM extracts from 2000 mbsl depth clustered most closely with the extraction blanks, indicative of the much lower concentration of organic matter at this depth.
For both datasets a blank subtraction method was subsequently carried out. For each component an average of the extraction blank peak areas was calculated (2013, n = 8; 2017, n = 8). This was then deducted from the average of the SPM extract peak areas (2013, n = 30; 2017, n = 20). While this led to the complete removal of many contaminants, some remained in the final dataset. When examining the most abundant components in the various clusters, eight known contaminants remained, mostly in the 2000 mbsl SPM extracts (Supplementary Table 1). It is possible these contaminants were introduced during sampling rather than extraction as they occurred in both duplicate samples. These eight contaminants were excluded from further analysis. In the 2013 dataset blank subtraction led to the complete removal of 996 components and in the 2017 dataset it was 562.
For the components remaining after blank extraction, an average value was calculated for each pair of duplicate SPM analyses. The resulting two datasets were then analyzed by two-way average-neighbor hierarchical clustering (using JMP R ), again following the Ward method, according first to component distribution in samples and then two way clustered to component abundance across samples. This clustering was visualized in a two-dimensional dendrogram with peak area abundance shown with a color scale.

Untargeted Analysis of Black Sea SPM Collected in 2013
The Black Sea is the world's largest permanently stratified anoxic basin. The SPM collected covered the full range of redox zones throughout the water column (Table 1): the oxic zone, the suboxic zone, which is defined as the zone where both oxygen and sulfide are below detection levels, and the euxinic zone where the water is both anoxic and sulfidic (Murray et al., 1999).
Lipid extracts of SPM (n = 15) produced a final dataset of 14,648 components. This entire final dataset then underwent two-way average-neighbor hierarchical clustering, dendrogram construction and a heat map plotting using JMP R software (Figure 2). The clustering of the SPM extracts (vertical axis, denoted by the depth at which the SPM was collected) is based on the distribution of all MS 1 spectral components within the SPM extracts. The clustering of the components (horizontal axis) is based on the similarity of the abundance depth profiles for each component. Each individual depth profile in Figure 2 has a color scale (heat map, red, highest abundance of component in depth profile; blue, lowest abundance of component in depth profile). This visualization of both the similarity between SPM extracts and the similarity between the component depth profiles is a useful resource for a rapid examination of the dataset. Firstly, it reveals component clusters that are associated with specific niches, environmental variables, or organisms of interest. Secondly, it is clusters that contain a known component can be used to find other components with a similar distribution

Component Clusters and Their Association With Specific Redox Zones
The vertical clustering generally followed the three redox zones (cf. Table 1) as marked (with red horizontal lines and red labels). The division between the redox zones was also visible in the distinct areas of component abundance maxima (red color) on the heat map, indicating a general division in the component signature of the SPM based on different chemical, and by extension, biological strata. The dendrogram of the 14,648 components (along the horizontal axis, Figure 2) showed four main clusters. We denoted these four large clusters as A-D (with red vertical lines and red labels). The four component clusters also correlated well with the different redox zones. Component cluster A (4846 components) corresponded most closely with the oxic zone (50 mbsl). Cluster B (4131 components) was most associated with the upper half of the suboxic zone and the lowest depth from the oxic zone (70-90 mbsl), while cluster C (1822 components) corresponded with the deeper part of the suboxic zone (95-110 mbsl). Component cluster D  Table 1.
(3849 components) was most associated with the euxinic zone (130-2000 mbsl). This division of the components into redox-defined niches of the Black Sea is in line with the earlier findings from traditional lipid analyses (Repeta and Simpson, 1991;Kuypers et al., 2003;Wakeham et al., 2003Wakeham et al., , 2007Coolen et al., 2007;Schubotz et al., 2009;Becker et al., 2018b). The benefit of this non-targeted visualization approach, however, is that it is quick, unbiased, uses all mass spectral information and does not require the laborious integration of all compounds, nor the plotting of their individual depth profiles. Furthermore, the components, while dominated by lipids due to the extraction and analytical methods applied, are expected to contain many other biomolecules, expanding the search for potential biomarkers associated with microorganisms or processes of interest.

Abundant Components in Major Clusters
In order to examine the output of our component extraction and visualization method, we examined the 10 most abundant components in each of the four component clusters A-D (Table 2) in order to identify them and to compare the results with previous studies of the Black Sea lipidome (e.g., Neretin et al., 2007;Wakeham et al., 2007;Schubotz et al., 2009;Sollai et al., 2018). Identification followed traditional MS interpretation methods, looking at the accurate mass of the component and its MS/MS spectrum in comparison with published data. In the component cluster A (associated with the oxic zone), seven of the 10 most abundant components were triacylglycerols (TAG; Hsu and Turk, 1999). TAG lipids can form an important fraction of the total fatty acid inventory, and are utilized as storage lipids in algae and zooplankton (i.e., Guschina and Harwood, 2009;Becker et al., 2018a). Also abundant components in cluster A were two betaine lipids: diacylglycerylcarboxyhydroxymethylcholine (DGCC) and diacylglyceryltrimethylhomoserine (DGTS; Li et al., 2017). Betaine lipids have been found in a variety of eukaryotes and photosynthetic bacteria (Dembitsky, 1996) and DGTS in particular is found in green algae (Li et al., 2017), while DGCCs have been shown to be a common membrane lipid in Haptophyceae algae (Kato et al., 1996). Also present was the photosynthetic pigment chlorophyll-a (chl-a) (Bale et al., 2011; Table 2). The dominance of TAGs, betaines lipids and chl-a in the component cluster associated with the oxic (photic) zone are all in line with a dominance of phytoplankton. In fact, the western Black Sea experiences extensive phytoplankton blooms, especially during the summer period (Moncheva et al., 2001) when our sampling occurred. In a previous study of IPLs throughout the Black Sea water column, betaine lipids were found to be very abundant in the oxic zone but TAGs and chlorophylls were not reported owing to the applied chromatographic method (Schubotz et al., 2009).
In cluster B (associated with the upper part of the suboxic zone, 70-90 mbsl), the eight of the 10 most abundant components were also TAG lipids. A DGTS betaine lipids was also in the most abundant components of this cluster along with a phosphatidylcholine (PC) lipid with a highly unsaturated diacyl glycerol (DAG) core: PC-DAG 38:6. PCs containing long chain highly unsaturated acyl moieties are associated with phytoplankton (Brett and Müller−Navarra, 1997) and have been observed previously in the oxic zone of the Black Sea (Schubotz et al., 2009). These results indicate that the upper part of the suboxic zone was still dominated by phytoplanktonderived lipids. The 10 most abundant components in cluster C (associated with the lower half of the suboxic zone; Table 2) included three DAG-phospholipids: a phosphoethanolamine (PE) and a monomethyl PE (MMPE) both with two 16:1 fatty acids, and a phosphoglycerol (PG)-DAG with 16:0,16:1 fatty acids. PE-IPLs are frequently the main lipid component of bacterial membranes. Schubotz et al. (2009) also found diacyl phospholipids to be characteristic of the suboxic zone of the Black Sea. The 2nd most abundant component was a TAG (with 16:1, 16:1, 16:1 fatty acids). As well as being storage lipids in living algae and zooplankton, TAGs are present in sinking particles (Wakeham, 1985) and hence their abundance here, well below the oxic zone, may be due to sinking material, including algal cells and zooplankton fecal pellets (Hay et al., 1990;Bologa et al., 1999;Moncheva et al., 2001;Wakeham et al., 2007). Two other abundant components in cluster C were acyl/ether glycerol lipids (termed AEGs; Sturt et al., 2004) without a polar head group: both with 16:2 ether bound chains and with a 16:1 fatty acid and a 16:2 fatty acid, respectively. AEGs have been reported mainly in anaerobic bacteria, although with some exceptions in aerobic bacteria, and are thought to improve cell resistance, relative to DAGs, to extreme external conditions (Grossi et al., 2015).
Also among the top six components in cluster C were two core glycerol dialkyl glycerol tetraethers (GDGTs): GDGT-0 and crenarchaeol. These archaeal membrane lipids are sn-2,3-dialkyl diglycerol tetraethers with two glycerol moieties connected by two C 40 isoprenoid chains which contain 0-8 cyclopentane moieties (i.e., GDGT-n, where n is the number of cyclopentane moieties) . To date, crenarchaeol (which has four cyclopentane moieties and one cyclohexane moiety) has only been associated with the archaeal phylum Thaumarchaeota and hence is considered to represent a specific biomarker for members of this phylum (e.g., Sinninghe Damsté et al., 2002b;Schouten et al., 2013;Bale et al., 2019a). GDGT-0 is a more cosmopolitan GDGT, produced by representatives of the Thaumarchaeota, Crenarchaeota, and Euryarchaeota (Villanueva et al., 2014). Previous studies of archaeal lipids in the Black Sea water column also reported both GDGT-0 and crenarchaeol throughout the water column (Wakeham et al., 2003;Coolen et al., 2007;Sollai et al., 2018). Sollai et al. (2018) found a maximum in archaeal 16S rRNA gene copies at 105 m, in the lower part of the suboxic zone, mainly attributed to Nitrosopumilus species, many of which are known to produce GDGT-0 and crenarchaeol as their dominant GDGTs (Sinninghe Damsté et al., 2002a;Schouten et al., 2008;Pitcher et al., 2011;Qin et al., 2015;Elling et al., 2017). Another abundant component at this depth was isorenieratene, a specific carotenoid of the photosynthetic sulfur bacteria Chlorobiaceae (Repeta, 1993;Koopmans et al., 1996;Sinninghe Damsté et al., 2001;Brocks et al., 2005). That isorenieratene is found in component cluster C, associated with the lower suboxic zone, is to be expected, as Chlorobiaceae perform photosynthesis using sulfide, and hence require photic zone anoxia, as has been shown to occur in the suboxic zone of the Black Sea (Repeta et al., 1989;Repeta and Simpson, 1991;Overmann et al., 1992;Becker et al., 2018b).
Finally, one of the most abundant components in cluster C was an unassigned component (I), with m/z 470.2520 ( Table 2; cf. Supplementary Figure 1A for MS/MS spectrum). The presence of this unassigned component in the 10 most abundant components of cluster C highlights that our untargeted method has the potential to identify novel compounds associated with specific redox zones and by extension, specific groups of organisms or their activity. Using traditional MS data interpretation approaches, this component would have very likely been overlooked as it occurred at only a few depths. Such components would be interesting targets for future rigorous identification approaches or for matching with the continuously increasing number of online spectral libraries.
In component cluster D, associated with the euxinic zone, the most abundant feature was a 22:3 fatty acid without a headgroup or glycerol moiety, followed by a PE-DAG with 16:1, 16:0 fatty acids, and a PE-AEG 32:1. A previous study described PE-AEG as an important component of the lipidome at the oxic zone/oxygen minimum zone (OMZ) transition in the eastern tropical North Pacific (Schubotz et al., 2018). Also in the 10 most abundant components were multiple diether glycerol (DEG)-based lipids: PE-DEG with O-16:1 and O-16:0 (denoting ether-bound alkyl chain), MMPE-DEG 30:0, a PG-DEG O-16:0, O-16:1 and a DEG O-16:0, O-16:2 without a polar headgroup. Diether lipids have been detected in a limited number of bacteria (e.g., Grossi et al., 2015) but their (core lipid) abundance in the Black Sea water column has been shown to correlate with sulfate reducing bacteria (SRB; Neretin et al., 2007). Two components, tentatively assigned as sulfate-1-deoxyceramides (Ding et al., 2021), were also among most abundant components in cluster D. Ding et al. (2021) hypothesize that, as bacterial sphingolipids are mainly produced by Bacteroidetes and selected Proteobacteria (Heaver et al., 2018) and because certain Bacteroidetes from the Black Sea are known to produce capnines Yadav et al., 2020), sulfonoanalogs of sphinganine (Godchaux and Leadbetter, 1980), that these novel sulfate-1-deoxyceramides, may also be produced by anaerobic heterotrophs related to Bacteroidetes.
In summary, our non-targeted data analysis resulted in clusters of components associated with specific depths. The zonation of these components (the major ones being lipids) due to microbial niches is in agreement with previous studies Schubotz et al., 2009), however, our approach is rapid and unbiased and includes all components within the analytical window whether they have been identified or not.

Rapid Clustering of Lipids Based on Depth Profile Similarity
The LCMS data visualization method we have presented here is useful to examine clusters that contain a known component, in order to find other components with the same distribution through the water column. Here, we demonstrate this by examining other components in a subcluster containing the sulfate-1-deoxyceramide with m/z 928.8001 (cluster D, associated with the euxinic zone, Table 2). As discussed in Ding et al. (2021), this novel 1-deoxysphingolipids may be produced by an as-yet uncultured anaerobic heterotroph related to Bacteroidetes. We examined components with a very similar depth profile, that may be associated with the same microbial producer or with producers that live in the same environmental niche. At a higher level of clustering, the sulfate-1-deoxyceramide with m/z 928.8001 was in a cluster of 24 components (Dx; as indicated in Figure 2). We present here the 10 most abundant components in this subcluster (Dx) ( Table 3). All 10 components followed a similar abundance depth profile with a distinct maximum at 130 m. Other than the sulfate-1-deoxyceramide with m/z 928.8001 (Ding et al., 2021), no other components in Dx were in the list of top 10 most abundant components in cluster D.
Here, we show that the selection of a subcluster of components around a component of interest is straightforward and quick and allows for rapid associations between different components to be made, based on their specific depth profiles, rather than their chemical composition.

Comparison Between SPM Collected in 2013 and 2017
In order to examine the robustness of our method for distinguishing the lipidome associated with different depths of the Black Sea water column, we followed the same method of component extraction and visualization on SPM extracts from the Black Sea water column, collected in March 2017 at a station located ∼110 km from that of 2013 cruise. SPM was collected at an earlier time in the year (March versus June), with a different distribution of sampling depths (more sampling in oxic zone, less in suboxic zone, Table 1), and using a smaller (nominal) pore size of filter (0.3 versus 0.7 µm). We followed the same data extraction and data processing procedure as described above for the 2013 sample set. The heat map generated from these data again shows division in the lipidome (Figure 3), supported by the vertical clustering, generally in line with the three redox zones. The dendrogram of the 12,372 components was again made up of four distinct clusters E-H (Figure 3), which corresponded with the redox zones. Component cluster E (6006 components) corresponded with the oxic zone. Cluster F (1292 components) with the suboxic zone, and again included the 70 mbsl depth SPM, from the base of the oxic zone. Cluster G (1941 components) corresponded with the uppermost sampling depth from the euxinic zone (130 mbsl), while cluster G (3133) with the remainder of the depths in the euxinic zone.
We again examined the 10 most abundant components ( Table 4) in component clusters E-H (Figure 3). In cluster E, associated with the oxic zone, six of the 10 most abundant components were TAG lipids, similar to the 2013 dataset ( Table 2). Two more abundant components were the chl-a alteration product pheophytin-a and the epimer of pheophytina (C-13 2 diastereomer). Demetalation of chl-a to produce pheophytin-a occurs readily when chl-a is free from its stable protein complex. While pheophytin can be formed naturally, e.g., during cell senescence or grazing (Louda et al., 1998(Louda et al., , 2002Bale et al., 2011), it is likely that in this study it represents an artifact of sampling and extraction (transformation of chla) as the sample collection and extraction were not optimized for pigment analysis. The presence of pheophytin-a, whether a natural occurring compound or a proxy for chlorophyll is not surprising as the oxic photic zone of the western Black Sea experiences extensive phytoplankton blooms (Moncheva et al., 2001). Pheophytin-a was not among the most abundant 10 components in cluster A (although chl-a was the 8th most abundant component). The same betaine lipid, DGTS 34:4 as was seen in cluster A from 2013 was among the 10 most abundant components in cluster E from 2017. Finally, an unassigned component (II), with m/z 547.2703 (Table 2; cf. Supplementary Figure 1B for MS/MS spectrum) was among the most abundant components in this cluster. Cluster F, associated with the suboxic zone (Table 4), included three TAG lipids, a similar result to that of 2013, where TAG lipids remained abundant below the oxic zone ( Table 2). Also abundant were several DAG lipids: PE-DAG 16:0, 16:1 and 16:1, 16:1. In the 2013 dataset, similar PE-DAGs were associated with the lower suboxic zone ( Table 2). Two monogalactosyl [MG]-DAGs (commonly referred to as a MGDG) with 16:0 and 16:1 fatty acids were dominant components in cluster F. MGDGs are predominately associated with eukaryotic chloroplasts and cyanobacteria, although they can also be produced by non-photosynthetic bacteria (Hölzl and Dörmann, 2007;Popendorf et al., 2011), including sulfatereducing bacteria when phosphate concentrations are below 20 µM (Bosak et al., 2016). This component was not listed among the top 10 components in any cluster from 2013. A DAG comprising of two 16:1 fatty acids and no polar head group was also among the abundant components of this cluster as were two isomers of AEG 32:3 without a polar head group (Table 4). AEG 32:3 was similarly abundant in the 2013 cluster associated with the lower suboxic zone ( Table 2).
In cluster G, which was associated most closely with the SPM from 130 mbsl (the upper euxinic zone), among of the most abundant component was a DGTS betaine lipid with a DAG 32:0 core, whereas in the 2013 a similar DGTS-DAG (30:1) was associated with the upper suboxic zone. The only other abundant DAG-based lipid in the top 10 components from this cluster was PE-DAG 14:0, 16:1. Similar PE-DAGs were present there in the 2013 clusters associated with the lower suboxic zone and the euxinic zone. PE-AEG 30:0 and 32:1 were also dominant in this cluster, of which the same or similar components had been associated with the euxinic zone from 2013 (which included 130 mbsl; Table 2). Also abundant were an AEG 32:3 (a later eluting isomer to the AEG 32:3 seen in cluster F), AEG 30:2, 33:3, and 35:3. Similar components were associated with the lower suboxic zone of the 2013 dataset. Only one DEG-based lipid was among the 10 most abundant in cluster G, PG-DEG with a 16:1 and 16:0 ether bound chain as had been seen in the euxinic zoneassociated cluster from 2013. Finally, an unassigned component (III), m/z 512.1589 (Table 4; cf. Supplementary Figure 1C for MS/MS spectrum).
In cluster H, associated with the remainder of the euxinic zone, among the 10 most abundant component were two DAG-based lipids, a betaine lipid, DGCC-DAG with a 16:0 and 18:5 fatty acid and PC-DAG 35:1. Neither DGCCs nor PC-DAGs were seen in the euxinic zone cluster from 2013 and both are associated with photosynthetic organisms. Their presence at depth in 2017 may be due to sinking algal material (typically in fecal pellets or marine snow) present in the euxinic zone as particles slowly descend to the sea floor. The majority of the 10 most abundant components in this cluster were PE and MMPE DEG lipids with O-15:0, O-16:0, and O-16:1 ether bound chains, as was also observed in the euxinic zone-associated cluster from 2013. MMPE-DEG 30:0 has previously be postulated to be associated with sulfate-reducing bacteria in the anoxic zone of the Black Sea (Schubotz et al., 2009) Overall, the UHPLC-HRMS component extraction and visualization method presented here provided, for both the 2013 and 2017 SPM datasets, a similar picture of a stratified lipidome, in accordance with the three main redox zones. There were many similarities between the two sample sets, for example the clusters associated with the oxic zone were in both years dominated by TAG lipids. The differences that were observed are probably attributable to the different distribution of sampling depths between the years (more sampling in of the oxic zone in 2017, more of the suboxic zone in 2013; Table 1), which led to a different clustering of depths. The similarity in the dominant components highlights the similarity in the Black Sea lipidome over time and hence the stability of the microbial niches in this system.

CONCLUSION
We used an untargeted UHPLC-HRMS data analysis approach to visualize components in the SPM from the Black Sea water column collected in June 2013. This revealed distinct clusters of known and unknown components, dominated by lipids, associated with specific depths. The approach allows for a rapid and unbiased view of the lipidome of an environment and for the identification of unknown lipids which potentially contain important ecological information. E.g., on (uncultivated) microbes. A second dataset from March 2017 provided similar results in the suboxic and euxinic zones. This rapid untargeted data visualization approach unlocks a hidden potential in UHPLC-HRMS data and provides a framework for further lipidomic method development which includes the utilization of MS/MS spectra (Ding et al., 2021).

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
NB and SD carried out the data analyses. NB wrote the manuscript. CB carried out the extractions and lipid analysis. EH initiated the research theme and oversaw the lipid identification. LV led 2017 sample collection and provided the ecological overview. MGIA and AH provided the methodological support. SS and JSSD supervised the study. All authors contributed to the article and approved the submitted version.