Analytical and Computational Advances, Opportunities, and Challenges in Marine Organic Biogeochemistry in an Era of “Omics”

Advances in sampling tools, analytical methods, and data handling capabilities have been fundamental to the growth of marine organic biogeochemistry over the past four decades. There has always been a strong feedback between analytical advances and scientific advances. However, whereas advances in analytical technology were often the driving force that made possible progress in elucidating the sources and fate of organic matter in the ocean in the first decades of marine organic biogeochemistry, today process-based scientific questions should drive analytical developments. Several paradigm shifts and challenges for the future are related to the intersection between analytical progress and scientific evolution. Untargeted “molecular headhunting” for its own sake is now being subsumed into process-driven targeted investigations that ask new questions and thus require new analytical capabilities. However, there are still major gaps in characterizing the chemical composition and biochemical behavior of macromolecules, as well as in generating reference standards for relevant types of organic matter. Field-based measurements are now routinely complemented by controlled laboratory experiments and in situ rate measurements of key biogeochemical processes. And finally, the multidisciplinary investigations that are becoming more common generate large and diverse datasets, requiring innovative computational tools to integrate often disparate data sets, including better global coverage and mapping. Here, we compile examples of developments in analytical methods that have enabled transformative scientific advances since 2004, and we project some challenges and opportunities in the near future. We believe that addressing these challenges and capitalizing on these opportunities will ensure continued progress in understanding the cycling of organic carbon in the ocean.


INTRODUCTION
The growth of marine organic biogeochemistry over the past several decades has been largely driven by advances in sampling tools, analytical methods and data handling capabilities. Max Blumer, one of the founding fathers of marine organic biogeochemistry, suggested in 1975 that limitations of analytical techniques are the major roadblock to understanding the chemical complexities of nature (Blumer, 1975). He asked whether we can conduct "realistic studies of nature that acknowledge the limitations of our present analytical powers and gaps in our understanding." It is therefore not surprising that every symposium, workshop or review on marine organic biogeochemistry since (reviewed by Wakeham and Lee, 2019) has repeated one theme -continuing development of analytical capabilities is necessary to affect progress in the future.
There has always been a strong feedback between analytical advances and scientific advances, but analytical technology should not be the driving force. Rather, process-based scientific questions should drive analytical developments. Several of the challenges identified at the 2004 symposium honoring the late John Hedges (Lee et al., 2004) remain. At the same time, the impressive growth of 'omics technologies since 2004 have changed the context in which marine organic biogeochemistry operates. For our purposes, we define 'omics as the set of analytical technologies and associated algorithms and databases that shine light on the internal workings of ecosystems via the sequences of biological molecules such as DNA, RNA, and protein, and the concentrations and structures of lipids and small organic molecules that are essential to life. Accordingly, we used the following themes to guide the discussions of this working group on analytical methodologies: (i) Multidisciplinary investigations generating diverse data.
Integration of disparate data sets, including better global coverage and mapping, is critical to understanding processes in the ocean. (ii) "Molecular headhunting, " or attempts to characterize the structures present in marine organic matter for their own sake, are being subsumed into process-driven targeted investigations requiring new analytical capabilities. (iii) There are still major gaps in characterizing the composition and behavior of macromolecules -the molecularly uncharacterized component of marine organic matter after Hedges et al. (2000).
(iv) Improvements are needed in protocols for making in situ rate measurements of key biogeochemical processes.
Here, we present a perspective review on advances in analytical methods since 2004 derived from a 2019 Hanse-Wissenschaftskolleg Workshop on Marine Organic Geochemistry, and on some pressing analytical needs for the next decade or two. We do not attempt to create a totally comprehensive review of advances and needs; such an effort would require a book. Rather, we highlight advances and needs that workshop participants found particularly important, exciting, or underappreciated, as summarized in Figure 1. We hope that this review will serve as a guide and inspiration to those attempting to expand the limits of analytical chemistry to better understand the cycling of organic matter in the ocean.

Nano-Elemental Analysis
Advances in continuous flow elemental analysis-isotope ratio mass spectrometry (EA-IRMS) have facilitated the tandem isotopic analysis of carbon and nitrogen in suspended, sinking, and sedimentary organic matter for describing C and N cycling processes in the ocean. Modifications to EA-IRMS systems allow for small samples of ca. 10-100 nmol, termed "nano-EA" (Polissar et al., 2009;Ogawa et al., 2010;Langel and Dyckmans, 2014), and have been applied to the study of isolated biomarker compounds (Junium et al., 2015;Fulton et al., 2018;Isaji et al., 2020) and samples with low content of organic matter (Junium et al., 2018;Cui et al., 2019;Murray et al., 2019).
Other techniques to minimize sample size include the spooling wire microcombustion (SWiM) method. SwiM has been applied to the C isotope analysis of sorted microbial cells (Eek et al., 2007), biomarkers (Pearson et al., 2016), and diatoms from sediments (Hansman and Sessions, 2016). The "denitrifier method" (Sigman et al., 2001) has been applied to the N isotopic analysis of biomarker compounds, phytoplankton, and dissolved organic matter (Robinson et al., 2004;Knapp et al., 2005;Higgins et al., 2009).

Chromatography
Gas chromatography (GC) and high-performance liquid chromatography (HPLC) are common in the analysis FIGURE 1 | Schematic illustrating the state of analytical methodologies available for marine organic biogeochemistry prior to 2004 and at present. Abbreviations are described in the text. of molecular biomarkers in marine biogeochemistry (Wakeham et al., 2007. GC and HPLC systems are coupled to a range of detectors, but most compound identifications are made by mass spectrometry (MS). Chromatographic methods generally target specific compound classes useful in biogeochemical proxy development, and recent developments have increased the analytical capacity to resolve previously intractable components. Comprehensive two-dimensional GC (GC × GC) has been applied to biomarker studies and unresolved complex mixtures in petroleomics (reviewed by Eiserbeck et al., 2014) and nontargeted marine contaminant research (Hoh et al., 2012). It also has great potential in the study of metabolomics (Higgins Keppler et al., 2018).
Relatively large (m/z 500-2000) polar molecules (pigments and intact polar lipids) are most effectively separated by HPLC, with some of the most recent analytical advances summarized by Wörmer et al. (2015). Garrido et al. (2012) reviewed methods for pigment analysis, and Higgins et al. (2012) summarized pigmentbased phytoplankton chemotaxonomy. Intact polar lipid analysis by HPLC is used in environmental proxy development (e.g., Rütters et al., 2002;Sturt et al., 2004;Van Mooy et al., 2009;Hopmans et al., 2016;Schubotz et al., 2018), and the range of compounds that can be detected during an individual analysis has expanded (Collins et al., 2016). Underivatized small polar metabolites are also analyzed by HPLC (Genta-Jouve et al., 2014;Johnson et al., 2017). Most of the recent technical improvements include Ultra-HPLC (UHPLC or UPLC) systems operating at higher pressure than conventional HPLC systems to rapidly resolve molecular peaks, in addition to chromatographic columns containing sub-2-µm solid-core particles that enable large increases in efficiency. UHPLC has been applied to the analysis of pigments , intact polar lipids (Besseling et al., 2020), lipid biomarkers in temperature proxy calculations (de Bar et al., 2017), and organic ligands (Wichard, 2016).
The introduction of hydrophilic interaction liquid chromatography (HILIC) columns resulted in enhanced peak resolution and reduced analysis time of IPLs using normalphase chromatography, which separates molecules according to their polar headgroups . On the other hand, intact polar lipid (IPL) analysis using reverse-phase chromatography provides compound separation by alkyl chain hydrophobicity , which improves the separation of compounds with the same headgroup and slight differences in the core lipid structure, and also allows for the simultaneous analysis of less polar compounds such as glycerol dialkyl glycerol tetraethers (GDGTs), pigments, alkenones, bacteriohopanepolyols (BHPs), and quinones. Core and intact archaeal isoprenoidal GDGTs can be now analyzed in a single run using reversed-phase chromatography (Zhu et al., 2013), while the analysis of core bacterial and archaeal GDGTs has been improved by the use of ethylene bridged hybrid (BEH) HILIC columns . The detection of BHPs has been improved by the analysis of non-derivatized compounds using either atmospheric pressure chemical ionization (APCI) (Talbot et al., 2016) or electrospray ionization (ESI) (Rush et al., 2019).

Fourier-Transform Ion Cyclotron Resonance Mass Spectrometry (FT-ICR-MS)
Fourier-transform ion cyclotron resonance mass spectrometry (FT-ICR-MS), with its unsurpassed mass resolution and mass accuracy (UHRMS -ultrahigh-resolution MS), provides the primary tool to study the composition of complex mixtures of organic compounds not accessible via other methods (Comisarow and Marshall, 1974;Kujawinski et al., 2004;Koch et al., 2005;Marshall and Chen, 2015). FT-ICR-MS provides elemental formulae, involving primarily C, H, and O and lesser amounts of N, S, and P. Elemental formulae may be converted into molecular formulae, and elemental compositions are visualized in van Krevelen plots of H/C and O/C ratios where they may be compared with the same ratios in potential biochemical precursors (lipids, proteins, carbohydrates), between samples, and across various spatial scales. H/C and O/C ratios that diverge from precursors indicate organic matter (OM) that has been degraded or transformed. FT-ICR-MS as a semi-quantitative fingerprinting method can be combined with targeted assays quantifying specific fractions of the natural organic matter (NOM) pool such as amino acids, carbohydrates or lipids. The unconventional type of information derived from FT-ICR-MS analyses -relative intensities of up to tens of thousands of peaks -warrants the use and development of new tools for data evaluation and interpretation. We discuss these needs in more detail below, in the section labeled "Untargeted high-resolution analysis: FT-ICR-MS and Orbitrap." FT-ICR-MS has enabled detailed analysis of dissolved organic matter (DOM) composition across gradients from marine to freshwater biomes (Medeiros et al., 2015;Osterholz et al., 2016) and ocean basins (Lechtenfeld et al., 2014;Hansman et al., 2015;Martínez-Pérez et al., 2017) to sediment porewaters (Schmidt et al., 2011;Seidel et al., 2014) to understand its biotic and abiotic turnover. The detection limit below femto-or even subattomolar concentrations enabled the description of bacterial metabolites (Kujawinski, 2011;Schwedt et al., 2015;Noriega-Ortega et al., 2019; see further description below), trace metal-DOM complexes (Lechtenfeld et al., 2011;Waska et al., 2015), or anthropogenic contaminants (Wagner et al., 2015b;Powers and Gonsior, 2019).
A critical caveat for FT-ICR-MS analysis is that a single chemical formula could represent structurally distinct compounds, which may potentially exhibit divergent chemical properties. The extent to which this is a problem in practice is a subject of substantial controversy. Coupling FT-ICR-MS to trapped ion mobility mass spectrometry (TIMS-FT-ICR-MS) provides new insights into the presumed high structural diversity of DOM. Lower limits of isomeric diversity could be established (Tose et al., 2018;Leyva et al., 2019), while upper limit estimates still range widely and are dependent on detection limit and resolving power.

Orbitrap Mass Spectrometers
Orbitraps, a new type of Fourier-transform MS (Hu et al., 2005), use an electrostatic field rather than the superconducting Frontiers in Marine Science | www.frontiersin.org magnets of FT-ICR-MS. These are available to a much wider range of labs than FT-ICR-MS because purchase prices as well as maintenance expenses are much lower than for FT-ICR-MS. Although maximum mass accuracy is slightly lower for Orbitraps than for FT-ICR-MS, acquisition rates are faster (15 Hz vs. 1 Hz), compatible with LC resolution allowing detection of multiple mass spectra across a chromatographic peak. This provides (ultra-) high-resolution accurate mass (HRAM) determination with resolution of up to 1,000,000 at m/z 200 and mass accuracies of <1 ppm and can be used for targeted and untargeted analyses. Extended mass range Orbitrap systems now have analytical windows of up to m/z 20,000 allowing, for example, for intact protein analysis. Orbitrap also offers superior resolving power at the low m/z end of the analytical window in comparison to other HRAM instruments such as quadrupole time-of-flight (Q-TOF), important for structural elucidation in metabolomics and lipidomics. Thus, Orbitrap mass spectrometers provide the potential to greatly expand the number of labs that can use high-resolution mass spectrometry to characterize marine OM. Coupled with UHPLC and ESI, Orbitrap technology has proven to be a very powerful tool for proteomics, metabolomics, and lipidomics in marine organic geochemical research (Poulson-Ellestad et al., 2014;Saito et al., 2014;Hunter et al., 2015;Cantarero et al., in press) and allows the deconvolution of isotopic patterns of all common stable isotopes (compoundspecific isotope analysis; CSIA) and position-specific isotope analysis (PSIA) of molecular fragment ions including multiply substituted isotopologues (Eiler et al., 2017). PSIA holds promise to intensively increase our understanding of biogeochemical pathways, cell physiology and metabolic state, dietary patterns, and degradation effects among other potential applications.

Analysis of Polar, Biologically Labile Organic Molecules
A central goal of organic biogeochemistry has been to track the composition of biologically relevant molecules as they are produced, consumed or transformed by small and large marine organisms (Moran et al., 2016). These molecules can be categorized according to the major biochemicals within cells (i.e., lipids, proteins, carbohydrates, and nucleic acids), their monomeric counterparts (i.e., amino acids, sugars, and nucleobases) and other metabolic intermediates. Small polar intermediates in this last biochemical class have received attention in the last 5 years as advances in sample acquisition, liquid chromatography, and mass spectrometry have expanded the analytical window of marine organic compounds. These compounds can be detected and quantified in both dissolved (Johnson et al., 2017) and particulate (Durham et al., 2019;Johnson et al., 2020) pools, and include molecules such as growth substrates, metabolic cofactors, and signaling molecules (the collective is often referred to as "metabolites, " inferring a biological origin and sink).
Methods for metabolite analysis couple liquid chromatography to high-resolution mass spectrometry (LC/MS), thus targeting molecules that are polar and soluble in aqueous solution. The two most popular LC-column materials are reversed-phase C 18 (or C 8 ) and hydrophilic-interaction liquid chromatography (HILIC) and can be used in parallel to detect the largest suite of compounds within a single sample (Boysen et al., 2018). Detection limits with triple-quadrupole or Orbitrap mass spectrometry fall within the femtogram to picogram range (per 5-10 µL injection). Thus, reasonable signals are achievable with a few hundred mL to a few L of seawater for both dissolved and particulate metabolites. Smaller volumes can be used for metabolite analysis by gas chromatography (GC) MS (Sogin et al., 2019) but fewer molecules are accessible to the derivatization required for GC analysis.
Targeted and untargeted analyses of polar metabolites generate large, complex data sets that require novel tools to analyze. Many of these tools cross disciplinary boundaries from biomedicine to marine science. Of interest to the marine science community will be advances in programs for chromatographic data analysis (MS-DIAL, Lai et al., 2018;MZmine, Pluskal et al., 2010;XCMS, Smith et al., 2006), programs to make existing tools easier to use (Li and Li, 2019;McLean and Kujawinski, 2019), and improvements in websites used to search and store mass spectrometry data (METLIN, Guijas et al., 2018;MetaboLights, Haug et al., 2020; the Metabolomics Workbench, Sud et al., 2016;MassIVE-KB, Wang et al., 2018). Several computational tools increase our ability to identify unknown compounds through searches and in silico calculations (CSI:FingerID, Dührkop et al., 2019;ClassyFire, Feunang et al., 2016;MetFrag, Ruttkies et al., 2019). The structural similarities in MS 2 spectra have proven especially useful in identifying compounds (MolNetEnhancer, Ernst et al., 2019;MetDNA, Shen et al., 2019;MS2LDA, van der Hooft et al., 2020), which is evident in the expanded list of tools now available at Global Natural Products Social Networking Aron et al., 2020). For example, users can search mass spectrometry data using MASST (Wang et al., 2020) and reuse publicly available data via ReDU (Jarmusch et al., 2019). Collectively, these tools allow marine scientists to expand our understanding of OM in seawater (Kharbush et al., 2016;Hartmann et al., 2017;Longnecker and Kujawinski, 2017;Petras et al., 2017).
Metabolite analyses offer a strong connection to genome-based assessments of community metabolism, both potential (diversity analysis or metagenomics) and realized (metatranscriptomics or metaproteomics). Metabolic capabilities can be inferred from diversity analysis while metabolic pathways can be identified through metagenomics. Pathway activity can be monitored through metatranscriptomics and metaproteomics and their end result within elemental cycles can be tracked through metabolomics. Early studies have used diversity analysis to connect culture-based metabolic inferences with metabolite analyses (Allen et al., 2008;Johnson et al., 2016;Heal et al., 2017). Field studies are connecting community composition (and inferred metabolic potential) with metabolite concentrations to postulate microbe-microbe interactions (Durham et al., 2019) and OM-related processes (Johnson et al., 2020). This field is rapidly expanding with new method developments presented at major conferences; significant advances in our understanding of microbe-microbe interactions through chemical exchange are likely in the near future.

Nuclear Magnetic Resonance (NMR)
Nuclear magnetic resonance spectroscopy (NMR) is a group of analytical techniques that rely on measuring how NMR-active nuclei (i.e., 13 C, 1 H, 15 N, 31 P) resonate in the presence of a strong magnetic field. Within the same nucleus (e.g., 13 C), the resonance frequency differs slightly based on shielding of local electrons, deshielding from adjacent chemical functional groups, and coupling with other nuclei. These differences in resonance allow identification of the chemical functional groups in which nuclei reside, and in some cases larger-scale structural information.
In the 1980s and 1990s, cross-polarization (CP) and magic angle spinning (MAS) techniques were coupled to allow characterization of functional group abundance in marine DOM isolate fractions such as XAD-resin-extracted DOM, ultrafiltered DOM (Benner et al., 1992;Abdulla et al., 2010a,b), and DOM concentrated by reverse osmosis-electrodialysis (Koprivnjak et al., 2009), as well as ultrafiltered particulate organic matter (POM) (particles 0.1-60 µm; Sannigrahi et al., 2005) and sinking particles ). One of the major drawbacks of CP-13 C NMR is underestimating de-protonated carbons (e.g., carbonyl and substituted aromatic carbon), especially in the presence of paramagnetic iron, which makes it a semi-quantitative technique (Pfeffer et al., 1984;Mao et al., 2000;Abdulla et al., 2010a). Direct polarization (DP-MAS-13 C NMR) is more quantitative, and has been used to measure changes in the composition of DOM isolated from coastal and open oceans sites (Helms et al., 2015;Cao et al., 2018). However, DP-13 C NMR requires a longer acquisition time and its spectrum has a lower signal to noise (S/N) ratio relative to CP-13 C NMR.
Recently, many sophisticated solid-state spectra-editing techniques (e.g., 13 C chemical shift anisotropy filter, CH selection, CH 2 selection) have been applied to resolve overlapping peaks within marine DOM (Helms et al., 2015;Cao et al., 2018). Two-dimensional correlation spectroscopy analysis techniques on CP-MAS-13 C NMR spectra have been used to track the changes of different DOM components under different perturbations and to correlate 13 C NMR with other spectroscopy techniques such as Fourier-transform infrared spectroscopy (FTIR) and FT-ICR-MS (Abdulla et al., 2010b(Abdulla et al., , 2013a. DOM has also been characterized by 1 H NMR, but this technique can be challenging due to interference from 1 H in water. Therefore, 1 H NMR has typically required extraction, drying, and redissolution in D 2 O (e.g., Repeta et al., 2002). However, recent advances in high magnetic field NMR (currently up to 900 MHz or 21 Tesla) and the development of water suppression techniques allow analysis of marine DOM using 1 H NMR techniques without any isolation or pretreatment on open ocean and sediment pore water samples (Lam and Simpson, 2008;Zheng and Price, 2012;Fox et al., 2018).

Natural-Level Radiocarbon Analysis
Advancements in accelerator mass spectrometry (AMS) technology have dramatically reduced the cost and sample size required for 14 C analysis. It is now possible to measure natural abundance 14 C in samples containing as low as a few tens to hundred µg C rather than the routine 1 mg C (Pearson et al., 1998;Santos et al., 2007;Shah and Pearson, 2007). Compact low energy AMS systems (e.g., the Mini Carbon Dating System; Synal et al., 2007) represent a milestone toward (ultra-) small-scale 14 C analyses (<10 µg C at a precision of ±2 for modern samples Wacker et al., 2010). At this point, 14 C measurements are analytically no longer size-limited, but rather by the sample pre-treatment methods and the associated blank C contamination . These improvements in sample size requirements have also allowed for natural-level compound-specific radiocarbon analysis, which has improved our understanding of carbon metabolisms of marine organisms, sedimentary processes in the ocean, and continental carbon cycling (Wakeham and McNichol, 2014;Druffel et al., 2016;Van der Voort et al., 2018). Methods allowing for the isolation of individual compounds, mostly based on preparative gas chromatography or preparative liquid chromatography, continue to be developed for multiple compound classes. Compound classes for which compound-specific radiocarbon analytic protocols have been established include alkanes, alkanoic acids (Eglinton et al., 1996), benzene polycarboxylic acids (BPCAs) (Gierga et al., 2014), PAHs (Reddy et al., 2002) and phospholipid fatty acids Wakeham and McNichol, 2014), aliphatic alcohols, sterols and hopanols (Pearson et al., 2001), lignin phenols (Hou et al., 2010;Ingalls et al., 2010), GDGTs (Ingalls et al., 2006;Birkholz et al., 2013), amino acids (Bour et al., 2016;Ishikawa et al., 2018), pigments (Kusch et al., 2010), and diatom bound organic compounds (Ingalls et al., 2004).
Several recent developments suggest further improvements to natural abundance 14 C analysis. New CO 2 -accepting ion sources allow peripheral instruments, such as elemental analyzers, other oxidation or hydrolysis systems and potentially gas chromatography, to interface directly to AMS (Bronk Ramsey et al., 2004;Ruff et al., 2010;Haghipour et al., 2019). This eliminates the need for an offline graphitization step, which is labor-intensive and potentially introduces 14 C contamination. Moreover, positive ion mass spectrometry (PIMS) uses an electron cyclotron resonance (ECR) plasma ion source generating high ion beam currents, which may allow further reductions in sample size. Finally, saturated absorption cavity ring-down spectroscopy (SCAR), a technique based on optical absorption of CO 2 gas, is now capable of measuring 14 C at natural abundance levels (Galli et al., 2016). The accuracy (±3%) and sample size required (6 mg) are not currently acceptable for most marine organic geochemical applications, but because this technique is inherently simpler than mass spectrometry, it has the potential to dramatically reduce the cost of natural abundance 14 C analysis.
Improvements have also focused on peripheral instrumentation. Ramped pyrolysis/oxidation (RPO) involves progressively heating OM to 685 • C under helium, then oxidizing the pyrolysis products to CO 2 (Rosenheim et al., 2008). This procedure exploits differences in stability of organic molecules to quantify and characterize different pools of OM. Coupled with radiocarbon dating, RPO is used to independently measure the age and reactivity distribution of fresh autochthonous OC and diagenetically stabilized allochthonous OM in terrestrial and marine settings. Recent applications of RPO include studies of riverine organic carbon (OC) (Rosenheim and Galy, 2012), oil-spill impacted sediments (Pendergraft et al., 2013), permafrost OC cycling in the Arctic (Zhang et al., 2017), weathering and recycling of petrogenic OC (Hemingway et al., 2018), the fate of terrestrial OC in the marine environment (Bao et al., 2019), soil OC preservation  and a global assessment of the controls on OC persistence in the environment (Hemingway et al., 2019). Because it characterizes the reactivity and age distribution of the entire OC pool, RPO is complementary to compound-specific 14 C, which is much more specific but characterizes only a small portion of the bulk OC. Serial UV oxidation protocols aim at the separation of distinct DOC subpools for 14 C analysis (Beaupré et al., 2007;Beaupré and Druffel, 2012). In both RPO and UV-oxidation, sample size limits have been pushed in recent years and more attention has been put toward minimizing potential blank C contamination.

Enzyme Assays
Field measurements along spatial and temporal gradients have been the mainstay for characterizing OM cycling in the ocean. However, process-based measurements, such as ondeck or laboratory incubations, are essential to deconvolute environmental factors affecting OM decomposition, or to quantify rates of distinct processes, although these rates should be regarded as "potential." An essential step along the pathway from POM to DOM is extracellular enzymatic hydrolysis, in which high molecular weight DOM is cleaved into fragments smaller than ∼600-1000 Da, allowing for microbial uptake (Arnosti, 2011). Various simple substrate proxies, often fluorogenic or with a fluorescent tag, are utilized to evaluate hydrolytic rates in different aquatic environments (e.g., Hoppe, 1983;Somville and Billen, 1983). Interpreting results requires caution since natural OM is far more complex than these simple proxies in terms of its structural diversity and three-dimensional conformation and organic matter uptake mechanisms are complex (Reintjes et al., 2017). Nevertheless, these simple substrates integrate the activities of diverse enzymes reasonably well (Steen et al., 2015). Substrate proxies better resembling NOM in terms of structural complexity, such as polysaccharides and peptides with fluorescent tags, provide further insight (Arnosti, 1995;Pantoja et al., 1997). Extracellular enzymatic hydrolysis is often instant and can outpace the uptake of hydrolysis products by microbes (Arnosti, 2004;Liu et al., 2013), and hydrolysis rate is highly dependent on substrate structure and environment Arnosti et al., 2011). These methods collectively provide a bridge between genome/proteome-based methods to understand microbial metabolism and methods for the analysis of organic carbon composition.

Advances in Data Processing
The mandate for scientists to describe experiments in sufficient detail that they could be reproduced by others dates to at least the 17th century (Stodden, 2010). With respect to obtaining and physically processing samples, clearly written descriptions of analytical methods in manuscripts can suffice to make methods reproducible. However, many of the new analytical systems described above generate large quantities of data which must be extensively transformed and reduced prior to presentation, frequently requiring custom-written computer code. Accordingly, an increasing number of analytical chemists have learned to become programmers on the side. While this opens exciting new opportunities for discovery, it also places new burdens on analytical chemists to clearly describe the computational methods by which we obtain our results. Such descriptions are as essential to reproducibility as are descriptions of the chemical conditions by which a sample was analyzed.
Fortunately, several trends have facilitated reproducible data analysis. First, (relatively) user-friendly, free and open-source software languages such as R and python, and to some extent Julia, have been developed into sophisticated analytical platforms, complete with packages for almost any type of statistical analysis that is common in the literature and high-quality integrated development environments. These languages have rapidly gained popularity compared to closed-source platforms such as SAS and SPSS (Muenchen, 2019).
Second, tools to integrate code with text, such as the R Markdown document specification and JupyterLab allow seamless integration of text, code, and code results (Baumer and Udwin, 2015;Perkel, 2016Perkel, , 2018. This makes it easy to relate code to the results that it produces, and to annotate code and results with a plain-language explanation of why the analysis was done in the way that it was. Such tools have long been available (Knuth, 1984;Ramsey, 1994;Leisch, 2002) but used the cumbersome TeX/LaTeX document preparation system (Knauff and Nejasmic, 2014) and did not support the languages most often used for data analysis.
Third, the availability of version control systems such as git and mercurial encourage integration of version control into the process of writing data analysis scripts and packages. Version control encourages a systematic approach to updating documents and tracking contributions by multiple authors, a key aspect of project management that scientists often manage informally (Cham, 2012). Associated hosting services such as GitHub and GitLab, each of which are commercial services that offer a free tier, make it easy to share and collaborate on code. These tools collectively take some time to learn (and can be frustrating even for experienced users; Munroe, 2015) but once adopted, they improve research productivity (Wilson, 2006;Ram, 2013).
Fourth, systems to facilitate the preservation and sharing of raw data have developed considerably. Repositories such as PANGAEA, National Centers for Environmental Information (NCEI), and Biological-Chemical Oceanography Data Management Office (BCO-DMO) have grown rapidly over the past decade. As of early 2020, PANGAEA has over 392,000 unique, publicly available data sets (Grobe et al., 2006). Concurrently, academics have developed best-practices and principles for data sharing and preservation, although there is disagreement in the literature as to how best to reconcile ideals of how best to share and preserve data with the practicalities of doing so with limited time and resources (Wilkinson et al., 2016;Tierney and Ram, 2020).
Finally, a broad ecosystem of support services for the above three developments has emerged. Evidence-based short courses such as Software Carpentry and Data Carpentry offer low-cost, evidence-based lessons in computational skills aimed specifically at scientific researchers (Wilson, 2014). Websites such as stackoverflow.com host questions about programming, and provide incentives for users to provide clear, accurate answers. Integrated Development Environments such as R Studio and JupyterLab, which are explicitly designed for data analysists, make the process of code development and debugging much easier (Gandrud, 2013). These trends have led to the development of a range of open-source tools that are valuable for marine organic geochemistry, particularly instruments that generate large quantities of data such as ultra-high resolution mass spectrometers, for instance XCMS (Smith et al., 2006) and LOBSTAHS (Collins et al., 2016), and many of the tools listed in the "Analysis of polar, biologically labile organic molecules" section.
All of these trends have deep roots in computer science. For instance, Donald Knuth coined the term "literate programming" to describe the practice of mixing explanatory text with computer code in 1984 (Knuth, 1984) and the idea of reproducible data analysis has a long history in the computer science literature (as reviewed in Stodden, 2010). The main advances since 2004 have stemmed from the fact that the tools for reproducible data analysis have become much more widespread and accessible to researchers, such as marine organic geochemists, whose primary expertise lies outside of computer science. The new accessibility of these tools, combined with an increased need for them to be used by marine organic geochemists, suggests a bright future for reproducible data analysis in our community. In any case, increased transparency in data analysis is far preferable to what Rossini et al. (2003) identified as the alternative: "blind faith in our colleagues' programming skills."

Drawing From Other Fields
Progress in adjacent fields has also led to improved understanding of marine organic geochemistry. Most notably, the explosive progress in DNA and RNA sequencing technology, coupled with advances in bioinformatic tools to process those data, have led to a phase shift in our ability to draw inferences from biological sequence data. Hedges et al. (2000) predicted that "fast-developing capabilities [to identify marine microbes by taxonomy] should help compensate for our inability to culture most microorganisms." Microbial taxonomy does offer some insight into environmental function, but this insight is severely limited (Royalty and Steen, 2019). However, it is now routine for individual labs to assemble incomplete or even complete, closed genomes of the most abundant microbes in a marine environment, in the context of a single project. Given microbial genomes, researchers can make reasonable predictions about broad nature of microbial interactions with organic matter (Rinke et al., 2013;Bird et al., 2019). It remains the case that bacterial and archaeal cells in seawater and sediments are only distantly related to microbes that have been grown in culture (Lloyd et al., 2018), and that the most common method of determining microbial abundance, surveys of primer-amplified 16S gene abundance, are biased toward taxa represented in culture collections . However, new culturing techniques and a renewed focus on culturing previously uncultured taxa have led to the culturing of quite a few abundant, environmentally important marine microbes (Könneke et al., 2005;Katayama et al., 2019;Imachi et al., 2020). Studies of microbes in pure culture act as "ground truthing" for inferences based on culture-independent techniques.
Other fields of ocean science have also impacted marine organic geochemistry. Improved understanding of the importance of mesoscale eddies to ocean physics have led biogeochemists to look for, and discover, differences in OM cycling in mesoscale eddies, for instance differences in D:Lamino acid transformation rates apparently due to different microbial communities within vs. outside of eddies (e.g., Zhang et al., 2009;Sarma et al., 2019). Remote sensing techniques have advanced substantially since 2001, due to improvements in sensors, the recent availability of inexpensive research drones, and improvements in the algorithms used to remove the overprint of the atmosphere on light reaching satellites from the ocean (e.g., Werdell et al., 2013). Coupled with improved understanding of DOM optical characteristics (Coble, 2007;Helms et al., 2008), this permits more accurate and robust measurements of parameters such as DOC concentration, chromophoric dissolved organic matter (CDOM) content, and even indirect estimations of parameters such as methylmercury concentrations (Fichot et al., 2016;Slonecker et al., 2016).

CHALLENGES AND OPPORTUNITIES FOR THE NEXT DECADE(S)
The technical advances described create new challenges and opportunities for the next decade-plus of marine organic geochemistry research (Table 1). While a few of these challenges and opportunities are specific to individual technologies, for the most part they rely on integrating advances from different analytical techniques and fields of knowledge. Below, we describe some directions for future research that our working group found most exciting.

New Lipid Biomarkers and New Uses of Old Lipid Biomarkers
Biomarker lipids show the presence of particular source organisms or biosynthetic pathways and can be proxies for past environmental parameters, including temperature, pH, and input of soil organic matter to marine systems. Lipid chemical stability enables their preservation in the geological record and thus, their extensive use as paleo-indicators. Novel HPLC-MS techniques have greatly widened the range of compounds amenable for analysis, both in terms of size and polarity (Schouten et al., 2000;Sturt et al., 2004), revealing a large diversity of microbial lipids in their intact (polar) forms in marine environments (e.g., Van Mooy and Fredricks, 2010;Schubotz et al., 2018) and make possible environmental proxies (e.g., the GDGTbased TEX 86 sea surface temperature proxy; see Schouten et al., 2013b). These techniques facilitate quantification of microbial intact polar lipids, which provide high taxonomic resolution (Schubotz et al., 2009), and which can yield insight into microbial processes such as ammonia oxidation rates . Adding compound-specific isotope analysis in the context of natural isotope abundances (e.g., Schubotz et al., 2009;Pearson et al., 2016) or tracer experiments (e.g., Kellermann et al., 2016) can give further insight into elemental flow within ancient or currently active communities. However, analytical refinements, along with extensive large-scale environmental and laboratory-based surveys, have highlighted the complexity in interpreting the biomarker record solely from a single influencing parameter point of view Sollai et al., 2019). Recent advances have focused on increasing the resolution and sensitivity of lipid analysis. Wörmer et al. (2014) introduced, for example, the use of direct laser-ablation of lipids from sediment core surfaces drastically improving spatial sampling resolution. High resolution mass spectrometry technology, such as orbitraps, has decreased detection limits and shown value as a tool to characterize and identify novel lipid structures (Moore et al., 2016; see also below). The identification of carboxyl-rich alicyclic material (CRAM; Hertkorn et al., 2006; see also below), carotenoid degradation products (CDP; Arakawa et al., 2017) and material derived from linear terpenoids (MDLT; Woods et al., 2011;Arakawa and Aluwihare, 2015) have shone a light on some of the most abundant classes of recalcitrant NOM in seawater. Although these structures have been inferred from NMR and ultrahigh-resolution mass spectrometry, and they appear related to biologically produced source molecules, there is not yet direct proof of how those molecules are produced, whether biotically or abiotically. Fractionation of DOM through hydrophilic interaction chromatography (HILIC) and multidimensional separation (2-D HPLC) coupled with ultrahigh-resolution mass spectrometry or conjunction with NMR experiments (Woods et al., 2011;Spranger et al., 2019) represent promising tools to determine if these materials might be useful as biomarkers or indicators of biogeochemical processes. At the same time, the data produced by these high-resolution instruments have highlighted the lack of pipelines for 'omics' data that are available for environmental lipidomic work.
To fully exploit the potential of these techniques, highthroughput methods need to be developed. For example, LOBSTAHS is an open-source lipidomics workflow, written in R, for high-throughput annotation and putative identification of lipids in high-mass-accuracy HPLC-MS data that can identify thousands of compounds (>14,000 unique entries) in a sample (Collins et al., 2016). In our view, the improved accessibility to high-resolution mass spectrometers and lipidomics workflow for high-throughput annotation and putative identification of lipids, not only enhance the number of structures that scientists can identify for targeted lipidomics, but they also allow us to explore the patterns of unknown compounds in sample sets with contrasting conditions. The latter has the potential of enhancing the amount of biologically and biogeochemically relevant information obtained from non-targeted, exploratory lipidomic analysis.
Future challenges require better comprehension of the biomarker producers, through understanding the environmental constraints and the biosynthetic origin of lipid biomarkers. For example, targeting the diversity of lipid biosynthetic genes has already helped us identify limitations of using long-established hopane biomarkers (Welander et al., 2010). Lipid isotope probing assays, recently adapted to use radiotracer applications (Evans et al., 2018), hold high promise in revealing lipid biosynthetic pathways in pure cultures, but also in environmental settings. By working together with microbiologists in the future, we will fulfill the need for a better grasp of the effects of microbial physiology on biomarker synthesis.

More and Improved Data Comparisons
State-of-the-art analytical techniques allow rapid chemical measurements of small samples. Some of these techniques, such as FT-ICR-MS, allow rapid measurements from small samples that can generate large amounts of data, with greater efficiency. Consequently, the amount of data available is expanding, presenting new challenges for data analysis, integration and intercalibration. Additionally, small variations in sample collection, physicochemical experiments, extraction protocols, fractionations, mass spectrometry analysis and data processing can lead to challenges in intercomparing data from multiple studies.
Differences in data processing algorithms can also impact acquired data files. As described above, computational methods to transform raw data should be understood as part of the analytical method, not as something downstream of the analysis. For example, comparison of three software tools and their parameters for processing mass spectral data showed large differences in the results depending on the tools and parameters chosen, specifically for the degree of false positive or negative peak detections (Cajka and Fiehn, 2016). This is especially important in the area of peak picking and identification in untargeted mass spectrometry analysis. There is a need to build on existing, successful data intercomparison (Schouten et al., 2013a) and sharing of information. Strengthened international cooperation and connections between diverse research fields could play a pivotal role in standardizing protocols and advancing understanding in the field of marine organic biogeochemistry.

Untargeted High-Resolution Analysis: Ultrahigh-Resolution Mass Spectrometry
Advances in sample preparation, analysis and interpretation for FT-ICR-MS of NOM are ongoing and will aid in further deciphering the structural diversity of NOM. Molecular analysis of DOM from marine samples is hindered by the salt matrix in which the compounds are present. Extraction with commercially available PPL columns (Dittmar et al., 2008) and analysis of the obtained methanol extract in ESI-negative mode represent the most widely applied combination. Different isolation techniques with variable efficiency have been proposed and compared (e.g., Sleighter and Hatcher, 2008;Green et al., 2014;Schmidt et al., 2014;Li et al., 2017;Stücheli et al., 2018). Because ESI primarily ionizes polar to semi-polar compounds and in itself represents a complex technique affecting compounds differently based on polarity, molecular size, acid/base character and concentration, no single ionization method captures the whole suite of compounds present in NOM (Ohno and Bro, 2006;Hertkorn et al., 2008;Reemtsma, 2009). Accordingly, extraction and ionization methods need to be tailored to the scientific question (Kido Soule et al., 2010;Sleighter et al., 2012).
FT-ICR-MS analyzers provide the highest accuracy and resolving power (>1,000,000 at m/z 400, Junot et al., 2014) for exact molecular formula determination. Direct infusion techniques are able to produce detailed information on sample composition in relatively short time, allowing fingerprinting of complex mixtures in high throughput studies. As one molecular formula can represent an unknown number of isomers Zark et al., 2017) and co-suppression of signals may occur when analyzing highly complex mixtures of organics such as (marine) DOM . Chemical separation techniques, such as liquid chromatography or electrophoresis, are applied prior to sample injection to overcome these issues and to provide more detailed structural information behind molecular formula assignments . While these techniques are commonly used in conjunction with tandem mass spectrometers including Orbitrap, we expect one-dimensional as well as multi-dimensional chromatographic separations prior to FT-ICR-MS analysis to become more prevalent in the future (Ghaste et al., 2016). However, the relatively slow acquisition rates of FT-ICR-MS instruments limit their applicability to hyphenated approaches often used in metabolomics and lipidomics research (Junot et al., 2010). Coupling FT-ICR-MS to trapped ion mobility mass spectrometry (TIMS-FT-ICR-MS) provided new insights into the presumed high structural diversity of DOM. Lower limits of isomeric diversity have been established (Tose et al., 2018;Leyva et al., 2019), while upper limit estimates still range widely and are dependent on detection limit and resolving power. Additionally, fragmentation approaches have been applied to confirm molecular formula attributions and to elucidate the structural diversity of DOM (Wagner et al., 2015a;Kujawinski et al., 2016;Zark et al., 2017).
The complex, high-dimensional information provided by ultrahigh-resolution MS requires development of new mathematical tools. The first, fundamental step is the assignment of unique molecular formulas. All tools in one way or another use mathematical transformations to improve resolving power, mass accuracy and sensitivity (e.g., Kilgour et al., 2013), or apply chemical and stochastic rules (Dittmar and Koch, 2006;Kind and Fiehn, 2007). After formula assignment, elemental ratios, molecular mass or mass defect analyses (Hughey et al., 2001) can provide first insights into NOM composition. Due to the high number of molecular formulae and increasing sample throughput, the application of multivariate statistics has been established in the field (Kujawinski et al., 2009;Sleighter et al., 2010;Longnecker and Kujawinski, 2016).

Untargeted High-Resolution Analysis: Coupled NMR and Ultrahigh-Resolution Mass Spectrometry
The alliance of NMR and ultrahigh-resolution mass spectrometry will continue to provide valuable insights into the sources of a refractory DOM components (Hertkorn et al., 2006;Aluwihare and Meador, 2008;Abdulla et al., 2013b;DiDonato and Hatcher, 2017). For example, combined NMR and FT-ICR-MS analyses show that the major components of the polysaccharide fraction of HMW-DOM are acylated polysaccharides (APS; N-acyl amino acids bound with neutral and amino sugars and non-acylated heteropolysaccharides) and CRAM (Hertkorn et al., 2006) with smaller amounts of aromatics and aromatic N-heterocyclics. CRAM is the most abundant identified component of DOM. Increasing depth in the water column conserves increasingly branched CRAM and heteroatom-containing components at the expense of carbohydrates, contributing to the resistance of DOM to biodegradation (Hertkorn et al., 2013).
Additional information may be gained by utilizing different NMR techniques. Higher magnetic field NMR instruments will likely not advance solid-state NMR techniques. This is because higher magnetic fields require a higher spinning speed for the sample rotor to remove the spinning sideband, which is not currently feasible (see Mopper et al., 2007 for a more complete explanation). However, higher magnetic field NMR and NMR Cryoprobes may allow the use of water suppression 1 H NMR techniques to analyze the entire marine DOM composition even in the deep ocean (Lam and Simpson, 2008;Fox et al., 2018). Diffusion ordered spectroscopy (DOSY) and 2D-NMR techniques ( 1 H-1 H-TOCSY and 1 H-1 H-COSY) estimate the molecular weight of DOM components, resolve overlapping peaks, confirm the interpretation of the chemical shifts, and help to identify the structures of specific DOM components. Coupling 1 H NMR techniques with offline chromatography and high-resolution mass spectrometry techniques will advance our understanding of the entire marine DOM transformation and cycling characterization of marine DOM (Simpson et al., 2004).
Chemometric statistical methods allow combining NMR data with mass spectrometry data using multivariate statistics, for example to identify structural components and pathways of metabolic perturbations or to determine the biotransformation of metabolites on short timescales (Jaeger and Aspers, 2014). These methods use the intrinsic covariance between signal intensities in the same and related molecules measured by different techniques across great numbers of samples (Crockford et al., 2006). Another approach identifies single metabolites by generating theoretical NMR and MS/MS spectra for possible chemical structures and by comparing them directly against experimental NMR and ultrahighresolution MS 2 spectra of the unknowns (Boiteau et al., 2018). This approach does not require identified compounds from experimental metabolomics databases, providing means for the identification of unknown and uncatalogued metabolites in natural DOM. Further development of chemometric tools will be needed in the future.

Coupling Untargeted Chemical and Microbiological Assays
Information on microbial metabolite transformations will enhance our understanding of carbon cycling in the oceans. Parallel trends between the complex, but closely connected non-living and living worlds are established by combining chemical with, e.g., untargeted microbiological assessments (Kujawinski, 2011;Osterholz et al., 2016). The application of stable isotope tracers will enable the delineation of causal links (e.g., Biddle et al., 2006;Orsi et al., 2018;Seyler et al., 2018). Understanding structural diversity as a meaningful property of DOM adds a new facet to the interpretation of UHRMS data. Borrowing on traditional ecological concepts, this approach may increase the understanding of the long-term stability of carbon compounds dissolved in the ocean, sustaining highly diverse microbial communities (Dittmar, 2015;Osterholz et al., 2016). Likewise, bioassay experiments employing metabolic profiling via NMR and FT-ICR-MS show that bacterial DOM has a chemical composition and structural diversity similar to refractory natural DOM in seawater (Lechtenfeld et al., 2015). In the future, including DOM fingerprint information into numeric models such as ocean circulation models can provide mechanistic explanations to observations and may enable us to better place our findings within the bigger picture of global carbon cycling.

Consensus Reference Materials
A broad range of chemical reference materials for ocean sciences remains elusive, despite the recognition that accuracy of data depends on calibration and intercomparison over time and among laboratories. To that end, the US National Research Council prepared a report identifying the most critically needed reference materials and recommending the most appropriate approaches for their development, with an emphasis on organic materials (Committee on Reference Materials for Ocean Science, 2002). Some of the recommendations of this committee remain unfulfilled.
The marine DOC community has adopted the consensus reference material available from D. Hansell's lab (University of Miami) to determine reproducibility and accuracy of oceanic DOC analyses (Sharp et al., 2002). When several researchers promoted the need for a collection of humic and fulvic acids to be made available to the scientific community, the International Humic Substances Society (IHSS) was organized in 1981. Since then, NOM research has benefited significantly from sharing common standard and reference materials with known elemental and stable isotopic ratios. Intercomparison of UHRMS data is challenging, as the "true" result is not known. A first interlaboratory comparison including 17 different highresolution mass spectrometers and NOM reference material from the IHSS showed that while differences exist, common trends in elemental ratios are reproduced (Hawkes et al., 2020). Efforts such as this one will lead to the development of consensus values for quality control of the complex analyses. However, a marine reference material similar to the freshwater-sourced NOM from IHSS suitable also for fingerprinting techniques such as FT-ICR-MS or Orbitrap is missing to date, calling upon the community to establish such a material.
Reference materials for biological, particulate, and sedimentary OM are also conspicuously absent, in part because of logistical difficulties in preparing and certifying them, and perhaps more complex, a lack of consensus as to what would best constitute an appropriate reference material and what organic parameters should be targeted. Intercomparison exercises have been conducted for organic contaminants in biological tissues (mussel) and sediments (New York/New Jersey estuary) in the marine environment (Schantz et al., 2008), but comparable exercises for marine biogeochemical materials have generally not been carried out. These difficulties are partially alleviated by targeting specific biomarkers or ratios of biomarkers. Two successful small-scale round-robin laboratory intercomparison exercises, in which a few kilograms of sediment or extracts/isolates were distributed to a limited number of laboratories, are the alkenone intercomparison (Rosell-Melé et al., 2001) and the GDGT intercomparison (Schouten et al., 2013a). It should be noted that in both of these round robin exercises, the targeted parameters were biomarker ratios (U K 37 and TEX 86 and BIT, respectively) rather than a rigorous assessment of absolute concentrations. Nonetheless, the marine organic biogeochemical community needs to address this shortfall.

CONCLUDING REMARKS
Marine organic biogeochemistry has made significant contributions toward a better understanding of the fundamental processes that affect element cycles in the ocean, especially for carbon and nitrogen. As new questions arise and new paradigms are developed, there is a never-ending need for novel analytical methodologies, ranging from advanced instrumentation, cross-disciplinary field observations and experimentation, to new thinking about how the massive amounts of data generated by biogeochemists is handled and interpreted (Figure 1). This review has summarized the state of the art with respect to analytical instrumentation in use by marine organic biogeochemists (e.g., chromatography, mass spectrometry, NMR spectroscopy, radiocarbon analysis, enzyme assays) and chemometrics for data handling. Key challenges for the future are also highlighted. These include identifying new biomarkers, employing advanced high-resolution mass spectrometry and NMR spectroscopy for characterizing dissolved organic matter and its behavior, and coupling chemical analyses with microbiological assays. The review concludes with a plea to move forward in the long-standing but as yet unrealized effort to develop and distribute reference materials -whether certified standard reference materials (SRMs) for intercalibration exercises or consensus reference materials for intercomparisons -to the biogeochemical community. Identifying and preparing reference materials, while not an easy task, is critical to guarantee that data generated by laboratories worldwide will truly be applicable and useable worldwide.

AUTHOR CONTRIBUTIONS
SW conceived the workshop and convened the group of authors. AS, SK, and SW organized the manuscript and lead the writing. SK created the figure. All co-authors contributed text and ideas. All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

FUNDING
The Hanse-Wissenschaftskolleg Delmenhorst, Germany, sponsored the "Marine Organic Biogeochemistry" workshop in April 2019, of which this working group report was a part. The workshop was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) -project number: 422798570. The Geochemical Society provided additional funding for the conference. AS was supported by DOE grant DE-SC0020369.