Representing Organic Matter Thermodynamics in Biogeochemical Reactions via Substrate-Explicit Modeling

Predictive biogeochemical modeling requires data-model integration that enables explicit representation of the sophisticated roles of microbial processes that transform substrates. Data from high-resolution organic matter (OM) characterization are increasingly available and can serve as a critical resource for this purpose, but their incorporation into biogeochemical models is often prohibited due to an over-simplified description of reaction networks. To fill this gap, we proposed a new concept of biogeochemical modeling—termed substrate-explicit modeling—that enables parameterizing OM-specific oxidative degradation pathways and reaction rates based on the thermodynamic properties of OM pools. Based on previous developments in the literature, we characterized the resulting kinetic models by only two parameters regardless of the complexity of OM profiles, which can greatly facilitate the integration with reactive transport models for ecosystem simulations by alleviating the difficulty in parameter identification. The two parameters include maximal growth rate (μmax) and harvest volume (Vh) (i.e., the volume that a microbe can access for harvesting energy). For every detected organic molecule in a given sample, our approach provides a systematic way to formulate reaction kinetics from chemical formula, which enables the evaluation of the impact of OM character on biogeochemical processes across conditions. In a case study of two sites with distinct OM thermodynamics using ultra high-resolution metabolomics datasets derived from Fourier transform ion cyclotron resonance mass spectrometry analyses, our method predicted how oxidative degradation is primarily driven by thermodynamic efficiency of OM consistent with experimental rate measurements (as shown by correlation coefficients of up to 0.61), and how biogeochemical reactions can vary in response to carbon and/or oxygen limitations. Lastly, we showed that incorporation of enzymatic regulations into substrate-explicit models is critical for more reasonable predictions. This result led us to present integrative biogeochemical modeling as a unifying framework that can ideally describe the dynamic interplay among microbes, enzymes, and substrates to address advanced questions and hypotheses in future studies. Altogether, the new modeling concept we propose in this work provides a foundational platform for unprecedented predictions of biogeochemical and ecosystem dynamics through enhanced integration with diverse experimental data and extant modeling approaches.

1 Pacific Northwest National Laboratory, Richland, WA, United States, 2 Departments of Biological Systems Engineering and Food Science and Technology, University of Nebraska-Lincoln, Lincoln, NE, United States, 3 Los Alamos National Laboratory, Los Alamos, NM, United States Predictive biogeochemical modeling requires data-model integration that enables explicit representation of the sophisticated roles of microbial processes that transform substrates. Data from high-resolution organic matter (OM) characterization are increasingly available and can serve as a critical resource for this purpose, but their incorporation into biogeochemical models is often prohibited due to an over-simplified description of reaction networks. To fill this gap, we proposed a new concept of biogeochemical modeling-termed substrate-explicit modeling-that enables parameterizing OM-specific oxidative degradation pathways and reaction rates based on the thermodynamic properties of OM pools. Based on previous developments in the literature, we characterized the resulting kinetic models by only two parameters regardless of the complexity of OM profiles, which can greatly facilitate the integration with reactive transport models for ecosystem simulations by alleviating the difficulty in parameter identification. The two parameters include maximal growth rate (µ max ) and harvest volume (V h ) (i.e., the volume that a microbe can access for harvesting energy). For every detected organic molecule in a given sample, our approach provides a systematic way to formulate reaction kinetics from chemical formula, which enables the evaluation of the impact of OM character on biogeochemical processes across conditions. In a case study of two sites with distinct OM thermodynamics using ultra high-resolution metabolomics datasets derived from Fourier transform ion cyclotron resonance mass spectrometry analyses, our method predicted how oxidative degradation is primarily driven by thermodynamic efficiency of OM consistent with experimental rate measurements (as shown by correlation coefficients of up to 0.61), and how biogeochemical reactions can vary in response to carbon and/or oxygen limitations. Lastly, we showed that incorporation of enzymatic regulations into substrateexplicit models is critical for more reasonable predictions. This result led us to present integrative biogeochemical modeling as a unifying framework that can ideally describe the dynamic interplay among microbes, enzymes, and substrates to address advanced

INTRODUCTION
Organic matter (OM) is a key determinant of global biogeochemistry and exerts far-reaching impacts on regional and global ecosystem health. Degradation of complex OM is primarily driven by microbial and enzymatic activities around environmental substrates (Moorhead et al., 2013;Manzoni et al., 2016;Paul, 2016;Varjani, 2017). Therefore, a proper representation of the dynamic interplay among microbes, enzymes, and substrates is key for reliable prediction of OM cycling and ecosystem functioning. Despite an increasing ability to generate high-resolution data for each of these components, understanding of fundamental processes that govern their interplay is limited and, consequently, is poorly represented in models (Fatichi et al., 2019). To date, no current-generation modeling framework has been available to interpret and utilize increasingly available high-resolution metabolite data for predicting biogeochemical cycles at the ecosystem level.
Earlier biogeochemical models provide a lumped description of microbial and enzymatic activities, as well as chemical entities (Schimel and Schaeffer, 2012;Blankinship et al., 2018). This reductionist approach significantly limits the predictive ability of models. For the past decade, growing recognition of the significance of microbial activities on biogeochemistry has led to the development of microbial-explicit models (MXMs) (Todd-Brown et al., 2012;Wieder et al., 2015;Allison, 2017;Sulman et al., 2018), which include DEMENT (Allison, 2012), CORPSE (Sulman et al., 2014), MIMICS (Wieder et al., 2014), MEND (Wang et al., 2015), RESOM (Tang and Riley, 2015), and other functional guild-based models (Hood et al., 2006;Jin and Roden, 2011;Bouskill et al., 2012). With or without coupled consideration of enzymatic processes, these models account for microbial physiology and interactions to provide a deeper understanding of microbially mediated OM decomposition.
Enzyme-based or enzyme-explicit models (EXMs) are a complementary approach that collectively describes biogeochemical functions performed by a microbial community with a focus on extracellular enzymes, alleviating the difficulty in identifying functional traits of individual organisms or their groups (Moorhead et al., 2013;Song et al., 2014Song et al., , 2017Song and Liu, 2015). In contrast with such increasing details considered in MXM and EXM, over-simplified descriptions of substrate pools remain as a serious bottleneck in building predictive biogeochemical models because further elaboration of microbial and enzyme activities becomes difficult without expanded consideration of complex OM chemistry that influences the specific metabolic pathways employed by microorganisms.
Extension of biogeochemical models to include detailed OM chemistry is critically important to pushing the boundaries of environmental science forward. For example, a present paradigm in environmental science views aerobic respiration rates as being primarily determined by kinetics [i.e., organic carbon (OC) and oxygen concentrations]. However, recent field and laboratory studies suggest that OM thermodynamics can be a main driver of aerobic respiration (Graham et al., 2017(Graham et al., , 2018Stegen et al., 2018b;Garayburu-Caruso et al., 2020). Conventional lumped biogeochemical models are not completely effective for addressing this issue because the thermodynamic properties of substrates are a function of chemical composition of compounds constituting OM pools. This implies that identification of underlying key processes that drive aerobic respiration requires advanced biogeochemical models that can properly reflect all relevant kinetics and thermodynamics in representing actual oxidation rates.
Toward filling this gap, we propose a new modeling concept drawn from thermodynamic theory that can explicitly account for the chemical composition of individual molecules in OM pools when generating biogeochemical rate estimates, therefore having the potential to significantly advance predictive capabilities of current-generation ecosystem models. Due to the ability of our approach to directly incorporate OM chemistry for every compound, we termed it substrate-explicit modeling (SXM), as opposed to MXM and EXM. SXM features flexibility in input data types, enabling incorporation of an unlimited number of compounds detected from various state-of-the-art instrumentation including GC-MS, LC-MS/MS, HPLC-MS, NMR, Orbitrap MS, and Fourier transform ion cyclotron resonance (FTICR-MS). Our modeling framework therefore overcomes a central challenging in using such informationhow to condense the massive amount of data produced by these technologies into variables that are both useful and computationally feasible in predicting biogeochemical dynamics and function.
By combining a suite of previously developed thermodynamic theories (McCarty, 2007;Kleerebezem and Van Loosdrecht, 2010;LaRowe and Van Cappellen, 2011;Desmond-Le Quemener and Bouchez, 2014), we developed a systematic procedure to convert chemical formulae of all organic compounds detected in an environment, regardless of the number of compounds or measurement technique, into two rate parameters that integrate seamlessly into a variety of modeling constructs. We evaluated the effectiveness of our approach through the analysis of OM characterized via FTICR-MS data obtained from two biogeochemically distinct sites. Model outputs were compared to experimental work on the same samples by Graham et al. (2017Graham et al. ( , 2018, who showed that aerobic respiration varied between across sites by several-fold range. We chose to model two representative OM profiles from the samples with high (high activity, HA) and low (low activity, LA) rates of aerobic respiration to test model predictions against the broadest range of biogeochemical activity in the dataset.
Consistent with the findings from Graham et al. (2017Graham et al. ( , 2018, comparative analyses of the two SXMs constructed for HA and LA zones showed that respiration reactions of OM pools in HA zones were thermodynamically more favorable than those in LA zones and that this difference of OM thermodynamic properties was associated with elevated respiration rates in the HA zone. Further comparison of predicted reaction rates and experimentally determined aerobic respiration predicted field conditions as being limited by carbon. Lastly, we showed that SXMs have a flexible structure to be synergistically integrated with other existing frameworks (such as EXMs and/or MXMs) toward more comprehensive predictive biogeochemical modeling. Through coupling with reactive transport models, this hybridized or integrative biogeochemical modeling (IBM) is expected to significantly expand our ability to predict complex ecosystem functions.

MATERIALS AND METHODS
Our development of SXMs from chemical formula of OC go through multiple-step procedures as illustrated in Figure 1, which are classified into two major parts: (1) derivation of stoichiometric equations for catabolic, anabolic, and metabolic reactions by combining a set of standard thermodynamic analyses (McCarty, 2007;Kleerebezem and Van Loosdrecht, 2010;LaRowe and Van Cappellen, 2011); and (2) formulation of kinetic equations for the final oxidative degradation reaction of OC using a relatively recent thermodynamic theory for microbial growth (Desmond-Le Quemener and Bouchez, 2014).

Stoichiometric Representation of Oxidative Respiration Reactions
Stoichiometric equations provide quantitative relationships among the reactants and products involved in given reactions. We derived a stoichiometric equation for oxidative degradation of OC by accounting for catabolism (i.e., all processes for obtaining energy through substrate oxidation or other means) and anabolism (i.e., synthesis of biomass using the energy provided from catabolism) (Figure 1). Combination of catabolic and anabolic reactions through energy coupling leads to metabolic reactions. We formulate catabolic reactions (and anabolic reactions as well in many cases) as combinations of a pair of redox half reactions, i.e., for an electron donor (Ed) and an electron acceptor (Ea), while cases exist where catabolism can occur without involving electron donor/acceptor coupling, e.g., in disproportionation reactions. Therefore, oxidative degradation of OC described here as a metabolic reaction can be systematically derived in a bottom-up fashion. We provide step-by-step procedures of developing stoichiometric equations following the standard approaches outlined in the literature (Kleerebezem and Van Loosdrecht, 2010;Rittmann and McCarty, 2012). For a general representation of all associated reactions, we use the following convention for stoichiometric representation: where S i denotes chemical species i, y S i (≥ 0) is the stoichiometric coefficient of S i , which is set to be negative for reactants and positive for products. For a defined array of S i 's, Eq.
(1) can be conveniently represented as a column vector of stoichiometric coefficients: where the superscript T denotes the vector transpose. We specified 10 chemical species (S i 's) involved in OC oxidation reactions (i.e., n = 10) and provided standard state Gibbs energy of formation of each compound ( Table 1). The standard state of a compound in aqueous phase denotes is unit activity in a hypothetical one molal solution referenced to infinite dilution at any temperature and pressure (Amend and LaRowe, 2019).

Metabolic Reactions
A stoichiometric equation of metabolic reactions can be derived by accounting for the energetic coupling between catabolism and anabolism, i.e., the balance between the energy production through substrate degradation and the energy consumption for cell synthesis (i.e., biomass production) (see Supplementary Text for derivation of stochiometric equations of catabolic and anabolic reactions). Two approaches commonly considered for this purpose include the dissipation method (Heijnen et al., 1992;Heijnen and Vandijken, 1993) and the thermodynamic electron equivalents model (TEEM) (McCarty, 2007). In this work, we combined the dissipation method and TEEM. We used the dissipation method as a basic framework, which determines the stoichiometric coefficient vector for metabolic reaction (y) by coupling the catabolic and anabolic reactions based on the parameter λ, i.e., Here, y Cat and y An are column vectors of stoichiometric coefficients in catabolic and anabolic reactions, and G r,Cat and G r,Ana , respectively, denote Gibbs energies of catabolic and anabolic reactions, and G r,Dis is dissipation energy. The parameter λ implies how many times the catabolic reaction needs   (Stephanopoulos et al., 1998;Kleerebezem and Van Loosdrecht, 2010). *All thermodynamic properties of H + and e − including Gibbs energies are 0 at all pressures and temperatures and all other aqueous species properties are relative to this.** G 0 i for Biom was taken from Kleerebezem and Van Loosdrecht (2010).
to run to provide the energy required for the synthesis of a unit C-mole of biomass. Lower values of λ therefore represent higher thermodynamic favorability of metabolic reactions involving given OM. The value of λ is expected to increase under stress conditions where energy efficiency for growth can become low, e.g., due to increased energy dissipation.
We used TEEM to calculate the value of λ and subsequently dissipation energy ( G r,Dis ) because the formula in the dissipation method (used to calculate G r,Dis ) is not extendable to complex OCs with more than six-carbon OCs, while most of the compounds profiled in our samples are more complex than those. The TEEM considers the energy provision of catabolic reaction to meet the energy spent in the following two steps of energy conversion in anabolism: (1) the carbon source to biomass building blocks ( G r,Block ), and (2) the conversion of biomass building blocks to biomass ( G r,Syn ). The original formulation includes the energy conversion from the nitrogen source to ammonium, which is neglected in our case where ammonium is taken as the nitrogen source. The energy balance is then represented using the following equation: where G r,Cat is the Gibbs free energy change in catabolism, λ is a parameter that couples catabolism and anabolism as defined earlier, and η is an energy transfer efficiency parameter. The exponent m accounts for energy transfer efficiency depending on whether energy is generated or consumed in the process of converting the carbon source to biomass building blocks by setting it to +1 (if G r,Block < 0) or −1 (if G r,Block > 0). Following Kleerebezem and Van Loosdrecht (2010), we chose G r,Syn = 200 kJ/ mol Biom , and η = 0.43; assumed the composition of biomass building block to be the same as Biom (i.e., CH 1.8 N 0.2 O 0.5 ). Constant composition of biomass and building block means that G r,Block is the same across samples. Specific biomass composition used in this work (CH 1.8 N 0.2 O 0.5 ) has been widely used in the literature (Stephanopoulos et al., 1998;Kleerebezem and Van Loosdrecht, 2010), while experimental determination would enhance the calculation of Gibbs energies (LaRowe and Amend, 2016).
Determination of λ from Eq. (5) then requires the calculation of the Gibbs free energy G r,Block , which is assumed to be the same as G r,Ana in our formulation (where the elemental composition of biomass building block was treated as the same as Biom). With the value of λ calculated as such, one can now determine the value of G r,Dis from Eq. (4) and the full stoichiometric equation of metabolic reaction from Eq. (3). In the following section, we describe how to calculate the Gibbs free energy changes in a general context.

Gibbs Free Energy Change
The Gibbs free energy change of a reaction ( G r ) can be calculated from where G 0 r denotes the standard state Gibbs energy of a reaction, R is the gas constant [= 0.008314 kJ/(K · mol)], T is temperature Frontiers in Microbiology | www.frontiersin.org in Kelvin, and Q r is the reaction quotient defined by where a i and y i denote the activity and stochiometric coefficient of the i th chemical species. For aqueous systems, Gibbs free energy of a reaction is often calculated for the standard state where substances take unit activities (i.e., ln Q r = 0 and therefore G r reduces to G 0 r ) in a hypothetical one molal solution referenced to infinite dilution. Standard state Gibbs energies of a reaction ( G 0 r ) can be calculated based on standard state Gibbs energy of formation of all associated compounds ( G 0 i ) and their stoichiometry (y S i ), i.e., As the standard state assuming one molal (i.e., unity activities) of aqueous species (i.e., thus pH = 0) does not properly represent the state of biological cells, one can also define the biological standard state where pH is set to 7 (i.e., a H + = 10 −7 ).
The biological standard state (denoted as by G 0 r ) is then calculated by The above equation defines the biological standard state Gibbs energies for most cells (neutrophils), while acidophiles and alkaliphiles optimally grow at lower and higher pH than 7 (Amend and LaRowe, 2019; Keenleyside, 2019). The use of G 0 r is generally regarded acceptable for thermodynamic analysis of redox reactions in biochemical systems involving oxidative respiration (Kleerebezem and Van Loosdrecht, 2010).
For all compounds considered in this work except OC, the values of G 0 i in Eq. (8) are readily obtainable from public databases or the literature (see Table 1) (Kleerebezem and Van Loosdrecht, 2010;Dick, 2019). Theoretical estimation of the standard formation energy of OC ( G 0 OC ) using the group contribution theory is infeasible because the structural information of compounds is not available in FTICR-MS. Therefore, we estimated G 0 OC in the following way. First, we estimated the standard Gibbs free energy change for an electron donor half reaction ( G 0 r,D ) using the formula from LaRowe and Van Cappellen (2011). They provided a linear relationship between the nominal oxidation state of carbon (NOSC) and the Gibbs energies for the oxidation half reactions of OC represented on a C-mole basis ( G 0 Cox ), i.e., where NOSC is obtained from the exchanged electron moles in the half reaction, and the number of carbon in OC: where n e − is the number of electrons transferred in the half reaction (= y e − ) and a denotes the number of C elements in OC.
As G 0 Cox [kJ/C-mol] represents the Gibbs free energy per one C-mole, G 0 r,D is obtained simply by multiplying the number of carbon in OC, i.e., Then, the substation of G 0 r,D in the above into Eq. (8) leads to where G 0 r,D is standard Gibbs energy of an electron-donor half reaction, y D i denotes the stoichiometric coefficient of the i th compound in an electron-donor half reaction, and y D OC = −1 in our formulation (see Supplementary Text for the formulation of an electron donor half reaction). Once G 0 OC is known from Eq. (13), it is straightforward to calculate the standard (at pH = 0) and biochemical standard (at pH = 7) Gibbs free energy changes (i.e., G 0 r and G 0 r ) for any given reactions. Hereafter, we drop the superscript 0 to denote thermodynamic functions at pH = 7, while we still keep the superscript 0 to denote their values at pH = 0.

Reaction Kinetics
Thermodynamic theory by Desmond-Le Quemener and Bouchez (2014) enables formulating microbial growth kinetics from stoichiometric equations derived in the previous section (see also Supplementary Text for derivation). In the case of oxidative degradation of OC, the microbial growth on the i th OC (OC i ) can be represented by where µ max is the maximal specific growth rate, V h is the volume that a microbe can access for harvesting energy from the environment (thus termed harvest volume), y OC , O 2 -limited in excessive supply of OC (15) Consumption and production rates of other chemicals (such as OC, C, O 2 , and HCO − 3 ) can be obtained simply by multiplying stoichiometric coefficients and µ i in Eq. (14) or (15), i.e., Finally, it is straightforward to set up dynamic mass balances. For example, in a homogeneous batch configuration, dynamic SXMs for key metabolites that are associated with oxidative respiration of OC i can be written as follows:

Integration With MXM and EXM
The SXM derived in Eq. (17) can be combined with other complementary approaches. Integration of SXM with MXM requires explicit consideration of distinct microbial groups, which leads to where the subscript j denotes the contribution of the j th microbial group to the production and consumption of metabolites, and µ i,j is the growth rate of the j th microbial group on OC i , i.e., Note that this formulation accounts for the difference in growth rate among microbial groups. Production and consumption rates of metabolites are accordingly dependent on microbial groups, i.e., For the integration of SXM with EXM, we introduce enzyme concentrations as additional variables. Dynamic mass balances in Eq. (17) remain the same, but we consider individual reactions to be catalyzed by distinct enzymes, i.e., where the reaction rate (µ i ) is composed of two parts: enzymatic regulation (e rel i ) and kinetic conversion (µ kin i ), and e rel i denotes relative level of enzyme, i.e., Enzyme concentrations and their maximal values can be determined from the following equation: where three terms on the right hand side of the above equation respectively denote constitutive enzyme synthesis rate, inductive enzyme synthesis rate, and enzyme degradation rate, u i is the variable that controls inductive enzyme synthesis, and r kin E,i is the kinetic rate of enzyme synthesis. We formulated the control variable u i based on the cybernetic modeling approach Song, 2012, 2018), which views microorganisms as a dynamic control system that maximizes a given metabolic objective (such as biomass production rate) through optimal synthesis of enzymes. The cybernetic model has been successfully applied to simulate complex microbial growth patterns on multiple nutrients and electron acceptors (Song and Liu, 2015;Song et al., 2017). Instead of solving separate enzyme equations, we considered a simplified form of cybernetic model  where e rel i in Eq. (17) is approximated by u i Ramkrishna, 2010, 2011), which is in turn determined based on µ kin i . Then, Eq. (17) becomes where

Field Samples
To demonstrate our modeling concept described above, we used the datasets generated in a previous work, described in detail in Graham and colleagues (Graham et al., 2017(Graham et al., , 2018. Briefly, sediment profiles (0-60 cm) were collected along two shoreline transects perpendicular to the Columbia River within the Hanford Site 300 Area in eastern Washington State (Graham et al., 2016(Graham et al., , 2017Slater et al., 2010;Zachara et al., 2013): one with riparian vegetation (HA zone) and the other without (LA zone).
In each transect, profiles were collected from three locations: upper, mid, and lower banks, and sectioned into 10 cm vertical intervals. FTICR-MS was used to characterize C chemistry in each sample, and Raz reduction assay was performed as a proxy of the rate of aerobic respiration per sample. For each peak detected in FTICR-MS spectra, we assigned a chemical formula using the following steps: (1) transformation of raw spectra to m/z (i.e., mass divided by charge number) values using BrukerDaltonik software; (2) chemical formula assignment using in house software following the Compound Identification Algorithm (Kujawinski and Behn, 2006;Minor et al., 2012;Tfaily et al., 2017). In the case that one m/z value can be matching with multiple chemical formulae, consistent assignment was made based on a set of prescribed rules, including the preference of the formula with the lowest error and with the lowest number of heteroatoms and the requiring the presence of at least four oxygen atoms for the assignment of one phosphorus atom. Peaks not satisfying these criteria were not assigned chemical formulae. For detailed analysis and comparison of two sites, we chose one sectioned sample from the LA and HA profiles: (1) Upper bank at unvegetated site (N1), 40-50 cm depth interval (sample N1-40-50) and (2) Upper bank at vegetated site (S1), 0-10 cm depth interval (sample S1-00-10). These two samples respectively represent the lowest and highest activity in aerobic respiration except an outlier (see section "Results: Comparison of Low-Activity and High-Activity Samples").

Evaluation of Thermodynamic Parameters as an Indicator of Oxidative Respiration
Previous studies have shown that aerobic respiration rates from HA samples were higher than LA samples and provided the hypothesis that thermodynamic properties of OM might be a key factor in regulating respiration rate (Graham et al., 2017(Graham et al., , 2018. While the original papers provided datasets that convincingly support this hypothesis, the question of what thermodynamic parameters (among many introduced in section "Materials and Methods") can serve as reliable indicators of aerobic respiration remains unknown. We therefore chose multiple thermodynamic parameters and examined to what extent individual parameters are correlated with aerobic respiration. Three key parameters include Gibbs free energy changes for electron donor half reactions ( G r,D ), Gibbs free energy changes for catabolic reactions ( G r,Cat ), and the parameter quantifying the energy coupling between catabolic and anabolic reactions (λ). The first two parameters ( G r,D and G r,Cat ) evaluate the thermodynamic character of OM with a focus on the energy generation (through a half or complete catabolic reaction, respectively). By contrast, the parameter λ quantifies the same based on the energy balance (i.e., the total energy generation to meet the demand for the synthesis of a unit C-mole of biomass). Therefore, all of these three parameters measure thermodynamic inefficiency of OM (in the sense that lower values of parameters imply higher thermodynamic efficiencies of OM), but at different levels, i.e., half/complete catabolic reactions for G r,D and G r,Cat , and entire metabolic reaction for λ. We examined their correlations with experimentally determined aerobic respiration at pH = 0 and pH = 7 (Figure 2). We also examined whether the thermodynamic favorability of compound can be better characterized on a C-mole or OM-mole basis (Supplementary Figure S1).
Regardless of pH values, G r,Cat and λ showed negative correlations with aerobic respiration, which implies that thermodynamic properties of OM lead to the difference in respiration rates (the middle and right panels in Figures 2A,B). By contrast, the results for G r,D were pH-dependent, i.e., G r,D showed a positive correlation with aerobic respiration at pH = 0 (the left panel of Figure 2A) (indicating that G r,D is not a good estimator of aerobic respiration under this condition), while it showed a negative correlation at pH = 7 (the left panel of Figure 2B).
As C-mole, the parameter λ still showed negative correlations with aerobic respiration at both pH values (Supplementary  Figure S1). G r,D and G r,Cat showed weak relationships with aerobic respiration with correlation coefficients between −0.1 and 0.1, except for G r,D at pH = 7 where the correlation coefficient was positive.
Together, these results identify the parameter λ as the most robust indicator of aerobic respiration, which shows consistent correlations across all conditions considered above. The results also indicate that the usefulness of the other thermodynamic parameters ( G r,D and G r,Cat ) as a respiration indicator is relatively limited.

Model Validation by Comparing Predicted Reaction Rates With Aerobic Respiration
Comparison of predicted reaction rates with experimentally determined aerobic respiration provides a means to validate the model. The experiment of Raz reduction assay in colleagues (Graham et al., 2017, 2018) was designed as a proxy measurement for the rate of oxygen consumption. For validation, we checked to what degree our model can predict oxygen consumption rates that are positively correlated with experimental estimation. As reaction rates are functions of substrate concentrations, we performed this comparison under three different limiting conditions: (1) C, (2) O 2 , and (3) both C&O 2 limitation. For each, we differentiated the level of limitation to be severe vs. moderate.
For simplicity, we set µ max = 1 in Eqs. (14) and (15) (Figure 3) and 0.2 (severe limitation), respectively (Figure 4) in the following four reaction rates: biomass production (i.e., growth) rate (µ), C consumption rate [r C ≡ (# of C) × r OC ], O 2 consumption rate (r O 2 ), and inorganic carbon production rate (r HCO − 3 ). For the case of moderate substrate limitation (Figure 3), for example, predicted µ showed a positive correlation under the C-limited condition (meaning that microbial growth rate is increasing with respiration rate), which was however turned into negative correlations under O 2 − or both C&O 2 -limited conditions (meaning that microbial growth rate decreases as respiration rate increases). Similar patterns were observed for |r C | (i.e., the absolute value of r C ) and r HCO − 3 , while their  correlations with aerobic respiration were weak under the C-limited condition. As an exception, the correlation of |r O 2 | with aerobic respiration was consistently positive across all three different limiting conditions. As mentioned above, this result partially validates our model because Raz reduction to resorufin represents an estimate of the consumption rate of O 2 , rather than other chemicals (Gonzalez-Pinzon et al., 2012).
The results above were significantly changed when substrate limitation was severe (Figure 4). In this case, correlation patterns among four different reaction rates were shown to be similar. That is, they all showed positive correlations with aerobic respiration under C-limited conditions and negative correlations under O 2 − and both C&O 2 -limited conditions. Interestingly, the positive correlation of aerobic respiration under C-limited conditions was the highest for microbial growth and O 2 consumption rates. Together with the previous result, this shows that experimental data in Graham et al. (2017) may be consistently interpreted as being C-limited.

Comparison of Low-Activity and High-Activity Samples
To understand how biogeochemistry in low-and high-activity zones is differentiated, we performed detailed analyses of two selected samples: one from the low-activity zone (sample N1-40-50) and the other from the high-activity zone (sample S1-00-10) (see section "Materials and Methods"). With outliers removed, these two samples represent the highest and lowest activity as shown in the correlations of aerobic respiration with λ and r O 2 (Supplementary Figure S2).
Distributions of all three thermodynamic parameters (λ, G r,Cat , and G r,D ) indicated that respiration reactions of the OM pools at HA are thermodynamically more favorable than those at LA (Figure 5). The distributions of parameter λ in the HA and LA samples, respectively, showed an exponential decay and bell-shaped patterns, indicating that the HA sample contained a predominantly large portion of OM, metabolic reactions of which are thermodynamically more favorable in comparison to the LA sample ( Figure 5A). By contrast, the distributions of G r,Cat ( Figure 5B) and G r,D (Figure 5C) suggested that the HA sample contained a lower portion of OM, catabolic and half reactions of which are less favorable compared to the LA sample.
We further compared distributions of model-predicted oxidative respiration rates (|r O 2 |) at LA vs. HA under C-, O 2 -, and both C&O 2 -limited conditions. In the case that substrates were moderately limited (i.e., V h [OC] = 1 and/or V h [O 2 ] = 1) (Figure 6), predicted oxidative respiration at HA was higher than at LA, when C or both C&O 2 were limited (as indicated by higher portions of faster |r O 2 | in the HA distribution) (Figures 6A,C); however, when O 2 was limited, there was no significant difference between the two samples ( Figure 6B).
Under severe substrate limitation (i.e., V h [OC] = 0.2 and/or V h [O 2 ] = 0.2) (Figure 7), predicted oxidative respiration was FIGURE 5 | Box plots (left) and histograms (right) for distributions of three key thermodynamic functions: (A) the energy coupling parameter (λ), (B) Gibbs free energy change for catabolic reaction ( G r,Cat ), and (C) Gibbs free energy change for an electron donor half reaction ( G r,D ). LA, low activity zone; and HA, high activity zone.
higher at HA and at LA when C was limited (Figure 7A), while the increase in oxidative respiration at HA was moderate when O 2 or C&O 2 were limited (Figures 7A,B). Interestingly, when C was limited, |r O 2 | showed bimodal distributions in both HA and LA samples, indicating the non-linear relationship between thermodynamic parameters to reaction rates.
Ratios of predicted reaction rates between the two samples (HA/LA) showed the level of elevated oxidative respiration in the HA zone (Figure 8). Although these ratios varied among reaction rates and were affected by the level of substrate limitation, their trends across three different substrate-limited conditions were consistent. That is, in cases of moderate ( Figure 8A) and severe ( Figure 8B) substrate limitation, we consistently found: (1) that the ratios of reaction rates were lowest under O 2 limitation, intermediate when both C and O 2 were limited, and highest under C limitation; and (2) that the magnitude of the ratio always maintained the following order: where |r O 2 | = respiration rate; µ = microbial growth rate; |r C | = carbon consumption rate; r HCO − 3 = bicarbonate production rate). Lastly, we performed dynamic simulations of OM consumption in LA and HA using two types of models: (1) SXM, and (2) a hybrid model that integrates SXM and EXM. The latter is a special case of a general IBM platform that hybridizes three complementary modeling platforms approaches: SXM-EXM-MXM. In hybridizing SXM with EXM, we used the cybernetic approach Song, 2012, 2018) to account for regulation of enzyme synthesis to simulate selective activation of oxidative reactions among a number of pathways (see section "Materials and Methods"), a key aspect that was missing in the case of using the SXM alone. Although both SXM and hybrid models correctly predicted    that specific oxidative reaction rates at HA were higher than at LA (Supplementary Figure S3), the time scale of the SXM was shown to be very small, which led specific reaction rates to be unrealistically high (i.e., more than 300 mol/CMB/d) (Supplementary Figures S3a,b). By contrast, the hybrid model predicted specific reaction rates consistent with the literature values (Supplementary Figures S3c,d). We further compared the two models with respect to how specific OM consumption rates would change with the number of incorporated compounds. As specific rates are defined per unit C-mole of biomass, their dependency on the number of compounds is expected to be insignificant, but specific rates calculated by the SXM linearly increased with the number of compounds (Figure 9A), while the hybrid model predicted specific rates to be almost constant regardless of the number of compounds ( Figure 9B). These results together indicate that SXM-EXM coupling significantly improved model predictions.

DISCUSSION
In this work, we proposed a novel and flexible biogeochemical modeling concept, termed SXM. As a key feature, SXM uses thermodynamics to incorporate data from increasingly highresolution metabolomics technologies into biogeochemical models by formulating OM-specific reaction kinetics for an unlimited number of organic compounds in a sample. The entire set of the resulting reaction kinetics is then represented by only two parameters (i.e., maximal growth rate and harvest volume). Our framework is a unique and scalable tool for modeling complex biogeochemical cycles at the ecosystem-scale, as no other approach can describe dynamic biogeochemical reaction networks composed of thousands of compounds with a small, computationally feasible set of parameters.
The substrate-explicit model is built upon recent experimental studies that reveal a close relationship between OM thermodynamics and aerobic respiration (Graham et al., 2017(Graham et al., , 2018Garayburu-Caruso et al., 2020). In consistent with the conclusions of these studies, our test cases showed that aerobic respiration was driven by thermodynamic favorability of metabolic reactions involving OM pools (as shown by the correlation of measured respiration rates with a thermodynamic parameter λ). These results together challenge classical theories that use the concentrations of bulk substrate pools (such as organic carbon and FIGURE 10 | The integrated biogeochemical modeling (IBM) concept that combines MXM, SXM, and EXM, which are respective representations of significant expansions from typical lumped models by integrating multi-omics data to identify functional contributions of individual organisms and/or functional guilds, substrate-specific degradation pathways, and detailed enzymatic processes. Consequently, the IBM may enable providing a mechanistic understanding of dynamic linkage and interactions among substrates, enzymes, and microbes at a molecular level and significantly improving the performance of complex ecosystem models in predicting OC consumption and CO 2 emission in space and time. COC, complex organic carbon; LOC, labile organic carbon; Enz, enzyme; Biom, biomass; Mic, microbes; and Sub, substrates. oxygen) as the sole driving factors of aerobic respiration, notably excluding the influence of OM chemistry on biogeochemistry, highlighting the criticality of incorporation of OM thermodynamics into our understanding and modeling of aerobic respiration.
In this regard, it has been customary to use standard state Gibb energies ( G 0 r ) in accounting for thermodynamics in biogeochemical modeling, but improved calculation of thermodynamic functions requires accounting for realistic non-standard conditions (i.e., activities of aqueous species deviating from one molal) (Larowe and Amend, 2015;Amend and LaRowe, 2019). While accurate values of actual Gibbs energies can be obtained by rigorous consideration of activities of all associated chemical species through the reaction quotient term Q r , we used thermodynamic functions corrected to the biological standard state ( G 0 r ) (i.e., pH = 7) as a partial remedy in the absence of information on activities of compounds other than H + . As a result, G 0 r (which is denoted as G r without the superscript in this article) provided the more interpretable results than G 0 r . G 0 Cox [kJ/C-mol] is one of the most frequently used metrics to evaluate thermodynamic favorability of compounds. However, our model suggests λ as a new thermodynamic parameter to evaluate thermodynamic character of OM. In contrast with G 0 Cox , which considers energy generation only through an electron donor half reaction, λ accounts for the energy balance between catabolic and anabolic reactions, thus evaluating thermodynamic properties of compounds based on the complete chemistry. Indeed, λ consistency showed negative correlations with aerobic respiration, regardless of pH values.
Our model was validated through a consistency check between predicted oxygen consumption rates with experimentally determined aerobic respiration. Across all three conditions (C, O 2 and, both C&O 2 limitation), aerobic respiration showed positive correlations with oxygen consumption rate when substrates are moderately limited. By contrast, in the case of severe C and O limitation, a positive correlation between oxygen consumption rate and aerobic respiration was obtained only for C-limited conditions. This matches field conditions for the study system in which OC concentrations are low and porewater within saturated sediments is consistently aerobic (Stegen et al., 2018b). These results together suggest that the field data is consistently interpretable due to C limitation and O 2 excess. In support of this, recent experimental work has shown a dependency of aerobic respiration on OM thermodynamics when C is limited but has shown no effect of C thermodynamics on respiration when C is widely available .
In conventional dynamic biogeochemical modeling approaches, the difficulty in finding reliable parameter values is often a major hurdle in scaling up to large-scale complex systems because the number of kinetic constants is increasing in proportion to the number of compounds. As mentioned earlier, this barrier is overcome by our modeling approach that formulates compound-specific reaction rates with only two parameters, i.e., maximal growth rate (µ max ) [1/h] or [1/d] and harvest volume (V h ) [m 3 ] or [L]. They are tunable parameters, which are quantitatively determinable via data fitting, e.g., when dynamic concentration profiles of substrates are available. Our model can guide experimental design to identify parameters and understand factors driving their variation across systems. In the present study using OM profiles from FTICR-MS, lack of quantitative information (i.e., concentrations) on individual OM molecules is an intrinsic limitation. Consequently, we focused on evaluating carbon quality based on the normalized distribution of OM, but this gap can be filled for improving predictions, e.g., through the integration with other complementary metabolomics approaches that can provide quantitative data (Hertkorn et al., 2013;McCallister et al., 2018).
The capability of our method that incorporates highresolution mass spectrometry (or OM characterization methods) data into biogeochemical modeling greatly facilitates other research programs in the field that collect OM chemistry datasets. The Worldwide Hydrobiogeochemistry Observation Network for Dynamic River Systems (WHONDRS), for example, is a global research consortium that aims at understanding crossscale dynamic interactions among hydrology, biogeochemistry, and microbiology in river corridors (Stegen and Goldman, 2018). As an initial effort, WHONDRS provides the collection of highresolution OM profiles such as FTICR-MS data across rivers in the world (Stegen et al., 2018a;Chu et al., 2019;Danczak et al., 2019Danczak et al., , 2020Garayburu-Caruso et al., 2019;Goldman et al., 2019;Renteria et al., 2019;Stegen et al., 2019;Wells et al., 2019). Other networked efforts, such as the National Ecological Observation Network (Teeri and Raven, 2002;Barnett et al., 2019), provides similar kinds of data that are amenable for analysis via SXM. Analysis of these data, which are collected and analyzed consistently across systems, using the SXM framework will significantly improve our understanding of the level of heterogeneity across space and time in OM consumption and respiration, and thus could be used as a critical tool for more mechanistic predictions of spatial and temporal variation in stream/river CO 2 emissions and other coupled biogeochemical rates from local to global scales.
As shown, accounting for enzymatic regulation in SXMs is critical for more reasonable simulations. Considering metabolic regulation can be best described by EXMs with direct association with specific enzymes, this results suggests that a synergistic integration of SXMs with other existing frameworks is an important future direction to fill the gaps that each approach has and to address advanced science questions that could not be addressed individually. We illustrate a conceptual master platform of integrative biogeochemical modeling (IBM) that accounts for the interaction among substrates, microbes, and enzymes at an unprecedented level of detail (Figure 10). We envision that the IBM will significantly contribute to create a molecular-level understanding of biogeochemistry that can be translated to complex ecosystem modeling. In this regard, the SXM concept proposed in this work provides a key component of IBM that has been missing so far.

DATA AVAILABILITY STATEMENT
All relevant data and numerical codes including KBase apps for computing thermodynamic parameters and reaction rates are available at: https://github.com/hyunseobsong/lambda.

AUTHOR CONTRIBUTIONS
H-SS, JCS, EBG, VAG-C, and TDS conceptualized the study. H-SS developed the codes and performed the theoretical analyses. J-YL, XC, and JDM developed the SDK Apps for the implementation of the codes in KBase. J-YL and WCN contributed to the data processing. H-SS drafted out the manuscript, which was edited by JCS and EBG. All the authors contributed to the writing.