# FLUXOMICS AND METABOLIC ANALYSIS IN SYSTEMS MICROBIOLOGY

EDITED BY : Wei Xiong, Yinjie Tang and Lars Keld Nielsen PUBLISHED IN : Frontiers in Microbiology

#### Frontiers Copyright Statement

© Copyright 2007-2019 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.

The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.

Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.

Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.

As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.

All copyright, and all rights therein, are protected by national and international copyright laws.

The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use. ISSN 1664-8714 ISBN 978-2-88963-069-1 DOI 10.3389/978-2-88963-069-1

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# FLUXOMICS AND METABOLIC ANALYSIS IN SYSTEMS MICROBIOLOGY

Topic Editors:

Wei Xiong, National Renewable Energy Laboratory (DOE), United States Yinjie Tang, Washington University in St. Louis, United States Lars Keld Nielsen, University of Queensland, Australia

Citation: Xiong, W., Tang, Y., Nielsen, L. K., eds. (2019). Fluxomics and Metabolic Analysis in Systems Microbiology. Lausanne: Frontiers Media. doi: 10.3389/978-2-88963-069-1

# Table of Contents


Martin Beyß, Salah Azzouzi, Michael Weitzel, Wolfgang Wiechert and Katharina Nöh

*41 13C-Metabolic Flux Analysis Reveals Effect of Phenol on Central Carbon Metabolism in* Escherichia coli

Sayaka Kitamura, Yoshihiro Toya and Hiroshi Shimizu

*49 EMUlator: An Elementary Metabolite Unit (EMU) Based Isotope Simulator Enabled by Adjacency Matrix*

Chao Wu, Chia-hsin Chen, Jonathan Lo, William Michener, PinChing Maness and Wei Xiong


Penghui He, Ni Wan, Dongbo Cai, Shiying Hu, Yaozhong Chen, Shunyi Li and Shouwen Chen

*93 Tandem Mass Spectrometry for 13C Metabolic Flux Analysis: Methods and Algorithms Based on EMU Framework*

Jungik Choi and Maciek R. Antoniewicz

*101 Flux Connections Between Gluconate Pathway, Glycolysis, and Pentose–Phosphate Pathway During Carbohydrate Metabolism in*  Bacillus megaterium *QM B1551*

Julie A. Wushensky, Tracy Youngster, Caroll M. Mendonca and Ludmilla Aristilde


## Membrane-Inlet Mass Spectrometry Enables a Quantitative Understanding of Inorganic Carbon Uptake Flux and Carbon Concentrating Mechanisms in Metabolically Engineered Cyanobacteria

*Damien Douchi1 , Feiyan Liang2 , Melissa Cano1 , Wei Xiong1 , Bo Wang1 , Pin-Ching Maness1 , Peter Lindblad2 and Jianping Yu1 \**

*1 Biosciences Center, National Renewable Energy Laboratory, Golden, CO, United States, 2 Microbial Chemistry, Department of Chemistry-Ångström, Uppsala University, Uppsala, Sweden*

#### *Edited by:*

*Marc Strous, University of Calgary, Canada*

#### *Reviewed by:*

*Kathleen Scott, University of South Florida, United States Xuefeng Lu, Qingdao Institute of Bioenergy and Bioprocess Technology (CAS), China*

> *\*Correspondence: Jianping Yu jianping.yu@nrel.gov*

#### *Specialty section:*

*This article was submitted to Microbial Physiology and Metabolism, a section of the journal Frontiers in Microbiology*

> *Received: 31 October 2018 Accepted: 31 May 2019 Published: 25 June 2019*

#### *Citation:*

*Douchi D, Liang F, Cano M, Xiong W, Wang B, Maness P-C, Lindblad P and Yu J (2019) Membrane-Inlet Mass Spectrometry Enables a Quantitative Understanding of Inorganic Carbon Uptake Flux and Carbon Concentrating Mechanisms in Metabolically Engineered Cyanobacteria. Front. Microbiol. 10:1356. doi: 10.3389/fmicb.2019.01356*

Photosynthesis uses solar energy to drive inorganic carbon (Ci) uptake, fixation, and biomass formation. In cyanobacteria, Ci uptake is assisted by carbon concentrating mechanisms (CCM), and CO2 fixation is catalyzed by RubisCO in the Calvin-Benson-Bassham (CBB) cycle. Understanding the regulation that governs CCM and CBB cycle activities in natural and engineered strains requires methods and parameters that quantify these activities. Here, we used membrane-inlet mass spectrometry (MIMS) to simultaneously quantify Ci concentrating and fixation processes in the cyanobacterium *Synechocystis* 6803. By comparing cultures acclimated to ambient air conditions to cultures transitioning to high Ci conditions, we show that acclimation to high Ci involves a concurrent decline of Ci uptake and fixation parameters. By varying light input, we show that both CCM and CBB reactions become energy limited under low light conditions. A strain over-expressing the gene for the CBB cycle enzyme fructose-bisphosphate aldolase showed higher CCM and carbon fixation capabilities, suggesting a regulatory link between CBB metabolites and CCM capacity. While the engineering of an ethanol production pathway had no effect on CCM or carbon fixation parameters, additional fructose-bisphosphate aldolase gene over-expression enhanced both activities while simultaneously increasing ethanol productivity. These observations show that MIMS can be a useful tool to study the extracellular Ci flux and how CBB metabolites regulate Ci uptake and fixation.

Keywords: MIMS, carbon uptake rate, cyanobacteria, FbaA, carbon fixation

### INTRODUCTION

Photosynthesis has been responsible for decreasing CO2 in the atmosphere from 30–35% to 0.04% over the last 3 billion years or so (Blankenship, 2010) and is the major physicochemical process that generates organic molecules and, as such, supports most of the life on Earth. The main enzyme responsible for this CO2 fixation, ribulose-1,5-bisphosphate carboxylase/oxygenase

**4**

(RubisCO) appeared 2.5 billion years ago and has evolved to adapt to the decreasing CO2 concentrations and concurrently increasing O2 levels (Whitney et al., 2011). CO2 and O2 are competitive substrates for RubisCO, leading to either CO2 fixation (carboxylation) or photorespiration (oxygenation). As a result, photosynthetic organisms have evolved various strategies to favor carboxylation over oxygenation by increasing carbon availability for RubisCO.

In cyanobacteria, the inorganic carbon (Ci) is concentrated in the cytoplasm to levels over 100 times the external concentration in air-saturated condition (Woodger et al., 2005) *via* carbon concentrating mechanisms (CCM; **Figure 1**). At least five different uptake proteins or complexes are involved in this process, each with different affinities for Ci. Among them are the BicA and SbtA, sodium/bicarbonate symporters, powered by a sodium gradient across the plasma membrane. NdhD3 and D4 were proposed to be involved in regenerating this sodium gradient, powered by NADPH or Ferredoxin (Wang et al., 2004; Woodger et al., 2007). The involvement of plasma membrane sodium/proton antiporters and ATPase was also hypothesized (Kamennaya et al., 2015). Another bicarbonate transporter is the Bct1 complex, which has its own ATPase activity. In addition, the CO2 uptake systems NDH1-3 and NDH1-4 directly convert the CO2 to bicarbonate in the cytoplasm using energy from photosynthetic or respiratory thylakoid electron flow (Battchikova et al., 2011; Artier et al., 2018), locking incoming CO2 which diffuses freely across the membrane and limiting Ci leakages.

RubisCO is confined to a bacterial micro-compartment, the carboxysome, along with carbonic anhydrase enzymes, which converts HCO3 - into CO2. Together the membrane transporters, cytoplasm and carboxysome carbonic anhydrases

chain (PETC), inorganic carbon (Ci) uptake and fixation, and the anabolic metabolism. Orange arrows indicate chemical energy fluxes. Black arrows indicate carbon flux.

form a Ci conduit from the medium to the carboxysome, so that the concentration of CO2 within the carboxysome is up to 4,000-fold higher than it is externally (Sültemeyer et al., 1995; Price et al., 1998; Kaplan and Reinhold, 1999; Woodger et al., 2005). RubisCO converts one molecule of ribulose bis-phosphate (RuBP) and one molecule of CO2 to two molecules of 3-phosphoglycerate (3PG), which enter the Calvin-Benson-Bassham (CBB) cycle. While RuBP is regenerated, other intermediates are formed and connect to the central carbon metabolism and anabolic pathways, feeding the production of cellular constituents.

An in-depth understanding of photosynthetic mechanism, such as the CCM, requires a more comprehensive, systemslevel approach to measure photosynthetic carbon flux within a biochemical network. With respect to photosynthetic organisms, the intracellular carbon flux can be determined through a new 13CO2/NaH13CO3 labeling approach: isotopically nonstationary metabolic flux analysis (INST-MFA) (Adebiyi et al., 2015). This method allows the estimation of relative photosynthesis and photorespiration fluxes yielding sugar phosphates, organic acids, and other intracellular metabolites in a model phototroph such as *Synechocystis* sp. PCC 6803 (hereafter *Synechocystis*). To obtain an absolute quantification of fluxomic values, an isotope tracer experiment must be coupled to fundamental determinants of *in vivo* cell physiology. For example, measurements of cell specific rates of nutrient uptake and product formation (i.e., normalized to cell density) allow for intracellular flux calculations using INST-MFA. The measured Ci fixation kinetics is a key input to these methods, because they constrain the solution space of feasible intracellular fluxes. Therefore, an accurate estimation of Ci fixation kinetics and their associated uncertainties is an essential task in the construction of accurate metabolic flux maps for phototrophs.

*In vivo* measurement of Ci utilization rates in a photosynthetic system is challenging. In aqueous solution, dissolved CO2 exists in equilibrium with bicarbonate ions, and both forms can be taken up into photosynthetic cells. Ci uptake can be measured by various methods. We have previously used sealed tubes and gas chromatography to measure the difference of Ci concentration over time (Xiong et al., 2015). This method provides the averaged Ci uptake rate over a longer time (hours) but cannot distinguish Ci uptake from fixation kinetics nor measure real time performance at specific Ci concentrations. The isotope labeling method has also been widely used. However, it has a higher leakage rate and suffers from interference caused by non-labeled Ci brought into the growth medium by some of its constituents (Eichner et al., 2015). An alternative technique is to measure gas phase CO2 by infrared absorption (Oakley et al., 2012). This technique requires higher cell density, which causes self-shading that affects the reliability of experiments. It also requires large culture volumes and is dependent on the equilibrium of CO2 between the gas phase and the liquid phase, which can differ under various conditions including temperature, pressure, medium composition, and pH. Additionally, water vapor has a high absorption coefficient in the infrared region, making the efficiency of the desiccant or compensation mechanisms critical for reliable measurements, especially at very low Ci concentrations. The measurement of O2 evolution at different Ci concentrations is also a common method to determine the overall affinity of cells for Ci (Woodger et al., 2005). However, this is an indirect measurement and is complicated by the existence of multiple O2 consumption pathways in cyanobacteria and other phototrophs.

In the 1960s, a technique was developed based on the permeation of gases through a silicone membrane which separate the culture media from the high vacuum line that leads to the detector of a mass spectrometer (Hoch and Kok, 1963). This experimental setup, now called MIMS (Membrane-Inlet Mass Spectrometry), was used to measure algal Ci consumption directly and in real time in the growth medium (Radmer and Kok, 1976). This instrument measures only small molecules that can pass through the silicone membrane. While CO2 is detectable, carbonates and bicarbonates are not permeable. Therefore, the measure of total inorganic carbon relies on the existing equilibrium that results from the fast interconversion of CO HC 2 3 O CO3 « « <sup>2</sup>- , which is highly sensitive to the medium pH.

While MIMS was used to study cyanobacterial CCM in the past (Espie et al., 1988; Miller et al., 1988; Tchernov et al., 2003), it has not been widely used in recent years. We adapted this method to measure Ci uptake flux in engineered cyanobacteria quantitatively by determining the extracellular Ci consumption rate per Ci concentration in real time. The kinetic parameters at various Ci concentrations are dependent on intracellular physiology and used to differentiate Ci uptake by CCM and Ci consumption by the carbon fixation machinery. We found that CBB engineering influences the regulation of the Ci uptake as well as fixation activities. We compared Ci uptake flux in wild-type (WT) *Synechocystis* and engineered strains over-expressing fructose-bisphosphate aldolase gene (*fbaA*) either in a WT background or in an ethanol-producing strain (Liang and Lindblad, 2016, 2017). *fbaA* over-expression affects the carbon metabolites in CBB cycle and likely Ci fluxes. Interestingly, the results indicate a positive relationship between the activity of the CBB cycle and the kinetics of CCM, which will help inform genetic engineering strategies of cyanobacterial central carbon metabolism for enhanced carbon utilization.

#### MATERIALS AND METHODS

#### Growth of Cyanobacteria

The *Synechocystis* strains used in this study were previously reported (Liang and Lindblad, 2016, 2017). We compared here the WT strain (empty vector control strain with kanamycin resistance, named Km in the below Liang et al. papers) to over-expressing strains generated *via* the same vector. They were grown on a modified BG-11, without added carbonate or bicarbonate, and supplemented with 50 mM NaCl and 20 mM TES, 50 mg/L kanamycin, and filter sterilized. The pH was adjusted to 7.4. Plates were kept under 5% CO2, 30°C, 50 μE m−2 s−1 provided by cool white fluorescent light tubes. The physiological tests were carried out after at least 48 h of incubation in 100 ml of BG-11 in 250 ml baffled Erlenmeyer flasks, continuously air bubbled at 100 ml min−1, 30°C, 100 μE m−2 s−1 of white LED light (4,500 K). The OD730 was always kept under 1, and the culture was diluted about 16 h prior to the experiment and harvested at an OD730 of about 0.4.

The MIMS set up, operation, and data processing are detailed in the section "Results."

#### Doubling Time

Cultures with an OD730 < 1 pre-acclimated for at least 48 h at 100 μE m−2 s−1 with air bubbling were diluted to an OD730 of about 0.050 in pre-warmed 30°C BG-11. The first density measurement was taken 1 h after the dilution. A second measurement was carried out after 16 h, when the OD730 was still under 0.5. These two measurements were used to calculate the doubling time.

## Net O2 Production

Cultures grown overnight with OD730 lower than 0.4 were spun down and resuspended in fresh BG-11 medium to OD730 of 0.7–0.8. Samples were incubated in 50 ml falcon tubes in the dark under agitation for 30 min. About 9 ml aliquots of the sample were transferred to the measurement cuvette in the dark and were stirred at 700 rpm and air bubbled for 10 min with 0.0022% antifoam 204, kept at 30°C by water bath. Measurement by MIMS started with this 10-min dark incubation. The air bubbling lead to an air saturated sample and used to calibrate at 278 μM O2. When the incubation was completed, the bubbling was turned off, and after 10–20 s, the light was turned to 150 μE m−2 s−1 (LED, 6,000 K) for 4 min to measure oxygen production. The measurement was maintained for 4 more minutes in the dark to measure respiration. Finally, the sample was bubbled with N2 to a steady state to calibrate at 0 μM O2.

#### High Ci Acclimation

WT strain was cultured in BG-11 medium (with 50 mM NaHCO3 replacing the NaCl) for at least 3 days prior the measurement (re-diluted every 24 h), without air bubbling, but under the same stirring, temperature, and light conditions. The cells were harvested by centrifugation with the same procedure described in the Ci uptake method, except that the cells went through an extra step of washing with a medium without HCO3 - .

#### Chlorophyll Content Measurement

WT strain and the *fbaA* OE strain were grown overnight from a healthy low Ci-acclimated culture. About 14 ml of culture with OD730 < 0.4 were spun down at 2,500 g for 5 min at room temperature, and the pellet was resuspended in 2 ml of −80°C methanol and left overnight at −80°C. After another spinning, the absorbance of the supernatant was measured at 665 and 720 nm. The chlorophyll content was calculated using an equation of μg Chla/ml = 12.9447\* (OD665-OD720) (Ritchie, 2006).

#### RESULTS

#### MIMS Experimental Setup and the Biological Significance of the Measurement

It is widely accepted that CO2 diffuses through biological membranes by using aquaporins and a number of other ways (Tchernov et al., 2001). Under high Ci levels at pH 7–8, bicarbonate transporters and other constituents of the CCM are bypassed by the saturating flow of CO2 entering the cell. In this work, Ci uptake was studied under conditions where all CCM components are actively involved. Air bubbling into the cultures was used to maintain a low but constant Ci supply while avoiding over accumulation of the O2 generated by the photosynthetic electron transport chain (PETC). Cultures were maintained in BG-11 medium below an OD730 of 0.4, where self-shading is minimal, and nutrients are replete. Cells were harvested by centrifugation and re-suspended in fresh

in this paper. (C) The MIMS experimental setup used in this study.

BG-11 media to an OD730 of 0.5. For consistent results in this measurement, the same media batch was used for all compared samples and replicas. About 9 ml aliquots of this suspension were subjected to CO2 concentration monitoring by MIMS (HIDEN HAS-301-1503A) using the SEM detector (settle time 400 ms, dwell time 2,750 ms). The sample was injected into a 20 ml measurement cuvette, which was custom designed and crafted (Allen Glass, Boulder, Colorado; **Figure 2**). In such conditions, buildup of photosynthetic oxygen was minor, which was confirmed by MIMS measurements. The pH was observed to be constant during the course of the experiment due to the presence of 20 mM TES in BG11 media, compared to the 100 μM of Ci consumed. The suspension was completed with 0.0022% antifoam 204 (Sigma), stirred at 700 rpm, and bubbled with air for 8 min 30 s in 10 μE m−2 s−1 (room light). The MS measurement was then commenced, and the bubbling stopped (T0). The measurement vessel was sealed, and the light was turned on to about 200 μE m−2 s−1 (white LED at 6,000 K). Dissolved CO2 was measured over time, from T0 when the sample is air saturated, to T-final when CO2 consumption becomes undetectable. A calibration was performed after each sample, where zero Ci was set by bubbling the sample with N2 for 10 min, and 100 μM Ci was set by an injection of freshly prepared NaHCO3 at pH 7.4. The

concentration of dissolved CO2 was considered a readout of the total Ci.

While it was hypothesized that in some cases, at the RubisCO vicinity, the consumption of CO2 can drive an imbalance in the CO HC 2 3 O- equilibrium (Sun et al., 2019), it is compensated intracellularly by carbonic anhydrases. The measurement of extracellular CO2 could also possibly be interfered by an imbalanced Ci hydration equilibrium. However, in our conditions, it was shown that *Synechocystis* uptakes more HCO3 - than CO2 (Benschop et al., 2003), yet we observed an immediate decrease in CO2 readings upon illumination as illustrated in **Figure 2A** (data not shown), indicating equilibrium of the Ci species. This is also supported by the absence of growth delay in cytoplasmic carbonic anhydrase mutant, indicating that the natural homeostasis between the two Ci species is able to support growth in laboratory conditions (So et al., 1998). In addition, the conditions we use are moderate in terms of pH, light, and cell density, thus unlikely to drive extracellular Ci out of equilibrium. Based on (Dreybrodt et al., 1996) and our calculation, at the time scale of 1 h experiment, we consider that the CO2 hydration is not limiting.

Because *Synechocystis* acclimation to low Ci concentration begins after about 2 h (Benschop et al., 2003), our method was designed so that each individual measurement did not last more than an hour.

The Ci uptake rate was calculated from the slope of the Ci concentration versus time over a 20 timepoint window, and this rate was plotted against the Ci concentration of the first point used to calculate the slope. A fitting curve was calculated using a five-parameter Hill modified equation.

$$Y = Y0 + \frac{a\left(X - X0\right)^b}{c^b + \left(X - X0\right)^b}$$

*Y* is the slope, plotted in ordinate; *X* is the Ci concentration plotted in abscissa; *a*, *b*, *c*, *Y*0, and *X*0 are variables used to fit the curve to experimental data.

As shown in **Figures 2, 3**, the MIMS method provides real-time Ci uptake rate over a range of external Ci concentrations. At higher external Ci concentrations (air bubbling), Ci import into cells is considered not limiting, and the measured Ci uptake rates are limited by carbon fixation reactions. In contrast, as external Ci concentrations drop toward zero, CCM activity should become the rate limiting step for the measured Ci uptake rates. Thus, the fitting curve can provide quantitative measurement of both CCM activity and Ci fixation (CBB cycle) activity. On the low Ci end, we set 20 μM Ci as the reliable detection limit; thus, the region between 20 and 25 μM Ci on these fitting curves was used to calculate an initial slope, designated as the *CCM coefficient*. On the high Ci end, we designate the *Ci fixation coefficient* as the maximum of the fitting sigmoid, representing Ci fixation rate at Ci sufficient conditions. The kinetic observed here is non-Michaelian, common for multicomponent reactions. Thus, the parameters described are not *V*max and *K*m. Other parameters may be calculated from the curve as our understanding of Ci uptake advances.

Designation of the two coefficients allows quantitative differentiation between Ci fixation and CCM activities. An inhibitor study was performed to support this differentiation. The partial inhibition of the CBB cycle with 5 mM of glycolaldehyde, which targets the phosphoribulokinase (Miller and Canvin, 1989), does not affect CCM (Rotatore et al., 1992). When this inhibitor was used in the MIMS method, we observed partial inhibition of Ci uptake in the upper phase (−35%) but not the lower phase of the curve as predicted (**Figure 3**). At the lower phase, Ci becomes very low, and the Ci supply to the Ci fixation reactions *via* CCM activity becomes limiting. The results obtained here indicate that the lower Ci phase of the uptake curve can serve as an indicator of CCM.

We found that the growth phase and cell density of the culture were critical factors for data reproducibility. We also observed that WT strains from different laboratories do not always behave similarly; some do not induce complete CCM when placed in air bubbling.

#### Acclimation to High Ci Conditions Involves a Concurrent Decline of the CO2 Fixation and CCM Coefficients

The acclimation of *Synechocystis* to high Ci concentrations, from bicarbonate addition or bubbling with CO2 enriched air, triggers a decrease in the level of active transporters in CCM (Burnap et al., 2015; Holland et al., 2016), and their apparent affinities for Ci (Skleryk et al., 2002; Benschop et al., 2003). We attempted to verify the previous observation of a delayed acclimation of the CCM machinery to high Ci (Benschop et al., 2003) using the MIMS method and parameters.

A WT culture adapted to low Ci was subjected to the addition of a saturating level of bicarbonate (50 mM) and sampled for MIMS measurement at 0, 8, or 48 h. Acclimation to high Ci conditions affected both the CCM coefficient and Ci fixation coefficient, but with different timing: the Ci fixation coefficient decreased by 19% in 8 h, while the CCM coefficient just started to decrease (note higher *p*). This is followed by a 25% decrease in the Ci fixation coefficient in 48 h, when the CCM coefficient decreased by 37% (**Figure 4**). These observations are in agreement with prior results (Benschop et al., 2003) and suggest that synthesis/activity of CCM and CBB enzymes are both regulated during adaptation to high Ci conditions. The regulation interestingly appears to be shifted in time. They also further support the assignation of the two phases of the curve to CCM and carbon fixation, respectively.

#### Energy Input Affects Both CCM and Ci Fixation as Exhibited by MIMS Measurement

Both the concentrating of Ci and the fixation of Ci *via* the CBB cycle requires energy input. The energy needed for those processes is provided by PETC, which harvests light energy and carries out a series of redox reactions leading to the production of energy carriers, mainly NADPH and ATP (**Figure 1**). We tested the influence of a higher light intensity (200 vs. 750 μE m−2 s−1) on the Ci uptake kinetics and observed enhanced Ci fixation (+60%) and CCM (+45%) coefficients (**Figure 5**). While Ci uptake increased with increasing light intensity as previously reported (Kranz et al., 2010), our method further indicates that energy input can be a rate limiting step for both Ci concentrating and Ci fixation activities in *Synechocystis*.

#### Over-Expression of the CBB Cycle Enzyme *fbaA* Gene Impacts CCM and Ci Fixation

Central carbon metabolites, such as 2-phosphoglycerate (2PG), RuBP, 3PG, and alpha-ketoglutarate (AKG), as well as the electron carrier NAD(P)+ and the size of the intracellular Ci pool play an important role in the transcriptional regulation of cyanobacterial Ci uptake activity (Burnap et al., 2015). Based on these observations, we postulated that substrate regeneration by the CBB cycle could be engineered to enhance

FIGURE 5 | Energy input affects both CCM and Ci fixation. (A) Ci consumption rate over media Ci concentration is shown for the low Ci acclimated WT strain. Ci uptake was either measured in 200 μE m−2 s−1 light as in all other experiments (black) or 750 μE m−2 s−1 light (green). All curves display 20 points standard deviations as vertical bars. (B) The carbon fixation coefficient and (C) the CCM coefficient calculated from the curves in (A). *p* from two-tailed comparison student *t*-test are shown, *n* = 3.

Ci uptake and fixation. We tested this hypothesis using engineered strains, including an ethanol-producer, and strains over-expressing the fructose bisphosphate aldolase gene (*fbaA OE*) alone or in combination with the ethanol-producer (Liang and Lindblad, 2016; Liang et al., 2018). MIMS was used to evaluate the effects of FbaA on carbon uptake and fixation machinery and to further characterize the effects of *fbaA OE* in carbon accumulation. FbaA is involved in two reactions within the CBB cycle. First, it catalyzes the combination of erythrose-4-phosphate (E4P) and dihydroxyacetone phosphate (DHAP) to form sedoheptulose-1,7-bisphosphate (SBP). Second, it also combines glyceraldehyde-3-phosphate (G3P) with DHAP to form fructose-1,6-bisphosphate (FBP). The f*baA OE* strain grew faster (6.0 vs. 6.5 h−1), had a higher chlorophyll content (+20%), and had a higher net O2 evolution rate (+21%) compared to the WT (**Figure 6**). Both the Ci fixation and the CCM coefficients were about 14% higher in *fbaA OE* than WT (**Figure 7**).

#### *fbaA* Over-Expression Enhances Ci Uptake and Fixation in an Ethanol-Producing Strain

Many cyanobacterial strain engineering efforts involve the production of a target metabolite that is volatile or excreted into the medium, thus expanding the cell's metabolic sinks. A number of such strains are reported to show enhanced carbon fixation although there is limited knowledge at the molecular level on how this enhancement is achieved (Nozzi and Atsumi, 2015; Xiong et al., 2015; Gao et al., 2016; Zhou et al., 2016). As a test case, we measured Ci uptake parameters in an ethanol-producing strain over-expressing pyruvate decarboxylase and alcohol dehydrogenase from *Zymomonas mobilis* (Luan et al., 2015; Liang and Lindblad, 2016, 2017; Liang et al., 2018). Over-expression of these two ethanol pathway genes and ethanol production had no effect on Ci uptake and fixation kinetics compared to WT (**Figure 8**). The additional *fbaA OE* feature in this strain nearly doubled ethanol productivity (Liang et al., 2018). We compared the growth rates of the WT and the two ethanol-producing strains. The introduction of the ethanol pathway increased the doubling time (8.6 h−1) versus WT (6.4 h−1), while the *fbaA* over-expression in the ethanol-producer restored WT like growth (7.1 h−1) (**Figure 6D**). The doubling time of the *fbaA* OE in a WT background, on the other hand, was about 10% shorter than the WT (**Figure 6C**). While expressing ethanol production genes alone had no effect on Ci uptake and fixation kinetics, their co-expression with *fbaA OE* enhanced both CCM and Ci fixation parameters (10 and 7%, respectively; **Figure 8**), comparable to the enhancement observed in WT background (**Figure 6**).

content relative to the OD730. (C) Doubling time in the growth conditions including 100 μE m−2 s−1 light, 30°C, and air bubbling. (D) Doubling time of ethanolproducing strains with or without simultaneous over-expression of *fbaA*. *p* from one-sided comparison student *t*-test are shown, *n* = 3.

### DISCUSSION

#### The MIMS Method Can Quantify Both Ci Concentrating and Fixation Activities

Photosynthesis is nature's primary Ci utilization pathway and is the basis for developing phototrophs biotechnology for enhanced Ci utilization and the production of fuels and chemicals. Toward this end, a standard and simple method is needed to measure the Ci concentrating and fixating activities *in vivo*. Accurate measurement of Ci uptake rate is also essential for metabolic flux analysis. In this study, a MIMS method that is capable to measure Ci uptake in real time in declining external Ci concentrations is used to quantify CCM and Ci fixation kinetics separately. The results from three different conditions – inhibitor of the CBB, acclimation to high Ci, and increase of the light intensity during the measurement – were in agreement with literature and further support the dual phase readings we report here (Rotatore et al., 1992; Benschop et al., 2003; Kranz et al., 2010; Burnap et al., 2015; Holland et al., 2016).

The MIMS data yielded Ci uptake rates that were then fitted to a plot against the Ci concentration using the Hill equation. The fit allowed calculation of the Ci fixation coefficient reflecting the organism's maximum Ci uptake rate in air bubbling conditions, as well as the CCM coefficient reflecting the organism's carbon concentrating capability. The distinction between these two parameters was confirmed by experiments using a sub-inhibitory concentration of the CBB cycle inhibitor glycolaldehyde. Future studies may assign specific CCM mechanisms to various regions along the curve based on Ci affinity, enabling more detailed studies of the individual mechanisms. The MIMS method can also potentially accommodate variations in light intensity and other physiological parameters during measurement.

#### Genetic Modification of the CBB Cycle Can Stimulate Both Ci Fixation and Uptake Reactions

Genetic modification of cyanobacterial central carbon metabolism can positively impact culture growth and carbon fixation. However, there has been limited understanding of how the enhancement of carbon utilization is achieved. Here, we tested Ci uptake parameters in a strain over-expressing the *fbaA* gene and in ethanol-producing strains with and without concomitant over-expression of the *fbaA* gene. While it is known that metabolites from the central carbon metabolism directly regulate the expression of CCM genes (Burnap et al., 2015; Orf et al., 2016), it has not been shown *in vivo* whether modifying the metabolic flux in CBB cycle impacts CCM activity. Our observation that *fbaA OE* increased both the Ci fixation coefficient and the CCM coefficient (**Figure 7**) provides *in vivo* evidence of a regulatory link between CBB

metabolites and CCM enhancement. The *fbaA OE* strain is expected to have a modified metabolic flux toward the regeneration of RuBP and the depletion of 3PG. Our data indicate that the enhancement of RuBP regeneration in the CBB cycle likely improves the overall Ci concentrating and fixation processes. The increased fluxes of Ci uptake and CBB cycle demand more energy, which is provided by an increased photosynthetic machinery activity and light harvesting. This was evidenced by about 20% increase in O2 evolution rate and chlorophyll content (**Figures 6A,B**). The stimulation of photosynthesis, including CCM, by CBB cycle modification represents another example of metabolic plasticity in cyanobacteria (Xiong et al., 2017).

The mechanism by which *fbaA OE* regulates CBB activity has been studied in transgenic plants. It was observed that FbaA activity affected photosynthetic efficiency, and its activity can be linked to the CBB metabolic balance and carbon partitioning in potatoes (Haake et al., 1998). Changes in CBB metabolite flux or concentrations are thought to trigger CBB enzymes regulation, likely through gene expression (Henkes et al., 2001; Cai et al., 2016; Simkin et al., 2017). Although contradictory results were found in other plants (Uematsu et al., 2012), all studies agree that the enhancement of photosynthetic activity is likely limited by 3PG depletion and RuBP regeneration. The enhancement of the overall carbon fixation by *fbaA OE* is even stronger at higher Ci conditions (Uematsu et al., 2012), where the rate of Ci fixation by RubisCO depends on the ability of the CBB cycle to regenerate RuBP (Raines, 2003; Ma et al., 2005; Farazdaghi, 2011), assuming that RubisCO quantity itself is not limiting under high Ci conditions (Kanno et al., 2017). Whether the increase in the Ci fixation coefficient in *fbaA OE* cyanobacterium is due to CBB metabolite levels, changes and/or enzyme rearrangement remain to be studied.

In contrast, the mechanism by which *fbaA OE* regulates CCM may be suggested from prior studies of cyanobacterial CCM gene expression in response to metabolite signaling (Burnap et al., 2015). Metabolites, including 2PG, RuBP, 3PG, and AKG, as well as cofactor NAD(P) directly affect major LysR-type transcriptional regulators such as the NAD(P)H dehydrogenase regulator (NdhR), the *cmp* operon regulator (CmpR), and the *rubisCO* operon regulator (RbcR) (Burnap et al., 2015). The internal concentration of Ci itself and the Ci/O2 ratio may also be a major signal for gene regulation (Woodger et al., 2005). Further studies of our strains, including quantification of CBB metabolites and transcriptome/proteome analyses, will be needed to clarify the regulatory mechanisms involved in photosynthetic enhancement by *fbaA* OE. In general, the positive effects on Ci utilization in some engineered strains could be attributed to metabolites/transcription regulation pathways as described above and may also depend on biochemical constraints, abundance of transporters, affinity, and the half-lives of the proteins involved in the process. The MIMS method can determine *in vivo* how engineering affects RubisCO Ci fixation kinetics and/or carbon concentrating kinetics and thus help delineate these regulatory mechanisms.

Our observations in *fbaA* OE cyanobacterium extend to engineered strains producing ethanol. The production of ethanol alone negatively impacted the growth rate, suggesting that the diversion of pyruvate toward the ethanol pathway depleted carbon flux toward anabolic metabolism. This negative effect was compensated by the simultaneous *fbaA* OE in the combined strain, in which ethanol production doubled and the growth rate was restored close to WT levels, while both CCM and Ci fixation coefficients increased (**Figure 8**). It appears that the loss of carbon to ethanol production is compensated in the combined strain by an increase in Ci uptake and metabolic fluxes within the CBB cycle, therefore providing enough 3PG to feed both ethanol production and cell growth. These observations suggest that both growth and ethanol production are limited by Ci uptake and fixation in the ethanol-producing strain, while *fbaA* OE stimulated Ci uptake and fixation, thus improving both ethanol production and growth rate.

As discussed above, *fbaA* OE may lead to a modified balance between metabolites that may enhance the regeneration phase of the CBB cycle and therefore speeds up the RubisCO reactions. A similar hypothesis was made about the over-expression of another CBB enzyme gene, *fsbp*, in the green alga *Chlorella* (Yang et al., 2017). Indeed, it is likely that the increased accumulation of other enzymes favoring the recycling of RuBP would also enhance Ci fixation. The benefit of increased growth potential is then balanced by the cost of producing additional proteins. It could be postulated that the metabolic changes within the CBB cycle would be different by over-expressing one gene versus another. If a step in the regulation of the CCM is modified by intermediate products of the CBB cycle, it is expected that the phenotype would be different among different overexpressers. Preliminary data show that a *Synechocystis* strain over-expressing the fructose/sedoheptulose

#### REFERENCES


bisphosphatase gene displays an increase in Ci fixation and a decrease in Ci uptake (data not shown). Thus, over-expression of genes for different central metabolism enzymes could provide more complete information on the regulation of Ci uptake, fixation, and carbon partitioning. Investigating how such changes in the metabolite balance of engineered strains improves or impairs Ci uptake machinery and Ci fixation could guide future strain development for enhanced carbon utilization toward production of fuels and chemicals.

The dynamics of Ci consumption from genetically modified strains and cultures adapted to different growth conditions can be precisely measured using a simple reactor system equipped with MIMS. Such measurements can provide additional information or constraints for model-based 13C-metabolic flux analysis and thus contribute to systematic studies of photosynthetic phenotype and fluxome.

#### AUTHOR CONTRIBUTIONS

DD and JY conceived the study and drafted the manuscript. DD conducted the experiments. FL and PL provided engineered strains. FL, MC, BW, WX, P-CM, PL, and JY assisted with experimental design, data interpretation, and troubleshooting. All authors edited and approved the manuscript. We thank Ms. Sunnyjoy Dupuis for language editing.

#### FUNDING

This work was authored in part by Alliance for Sustainable Energy, LLC, the manager and operator of the National Renewable Energy Laboratory for the U.S. Department of Energy (DOE) under Contract No. DE-AC36-08GO28308. Funding provided by DOE Office of Energy Efficiency and Renewable Energy BioEnergy Technologies Office (WX, MC, BW, P-CM, JY). DD is supported by a Swiss National Science Foundation Postdoc Mobility Fellowship number: P2GEP3-168265. FL and PL acknowledge funding support from the NordForsk NCoE program "NordAqua" (project # 82845) and from the Swedish Energy Agency (project CyanoFuels, # P46607-1).


the internal inorganic carbon pool and involves oxygen. *Plant Physiol.* 139, 1959–1969. doi: 10.1104/pp.105.069146


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*The views expressed in the article do not necessarily represent the views of the DOE or the U.S. Government. The U.S. Government retains and the publisher, by accepting the article for publication, acknowledges that the U.S. Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this work, or allow others to do so, for U.S. Government purposes.*

# The Design of FluxML: A Universal Modeling Language for <sup>13</sup>C Metabolic Flux Analysis

Martin Beyß1†, Salah Azzouzi 1†, Michael Weitzel 1†, Wolfgang Wiechert 1,2‡ and Katharina Nöh<sup>1</sup> \* ‡

1 Institute of Bio- and Geosciences, IBG-1: Biotechnology, Forschungszentrum Jülich GmbH, Jülich, Germany, <sup>2</sup> Computational Systems Biotechnology (AVT.CSB), RWTH Aachen University, Aachen, Germany

#### Edited by:

Lars Keld Nielsen, University of Queensland, Australia

#### Reviewed by:

Sonia Cortassa, National Institutes of Health (NIH), United States Hiroshi Shimizu, Osaka University, Japan Maciek R. Antoniewicz, University of Delaware, United States Fumio Matsuda, Osaka University, Japan

> \*Correspondence: Katharina Nöh k.noeh@fz-juelich.de

†These authors have contributed equally to this work

‡These authors have contributed equally to this work

#### Specialty section:

This article was submitted to Microbial Physiology and Metabolism, a section of the journal Frontiers in Microbiology

> Received: 16 October 2018 Accepted: 24 April 2019 Published: 24 May 2019

#### Citation:

Beyß M, Azzouzi S, Weitzel M, Wiechert W and Nöh K (2019) The Design of FluxML: A Universal Modeling Language for <sup>13</sup>C Metabolic Flux Analysis. Front. Microbiol. 10:1022. doi: 10.3389/fmicb.2019.01022 <sup>13</sup>C metabolic flux analysis (MFA) is the method of choice when a detailed inference of intracellular metabolic fluxes in living organisms under metabolic quasi-steady state conditions is desired. Being continuously developed since two decades, the technology made major contributions to the quantitative characterization of organisms in all fields of biotechnology and health-related research. <sup>13</sup>C MFA, however, stands out from other "-omics sciences," in that it requires not only experimental-analytical data, but also mathematical models and a computational toolset to infer the quantities of interest, i.e., the metabolic fluxes. At present, these models cannot be conveniently exchanged between different labs. Here, we present the implementation-independent model description language FluxML for specifying <sup>13</sup>C MFA models. The core of FluxML captures the metabolic reaction network together with atom mappings, constraints on the model parameters, and the wealth of data configurations. In particular, we describe the governing design processes that shaped the FluxML language. We demonstrate the utility of FluxML to represent many contemporary experimental-analytical requirements in the field of <sup>13</sup>C MFA. The major aim of FluxML is to offer a sound, open, and future-proof language to unambiguously express and conserve all the necessary information for model re-use, exchange, and comparison. Along with FluxML, several powerful computational tools are supplied for easy handling, but also to maintain a maximum of flexibility. Altogether, the FluxML collection is an "all-around carefree package" for <sup>13</sup>C MFA modelers. We believe that FluxML improves scientific productivity as well as transparency and therewith contributes to the efficiency and reproducibility of computational modeling efforts in the field of <sup>13</sup>C MFA.

Keywords: <sup>13</sup>C metabolic flux analysis, FluxML, machine-readable format, model specification language, computational modeling, reproducible science, data models, model exchange

#### INTRODUCTION

Systems Biology combines high-throughput experimentation with quantitative analysis and computational modeling to approach an understanding on how cellular phenotypes emerge from molecular interactions (Wolkenhauer, 2001; Westerhoff and Hofmeyr, 2005). To this end, a comprehensive set of "omics" techniques has been developed ranging from transcriptomics, proteomics, metabolomics to fluxomics, the quantification of metabolic reaction rates (fluxes) in vivo (Nielsen, 2003). In the field of fluxomics, metabolic flux analysis (MFA) with stable isotope tracers, typically a <sup>13</sup>C labeled carbon source, is being considered as the "gold standard" for flux quantification under metabolic quasi-steady state conditions (Wiechert, 2001; Sauer, 2006). Being systematically developed in the mid-1990s (Marx et al., 1996; Christensen and Nielsen, 1999), <sup>13</sup>C MFA has been applied to a wide variety of organisms (microbes, plants, mammalian cell lines), cultivated under different conditions (chemostat, batch, fed-batch), in single, coculture and host-pathogen systems (Beste et al., 2013; Ghosh et al., 2014; Gebreselassie and Antoniewicz, 2015), and probed with diverse labeling strategies (e.g., <sup>13</sup>/14C, <sup>2</sup>H, <sup>15</sup>N)<sup>1</sup> within isotopic transient or steady-state regimes (Zamboni et al., 2009; Niedenführ et al., 2015; Allen, 2016; Schwechheimer et al., 2018). For introductory texts on <sup>13</sup>C MFA, the reader is referred to the literature (Zamboni et al., 2009; Wiechert et al., 2015; Dai and Locasale, 2017).

Direct procedures to measure fluxes exist solely for extracellular rates, i.e., uptake and secretion fluxes. The determination of intracellular fluxes in vivo requires two additional ingredients: First, the measurement of the labeling incorporation into the intracellular metabolites. To this end, various analytical techniques such as homo- or heteronuclear, scalar- or multi-dimensional nuclear magnetic resonance (NMR) as well as single or tandem mass spectrometry (MS) are nowadays applied (Wittmann and Heinzle, 2001; Luo et al., 2007; Lane et al., 2008; Yuan et al., 2010; Giraudeau et al., 2011; Blank et al., 2012; Chu et al., 2015; Kappelmann et al., 2017). Second, and in contrast to other omics technologies, a powerful computational machinery is mandatory for data evaluation and flux inference. This means that the measured information, i.e., the isotopic data of intracellular metabolites together with the extracellular rates, does not directly uncover the desired flux information. The relation between isotopic enrichments and the fluxes is captured in a mathematical model which predicts the emerging fractional labeling patterns from given flux values. Clearly, this model has to be operated in the inverse direction to infer the, in reality, unknown fluxes from the observed data. These fluxes are then determined in an iterative fitting procedure in which the log-likelihood function, expressing the discrepancies between the model-predicted and measured quantities, is minimized. Finally, statistical measures estimate the confidence with which the fluxes are inferred from the data in view of their precision (Wiechert et al., 1997; Theorell et al., 2017).

As a consequence of this procedure, the results of any <sup>13</sup>C MFA intimately depend on the metabolic network model used. Metabolic networks for <sup>13</sup>C MFA heavily vary in size, from focused representations consisting of only a few tens of reaction steps (Zamboni et al., 2009) to comprehensive descriptions with hundreds of reactions (Gopalakrishnan and Maranas, 2015; McCloskey et al., 2016b). Since the flux estimation procedure with such networks is computationally demanding, a number of algorithms have been proposed over the last two decades to speed up the core computation steps (Wiechert et al., 1999; Zamboni et al., 2005; Antoniewicz et al., 2007; Weitzel et al., 2007; Tepper and Shlomi, 2015). Unsurprisingly, these developments have led to the emergence of a variety of software tools that are almost as diverse as the experimental scenarios of <sup>13</sup>C MFA (see **Supplementary S1 Table 1.1**).

More on the <sup>13</sup>C MFA methodology and the assortment of flux analysis methods being applied is found elsewhere in the literature (Zamboni et al., 2009; Niedenführ et al., 2015; Wiechert et al., 2015). For the following considerations, it is sufficient to recognize that <sup>13</sup>C MFA in practice means a combinatorial variety of possible experimental, analytical, and computational configurations as well as model incarnations. The pros and cons of these different frameworks should not be scrutinized here. However, one aspect has to be emphasized: Despite of the heterogeneity of use cases, there is little debate about the principal conditions under which a <sup>13</sup>C MFA experiment must be conducted (i.e., metabolic pseudo-stationarity, homogeneous cell populations), and the input required for setting up the computational model (e.g., the structural description of the biochemical network underlying the model, specification of tracers, and measurements). Consequently, the precise configuration for an individual case comprises lots of specific details about the experimental-analytical setup and is, as evidenced in Section FluxML IN A NUTSHELL, rather complex.

## Why Is There a Need for a Standardized Model Exchange Format in <sup>13</sup>C MFA?

Bundling all the aspects specific to an individual <sup>13</sup>C MFA study in a standardized document is undoubtedly of tremendous value for the community. This has already been proven by the success of the Systems Biology Markup Language (SBML, Hucka et al., 2003), which is today used as lingua franca to handle modelexchange between hundreds of different computational systems biology tools, as well as various other established modeling languages such as CellML (Lloyd et al., 2004) and NeuroML (Gleeson et al., 2010). Transferred to the <sup>13</sup>C MFA domain, this means to have a flux document formulated in a universal, i.e., network, algorithm-, tool-, and measurement-independent modeling language that is governed by controlled vocabularies and covers all current application cases.

A universal <sup>13</sup>C MFA modeling language allows sharing and publishing models in a complete, unambiguous, and reusable way. At present, this is only wishful thinking as existing guidelines (Crown and Antoniewicz, 2013) are, as we argue, not sufficiently strict. As a result, published papers do almost never supply all the information required to enable full reproduction of the model(s) used in the study. Partly, this incompleteness is due to the configuration processes that are too complex for

<sup>1</sup>The use of <sup>13</sup>C labeled tracers is the mainstream scenario for <sup>13</sup>C MFA. The utilization of other stable isotope labeling strategies such as <sup>15</sup>N or simultaneous hetero-isotopic tracer combinations such as <sup>13</sup>C-15N is conceptually equivalent. Since it is the established notion in the field, the term "13C MFA" is used throughout, although we extend its meaning to all of these alternative labeling strategies.

**Abbreviations:** FluxML, Flux Markup Language; FTBL, Flux TaBuLar format; ILE, isotope labeling experiment; INST, isotopically non-stationary; MFA, metabolic flux analysis; MID, mass isotopomer distribution; MS, mass spectrometry; NMR, nuclear magnetic resonance; OED, optimal experimental design; (S)LOC, (single) lines of code.

full reproduction in a paper. But also, implicit assumptions made in the modeling process—either by the modeler or hidden in the encoding of the software tool—remain undocumented, maybe unintentionally. In this sense, a standardized <sup>13</sup>C MFA modeling language provides a rule set to scientists for reporting re-usable models.

In a wider context, model exchange formats are an essential component for the reproduction of simulation results within the complex computational pipelines (Ebert et al., 2012; Dalman et al., 2016). As a practical benefit, a <sup>13</sup>C MFA modeling language empowers modelers to concentrate on the specification of the underlying network model, independent of the specific implementation in a software tool (cf. **Figure 1**). Such an "Esperanto" format is, thus, the central component for serving the FAIR Data Principles (Wilkinson et al., 2016). To put it straight, a standardized model exchange format fills the void and resolves many, if not all, of the current deficiencies. In addition, it paves the way for enhancing the models' shelf lifes and increases the efficiency of modeling efforts.

In this work we discuss the question: How should a universal model specification look like that digitally codifies all data required to carry out a <sup>13</sup>C MFA? By expanding on our former work (Wiechert et al., 2001), we motivate the benefits of a modern computer readable markup language for <sup>13</sup>C MFA, called Flux Markup Language (FluxML), and describe the governing principles of its design. To this end, we work out the required content that constitutes a model, formally known as syntax standard. Here, special focus is given to the modeldata integration and extensibility aspects to keep pace with ongoing experimental-analytical developments. Clearly, to be adopted, such a general model representation effort must be accompanied by a set of supporting tools facilitating validation and modification tasks. We supply several computational tools with the modeling language, making the FluxML collection an "all-around carefree package" for modelers. The collection is illustrated with typical <sup>13</sup>C MFA examples at hand: First, we demonstrate that the FluxML model format unlocks the comparability of state-of-the-art simulators, an aspect that is dearly missing, even 20 years after advent of the first simulators. Secondly, we illustrate how easy the configuration task of parallel labeling experiments is with FluxML.

### A CONTENT STANDARD FOR THE EXCHANGE OF MODELS AND DATA IN <sup>13</sup>C MFA

A model exchange document has to encapsulate all (necessary and optional) components, their interconnections, and the parameter information from which the computational model is built. In addition, since <sup>13</sup>C MFA is an experimental method, experimental data descriptors have to be defined from which the fluxes are to be inferred. To illustrate this in more detail, the specification of an isotope labeling experiment (ILE) and the corresponding measurement data includes the following elements (Wiechert, 2001; Wiechert et al., 2001):

	- a. Extracellular rates (or external fluxes), as derived from concentration profiles of exometabolites or bioprocess models.
	- b. Fractional labeling enrichments obtained by analytical instruments (e.g., positional labeling, mass isotopomer fragments, multiplets, etc.). For isotopically steady state conditions one set of labeling data is specified, while under isotopically non-stationary conditions (INST <sup>13</sup>C MFA) a time series of such sets is to be integrated representing the transient incorporation of label.
	- c. Intracellular pool sizes (i.e., concentrations) are key determinants of the labeling incorporation velocity and should, thus, be specified in INST <sup>13</sup>C MFA if experimentally accessible (Wiechert and Nöh, 2005; Nöh et al., 2006).

Importantly, each measurement must be accompanied with an associated standard deviation quantifying its precision.

6. A set of variables that parametrizes the underlying computational model and enables its execution, e.g., the set of free fluxes (Wiechert and de Graaf, 1996).

This list can be regarded as minimal content standard for <sup>13</sup>C MFA models in the notion of Minimum Information Requested In the Annotation of biochemical Models (MIRIAM) (Le Novère et al., 2005). However, to define a language, a syntax standard (or format) is needed that provides structures for formatting the information laid down in the content standard. In addition, terminology and rules to specify valid models have to be declared, therewith enabling the semantic interpretation of the model descriptions.

The requirement to deal with the broad diversity of <sup>13</sup>C MFA options renders the design of a modeling language a challenging endeavor. Before discussing the design decisions for FluxML in detail, former developments in the computational field should be briefly reviewed.

### A Short History of <sup>13</sup>C MFA Modeling

Software systems developed in the past have used different approaches to supply the information needed to execute <sup>13</sup>C MFA. Several of the first generation flux analysis tools developed in the 90ies did not rely on dedicated specification formats but rather formulate the network and associated measurements by a set of matrices: atom mapping matrices to describe atom transitions (Zupke and Stephanopoulos, 1994) or isotopomer mapping matrices that unfold the system of isotopomer balance equations (Schmidt et al., 1997). One obvious problem with this matrix-centered approach is that it is prone to introduce specification errors which are hardly detectable afterwards.

To overcome this weakness, many second generation tools such as FiatFlux (Zamboni et al., 2005), tcaSIM/tcaCALC (Sherry et al., 2004), Metran (Young et al., 2008), INCA (Young, 2014), or WuFlux (He et al., 2016) have been equipped with graphical user interfaces (GUI) for a convenient model formulation (cf. **Supplementary S1 Table 1.1**). Such solutions are designed with having the end-user, typically an experimentalist, in mind who does not want to care about too many technical details. While the user-friendliness of these GUI-based tools is unraveled, they come at the price of a substantially restricted modeling flexibility: the abilities to change the reaction network or to formulate different measurement configurations are rather limited.

The first software framework for <sup>13</sup>C MFA that was able to deal with any isotopically stationary experimental setup in a freely configurable manner was 13CFLUX (Wiechert et al., 2001). Owing to the popularity of spreadsheets among experimentalists, 13CFLUX relies on tabulator-delimited text files for model and data specification, the FTBL (Flux TaBuLar) format. FTBLs' concept to divide the required information into several contextual sections has been adopted by many software packages such as OPENFLUX(2) (Quek et al., 2009; Shupletsov et al., 2014), FIA (Srour et al., 2011), and influx\_s(i) (Sokol et al., 2012).

Despite the widespread use of FTBL, recent trends for automated lab experimentation and computational analysis pipelines (Dalman et al., 2010, 2016; Heux et al., 2017) call for contemporary model specification formats that are computationally easier to access and better verifiable than spreadsheets. Consequently, with our second generation <sup>13</sup>C MFA software 13CFLUX2 (Weitzel et al., 2013) an update to FTBL was proposed: the Flux Markup Language FluxML. FluxML exploits the powerful eXtensible Markup Language (XML) framework which has been designed to ease the computational processing of structured text documents. However, at the time of its publication, FluxML supported exclusively the formulation of isotopic stationary <sup>13</sup>C MFA models.

## DECISIONS ON THE DESIGN OF FLUXML Universal <sup>13</sup>C MFA Model Exchange Formats—Why an Update Is Needed

<sup>13</sup>C MFA has been developed rapidly in the last decade. These developments have been impelled, in particular, by advances in analytical measurement technologies where MS and NMR based approaches have been extended in scope and optimized in speed, resolution, precision, and accuracy (Moseley et al., 2011; Choi et al., 2012; Giraudeau et al., 2012; McCloskey et al., 2016a; Nilsson and Jain, 2016; Borkum et al., 2017; Kappelmann et al., 2017; Mairinger and Hann, 2017; Su et al., 2017). In turn, these developments triggered the setup of more comprehensive network models (Gopalakrishnan and Maranas, 2015; McCloskey et al., 2016b; Nilsson and Jain, 2016). Also INST <sup>13</sup>C MFA application scenarios have become more commonplace (Niedenführ et al., 2015; Cheah and Young, 2018; Delp et al., 2018; Gopalakrishnan et al., 2018). In view of these developments, existing formats have several limitations making a revision necessary.

Two decades of experiences with planning, modeling and analyzing ILEs and the continuous exchange with the 13CFLUX(2) user community have led to the specification of the updated FluxML format, which we present in this work. FluxML now covers isotopically stationary and non-stationary ILEs and is fully universal in terms of network, atom transition, measurement (error), and constraint formulation, including the use of multiple isotopes as tracers. It should be noted that the involved design processes, which we discuss in the following, were driven by the pragmatism to support modelers. Nevertheless, the FluxML format aims at a canonical model representation and follows the recommendations provided by the COMBINE (COmputational Modeling in BIology NEtwork, http://co.mbine.org/) initiative.

### Design Decision 1—Scope: Data Pre-processing Is Not Part of FluxML

Measurement instruments generate raw data that first must be processed to be utilizable for <sup>13</sup>C MFA. For example, fractional labeling patterns must be extracted from NMR or MS spectra. This includes the identification of target fragments followed by the determination of their abundance by peak integration. For INST <sup>13</sup>C MFA, in addition, absolute intracellular pool sizes are to be determined. Here, special care has to be taken to correct for known biases in the sampling procedure (e.g., quenching, cell separation, and metabolite extraction). For example, the loss of intracellular metabolites during quenching (known as leakage effect) has to be counteracted by application of advanced protocols (Noack and Wiechert, 2014). For both quantities, the labels and pool sizes, standardization and modeling the propagation of the measurement error throughout the analytical processing pipelines is becoming best practice (Tillack et al., 2012; Mairinger et al., 2018).

On the other hand, most software systems for <sup>13</sup>C MFA emulate metabolite backbones rather than the analytically observed molecules. This means, that the data derived from the raw mass spectra must be corrected for "artificial" and/or "natural" isotope labeling contributions before conforming with <sup>13</sup>C MFA (Lee et al., 1991; Fernandez et al., 1996; Wahl et al., 2004; Jungreuthmayer et al., 2016; Niedenführ et al., 2016; Su et al., 2017). Also, the specific chemical nature of the analyte mixture and the analysis technique employed might lead to distorted observations, such as proton-loss/gain, which require correction prior to model integration (Poskar et al., 2012). In addition, non-negligible inoculation residues or preliminary labeling sampling times may bias the interpretation of labeling enrichments in the classical case and need, thus, to be corrected (van Winden et al., 2001; Wiechert and Nöh, 2005). Finally, cell-specific external rates and their errors are calculated from cultivation data (concentration time courses of extracellular metabolites, off-gas analysis, biomass composition etc.) by means of simple regression (Murphy and Young, 2013), differentiation after smoothing (Llaneras and Picó, 2007), stochastic filtering (Cinquemani et al., 2017), or tailored bioprocess models (Noack et al., 2011).

That said, it becomes clear that such pre-processing procedures are extremely eclectic and heterogeneous, require a high degree of expertise, and underlie continuous change due to changing experimental setups, instrumentation, vendor formats, and analytical method developments. Recently, the metabolomics community got sensitized about their needs for reporting standards. Data formats and repositories are now under development, to report and store raw data along with its meta-information (Kale et al., 2016; Rocca-Serra et al., 2016). To avoid duplication, FluxML includes only those details about the evaluation procedures that contain the necessary key information about the measurement data that is actually used for producing the flux map (i.e., the use data, s. a. Section Experimental Data). The decision to not incorporate data pre-processing is also reasonable from a computer science perspective, since encapsulating complex designs in compact, orthogonal modules limits the overall complexity of the specification and eases future developments.

#### Design Decision 2—Technical Considerations: An XML Format for <sup>13</sup>C MFA

Generally, a modeling language must have a clearly defined syntax and succinct and precise specification of its semantics, for the computer but also for the human if necessary. Reemploying language concepts that are accepted by the target audience help to reduce learning hurdles. With SBML, a XML dialect is already available that is familiar to systems biologists. Technically, the design of FluxML was influenced by SBML as well as the following general considerations:


computational model is actually created and mapped to the internal data structures of a simulator, are error-prone. For this reason, all required data structures must be generated automatically by some kind of model compiler. For dealing with XML files, hundreds of off-the-shelf parsing, verification and transformation tools are available. This eases the writing of processing software for developers.


### Design Decision 3—FluxML a Domain-Specific Language

One of the key design objectives of FluxML was to allow for automated model interpretation (analysis and code generation) for large-scale isotope labeling networks without forcing the modeler to resort to text-based specifications of low-level model description languages. Here, it could be argued that the flexibility of general description languages like CellML, offering a low-level description of the mathematical equations, is unraveled when new experimental or analytical paradigms become available. However, the generality comes at the price of readability and clearly challenges the proofreading capabilities of the modeler.

On the other hand, isotope labeling networks share many aspects with stoichiometric metabolic network models. For this reason, FluxML and SBML have a common subset of information that contains the metabolite and reaction names as well as the network stoichiometry and flux constraints. While reaction kinetic information is currently not in the scope of <sup>13</sup>C MFA, atom transitions, tracer mixtures, as well as experimental data are not part of SBML. Thus, the set of common features is not that large. Recently, an attempt has been made to encode the surplus information required for <sup>13</sup>C MFA in the SBML notation (Birkel et al., 2017). Here, the construct notes (extending reaction and species in the notion of SBML) has been utilized to express carbon atom mappings and measurement data. However, because atom transitions and measurement specifications are vital for generating the essential mathematical system (Weitzel et al., 2007), it is clear that specifying this information in optional add-on elements, such as notes, complicates validation and consistency checking enormously. Hence, such a solution is not recommended by the SBML designers<sup>2</sup> .

Taken together, these reasons speak in favor of the domainspecific standalone XML-based language. We followed the example of SBML and adopted those parts belonging to the common language subset with only minor changes to FluxML. The common subset is then extended by the information necessary to specify ILEs. Firstly, this way the entry level for a newcomer already familiar with SBML is lowered. Secondly, extracting the common information from a FluxML file and generating a rudimentary SBML document, or vice versa, is fairly straightforward.

## FLUXML IN A NUTSHELL

FluxML development branches are organized in major Levels and minor Versions. Level 1 is dedicated to isotopically stationary <sup>13</sup>C MFA (Weitzel et al., 2013) while Level 2 covers both, the isotopically stationary and non-stationary cases. During language design special care has been taken to keep Level 2 backward compatible to Level 1, meaning that existing simulation tools designed for using the published FluxML version (Weitzel et al., 2013) do not need adaption when being used with Level 2 files. This helps third-party software developers using Level 1-models as input in keeping their versions stable. Lastly, FluxML Level 3 has been developed which extends Level 2 to the general case of multiple isotopically labeled elements. Here, for obvious reasons, backward compatibility could no longer be maintained.

The general hierarchical structure of FluxML documents that are common for all Levels is shown in **Figure 2**. **Figure 2A** overviews the main elements of the FluxML language while **Figure 2B** shows a code excerpt from the serialization of a model. The top-level element fluxml contains the elements info, for providing basic information about the model, and reactionnetwork containing metabolites and reactions which, together with the constraints element define the isotope network structure. An important key concept of FluxML, which is not present in SBML, is that of configurations. configurations entail the convenient possibility to connect the same model structure with different experimental or simulation settings. In this way, instances that, for example, differ in the selection of the tracer mixture, flux parametrization, and/or measurement configuration can be stored in different configurations sections within one model file. Another core concept of FluxML, distinguishing it from SBML, is the incorporation of experimental data. Here, the measurement

<sup>2</sup> "In particular, it is critical that data essential to a model definition . . . is not stored in annotations" [http://sbml.org/special/specifications/sbml-level-3/ version-2/core/release-1-rc1/sbml-level-3-version-2-core.pdf].

data declaration is separated from the data specification by the measurement sub-elements model and data, respectively. Finally, the simulation element contains details about the model parameterizations in terms of free model parameters as well as their values.

Instead of an exhaustive language description, only the major features of each FluxML section should be highlighted in the following, in particular those that eliminate limitations of the FTBL format and represent novel developments in the field.

#### Metabolites, Reaction Network Structure, and Atom Mappings

Clearly, biochemical reaction steps and their atom transitions constitute the core of any <sup>13</sup>C MFA model. The section reactionnetwork defines the metabolite pools, the reactions interconnecting them, and atom transitions which, altogether, give rise to the network structure of the <sup>13</sup>C MFA model. Each metabolite and reaction is labeled with a unique identifier (id) which assures its consistent usage throughout the FluxML document (cf. **Figure 2B**). Here, the atom enumerations are of particular importance not only for tracking the atoms, but also the correct association of the measured labeling fractions with the reactants.

Before going into specification details, it is appropriate to briefly summarize the most important facts about the network and atom transition compilation. Although there are plenty of ways to retrieve information from reaction databases (KEGG [http://www.genome.jp/kegg/], BioCyc [https://biocyc. org/], MetRxn Kumar et al., 2012), fluxomics collections (Zhang et al., 2014) [http://www.cecafdb.org], model repositories such as Biomodels [https://www.ebi.ac.uk/biomodels-main/] and BIGG [http://bigg.ucsd.edu/], as well as algorithmic approaches (Kumar and Maranas, 2014; Hadadi et al., 2017), there is currently no "one" curated source containing all the structural information needed for setting up a <sup>13</sup>C MFA model. In this context it is worthwhile to remember that solid biochemical knowledge beyond simple net reaction stoichiometry is needed. One prominent example is the transketolase- and transaldolase-catalyzed reaction complex in the pentose phosphate pathway (PPP) where the kinetic enzyme mechanism impacts the formulation of the associated carbon atom transitions (van Winden et al., 2001; Kleijn et al., 2005). Considering this, it is fallacious to solely rely on information available in biochemistry textbooks and reaction databases. Further study-specific factors to be considered are reaction reversibilities, transamination reactions, isoenzymes showing evidence for differences in substrate affinity and activity, and (micro-)compartmentalization due to metabolite channeling or metabolically inactive pools (van Winden et al., 2001). All these factors may influence flux inference from available labeling distributions. On the other hand, it is common to simplify reaction networks, e.g., by lumping "linear" reaction chains into one surrogate reaction when the labeling distribution (and incorporation speed in case of INST) is not affected.

The mentioned considerations imply that the <sup>13</sup>C MFA model compilation procedure is hardly automatable, at least for nonstandard cases. Currently the best way to build and verify the network model from scratch is to use various information sources and a visual tool for specification and proofreading purposes (Nöh et al., 2015). Having a list of relevant reactions and metabolites at hand, different naming conventions exist for representing the associated atom transitions. Traditionally, casesensitive characters have been used to specify carbon transitions, as exemplified by the Fructose-bisphosphate aldolase reaction in glycolysis (in biochemical enumeration and FTBL notation):

```
emp4: FBP > GAP + DHAP
      #abcdef > #cba + #def
```
Although this notation is convenient for an end-user, and still used by many software tools, it obviously does not fulfill the aforementioned requirements of a universal language. For this reason, atom transitions are specified in FluxML as follows:

```
<reaction id="emp4">
      <reduct cfg="abcdef" id="FBP"/>
      <rproduct cfg="cba" id="GAP"/>
      <rproduct cfg="def" id="DHAP"/>
</reaction>
```
This way, a reaction (reaction) can accommodate an arbitrary number of educts (reduct) and products (rproduct). These refer to unique metabolite names that are declared in the metabolitepools section of the FluxML document along with the definition of label-carrying atom types and numbers (cf. **Figure 3B**).

An implicit assumption underlying both emp4 representations is the use of the IUPAC recommendation for coding the carbon atom-character relation of the metabolites. Herein, the lettering starts with the highest oxidized group of a molecule following the main carbon chain etc. For instance, following the biochemical enumeration the first carbon atom of glyceraldehyde 3-phosphate (GAP) is the one in proximity of the phosphate group (cf. **Figure 3A**). Due to its popularity among biochemists the IUPAC "biochemical enumeration scheme" has settled as pseudo-standard.

However, having genome-sized networks and multi-element ILEs in mind, this enumeration practice becomes questionable. In this situation, a veritable alternative is the International Chemical Identifier (InChI) (Heller et al., 2015). The InChI identifier is a computer-generated unique character string for encoding molecular structures that is widely accepted in the chemical community. The InChI identifier does not only facilitate database/web-search and information exchange in the field of metabolomics, it also comes with an outstanding merit for <sup>13</sup>C MFA model exchange: InChI gives an identifier and canonical ordering to each atom of a metabolite (except for hydrogen). Thereby, employing InChI strings for metabolite declaration and atom enumeration makes network descriptions self-contained and exchangeable.

As an example, **Figure 3** shows the atom numbering as provided by the InChI software [http://www.inchi-trust.org/]. Accordingly, the carbon atom transitions for the aldolase reaction emp4 in FluxML notation is:

```
<reaction id="emp4">
      <reduct cfg="C#1@1 C#2@1 C#3@1 C#4@1 C#5@1
       C#6@1" id="FBP"/>
      <rproduct cfg="C#4@1 C#1@1 C#3@1"
       id="GAP"/>
      <rproduct cfg="C#5@1 C#2@1 C#6@1"
       id="DHAP"/>
</reaction>
```
Herein, the atoms of the educt FBP are represented by white-space separated list of entries of the form element#canonical\_atom\_index@educt\_index

which are mapped to the respective atom positions in the products. Of course, the mapping can still be expressed by letters<sup>3</sup> . However, the use of the more complicated #@ notation pays off immediately when ILEs with multiple isotopic tracers are considered. Using the InChI notation, generalization of transitions is straightforward, without losing readability, as exemplified with the glutamate dehydrogenase gdhA converting α-ketoglutarate (AKG) and ammonia (NH3) to L-glutamate (GLU):

```
<reaction bidirectional="false" id="gdhA">
      <reduct cfg="C#1@1 C#2@1 C#3@1 C#4@1
       C#5@1" id="AKG"/>
      <reduct cfg="N#1@2" id="NH3"/>
      <reduct id="NADPH"/>
      <reduct id="H"/>
      <rproduct cfg="C#1@1 C#2@1 C#3@1 C#4@1
       C#5@1 N#1@2" id="GLU"/>
      <rproduct id="H2O"/>
      <rproduct id="NADP"/>
</reaction>
```
Herein co-factors NADPH, NADP, H, and H2O (i.e., metabolites that do not carry labeled material in the scope of the model) are explicitly specified as reaction partners, a feature that helps to keep FluxML and SBML reaction network representations consistent.

### Stoichiometric Constraints

Constraints on the fluxes that impose bounds on the reaction rates on top of the stoichiometric mass balances are important components of any flux model. Typically, such constraints express principled condition-dependent biological or simulation settings. Unfortunately, these equality or inequality relations remain undocumented in <sup>13</sup>C MFA publications and are, in our experience, a frequent reason why the reproduction of published flux maps fails. Hence, it is vitally important to bundle the complete constraint set together with the model.

An aspect which is conceptually closely related to flux constraints is that of reaction directionality. Here, it often depends on the actual in vivo conditions whether a reversible reaction operates in forward and backward direction (bidirectional) or only in one of the directions (unidirectional). In <sup>13</sup>C MFA this setting must be carefully considered since it impacts flux inferences. Technically, in purely stoichiometric models bidirectional reactions are split into non-negative forward and backward parts. In <sup>13</sup>C MFA, however, it is common to use an alternative description for bidirectional reactions, i.e., that of net and exchange fluxes (Wiechert and de Graaf, 1997). Exchange fluxes are net-neutral intracellular material exchanges between reactants (not to be confused with extracellular rates). An advantage of the net/exchange flux system over the backward/forward formulation is that it leads to a "decoupling" of the underlying mathematical equation system for the two flux types, making it easier to express assumptions on both of them.

In FluxML, reaction directionalities are set with the Boolean attribute bidirectional="true" or bidirectional="false" (cf. gdhA reaction above). Since net fluxes can take positive and negative values (n.b., exchange fluxes are always non-negative), typical assumptions on net fluxes are "sign" constraints (e.g., v net ≥ 0) indicating known net flux directions owing to thermodynamic reasoning, upper limits to individual fluxes from enzyme capacity measurements (v net ≤ v net max), or specific flux ranges (v net min ≤ v net ≤ v net max). Similarly, upper boundaries for exchange fluxes may be applicable for thermodynamic (Wiechert, 2007) or numerical reasons (Theorell et al., 2017). Finally, net and exchange fluxes, respectively, can be related through equality and inequality relations to express further specific relationships

<sup>3</sup>Notice that a simple permutation transforms the atom positions in the biochemical lettering and the InChI-based canonical enumeration into one another.

FIGURE 3 | Fructose-bisphosphate aldolase reaction emp4. (A) Carbon atom transitions. Atoms are enumerated according to their appearance in the InChI string. Off-the-shelf chemistry programs provide visualization of the molecule structures and atom numbers. Specialized tools, such as the Omix visualization software, allows for visual specification of atom transitions as well as the export of the results as in FluxML, releasing the user from any peculiar enumeration issue (Nöh et al., 2015). (B) Metabolite specifications in FluxML format annotated with InChI strings. The cfg argument reports the atomic elements involved in the transition network (C6 for the six carbon atoms of F6P), while the InChI string implicitly contains the enumeration order of the atoms. Once specified in the reactionnetwork section of the FluxML tree, atoms and cfg specifications for the metabolite with the name id are binding for the whole document.

such as the rate equalities of scrambling reactions (cf. Section Symmetric (Scrambling) Reactions). The following excerpt gives a typical example:

```
<constraints>
      <net>
           <!-- BM-coeff: [mumol/gCDW], fluxes
                           [mumol/gCDW/s] -->
           <textual>
                 Glc_upt=2.38;
                 TCA7_v26_1=TCA7_v26_2;
                 0=488∗mu_v-Ala_bm;
           </textual>
      </net>
      <xch>
           <apply>
           <!-- MathML -->
                 <eq/><ci>TCA7_v26_1<ci/>
                  <ci>TCA7_v26_2</ci>
           </apply>
      </xch>
      <psize>
           <!-- Poolsizes [mumol/gCDW] -->
           <textual>
                 ALA>=0.2327;
                 ALA<=1.49286;
           </textual>
      </psize>
</constraints>
```
Here the glucose uptake rate (Glc\_upt) is assigned to a value of 2.38 [µmol/gCDW/s] and net as well as exchange fluxes of the two succinate dehydrogenase reaction variants (TCA7\_v26\_1,2) converting succinate to fumarate are equalized. The third entry encodes a biomass efflux (Ala\_bm) that is proportional to the cell growth flux (mu\_v). Importantly, mathematical relations can be expressed in human-friendly text-string representation as well as in Content-MathML [https://www.w3.org/TR/MathML3/ chapter4.html] (cf. **Supplementary S1 Section 4.1** for an example). Besides the fluxes, pool sizes may also be subject to restrictions. For alanine (ALA) a lower and upper boundary is specified, indicated by the XML entities > (>) and < (<), respectively.

#### Symmetric (Scrambling) Reactions

Scrambling reactions constitute a special class of reactions that involve symmetric molecules, i.e., molecules that are biochemically indistinguishable due to their rotational symmetry. For instance, the metabolite LL-2,6-diaminopimelate (LL-DAP), an intermediate of the lysine biosynthesis pathway, contains a rotation axis which gives two symmetric groups (cf. **Figure 4**). In the general case of n symmetric groups, n! different mapping variants exist, which all have to be specified to describe the emerging labeling patterns correctly.

Technically, any scrambling reaction can be specified as a set of reaction variants, implementing the alternative atom mappings. Here, it is typically assumed that the catalyzing enzyme treats all biochemically indistinguishable isotopomers equally, resulting in identical fluxes of each of the mapping variants. In turn, the associated fluxes are set equal by formulating appropriate equality constraints. Depending on the symmetry level this approach can lead to numerous "virtual" reactions that have to be handled appropriately, also in the post-processing of the results, e.g., the visualization of the flux map. To alleviate the specification process, specific elements (variant) and attributes (ratio) for modeling scrambling reactions have been introduced to FluxML. The following listing showcases the specification the diaminopimelate decarboxylase scrambling reaction AA13\_v49 by means of the variant notation (cf. **Figure 4** and **Supplementary S1 Section 4.2** for the traditional specification):

```
<reaction id="AA13_v49_1 AA13_v49_2">
   <annotation name="pathway">
        Lysine Biosynthesis
   </annotation>
   <annotation name="name">
        AA13_v49
   </annotation>
   <reduct id="LL_DAP">
      <variant cfg="C#1@1 C#2@1 C#3@1 C#4@1
       C#5@1 C#6@1 C#7@1 N#8@1 N#9@1"
       ratio="0.5"/>
      <variant cfg="C#1@1 C#3@1 C#2@1 C#5@1
       C#4@1 C#7@1 C#6@1 N#9@1 N#8@1"
       ratio="0.5"/>
   </reduct>
   <rproduct cfg="C#6@1" id="CO2"/>
   <rproduct cfg="C#1@1 C#2@1 C#3@1 C#4@1 C#5@1
    C#7@1 N#8@1 N#9@1" id="LYS"/>
</reaction>
```
Herein, two reaction variants AA13\_v49\_1 and AA13\_v49\_2 are specified, induced by the symmetry of the educt LL\_DAP, having fixed equal fluxes (ratio="0.5"). Furthermore, the FluxML excerpt shows how elements can be enriched with additional information, e.g., associating the reaction variants to their superordinate reaction (AA13\_v49) and the pathway names.

### Configurations

Experience shows that after an initial set-up phase, <sup>13</sup>C MFA evaluation workflows are accompanied by a series of minor model modifications. Here, the majority of differences lie in the settings of constraints, parameter sets and values, and the composition of data sets, while the model structure itself remains largely untouched. Configurations are created having these experiences in mind. In a configuration branch of the FluxML tree, input-, constraints-, measurement-, and simulation-settings are bundled, each specific to one ILE or simulation experiment. A FluxML document can then contain an arbitrary number of such configurations.

```
<configuration name="config_0">
     <comment>
        some comment about config_0
     </comment>
     <input pool="input_pool_0"
      type="isotopomer"> ... </input>
     <constraints> ... </constraints>
     <measurement> ... </measurement>
     <simulation method="auto" type="auto">
           <variables>
                 <fluxvalue flux="flux_0"
                  type="net">2.3</fluxvalue>
                 <fluxvalue flux="flux_1"
                  type="xch">70.1</fluxvalue>
                 <poolvalue pool="pool_0">
                  0.36725</poolvalue>
                 ...
           </variables>
     </simulation>
</configuration>
<configuration name="config_1"> ...
</configuration>
```
Combined with the reaction network, each single configuration constitutes a complete <sup>13</sup>C MFA model. Consequently, Beyß et al. Universal Modeling Language for <sup>13</sup>C MFA

the use of configurations releases the modeler from the necessity to duplicate files beyond necessity and, thus, makes model management more transparent and less error-prone. A typical application scenario where this is enormously useful, are so called parallel ILEs (cf. Section Parallel Labeling Experiments for a worked example). Therewith, configurations are one of the most powerful paradigms of FluxML, as compared to its predecessor FTBL and other modeling languages such as SBML. In the following, the single configuration elements are briefly overviewed.

#### Input Mixture Specification

A broad variety of labeled substrates has been used in <sup>13</sup>C MFA, individually or in mixtures, to elucidate metabolic fluxes (Crown et al., 2015; Nöh et al., 2018). Optimal experimental design (OED) heuristics give guidance on the selection of the tracer mixture to maximize the chance of the ILE to be informative about the fluxes. How to select the labeled species for a specific question under study, rather than taking a standard experimental design, is a computational question par excellence (see Section Special Settings for ILE Design). As such, the composition of the substrate pools in terms of labeled species has been subject of various design studies and the OED of ILEs has become a built-in feature of contemporary software systems.

In FluxML, the composition of a substrate labeling is specified in the input section by supplying the fractions of the input species present in the substrate pool(s), usually in form of isotopomers. Here, it must be taken into account that neither "unlabeled" nor "labeled" proportions are 100% pure in practice: the abundance of <sup>12</sup>C and <sup>13</sup>C isotopes (0.9893 and 0.0107, respectively) leads to a natural variation in the isotopomer compositions. In case of naturally labeled substrates, it is sufficient to correct for the variation in each single atom position while neglecting occurrences of combinations of two or more labeled positions (the error due to the occurrence of multiple labeled molecules is below 1.1·10−<sup>4</sup> and decreases rapidly with increasing number of labeled positions). As an example, the formulation for [12C] glucose is:

```
<input pool="GLC_ext" type="isotopomer">
      <!-- the set has to sum up to 1.0 -->
      <label cfg="000000">0.9375</label>
      <label cfg="000001">0.0107</label>
      <label cfg="000010">0.0107</label>
      <label cfg="000100">0.0107</label>
      <label cfg="001000">0.0107</label>
      <label cfg="010000">0.0107</label>
      <label cfg="100000">0.0107</label>
</input>
```
Commercially available isotopic tracers vary in their isotopic purity in a cost-dependent manner, implying that not only the natural abundance impacts the fractions of the single labeled species, but also the manufacturing and purification quality. In FluxML, the attributes purity and costs have been created to precisely express these contributions. As an example, a glucose mixture consisting of 77% [1-13C]-, 20.5% [U-13C]-, and 2.5% [ <sup>12</sup>C]-glucose is specified in the following succinct way:

```
<input pool="GLC_ext" type="isotopomer">
      <!-- InChI numbering; EUR/g -->
      <label cfg="000001" purity="0.996"
       cost="147.0">0.770</label>
      <label cfg="111111" purity="0.995"
       cost="134.0">0.205</label>
      <label cfg="000000" purity="0.989"
       cost="0.3">0.025</label>
</input>
```
The extension to the multiple-element input substrate specification is then straightforward (cf. **Supplementary S1 Section 4.3**).

For designing ILEs, different substrate sources are mixed with the aim to determine those tracer proportions that are optimally informative about the fluxes. Arbitrary mixtures of labeled substrates are modeled in FluxML by specifying one uptake flux per tracer in the metabolic network. All these uptake fluxes then amount to the total uptake rate of the corresponding substrate which is specified in the constraint section of a FluxML document, for example:

```
<!-- tracer specification -->
<input pool="GLC_ext_12C" type="isotopomer">
      <label cfg="000000" purity="0.989"
       cost="0.04">1</label>
</input>
<input pool="GLC_ext_13C1" type="isotopomer">
      <label cfg="000001" purity="0.996"
       cost="93.0">1</label>
</input>
<input pool="GLC_ext_U13C" type="isotopomer">
      <label cfg="111111" purity="0.995"
       cost="188.0">1</label>
</input>
...
<constraints>
     <net>
           <textual>
              Glc_upt=Glc_upt_12C+
              Glc_upt_13C1+Glc_upt_U13C
           </textual>
     </net>
</constraints>
```
where Glc\_upt is the total uptake rate and Glc\_upt\_12C, Glc\_upt\_13C1, Glc\_upt\_U13C are the individual uptake rates of naturally [12C]-, [1-13C]-, and fully [U-<sup>13</sup>C]-labeled glucoses, respectively. Uptake fluxes are canonically unidirectional, usually with an extracellular rate assigned (cf. Section Extracellular Rates). For the case that intra- and extracellular metabolites are exchanged, a specification example is given in **Supplementary S1 Section 4.4**.

Principally, the labeling states of the intracellular metabolites depend on the input labeling composition which is usually constantly administered. For INST ILEs such kind of restriction is no longer mandatory, therewith paving the way for the targeted exploitation of dynamic labeling profiles to design highly informative ILEs. Certainly, the most simplistic form of labeling profiles is a repetitive switch between two isotopomer species of a substrate. But also more sophisticated profiles, such as sinusoidal and pulse-width modulated waveforms, have been considered theoretically (Sokol and Portais, 2015). Another scenario, where profiles are of practical value, is when ILEs are conducted under cultivation conditions where the administered carbon source is present in excess. In

FIGURE 5 | Isotopic substrate labeling profiles. Showcased are a simple repetitive switch between fully (red solid line) and naturally labeled glucose (blue dashed line) specified via a Boolean condition for each time interval (A), a sinus with wavelength 2π (B), and an exponential enrichment-decay curve (C). Shown are fractional labeling enrichments (FLE). The species of each input pool must sum up to 1.0 to define a valid profile. Contributions attributed to impurities are not displayed in the charts.

FluxML, such labeling profile functions can be flexibly specified (cf. **Figure 5**).

#### Specific Constraints

Besides constraints that are inherently linked to the network structure irrespective of the experimental conditions (i.e., globally valid constraints, cf. Section Stoichiometry Constraints), FluxML configurations allow to specify additional specific constraints, i.e., those that may only be valid in the context of a concrete experimental setting. For instance, the flux solution space can be tightened by such specific constraints in the context of simulation experiments. Both types of constraints are syntactically equivalent.

### Experimental Data

Measurements are an integral part of <sup>13</sup>C MFA models, being the basis of flux inference. But also, <sup>13</sup>C MFA codes are tuned for specific measurement types (mostly MS, cf. **Supplementary S1 Table 1.1**). The reason is that the labeling system that is actually needed to describe the sub-set of observable labeling states can be tremendously smaller than the labeling network describing all intracellular labeling states. For the reduction of the highdimensional labeling systems powerful graph theoretic algorithms have been developed (Weitzel et al., 2007), which are implemented in the high-performance code 13CFLUX2. Consequently, the resulting reduced labeling systems intimately rely on the specific measurement configuration. Notably, the reduction crucially impacts the computational efficiency of flux fitting, rather than the final flux map. Before explaining how the specific measurement setup of an ILE is specified in FluxML, some general remarks on the present measurement equipment are appropriate.

#### Modeling Data

Measurement models provide a link between the models' state variables and parameters (fluxes, pool sizes in the case of INST) and the observables (extracellular rates, pool sizes, labeling measurements). These three data models are essentially linear, which is trivial to see for the first two types. Therefore, we concentrate on the modeling of the labeling patterns. Consider a metabolite fragment M with n atom positions. Each atom can be present in one of k labeling states ({0,1} for <sup>12</sup>C, <sup>13</sup>C, and <sup>14</sup>N, <sup>15</sup>N, {0,1,2} for <sup>16</sup>O, <sup>17</sup>O, <sup>18</sup>O etc.). For the isotopomer fractions of M then it holds:

$$0 \le m\_{k\_1, k\_2, \dots, k\_n} \le 1.0 \quad \text{with} \quad \sum\_{k\_1, k\_2, \dots, k\_n} m\_{k\_1, k\_2, \dots, k\_n} = 1.0 \quad \text{(1)}$$

With the isotopomer fractions any labeling measurement is formulated based on the following criteria, which should be obeyed by any well-calibrated measurement procedure:


To make these considerations more concrete, the case of a mass isotopomer distribution (MID) generated in MS is discussed. The MID of an analyte is the vector of fractional labeling enrichments that are derived from the contribution of the single peak areas relative to the sum of all peak areas of the respective analyte. Apart from aspects of pre-processing (cf. Section Design Decision 1—Scope: Data Pre-processing Is Not Part of FluxML), an ideal MS ion chromatogram of a metabolite fragment M with three carbon atoms, contains four distinguishable peaks (m.0, m.1, m.2, m.3) to which in total 2<sup>3</sup> = 8 isotopomers contribute. Precisely, the M000 isotopomer contributes to the m.0, M001, M010, M100 isotopomers to the m.1, M011, M101, M110 isotopomers to the m.2, and the M111 isotopomer to the m.3 peak, respectively. The relation between isotopomers (**x**M) and the MID of M (**y**M) can be represented using matrix notation:

$$
\underbrace{\begin{pmatrix} \boldsymbol{\mathcal{V}}\_{M}^{m,0} \\ \boldsymbol{\mathcal{V}}\_{M}^{m,1} \\ \boldsymbol{\mathcal{V}}\_{M}^{m,2} \\ \boldsymbol{\mathcal{V}}\_{M}^{m,3} \end{pmatrix}}\_{\mathbf{\mathcal{V}}\_{M}} = \underbrace{\begin{pmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{pmatrix}}\_{\mathbf{\mathcal{M}}\_{M}}, \underbrace{\begin{pmatrix} m\_{000} \\ m\_{010} \\ m\_{011} \\ m\_{100} \\ m\_{101} \\ m\_{110} \\ m\_{111} \end{pmatrix}}\_{\mathbf{\mathcal{x}}\_{M}} \tag{2}
$$

with **M**<sup>M</sup> the MS measurement matrix of metabolite fragment M. In principle, this measurement matrix scheme holds true for other analytical techniques (with different sparsity pattern of the measurement matrix **M**M), as long as the measurements obey the common criteria of good analytical practice described before. Because in Equation (2) appropriate re-scaling of the intensities by (unknown) group-specific scaling factors ω<sup>M</sup> may be required to match the simulated enrichments (Wiechert et al., 1999), the general measurement models read:

$$\mathbf{y}\_M = \boldsymbol{\omega}\_M \cdot \mathbf{M}\_M \cdot \mathbf{x}\_M, \qquad \mathbf{y}\_M(t\_i) = \boldsymbol{\omega}\_{M, t\_i} \cdot \mathbf{M}\_{M, t\_i} \cdot \mathbf{x}\_M(t\_i) \tag{3}$$

for the isotopically stationary and non-stationary cases, respectively. It should be remarked that isotopomer fractions are not the only systematic that can be used for expressing labeling states. Alternatives are cumomers (Möllney et al., 1999), EMUs (Antoniewicz et al., 2007), or tandemers (Tepper and Shlomi, 2015). Since all three labeling systematics can be linearly transformed into isotopomer fractions, the general measurement model formulations given in Equation (3) are equally valid for these alternate frameworks.

#### Measurement Specification in FluxML

Experimental data are located in the measurement branch of the FluxML tree. By design, we distinguish between the declaration of the measurements (<model>) and the specification of the quantitative data (<data>):

```
<measurement>
      <mlabel>some notes on the measurements
      </mlabel>
      <model>
            <fluxmeasurement> ...
            </fluxmeasurement>
            <labelingmeasurement> ...
            </labelingmeasurement>
            <poolsizemeasurement> ...
            </poolsizemeasurement>
      </model>
      <data> ... </data>
</measurement>
```
As in the case of reactions and metabolite pools, each measurement group must be accompanied with a unique identifier (id) to unambiguously crosslink the declared reactions and metabolite pools with the specified measured entities (cf. **Figure 2**).

#### **Extracellular rates (**<**fluxmeasurement**> **section, Level 1**+**)**

Flux measurements are essential to any network-wide <sup>13</sup>C MFA study. Uptake and secretion fluxes are net rates, specified one-by-one with the following notation:

```
<netflux id="fm_0">
     <textual>Glc_upt</textual>
</netflux>
```
On the other hand, FluxML also allows for the formulation of functional relations between model parameters and to equip these with measurements. This feature can be used to incorporate flux ratios, e.g., obtained using FiatFlux or SUMOFLUX (Kogadeeva and Zamboni, 2016):

```
<netratio id="frm_0">
      <!-- flux ratio between glycolysis and
       PPP -->
      <textual>
            emp2/(emp2+ppp1)
      </textual>
</netratio>
```
#### **Isotopic labeling (**<labelingmeasurement> **section, Level 1**+**)**

The remarks on measurement models above make clear that in practice only one approach works for a universal specification language: The user should be enabled to compose specific measurement configurations from predefined basic expressions (primitives) with which more complex measurement specifications can be expressed. These primitives describe (real or envisioned) measurements with concise code fragments. Consequently, in FluxML labeling spectra are composed by linear combinations of measured signals:

1. The most basic primitive specifies a single isotopomer fraction:

M#010

This means that the isotopomer M010, which carries a labeled atom only at its second atom position, contributes to the measurement matrix. As an extension of the isotopomer notation, a positional atom entry can be marked by an "x" expressing that no information is available for this position or, with other words, any labeling state is allowed. For example, M#01x

denotes the set of isotopomers {M010, M011} (if the third atom position of M codes for an element with two possible isotopic labeling states). In terms of the measurement models Equation (3) this means that all isotopomers of the set contribute a "1" to the row of the measurement matrix while all other isotopomers lead to a zero entry in **M**M. If only the symbols "1" and "x" are used, the notation coincides with the cumomer notation (Wiechert et al., 1999). Labeling patterns of fragments are identified by the associated atom numbers given in squared brackets, e.g. M[1-2]#. This way, the seven EMUs (moieties comprising any distinct subset of the compound's atoms Antoniewicz et al., 2007) of M are represented by M[1]#, M[2]#, M[3]#, M[1-2]#, M[1,3]#, M[2-3]#, and M[1-3]# (or M#)

	- a. One-dimensional <sup>1</sup>H-NMR generate positional enrichment information:

```
<group id="NMR1H_Ala_23">
      <textual>ALA#P2,3</textual>
</group>
```
Here, the positions P = 2 and 3 of the metabolite Alanine (ALA) are specified, coding for isotopomer fractions that are <sup>13</sup>C labeled at position P. Since the two sets of positional isotopomers interfere, they are combined to one measurement group, named NMR1H\_Ala\_23.

b. Beyond positional observations, two-dimensional <sup>13</sup>C-NMR can discriminate between certain labeling positions in the direct neighborhood of a <sup>13</sup>C-labeled position, giving rise to multiplets: peak singlets (S) occur, when the focused position is surrounded by unlabeled atoms. Right or left doublets (DR, DL) emerge if exactly one of the adjacent carbon atoms is labeled with <sup>13</sup>C. If two surrounding positions are occupied with <sup>13</sup>C isotopes, double doublets (DD) or triplets (T) may be obtained. In the following FluxML snippet, two measurement groups of ALA are listed, targeting the second and third carbon position, respectively:

```
<group id="NMR13C_Ala_2">
      <textual>ALA#S2,DL2,DR2,DD2</textual>
</group>
<group id="NMR13C_Ala_3">
      <textual>Ala#S3,DL3</textual>
</group>
```
c. In MS measurements all isotopomers with the same number of labeled atoms are pooled, resulting in MIDs as exemplified for the C9 metabolite phenylalanine (PHE):

 $\text{\textbulletgroup\ \"id="MS\ \"Ph=">} \text{\"\" \" \text\*\" \" \text\*\" \text\*\" \text\*\" \text\*\" \text\*\" \text\*\" \text\*\" \text\*\" \text\*\" \text\*\" \text\*\" \text\*\" \text\*\" \text\*\" \text\*\" \text\*\" \text\*\" \text\*\" \text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text\*\"\text2\text\*\"\text2\text\*\"\text2\text\*\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text\"\text2\text2$ 

This measurement group specifies 10 mass isotopomers (m.0,. . . , m.9) which share a common scaling factorω , as represented by the scale attribute (Möllney et al., 1999). The scale factor is a nuisance parameter that translates between the simulated ([0,1]) and measured ranges of the enrichment data (cf. Equation (3)). This attribute is either one, all measurement values of the PHE measurement group are taken as specified, or auto, meaning that the scale factor is to be determined within the fitting procedure. With FluxML Level 3, also MS data from multipleisotope tracer experiments can be conveniently specified (cf. **Supplementary S1 Section 4.5** for an example).

d. Beyond simple MS, tandem MS has proved to be very informative about fluxes, since it can deliver positional information. In FluxML, tandem MS measurements of PHE are specified as follows:

```
<group id="MSMS_Phe_2_9" scale="auto">
      <textual>
            PHE[1-9:2-9]#M(0,0),(1,0),
            (1,1),(2,1),(2,2),(3,2),(3,3),
            (4,3),(4,4),(5,4),(5,5),(6,5),
            (6,6),(7,6),(7,7),(8,7),(8,8),
            (9,8)
      </textual>
</group>
```
Here, the first atom range (1-9) refers to the precursor ion, whereas the second range (2-9) relates to the product ion (i.e., the first carbon atom is filtered). The tuples specify the tandem mass isotopomers defined by the precursor and product ion, respectively.

Aside from such shortcuts to specify measurement configurations, the FluxML notation is fully universal because any possible linear measurement combination can be described. This way, arbitrary setups can be expressed, for instance, a <sup>13</sup>C-NMR measurement of valine (VAL, cf. **Figure 6**). The flexibility of the FluxML syntax is further demonstrated with the formulation of the summed fractional labeling, the sum of the fractional labeling of the atoms contained in a molecule (fragment) (Christensen et al., 2002). The summed fractional labeling can be specified by either using the generalized isotopomer notation:

$$\begin{array}{l} \mathsf{\*}\mathsf{group}\ \mathsf{id="gen\\_SFL\ }\mathsf{A}\ \mathsf{a}\ 1\ \mathsf{n}\ \mathsf{s}\ \mathsf{e}\ \mathsf{e}\ \mathsf{e}\ \mathsf{s}\ \mathsf{e}\ \mathsf{n}\ \mathsf{s}\ \mathsf{e}\ \mathsf{s}\ \mathsf{t}\ \mathsf{x}\ \mathsf{t}\ \mathsf{a}\ \mathsf{s}\ \mathsf{n}\ \mathsf{e}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{e}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{s}\ \mathsf{n}\ \mathsf{$$

or, alternatively, in terms of MS measurements:

```
<group id="gen_SFL_Ala_0" scale="auto">
      <textual>
          (0∗ALA#M0+1∗ALA#M1+2∗ALA#M2+3∗ALA#M3)/3
      </textual>
</group>
```
The isotopically non-stationary case (Level 2+): In contrast to classical, isotopically stationary <sup>13</sup>C MFA where labeling data sets consist of one single labeling measurement vector, in the INST case labeling measurements are time series data. In FluxML, with Level 2 upwards, the measurement time points are introduced as attributes of the measurement groups. This way, time resolved MIDs of ALA can be formulated as follows:

```
<group id="MS_Ala_0" scale="auto",
 times="0.0,0.1,0.5,1.0,INF">
      <textual>ALA#M0,1,2,3</textual>
</group>
```
expressing that MIDs of ALA are available at five time points (0.0, 0.1, 0.5,1.0,∞). This notation enables joining isotopically stationary and non-stationary data in a single measurement group.

A consideration, which becomes especially important in the INST case, is that fluxes, pool sizes, and time need to be formulated in a coherent physical unit system to produce meaningful results<sup>4</sup> . Metabolic fluxes (and extracellular rates) are amounts of substance transported per time unit. Practically, this means that with the choice of the flux unit, the units of the pool sizes and time are implicitly determined as well. Fluxes are reported in diverse units, e.g., [mmol/gCDW/h], [mmol/LCell/s], or [nmol/10<sup>6</sup> cells/h]. Due to this variety, FluxML does not enforce a specific unit system. However, modelers are strongly advised to document the applied units in the FluxML elements fluxunit, poolsizeunit, and timeunit.

#### **Pool sizes (**<poolsizemeasurement> **section, Level 2**+**)**

Likewise important for flux inference in the INST case are intercellular pool size data. In FluxML they are specified as follows:

```
<poolsize id="psm_0">
      <textual>ALA</textual>
</poolsize>
<poolsize id="psm_1">
      <textual>RU5P+R5P</textual>
</poolsize>
```
for single and pooled measurements, respectively. Due to the metabolic steady state, pool sizes and extracellular rates remain constant throughout the ILE. Thus, specification of one measurement per entity is appropriate in both cases.

#### Specification of Experimental Data

To decouple the model structure from the corresponding measurement data is good practice in model-based data evaluation. In FluxML the network formulation and the data descriptors are located in different sub-branches of the document tree (cf. **Figure 2**). This way, model specification and data can be combined in one single document or, alternatively, in two separate files. Principally all measured quantities have to be supplied together with a (strictly positive) measurement error. This measurement uncertainty may refer to data precision or accuracy and cover solely technical or biological uncertainty. Although standard deviations are often based on experiences, such kinds of assumptions are better explicitly documented. In FluxML the description of the measurements and their errors is located within the data -branch:

```
<data>
      <dlabel>
             <date>2018-08-30</start>
             <strain>E. coli K12</strain>
             <experiment>Experiment 2018-09-01
             </experiment>
      </dlabel>
      ...
      <datum id="MS_Ala_0" stddev="0.01"
       weight="0" time="10.0">0.40669
      </datum>
      <datum id="MS_Ala_0" stddev="0.01"
       weight="1" time="10.0">0.03156
      </datum>
      <datum id="MS_Ala_0" stddev="0.01"
       weight="2" time="10.0">0.30904
      </datum>
      <datum id="MS_Ala_0" stddev="0.01"
       weight="3" time="10.0">0.03579
      </datum>
      ...
      <datum id="fm_0" stddev="0.12">2.39
      </datum>
      ...
      <datum id="psm_0" stddev="0.26">0.78
      </datum>
      ...
</data>
```
Here, the identifier MS\_Ala\_0 refers to the timeresolved MIDs of ALA, distinguished by the number of labeled ions contained (weight). Each datum element specifies exactly one measurement value along with its associated standard deviation (stddev) and sampling time point. The measurements for the uptake rate Glc\_upt (fm\_0) and the pool size of ALA (psm\_0) are specified similarly.

#### Simulation

Whether fluxes and pool sizes are specified as free parameters or being constraint to fixed values impacts flux estimation and

<sup>4</sup>Note that in classical <sup>13</sup>C MFA the fluxes can be scaled arbitrarily, as long as they share the same units.

the statistical assessment of the final flux map (Heise et al., 2015; Theorell et al., 2017). Thus, the parametrization of the model should also be part of a model. To this end, in FluxML the variables element within the simulation branch collects the models' variables (free fluxes and, in case of INST, pool sizes) and their values as well as the minimum information to connect the model description with the simulation framework of choice:

```
<simulation method="auto" type="auto">
<!-- example: method = cumomer, type = INST -->
      <variables>
            <fluxvalue flux="Glc_upt"
             type="net">2.234</fluxvalue>
            <fluxvalue flux="emp6_v6"
             type="xch">1383.3</fluxvalue>
            <poolsizevalue pool="ALA"
             edweight="0.1">0.46</poolsizevalue>
             ...
      </variables>
</simulation>
```
Being designed as a simulator-independent language, details about specific simulation scenarios and settings (solver parametrization, integration times etc.) are intentionally not part of FluxML. For this purpose, scientific workflow, and provenance data description languages have been developed such as CWL [www.commonwl.org/].

### Special Settings for ILE Design

OED of ILEs aims at customizing the experimental settings in a way that the ILE's information gain is maximized. As such, optimal ILE design has become an integral part of <sup>13</sup>C MFA workflows. Many contemporary software systems provide decision metrics for selecting "informative" tracer mixtures (Möllney et al., 1999; Weitzel et al., 2013; Millard et al., 2014; Shupletsov et al., 2014; Young, 2014). In optimal ILE design, the information gain of an ILE is tested in silico by assuming hypothetical experimental-analytical settings. In this context it is important to recognize that OED strategies require not only the measurement model of the envisioned (but physically not yet available) data sets, but also an estimation of their associated standard deviations. Literature mining indicates that errors of labeling enrichments can be heteroscedastic, rather than obeying constant absolute or relative variances (Nöh et al., 2018). For a universal modeling language this implies the need to formulate arbitrary functional dependencies between the "envisioned" measurements and their errors to overcome a lack of real data. How this is solved in FluxML, is exemplified for a tandem-MS measurement group of ALA:

```
<group id="LCMSMS_Ala_3_2">
      <errormodel>
      <!-- one model for all tandem mass
        isotopomers -->
            <textual>
                    0.01650∗meas_sim+0.0017
            </textual>
      </errormodel>
      <textual>
            ALA[1-3:1-2]#M(0,0),(1,0),(1,1),
            (2,1),(2,2),(3,2)
      </textual>
</group>
```
Within the errormodel construct, functional expressions derived from analytical expert knowledge relate the simulated measurement values (meas\_sim), to their associated errors. In an analogous manner, error models for extracellular rate and pool size measurements can be formulated.

Besides a full experimental design, scenarios can be envisioned in which some parameters is given a higher importance than others. The importance of a parameter can be specified by the edweight attribute in the variables section (cf. listing in Section Simulation, where the pool size of ALA is given minor importance compared to the two flux values). In this way, partial experimental designs can be realized (Möllney et al., 1999).

#### Housekeeping: Enriching FluxML With Annotations

Developing a model requires documentation which puts the model into the context of the analysis scenario it is built for. FluxML has various dedicated fields to deposit such kind of information. For instance, the top-level info element contains the necessary information to achieve MIRIAM-compliance:

```
<info>
     <date>2018-08-28</date>
     <modeler>info@13cflux.net</modeler>
     <strain>Escherichia coli K12</strain>
     <version>2.0</version>
     <comment>
              Version 1.0 from
              10.1016/j.ymben.2015.01.001
              Version 2.0: N transitions added
              Document license: CC BY-SA 4.0
     </comment>
</info>
```
Owing to the configuration concept, also the data-branch contains dedicated elements to carry information about the experiment, analytics and data, such as units. Furthermore, annotation elements can be added to any FluxML element, in which XML-compliant content can be stored. For instance, pathway information is helpful to structure comprehensive models or, associating a reactant with its InChI code enables metabolite identification and database matching.

#### FLUXML COLLECTION AND SUPPORTING TOOLS

Although human-readable, FluxML documents are not made for direct editing by modelers. Additional software tools are necessary to verify, read, write, and edit the information contained in a document, to display its contents in a digestible form, and to check the documents' syntax and semantics. These tools fall into three categories:


#### FluxML Schema

A language is commonly defined by a formal syntax description (the grammar). Accordingly, for each released FluxML Level, the formal syntax is defined in W3C XML Schemas [http:// www.13cflux.net/fluxml]. Each XML Schema Definition (XSD) describes the structure of a FluxML document and defines strict syntax rules for the elements and attributes contained. This grammar definition constitutes the essential basis for checking the well-formedness of model files and, therefore, any further FluxML processing procedure. The checking procedure itself is the task of the FluxML parser.

#### The FluxML Parser fmllint

The parser **fmllint** is an error-detection oriented software tool that analyzes the syntactical and semantical validity of FluxML model files according to the rules defined in the associated FluxML Schema. The parser loads a specified FluxML document, traverses through the tree structure and turns the textual representation into a set of objects, the in-memory Document Object Model (DOM) tree [www.w3.org/DOM]. To this end, **fmllint** uses the capabilities of the DOM XML parser Xerces-C [www.xerces.apache.org] to perform strict validation according to the XSD file. To facilitate precise semantic model validation, in addition to the grammar, an extensive set of semantic rules is implemented in **fmllint**. Thus, with parsing, existing document structure inconsistencies and context-sensitive issues are detected and expressive error messages and warnings are reported, mostly along with specific correction suggestions.

**Figure 7** gives an example where the metabolite pool F, which participates in the reaction w, has been forgotten to be specified. Here, during the parsing process, **fmllint** detects the missing metabolite F and reports an error. The error message provides precise information such as the error location (row and column number) which helps to quickly fix the issue. Several examples, typical for erroneous <sup>13</sup>C MFA models, are given in the **Supplementary S1 Section 3**. Some of the most important validity checks of **fmllint**, specific to <sup>13</sup>C MFA models are:

	- Missing labeling sources or effluxes
	- Dead-end and disconnected metabolites
	- Traps and isles in the metabolic and atom transition networks
	- Missing metabolite/reaction declarations or duplicates
	- Invalid and elementally imbalanced atom transitions
	- Infeasible/inconsistent input (mixture) specification or purities
	- Too few/many equality constraints leading to under- or overdetermined stoichiometry
	- Duplicate or linearly dependent equality constraints
	- Infeasible inequality constraints
	- Too few/many free parameters
	- Infeasible parameter values violating the set of constraints
	- Missing measurement declarations or duplicates
	- Invalid measurement specification or duplicates


FIGURE 7 | Error backtracking provided by the FluxML parser fmllint. In the FluxML model network.fml, the metabolite pool F, which participates in the reaction w is not declared in the metabolitepools section. fmllint provides an expressive error log, pointing to the origin of the error.


◦ Invalid attributes or attribute combinations

Frontiers in Microbiology | www.frontiersin.org

standard ANSI C libraries only and is, thus, highly portable.

#### Converters and Utilities

#### Tools for Model Reuse

To enable the effective reuse of models was the main driver for developing the FluxML language. To support this goal, the following language translators are supplied:


#### Auxiliary Tools for Everyday Operations

From a users' perspective graphical tools for model building and configuration are preferable. To this end, the comprehensive Fluxomix modeling suite has been developed as plugin-suite for the visualization software Omix (Nöh et al., 2015). However, for integrating model configuration procedures into computational evaluation workflows, programmatic model access is much more convenient than visual modeling. To support commonly performed steps, tools that have been initially released with the 13CFLUX2 software suite, are lifted as standalone Python tools:


#### A Software Library for FluxML Tool Developers

The software library **libFluxML** is a library for reading, writing and altering FluxML documents. The library provides a rich application programming interface (API) enabling full access to the FluxML language content and a range of functions that facilitate the creation, validation, and manipulation of FluxML documents. **libFluxML** offers helper functions for processing and manipulating mathematical formulas in both, human-readable textual notation and machine-readable Content-MathML format, as well as the ability to interconvert mathematical expressions between these forms. Many higherlevel convenience features are included, such as for obtaining the number of reactions or constructing the stoichiometric matrix of the reaction network. The library is written in standard ANSI/ISO C/C++ and uses the FluxML parser **fmllint** for parsing and validity checking.

#### Availability

The FluxML collection consisting of the formal schema definitions, the **fmllint** parser, versatile tools and the core library **libFluxML** represents an all-inclusive suite to validate and manipulate FluxML documents. Schema files are located at http://www.13cflux.net/fluxml. The source codes of the FluxML parser **fmllint**, the **libFluxML** library, and the auxiliary tools are available at the github repository https://github.com/ modsim/FluxML/ with full built instructions, comprehensive documentation and usage examples. In addition, precompiled binary distributions for Linux and Mac OS X are provided. The FluxML collection is licensed under the open-source Creative Commons Attribution-ShareAlike (CC BY-SA 4.0)<sup>5</sup> and MIT<sup>6</sup> licenses. In addition, for model checking without installation, a web-based FluxML validator is available at http://www.13cflux. net/fluxml/validator/. Altogether, this collection provides a set of tools for interfacing and validating FluxML documents and, as such, provides a solid tool base for future developments of the modeling language FluxML.

### HARNESSING THE BENEFITS OF FLUXML

Finally, we give two examples for the utility and usability of the FluxML language. We first illustrate how using one single model, formulated in FluxML, can be used with different <sup>13</sup>C MFA tools to facilitate the comparison of results. Secondly, we demonstrate how parallel ILEs are efficiently modeled starting from a single ILE setup.

#### FluxML for Simulator Comparisons

From a users' perspective, the lack of abilities to compare and validate numerical results generated by different <sup>13</sup>C MFA tools is unsatisfactory. Clearly, a precise and unambiguous representation of a model provides the basis for any of these tasks. Extracting the encoding of a model formulated for one piece of software and transferring it to another format is a step prone to errors that should be subjected to converters. Here, we exemplify a simulator comparison, taking the deterministic forward simulation step with 13CFLUX2 (v2.0) and Sysmetab (v5.1, Mottelet et al., 2017) as representative test case. The comparison is done with a central metabolism model of E. coli contained in the Sysmetab distribution, precisely, a isotopically stationary and non-stationary variant mimicking ILEs with a 3:7 [U-13C]: [1-13C]-glucose mixture. The **fmlstats** tool reports that the network consists of 51 metabolites and 86 reactions. In total 9 MS measurement groups and one extracellular flux measurement are contained.

In the classic isotopically stationary case, the corresponding Sysmetab FluxML was conform with the FluxML Level 1 definition. Both simulators were invoked and simulated labeling

<sup>5</sup>https://creativecommons.org/licenses/by-sa/4.0/legalcode

<sup>6</sup>https://opensource.org/licenses/mit-license.php

patterns extracted from the tools' output. The comparison of the simulated fractional enrichments shows perfect agreement (cf. **Figure 8A**). For the isotopically non-stationary case, it turned out that the model shipped with Sysmetab lacks pool sizes and is, thus, syntactically invalid with respect to the Level 2 specification for INST <sup>13</sup>C MFA. Since Sysmetab internally allocates positive random values to the pool sizes in the simulation step, these model parameters had to be extracted from the simulation output. After updating the INST FluxML model to conform with the Level 2 Schema definition using the **fmlupdate** tool, the pool sizes were incorporated into the file utilizing the **setparameters** command. Lastly, the 13CFLUX2 simulator was called to execute the simulation step. Again, the simulated MIDs for the nine measurement groups produced by the two simulators were extracted from the output files and plotted in a parity plot, showing excellent agreement (cf. **Figure 8B**). This example shows the importance of syntactic model validation in view of reporting standards. Besides a clear and complete language definition, appropriate converters, and auxiliary tools are needed to tame the zoo of available model files, often developed specific to <sup>13</sup>C MFA software systems.

#### Parallel Labeling Experiments

Experimental design has been an essential part of the general flux analysis workflow since the beginnings of <sup>13</sup>C MFA. As such, numerous studies investigated how specific experimental configurations, predominantly the input tracers, and substrate mixtures, but also the number, type and quality of measurements influence the statistical quality of the flux estimates. To increase the information gain about fluxes, it has been suggested to use multiple experiments operated under exactly the same physiological conditions, each with a different tracer (Schwender et al., 2006; Antoniewicz, 2015). The evaluation of such, so called, parallel ILEs (pILE) requires the modeler to merge all data sets in one measurement specification. By expressing pILEs in the FluxML language, their simulation can be readily achieved by employing standard <sup>13</sup>C MFA tools. In particular, we show that the evaluation of pILEs becomes a special case of the traditional single ILE-based <sup>13</sup>C MFA.

The general principle is in fact simple: When N different experiments are performed (in practice usually up to tens), one option is that the modeler supplies the original network formulation in N multiple copies. In each of the copies all metabolites and reactions are multiplied (practically by appending their identifiers with an additional suffix relating them to the experiment to which their associated measurement sets). In addition, for each flux (and pool size, in the case of INST) an additional constraint must be specified in the FluxML document which assures that the values of the model parameters are the same for all network copies<sup>7</sup> . Clearly, to perform these operations manually is laborious and means to pay painstaking attention to details.

By using the available FluxML capabilities, automation of this operation is straightforward. To this end, the program **multiply\_fml** was implemented with only 400 single LOCs (SLOC) of Python code. **multiply\_fml** expects an FluxML file with N configurations each equipped with the input specification and a corresponding measurement set for one ILE. The duplication process is showcased with 14 different ILEs with the setting reported by Crown et al. (2015). First, the different input mixtures are specified one-at-a-time in 14 different configurations of the reference network by invoking the **setinputs** function:

<sup>7</sup>Another option is to double the metabolites and add the pool copies to the reactions, along with the formulation of pool size constraints in case of INST.

setinputs -i crown.fml -c config\_01 -C input\_01.csv -o crown.fml

etc. Secondly, the measurement data sets are sequentially incorporated by using the **setmeasurements** tool:

setmeasurements -i crown.fml -c config\_01 -C data\_01.csv -o crown.fml

Finally, the network duplication step is performed using the **multiply\_fml** program:

```
multiply_fml -i
crownl.fml -o crown_multiplied.fml
```
The resulting model file (crown\_multiplied.fml) consists of one network description and a single configuration comprising all 14 ILE data sets (all FluxML and CSV files used in this showcase are available in the **Supplementary Data Sheet 3**). With this model at hand, all <sup>13</sup>C MFA tools can be invoked. For example, optimal tracer design for a series of pILEs is possible, rephrased as the choice of the best substrates per experiment. This makes application of experimental design tools of <sup>13</sup>C MFA software straightforward.

#### CONCLUSION

<sup>13</sup>C MFA is the primary experimental technique for measuring intracellular fluxes at metabolic pseudo-steady state conditions. After two decades of active research there is consensus about the minimal information set needed to specify a computational <sup>13</sup>C model and its associated data. However, this consensus has not yet found its way into a model format that contains the complete information set of an ILE configuration in a well-structured manner. Most importantly, implicit assumptions made in the modeling process are rarely included in publications because they are considered to be common sense or of purely technical nature. This makes it essentially impossible to reproduce many published flux analysis results.

On the one hand, the complexity and depth of ILE specifications should not hinder experimentalists to deliver complete <sup>13</sup>C MFA models. In this context, it is of great advantage that tailored model templates can be configured (often only once for an organism or strain) whereas experiment specific data is fed into these templates using preconfigured scripts. For power users, on the other hand, computational model components should

#### REFERENCES


be programmatically accessible, so that they are embeddable in computational pipelines.

Following these two guiding principles, here we describe the Flux Markup Language FluxML along with its design. The major aim of FluxML is to offer a sound universal, open source, simulator-independent, and future-proof platform that conserves all the necessary and optional information for model description, reuse, exchange and comparison. Specifically, FluxML enables practitioners to describe valid isotopically stationary and non-stationary models, while the format is fully universal in term of network, atom mapping, measurement (error) and constraint formulation, including the use of homoand hetero-isotopic tracers. With the language, the FluxML collection is supplied which contains the powerful FluxML parser **fmllint** for model (in)validation and several auxiliary tools for easy handling, but also to allow for a maximum of flexibility. We believe that FluxML improves scientific productivity, efficiency as well as transparency and contributes to the reproducibility of computational modeling efforts in the field of <sup>13</sup>C MFA.

#### AUTHOR CONTRIBUTIONS

WW and KN conceived the work. MW developed FluxML Level 1. SA and MB developed Levels 2/3 and performed the computational analyses. WW and KN wrote the manuscript to which SA and MB contributed. All authors approved the content of the manuscript.

#### FUNDING

MW was funded by Deutsche Forschungsgemeinschaft (DFG) Grant WI 1705/12-1.

#### ACKNOWLEDGMENTS

The authors thank Peter Droste for insightful discussions.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmicb. 2019.01022/full#supplementary-material

diet for intracellular Mycobacterium tuberculosis. Chem. Biol. 20, 1012–1021. doi: 10.1016/j.chembiol.2013.06.012


liquid chromatography-mass spectrometry. J. Chromatogr. A 1147, 153–164. doi: 10.1016/j.chroma.2007.02.034


state conditions. Metab. Eng. 8, 554–577. doi: 10.1016/j.ymben.2006. 05.006


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Beyß, Azzouzi, Weitzel, Wiechert and Nöh. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# <sup>13</sup>C-Metabolic Flux Analysis Reveals Effect of Phenol on Central Carbon Metabolism in Escherichia coli

#### Sayaka Kitamura, Yoshihiro Toya and Hiroshi Shimizu\*

Department of Bioinformatic Engineering, Graduate School of Information Science and Technology, Osaka University, Osaka, Japan

Phenol is an important chemical product that can be used in a wide variety of

applications, and it is currently produced from fossil resources. Fermentation production of phenol from renewable biomass resources by microorganisms is highly desirable for sustainable development. However, phenol toxicity hampers phenol production in industrial microorganisms such as Escherichia coli. In the present study, it was revealed that culturing E. coli in the presence of phenol not only decreased growth rate, but also biomass yield. This suggests that phenol affects the carbon flow of the metabolism, but the mechanism is unknown. To investigate the effect of phenol on the flux distribution of central carbon metabolism, <sup>13</sup>C-metabolic flux analysis (13C-MFA) was performed on cells grown under different phenol concentrations (0, 0.1, and 0.15%). <sup>13</sup>C-MFA revealed that the TCA cycle flux reduced by 25% increased acetate production from acetyl-CoA by 30% in the presence of 0.1% phenol. This trend of flux changes was emphasized at a phenol concentration of 0.15%. Although the expression level of citrate synthase, which catalyzes the first reaction of the TCA cycle, does not change regardless of phenol concentrations, the in vitro enzyme activity assay shows that the reaction was inhibited by phenol. These results suggest that the TCA cycle flux decreased due to phenol inhibition of citrate synthase; therefore, ATP could not be sufficiently produced by respiration, and growth rate decreased. Furthermore, since carbon was lost as acetate due to overflow metabolism, the biomass yield became low in the presence of phenol.

Keywords: phenol toxicity, Escherichia coli, <sup>13</sup>C-metabolic flux analysis, enzymatic assay, citrate synthase, acetate overflow metabolism

#### INTRODUCTION

Phenol is a valuable compound used in various applications, such as in the production of polycarbonate, epoxy resin, phenol resin, and aniline (Schmidt, 2005). Although it is mainly produced by the cumene method using benzene derived from fossil resources at present (Wierckx et al., 2005), fermentation production using microorganisms from renewable biomass resources

#### Edited by:

Wei Xiong, National Renewable Energy Laboratory (DOE), United States

#### Reviewed by:

Volker F. Wendisch, Bielefeld University, Germany Chen Shouwen, Hubei University, China

> \*Correspondence: Hiroshi Shimizu shimizu@ist.osaka-u.ac.jp

#### Specialty section:

This article was submitted to Microbial Physiology and Metabolism, a section of the journal Frontiers in Microbiology

> Received: 30 October 2018 Accepted: 23 April 2019 Published: 07 May 2019

#### Citation:

Kitamura S, Toya Y and Shimizu H (2019) <sup>13</sup>C-Metabolic Flux Analysis Reveals Effect of Phenol on Central Carbon Metabolism in Escherichia coli. Front. Microbiol. 10:1010. doi: 10.3389/fmicb.2019.01010

is required to achieve sustainable development and solve environmental problems. A tyrosine phenol-lyase (TPL), which has been discovered in some microorganisms such as Bacterium coli phenologenes, Pantoea agglomerans, and Clostridium tetanomorphum, catalyzes the conversion between tyrosine and phenol (Brot et al., 1965; Kumagai et al., 1970). Microbial phenol production has been reported in Pseudomonas putida by overexpressing the heterologous TPL of P. agglomerans (Wierckx et al., 2005). More recently, engineered strains of Escherichia coli, widely used as industrial bio-production hosts, can produce phenol from glucose by introducing the TPL from P. agglomerans (Kim et al., 2014; Thompson et al., 2016). However, productivity is still low for industrial applications.

Organic solvents, including phenol, are toxic toward a wide range of microorganisms, including E. coli, and reduce their growth (Ramos et al., 2002). In the phenol-producing E. coli strain, there is a challenge of low phenol productivity due to toxicity (Kim et al., 2014). Phenol tolerance varies among microorganisms, with some species including P. putida, used for bioremediation, being able to tolerate the presence of phenol (Ramos et al., 2002). In the same gram-negative bacteria, however, E. coli is weak against phenol and stops growing under a 1.2 g/L, corresponding to 0.11%(v/v), phenol condition (Kim et al., 2014). Despite this disadvantage, E. coli is an excellent host for industrial applications because of its rapid growth rate, easy genetic manipulation, and abundant biological knowledge. Elucidation of the reasons why E. coli is weak against phenol would make possible to overcome this weakness and enhance phenol production.

Various organic solvents are toxic to E. coli, and this toxicity correlates with a logarithm of the partition coefficient in n-octanol and water (log Pow) (Aono et al., 1994). The log Pow of phenol is 1.46 (Verschueren, 1983), and is very severe among organic solvents. In E. coli, some efflux pumps such as AcrAB-TolC and AcrEF-TolC are involved in solvent tolerance (Ramos et al., 2002). Furthermore, it has been reported that the composition of the cell membrane changes in the presence of phenol (Keweloh et al., 1990), and phenol tolerance increases by alterations of the fatty acid composition of membranes (Keweloh et al., 1991). Because its respiratory activity is not reduced in the presence of phenol (Keweloh et al., 1990), the cause of growth reduction would not be respiratory chain inhibition due to membrane damage. We found in this work that culturing E. coli in the presence of phenol not only decreased growth rate, but also biomass yield. This suggests that phenol affects the carbon flow of central carbon metabolism, but the mechanism is unknown. <sup>13</sup>C-metabolic flux analysis (13C-MFA) is an effective approach to investigate the carbon flux distribution on central carbon metabolism (Wittmann, 2007; Zamboni et al., 2009).

In the present study, we cultured wild type E. coli under different phenol concentrations (0, 0.1, and 0.15%), and compared these flux distributions to identify the effect of phenol on the metabolism. To investigate the cause of the flux changes, the in vitro enzyme assay was performed and revealed that citrate synthase is strongly inhibited by phenol.

## MATERIALS AND METHODS

#### Strains and Culture Conditions

The E. coli strains used in this study are shown in **Supplementary Table S1**. In preculture, wild type, gltA+, and 1pta strains were aerobically grown at 37◦C overnight using 5 mL of M9 medium containing 4 g/L glucose in a test tube. For the gltA+ strain, 40 mg/mL histidine and 10 mg/mL Thiamine hydrochloride was supplemented to the M9 media in accordance with the previous report (Kitagawa et al., 2005).

To evaluate phenol toxicity using an automated optical density monitoring system, the preculture was inoculated into 5 mL of the same M9 medium in an L-shaped test tube at an initial optical density at 660 nm (OD660) of 0.05. Phenol was added to be 0, 0.04, 0.08, 0.1, 0.12, and 0.16% (v/v). The cultures were incubated at 37◦C at 70 rpm using a TVS062CA incubator (Advantec, Tokyo, Japan). The OD<sup>660</sup> was measured every 5 min.

For evaluating the fermentation profiles in flask scale, the preculture of wild type and 1pta strains were inoculated into 50 mL of M9 medium containing 4 g/L glucose as the sole carbon source in a 200 mL baffled flask at an initial OD<sup>600</sup> of 0.05, and incubated at 37◦C at 200 rpm using a BR-43FL incubator (TAITEC, Saitama, Japan). Phenol was added to be 0 and 0.15% (v/v). For <sup>13</sup>C-MFA, the cultures were performed using the same condition except that the glucose was replaced to [1-13C] glucose. Phenol was added to be 0, 0.1, and 0.15% (v/v). All cultures were performed in triplicate.

### Measurement of Cell Concentration and Extracellular Metabolites

The OD<sup>600</sup> was measured using an UVmini-1240 UV-VIS spectrophotometer (Shimadzu, Kyoto, Japan). The dry cell weight (DCW) was calculated using a conversion coefficient of 0.3 g/L/OD<sup>600</sup> based on a previous report (Soini et al., 2008). Concentrations of glucose, acetate, formate, ethanol, lactate, and succinate in the culture were measured using a highperformance liquid chromatography system (Shimadzu, Kyoto, Japan) with an Aminex HPX-87H column (Bio-Rad, Hercules, CA, United States). The detailed method was described in Okahashi et al. (2017). The detection limits were 5 mM for ethanol and 1 mM for organic acids including lactate, acetate, formate, and succinate.

## Measurement of Proteinogenic Amino Acids

During the mid-log phase, an appropriate amount of cells (0.0015 gDCW) was collected by centrifugation. After hydrolyzation, the proteinogenic amino acids were derivatized with tert-butyldimethylsilyl (tBDMS). The mass isotopomer distributions of amino acids were measured using a gas chromatography/mass spectrometer (GC/MS) (Agilent Technologies, United States) with a DB-5MS column (Agilent Technologies, United States). The mass isotopomer distributions were corrected for the presence of naturally occurring isotopes (van Winden et al., 2002). The detailed

protocols for sample preparation and measurement were described in Okahashi et al. (2017).

### <sup>13</sup>C-Metabolic Flux Analysis

The central carbon metabolism including glycolysis, the pentose phosphate pathway, the TCA cycle, the Entner–Doudoroff (ED) pathway, the anaplerotic pathway, and the glyoxylate shunt, was considered for <sup>13</sup>C-MFA based on previous analyses of E. coli. The elementary metabolites unit framework was used for modeling the carbon atom transitions (Antoniewicz et al., 2007). All reactions and carbon atom transitions are summarized in **Supplementary Table S2**. Flux distribution was optimized to minimize a residual sum of square (RSS) between calculated and measured amino acid mass isotopomer distributions. The fitness function was described below:

$$\begin{aligned} \text{Minimize RSS} &= \sum\_{i=1}^{n} \left( \frac{MID\_i^{measured} - MID\_i^{simulated}}{SD\_i} \right)^2 \\ &+ \sum\_{j=1}^{m} \left( \frac{r\_j^{measured} - r\_j^{simulated}}{SD\_j} \right)^2 \end{aligned}$$

where MIDmeasured i and MIDsimulated i are measured and simulated mass isotopomer distribution of ith amino acid, respectively. The standard deviation (SD) was set to 0.01 in GC/MS measurements (Wada et al., 2017). The r measured j and r simulated j are measured and simulated flux of jth reaction, respectively. The difference between the calculated and measured values of the mass isotopomer distribution was statistically evaluated by the χ 2 test. The 95% confidence intervals of each flux were evaluated by the grid search method (Antoniewicz et al., 2006). All calculations were performed using the OpenMebius software (Kajihata et al., 2014) with MATLAB 2018a (MathWorks, Natick, MA, United States).

#### Measurement of Enzyme Activities

To measure the citrate synthase activity of the E. coli cells cultured at the different phenol concentrations, the cells obtained from the 1 mL culture were suspended in 0.1 M of Tris– HCl pH 8.2, and were disrupted by sonication repeating on/off 10 times at out level at 50% using a UD-100 ultrasonic homogenizer (Tomy Seiko, Tokyo, Japan). After removing the debris by centrifugation (20,000 × g, 4◦C, 30 min), the crude extract was used for citrate synthase activity measurement. The protein concentration of the crude extract was measured using a nanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, United States). Citrate synthase activity was measured by Ellmann's method, that detects the thiol group of CoA by a colorimetric reaction using 5<sup>0</sup> ,50 -Dithiobis(2-nitrobenzoic acid) (DTNB) (Srere, 1969). DTNB was measured at 412 nm by UVmini-1240 UV-VIS spectrophotometer (Shimadzu, Kyoto, Japan). After incubating a mixture containing 10 µL of crude extract, 190 µL of MilliQ water, 12.5 µL of 10 mM acetyl-CoA, and 25 µL of 1 mM DTNB in 0.1 M Tris–HCl (pH 8.2) at 25◦C for 5 min, 12.5 µL of 10 mM oxaloacetate in 0.1 M Tris–HCl (pH 8.2) was added to start the reaction. The absorbance at 412 nm was measured every 10 s.

Phenol inhibition of the E. coli citrate synthase in vitro reaction was examined using a sample obtained from a gltA overexpressing strain, JW0710-AM, from the ASKA library (Kitagawa et al., 2005). In preculture, a single colony of this mutant was inoculated in 5 mL of Luria-Bertani (LB) medium, and was incubated overnight at 37◦C at 150 rpm. The preculture was inoculated into 50 mL of LB medium at an initial OD<sup>660</sup> of 0.05, and was incubated at 37◦C at 150 rpm using a BR-43FL incubator (TAITEC, Saitama, Japan). At OD<sup>600</sup> of approximately 0.3, 0.1 mM of isopropyl β-D-1-thiogalactopyranoside was added to induce gene expression. After 2 h, the cells were collected by centrifugation (3000 rpm, 4◦C, 10 min), and were washed by 5 mL of filter sterilized buffer containing 50 mM sodium phosphate, 200 mM NaCl, and a protease inhibitor tablet (cOmplete tablets, Roche). To keep the plasmid overexpressing the gltA gene, 25 mg/L of chloramphenicol were added to the media. The obtained cell pellet was extracted by sonication and used for in vitro enzymatic assays in the same manner described above.

Phenol inhibition of glucose 6-phosphate isomerase (PGI) in vitro reaction was also evaluated using a sample obtained from wild type E. coli. PGI activity was measured as NADPH formation through a coupled reaction with glucose 6 phosphate dehydrogenase (G6PDH) (Salas et al., 1965; Peng and Shimizu, 2003). After addition of an appropriate amount of crude cell extract to a reaction mixture containing 0.1 M Tris–HCl (pH 7.8), 10 mM MgCl2, 0.5 mM NADP+, 1 U G6PDH (Sigma-Aldrich, Merck, Darmstadt, Germany), and 2 mM D-Fructose 6 phosphate, NADPH formation was measured by absorption at 340 nm. One unit (U) is defined as the amount of enzyme required to convert 1 µmol of substrate into product per minute per milligram protein.

### RESULTS

#### Evaluation of Phenol Toxicity in E. coli

The wild type E. coli BW25113 was cultured using a synthetic M9 medium with different phenol concentrations (0, 0.04, 0.08, 0.1, 0.12, and 0.16%) in L-shaped test tubes. **Figure 1** shows the specific growth rate and the maximum OD<sup>660</sup> at each phenol concentration. The specific growth rate decreased by 40% under 0.1% phenol, and by 84% under 0.16% phenol as compared to that without addition of phenol. A previous study has also reported that an increase in the phenol concentration of the culture led to a decrease in the growth rate of E. coli, and eventually cell growth was completely inhibited (Kim et al., 2014). These phenotypes were consistent with our result. Furthermore, the value of maximum OD<sup>660</sup> also decreased by 16% under 0.1% phenol and by 92% under 0.16% phenol. The drastic decrease of biomass yield suggests that phenol affects the carbon flux of the central carbon metabolism.

### Growth Characteristics of E. coli Under Different Phenol Concentrations

In order to reveal the influence of phenol on the carbon flow of the metabolism, the cells were cultured in 50 mL of M9 medium using a baffled flask with three different phenol concentrations (0, 0.1, and 0.15%). The time courses of cell, glucose, and acetate concentrations in the culture broth are shown in **Figure 2**.

At 0% phenol, the OD<sup>600</sup> reached the maximum value of 4.7 at 9.3 h. Glucose consumption was the fastest among the three conditions, and was depleted by 8.3 h. Acetate concentration reached a maximum of 12.4 mM at 8.3 h, and was quickly consumed after glucose depletion. At 0.1% phenol, the maximum OD<sup>600</sup> was 4.4 at 24.2 h. Glucose was consumed more slowly and was depleted by 10.4 h. Acetate concentration reached a maximum of 20.1 mM (1.6 times of that at 0% phenol) at 10.4 h, and then it was gently consumed. At 0.15% phenol, the maximum OD<sup>600</sup> was significantly decreased to 2.6. Glucose consumption was the slowest and was depleted at 14 h. Acetate production was further enhanced to 34.5 mM at 24 h. No acetate consumption was observed after depletion of glucose. Furthermore, no ethanol, lactate, succinate, and formate productions were detected throughout the culture period under any conditions. The specific rates during the exponential TABLE 1 | Specific growth, glucose consumption, and acetate production rates of wild type E. coli under different phenol concentrations.


growth phase are summarized in **Table 1**. Compared to the phenol 0% condition, the specific growth rate and specific glucose consumption rate at 0.15% phenol decreased by 42 and 24%, respectively, whereas the specific acetate production rate increased by 56%.

To further investigate the effect of acetate overflow on the phenol tolerance, a 1pta strain which does not have phosphotransacetylase for acetate formation was cultured in the absence and presence of phenol (**Figure 3**). Interestingly, the growth profile of the 1pta strain was similar to that of the wild type in the absence of phenol, whereas the strain did not grow in the presence of 0.15% phenol. The result suggests that the acetate overflow is related to E. coli growth in the presence of phenol.

FIGURE 1 | Relationships between phenol concentration and Specific growth rate (A), and Maximum OD660 (B) on wild type E. coli BW25113. Closed circles and squares represent the maximum specific growth rate (h−<sup>1</sup> ) and the maximum OD<sup>660</sup> during the 24 h period.

concentrations, respectively.

### Flux Distributions Under Different Phenol Concentrations by <sup>13</sup>C-Metabolic–Flux Analysis (13C-MFA)

To identify the reaction in which the flux was changed depending on phenol concentrations, a <sup>13</sup>C-MFA was performed for each condition. The cells were cultured with [1-13C] glucose as a sole carbon source. The mass isotopomer distributions of proteinogenic amino acids at the exponential growth phase were measured by GC/MS. The mass isotopomer distributions of amino acids that have been corrected for natural isotope abundances are shown in **Supplementary Table S3**. The flux distributions were optimized to minimize the RSS between calculated and measured amino acid mass isotopomer distributions. The threshold of the χ 2 test for the goodness of fit was 106.4 (104 independent data and 18 degrees of freedom for the model). As a result of flux optimization, the RSS values of conditions with 0, 0.1, and 0.15% phenol were 92.1, 104.5, and 103.3, respectively (passed χ 2 test, p < 0.05). The 95% confidence interval of each flux, to explain the measured mass isotopomer distributions of amino acid, was also calculated. The best fitted flux distribution and the 95% confidence interval are shown in **Supplementary Table S4**.

Although no significant change was observed in the upstream pathways including glycolysis, the pentose phosphate pathway, and the ED pathway by phenol addition, a large change occurred in the downstream pathways including pyruvate metabolism and the TCA cycle. The flux distributions focused on the downstream pathways are shown in **Figure 4** The flux of the TCA cycle excluding malate dehydrogenase was significantly decreased in the presence of phenol. At the 0% phenol condition, the flux of citrate synthase reaction, the first reaction of the TCA cycle, was 3.6 mmol gDCW−<sup>1</sup> h −1 . However, the fluxes of citrate synthase at 0.1 and 0.15% phenol were decreased to 0.5 and 0.4 mmol gDCW−<sup>1</sup> h −1 , respectively. This suggests that acetyl-CoA does not flow into the TCA cycle, therefore, increasing overflow from acetyl-CoA to acetate.

Although the upper bound of the estimated flux of glyoxylate shunt seems to be decreased as the phenol concentration increases, it is difficult to conclude the flux changes because the confidence interval of this flux at the low phenol concentration. The flux of this pathway is usually low in E. coli under glucose carbon source without phenol addition (Leighty and Antoniewicz, 2012). Therefore, it is considered that this pathway did not work so much, regardless of the phenol concentration.

#### Effect of Phenol on Enzyme Activities

<sup>13</sup>C-metabolic flux analysis revealed that the TCA cycle flux decreased in the presence of phenol in E. coli. Because no leakage of the TCA cycle intermediates, such as citrate and α-ketoglutarate, was detected in the culture broths, we assumed that phenol inhibits a citrate synthase reaction, at the entrance of the TCA cycle. In order to evaluate the changes in the protein expression level of citrate synthase, the specific activities of the crude extracts obtained from the cells cultured with three different phenol concentrations (0, 0.1, and 0.15%) were measured using an in vitro enzymatic assay. The specific activities of the crude cell extracts obtained from the cultures with 0, 0.1, and 0.15% phenol were 0.095 ± 0.007, 0.087 ± 0.009, and 0.100 ± 0.009 U, respectively. No significant difference was observed. This result suggests that the expression level of citrate synthase does not change among the culture conditions. Assuming 50% protein in the DCW, the citrate synthase activity (0.095 U) under the 0% phenol was calculated as 2.9 mmol/gDCW/h. In the <sup>13</sup>C-MFA, the 95% confidence interval of this reaction was 3 to 4.5 mmol gDCW−<sup>1</sup> h −1 . Since the citrate synthase activity is almost consistent with the range of the flux, the in vitro enzyme assay results are reasonable.

Next, the effect of phenol on the reaction rate of citrate synthase was investigated using a crude cell extract obtained from a gltA overexpressing strain. The relative activities of citrate synthase to a phenol free condition were 59.8, 56.6, and 24.6% at phenol concentrations in the reaction mixture of 0.1, 0.15, and 0.3%, respectively (**Figure 5A**). This result is consistent with the <sup>13</sup>C-MFA results showing that the flux of the reaction from acetyl-CoA and oxaloacetate to citrate was reduced from 3.6 to 0.37 mmol gDCW−<sup>1</sup> h <sup>−</sup><sup>1</sup> by addition of 0.15% phenol. Since high concentration of phenol is involved in protein denaturation, the decreased citrate synthase activity by phenol addition may be due to denaturation of the enzyme. To remove the suspicion, the effect of phenol on other enzyme activity was examined. As the representative enzyme, phenol inhibition of PGI was investigate (**Figure 5B**), because little flux change in the reaction was observed in the <sup>13</sup>C metabolic flux analysis in response to phenol addition. No change in the PGI activity was observed by addition of phenol in the reaction mixture. This result supports that the cause of citrate synthase activity decrease in the presence of phenol is not denaturation of the protein.

If the inhibition of citrate synthase by phenol is the cause of growth reduction, the overexpression would restore the growth in the presence of phenol. To evaluate the effect of citrate synthase overexpression on phenol tolerance, the citrate synthase overexpressing (gltA+) strain was cultured using the M9 medium with different phenol concentrations. Because the expression of citrate synthase is regulated by an IPTG inducible T5-lac promoter (Kitagawa et al., 2005), 50 µM IPTG was added to the cultures. **Supplementary Figure S1** shows the specific growth

rate and the maximum OD<sup>660</sup> at each phenol concentration. The decrease of cell growth was also observed in this overexpressing strain in the presence of phenol. Because the overexpression of citrate synthase caused a sever growth defect without phenol condition, fine-tuning of the expression level would be needed to alleviate the phenol stress.

### DISCUSSION

In the present study, we investigated in detail the cause of phenol toxicity in E. coli from the viewpoint of metabolism. <sup>13</sup>C-MFA showed an enhanced overflow of acetyl-CoA to acetate at phenol concentrations of 0.1 and 0.15%, indicating that the TCA cycle flux was significantly decreased. Further analysis based on measuring the specific enzyme activity in vitro revealed that citrate synthase, at the entrance reaction of the TCA cycle, was inhibited by phenol and led to the flux repression of this reaction.

<sup>13</sup>C-metabolic flux analysis can reveal how the flux changed on the metabolic pathways, but it does not provide the mechanism that caused this flux change. In addition, since it assumes a steady state condition, the flux change in a sequential pathway is found; however, it cannot identify which reaction in the pathway caused the flux change. The mechanism of flux changes must be discussed together with the knowledge about cellular

systems, such as enzyme expression and reaction inhibition. For instance, the flux information alone cannot conclude whether the enhanced acetate production led to the decrease of the TCA cycle flux, or the decrease of the TCA cycle flux led to the enhanced acetate production. In E. coli, it is known that acetate synthesis occurs due to overflow metabolism (Castaño-Cerezo et al., 2009). Therefore, based on the citrate synthase activity assays in vitro, it is considered that the overflow to acetate increased because the TCA cycle flux was repressed by phenol.

How did the decrease of the TCA cycle flux occur in the presence of phenol? Because respiration and the TCA cycle are linked with NADH, damage to the respiratory chain may cause a decrease in the TCA cycle flux. If so, however, NADH will accumulate and lactate and ethanol will be produced in order to recycle NADH produced by glycolysis; like in anaerobic fermentation, these fermentation products were not detected in the culture broth analysis by HPLC. Furthermore, inhibition of the respiratory chain increases uptake of glucose in E. coli (Kihira et al., 2012), but the specific glucose uptake rate decreased in the presence of phenol (**Table 1**). If α-ketoglutarate dehydrogenase and aconitase are inhibited by phenol, α-ketoglutarate and citrate should overflow, respectively, but these compounds were not detected either by the culture broth analysis. Based on these findings, it is considered that the reaction stopped at the point of citrate synthase, which catalyzes the first step of the TCA cycle. In addition, the culture profiles in **Figure 2** show that acetate was consumed promptly after depletion of glucose in the absence of phenol. On the other hand, consumption of acetate after glucose depletion was remarkably decreased at 0.1% phenol, and was completely stopped at 0.15% phenol. In E. coli grown with acetate as a carbon source, acetate is converted to acetyl-CoA and is catabolized via the glyoxylate shunt into the TCA cycle to avoid CO<sup>2</sup> emission (Zhao and Shimizu, 2003). Therefore, the reaction in which phenol was directly inhibited is a common reaction for acetate catabolic pathways. The acetate uptake and the glyoxylate shunt do not work during the glucose consumption phase. Because reactions from acetyl-CoA to isocitrate are common, we focused on citrate synthase, which catalyzes this step.

Metabolic flux changes are caused by several factors, such as enzyme expression level, and reaction inhibition. Citrate synthase is an important enzyme that is regulated at the gene expression level by various transcription factors such as Cra, ArcAB, and Fnr (Matsuoka and Shimizu, 2011). First, in order to investigate the change in enzyme expression, the specific activity of the crude extract of the cells cultured at different phenol conditions was evaluated, but no activity change was observed. Next, in order to investigate whether phenol affected the citrate synthase reaction in E. coli, activity was evaluated by adding phenol to the in vitro reaction solution. As shown in **Figure 5A**, the enzyme activity decreased to 56.6% by addition of 0.15% phenol. The compound is highly permeable to the cell membrane due to its hydrophobicity. Assuming equal concentration of phenol inside and outside the cell, the examined phenol concentration affects the citrate synthase activity in vivo. This enzyme assay result suggests that phenol affects the enzyme reaction level, and not the gene expression level. In the <sup>13</sup>C-MFA results, since the citrate synthase flux is more severely inhibited to 11% at 0.15% phenol, it suggests that other factors are remained to decrease the TCA cycle flux. A comprehensive analysis of gene expression profiles might provide clues to identify the factors.

For decreasing the acetate overflow in the presence of phenol, a gene knockout of pta encoding phosphotransacetylase which catalyzes the acetate formation pathway was investigated (**Figure 3**). The phenol tolerance of this mutant was lower than that of wild type strain. No growth of the 1pta strain was observed in the presence of 0.15% phenol. Since the knockout mutant cannot discard the end product of glycolysis as acetate in the presence of phenol, excess intermediates would accumulate in the cell and inhibit the metabolic flow in glycolysis. As another hypothesis, post-translational modification of proteins by acetylation may have affected the metabolic behavior. Acetylphosphate, an intermediate in the acetate synthesis pathway, is an acetyl group donor for lysine acetylation of various proteins. It has been reported that the acetylation of isocitrate lyase affects the activity of glyoxylate shunt and transcription factor RcsB which related to an acid stress susceptibility (Castaño-Cerezo et al., 2014).

In the present study, <sup>13</sup>C-MFA and enzyme assays reveal that the TCA cycle flux decreased due to phenol inhibition of citrate synthase; therefore, ATP could not be sufficiently produced by respiration, and growth rate decreased. Furthermore, since the carbon was lost as acetate due to overflow metabolism, the biomass yield became low in the presence of phenol. In the central carbon metabolism of E. coli, acetyl-CoA node is a branch point between the TCA cycle and acetate synthesis. The inhibition of citrate synthase by phenol blocks carbon flux into the TCA cycle and enhances the overflow into the acetate synthesis. Furthermore, since the acetyl-phosphate is an intermediate of acetate synthesis pathway, the accumulation may affect the TCA cycle flux through changes in enzyme activity and gene expression via protein acetylation. Investigations of the gene expression profiles and post-translational modifications on phenol stress would be necessary for further understanding the metabolic regulation. However, flux of upstream pathways such as glycolysis and the pentose phosphate pathway did not decrease. Phenol is synthesized via the aromatic amino acid synthesis pathway from phosphoenolpyruvate of glycolysis and erythrose 4-phosphate of the pentose phosphate pathway as precursors. So, although phenol has a negative effect on E. coli growth, phenol would not inhibit its own biosynthesis from glucose. Regarding phenol production by E. coli, then, it would be effective to separate the growth phase from the production phases, because the TCA cycle is not involved in phenol synthesis. The reaction for phenol synthesis should be suppressed during the growth phase, and should be induced after the cells are sufficiently grown.

#### AUTHOR CONTRIBUTIONS

SK, YT, and HS conceived the study. SK designed the study and performed all the experiments and computational analysis. YT designed the study and helped to do the experiments and

computational analysis. HS designed and supervised the study. SK and YT drafted the manuscript. All authors contributed to preparing the final version of the manuscript and approved the manuscript to submit to this journal.

#### FUNDING

This work was supported by Grant-in-Aid for Scientific Research (B) No.16H04576, and Grant-in-Aid for Scientific Research (C) No. 18K04850.

#### REFERENCES


#### ACKNOWLEDGMENTS

We thank Prof. Fumio Matsuda (Osaka University, Japan) for his helpful comments.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmicb. 2019.01010/full#supplementary-material

colistrains with <sup>13</sup>C-metabolic flux analysis. Biotechnol. Bioeng. 114, 2782–2793. doi: 10.1002/bit.26390


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Kitamura, Toya and Shimizu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# EMUlator: An Elementary Metabolite Unit (EMU) Based Isotope Simulator Enabled by Adjacency Matrix

Chao Wu<sup>1</sup> , Chia-hsin Chen<sup>2</sup> , Jonathan Lo<sup>1</sup> , William Michener <sup>1</sup> , PinChing Maness <sup>1</sup> and Wei Xiong<sup>1</sup> \*

*<sup>1</sup> National Renewable Energy Laboratory, Golden, CO, United States, <sup>2</sup> Institute of Nuclear Energy Research, Taoyuan, Taiwan*

#### *Edited by:*

*Manuel Kleiner, North Carolina State University, United States*

#### *Reviewed by:*

*Lian He, University of Washington, United States Joshua Chan, Colorado State University, United States*

> *\*Correspondence: Wei Xiong wei.xiong@nrel.gov*

#### *Specialty section:*

*This article was submitted to Microbial Physiology and Metabolism, a section of the journal Frontiers in Microbiology*

> *Received: 09 February 2019 Accepted: 11 April 2019 Published: 30 April 2019*

#### *Citation:*

*Wu C, Chen C, Lo J, Michener W, Maness P and Xiong W (2019) EMUlator: An Elementary Metabolite Unit (EMU) Based Isotope Simulator Enabled by Adjacency Matrix. Front. Microbiol. 10:922. doi: 10.3389/fmicb.2019.00922* Stable isotope based metabolic flux analysis is currently the unique methodology that allows the experimental study of the integrated responses of metabolic networks. This method primarily relies on isotope labeling and modeling, which could be a challenge in both experimental and computational biology. In particular, the algorithm implementation for isotope simulation is a critical step, limiting extensive usage of this powerful approach. Here, we introduce *EMUlator* a Python-based isotope simulator which is developed on Elementary Metabolite Unit (EMU) algorithm, an efficient and powerful algorithm for isotope modeling. We propose a novel adjacency matrix method to implement EMU modeling and exemplify it stepwise. This method is intuitively straightforward and can be conveniently mastered for various customized purposes. We apply this arithmetic pipeline to understand the phosphoketolase flux in the metabolic network of an industrial microbe *Clostridium acetobutylicum*. The resulting design enables a high-throughput and non-invasive approach for estimating phosphoketolase flux *in vivo*. Our computational insights allow the systematic design and prediction of isotope-based metabolic models and yield a comprehensive understanding of their limitations and potentials.

Keywords: adjacency matrix, elementary metabolite unit (EMU), fractional labeling (FL), *Clostridium acetobutylicum*, phosphoketolase

### INTRODUCTION

<sup>13</sup>C metabolic flux analysis (MFA) is currently the only experimental methodology to quantitatively understand intracellular biochemical networks by means of stable isotope tracing, labeling pattern analysis and indispensably, metabolic modeling which is based on mass and isotope balancing (Stephanopoulos, 1999; Sauer, 2006). Importantly, isotope modeling can provide additional key information that further defines the system, enabling quantification of the fluxes in parallel or cyclic pathways that cannot be estimated reliably by mass balance only. A variety of mathematical models were developed to establish relationship between isotope distribution and metabolic flux, including isotopomers (Schmidt et al., 1997), cumomers (Wiechert et al., 1999), and bondomers (Van Winden et al., 2002). However, previous methods suffer from the computational challenge in resolving realistic and large-scale metabolic network, as a large number of isotopomer equations need to be solved.

To address this limitation, a creative computational framework based on Elementary Metabolite Unit (EMU) was proposed (Antoniewicz et al., 2007a). The EMUs of a metabolite are defined as the non-empty subsets of all that compound's atoms (usually carbon atom). EMU can be cataloged by size, i.e., the number of atoms in it. With atom transition map, this framework will trace and identify the minimal relevant metabolic information needed to simulate isotope patterns and solve the optimization problem, and therefore greatly reduces the number of balance equations and computation burden, i.e., 95% reduction of the variables needed for the simulation in an E. coli model without any loss of information (Antoniewicz et al., 2007b). To date, EMU algorithm has garnered increasing attention in metabolic analysis (Quek et al., 2009; Sokol et al., 2012; Weitzel et al., 2013; Kajihata et al., 2014; Shupletsov et al., 2014; Young, 2014). Better leveraging this approach in understanding cell metabolism will require the development of novel computational packages which are easy to program and can fit new and broad application scenarios. However, to date, computational approaches that can straightforwardly and efficiently implement the EMU algorithm are still insufficient.

Here, we develop a new computational toolbox for steady state metabolic modeling analysis, the EMUlator, which accomplishes EMU modeling through an adjacency matrixbased approach. In graph theory, an adjacency matrix is used to quantitatively represent a graph, of which the elements indicate the connectivity of vertex pairs in rows and columns. Essentially, metabolic network is a directed graph with branches, thus it can be transformed into adjacency matrix for the ease of programming. Utilizing adjacency matrix, the EMUlator can efficiently simulate isotope distributions for <sup>13</sup>C-MFA. To demonstrate its functionality, we decompose the EMUs for isotope simulation in a tricarboxylic acid (TCA) cycle which represents a realistic metabolic network. Furthermore, we applied this newly developed software in modeling and analyzing the flux of phosphoketolase pathway in Clostridium acetobutylicum xylose catabolism. By decomposing the network and simulating metabolite isotopomer patterns, we found a good correlation between phosphoketolase flux and the fractional labeling of acetate, which has never been characterized in an isotope tracer experiment. Coupled with GC-MS analysis of acetate, this EMUlator-enabled analysis leads to a novel and high-throughput methodology for quantitatively understanding the phosphoketolase pathway in response to environmental and genetic perturbation. As exemplified, EMUlator aims to be a universal and powerful tool for isotope tracer modeling and for gaining quantitative understanding of cell metabolism. The software and its instruction are available at **Supplementary Information**.

#### RESULTS

#### Overview of the *EMUlator* Pipeline

The EMUlator pipeline is designed in Python, capable of performing a complete isotope simulation and prediction of a metabolic network for <sup>13</sup>C-MFA. Previous tools, such as Metran (Antoniewicz et al., 2006, 2007a), OpenFlux (Quek et al., 2009; Shupletsov et al., 2014) and INCA (Young, 2014), are able to perform such modeling, however based on Matlab platform which is not an open and free computing environment. In addition, previous EMU modeling was substantially less transparent than the one we present here. A key distinguishing feature of EMUlator is the usage of adjacency matrix. This ensures a graphic expression of the algorithm which can be understood intuitively and implemented iteratively. In particular, EMUlator provides a more detailed and principled procedure of EMU modeling, which decomposes metabolic network into EMU reactions, sets up EMU balances and simulates labeling distribution. Most importantly, the EMU deconstruction results in the reduction of the metabolic network model leading to a smaller set of EMU reactions which preserves all the information contained in it but decreases running time significantly.

#### *EMUlator* Transforms Metabolic Network Into Adjacency Matrix

To illustrate algorithm of the program comparably, we implement TCA cycle model as an example, as this representative metabolic network was also used in the original EMU work (Antoniewicz et al., 2007a). In this network, aspartate and acetyl coenzyme A (AcCoA) are the substrates, while CO<sup>2</sup> and glutamate are final products. Reactions with carbon atom transitions are listed in **Figure 1**. As a directed graph, any metabolic network with branches (due to cleavage and condensation reactions) can be transformed into a metabolite adjacency matrix (MAM). All metabolites are grouped in both row and column coordinates, thus forming a square matrix. Row metabolites appear as reactants while column metabolites are products of each reactions. Elements determined by row and column coordinates are connecting reactions for reactants and products. Reaction in element may not be unique because identical reactant and product could be involved in different reactions. As such, inputs and outputs of the network are easy to identify. Herein, columns without element are identified as substrates since they have no precursors (i.e., columns for AcCoA and Aspartate in dashed red boxes, **Figure 2**), while rows without element are identified as final products (i.e., rows for CO<sup>2</sup> and glutamate in dashed green boxes, **Figure 2**). Overall, MAM reflects the connectivity of metabolites in a network.

### *EMUlator* Decomposes MAM Into EMU Adjacency Matrix (EAM)

EMU decomposition of a metabolic network can start at the size of EMU(s) that need to be simulated. In this example, we simulate the Mass Distribution Vector (MDV, the fractional abundance of each isotopolog normalized to the sum of all possible isotopologues.) (Nanchen et al., 2007) of glutamate Glu<sup>12345</sup> (size 5). All EMU reactions that are needed for this simulation can be identified iteratively via MAM. Glu<sup>12345</sup> as a product can be found in the column of EAM (size 5) (see **Figure 3A**), its precursor AKG<sup>12345</sup> is illustrated in the corresponding row through the reaction V3. Similarly, for the product AKG12345, we can locate its precursor Cit<sup>12345</sup> via v2. Lastly for the product (in column) Cit12345, we identify reaction

FIGURE 1 | Simplified tricarboxylic acid (TCA) cycle to illustrate adjacency matrix-based EMU decomposition. Reactions involved in the metabolic model are listed on the right. Lowercase letters in brackets demonstrate atom transition in each reaction. Decimals indicate EMU equivalents due to rotation axis within molecule. AcCoA, acetyl coenzyme A; AKG, α-ketoglutarate; Asp, aspartate; Cit, citrate; Fum, fumarate; Glu, glutamate; OAA, oxaloacetate; Suc, succinate; subscript f, forward reaction; subscript b, backward reaction.

v1, in which both OAA<sup>234</sup> and AcCoA<sup>12</sup> are the reactants. Since EMUs of smaller size are identified in condensation reactions, they will be used as new start points for searching. Therefore, we can follow the OAA<sup>234</sup> (AcCoA<sup>12</sup> is identified as an EMU of the substrate, and thus the searching stops), and search all other EMUs at size 3 (**Figure 3B**). All precursor EMUs for multiple precursors [e.g., due to equivalent EMUs (Antoniewicz et al., 2007a)], can be identified through breadth-first search. As such, adjacent matrix provides a straightforward and iterative path allowing us to trace back EMUs of smaller sizes until the EMUs of network substrates are identified (i.e., Aspartate and AcCoA in this example). Once all set of the EMUs are obtained, EMUs can be arranged into different EAMs per size. Similar to MAM, row and column coordinates of EAM correspond to reactants and products of each reaction, respectively, with the difference that EMUs subject to convolution also appear in rows of EAM, and the coefficients of an element equal to the stoichiometric coefficients of corresponding reactant and product. Complete EAMs after EMU decomposition of the example are shown in **Figure 3**.

### *EMUlator* Significantly Reduces the Size of EAMs

EMUlator can also reduce the scale of EAM. In steady state isotope modeling, labeling pattern can only be modified by

convolution of two or more metabolites and unimolecular reactions may be lumped without affecting simulations which helps to reduce the scale of EAM. Unimolecular reactions can be easily identified in EAM as those with solo element in a column, which means its corresponding product only has a single source. Those columns will be deleted with identical metabolites in rows all renamed by their precursors. For example, in EAM of size 1, after AKG<sup>3</sup> column is eliminated, AKG<sup>3</sup> in row will be renamed with Cit<sup>3</sup> which produces AKG<sup>3</sup> by reaction v<sup>2</sup> (**Figure 4**). Moreover, since Cit<sup>3</sup> is still from a unimolecular reaction (v1), we eliminate the column as well and Cit<sup>3</sup> in row will be renamed by its precursor OAA<sup>2</sup> via v1. The deletion of the column and renaming of corresponding row continue until there is no soloelement column in EAM. Finally, multiple rows with identical metabolite name will be combined, indicating the identical reactant and product are connected by different reactions.

### *EMUlator* Identifies and Combines Equivalent EMUs

Rotationally symmetric molecules (i.e., fumaric acid and succinic acid) will give rise to equivalent EMUs (Antoniewicz et al., 2007a), which are undistinguishable for enzyme, and react identically in the reactions. EMUlator can combine those EMUs as they will have the same probability to get certain labeling pattern. Metabolites which could generate equivalent EMUs are indicated with fractional carbon atoms in **Figure 1**. Fum<sup>2</sup> and Fum<sup>3</sup> are equivalent EMUs of size 1 in this example (**Figure 5**). Element coefficients of row equivalent EMUs are combined, while coefficients of column equivalent EMUs are combined and divided by the number of equivalents. Eventually, EMU variables were reduced from 24 of initial EMU model to 9 after lumping unimolecular reactions and combining equivalent EMU, yielding the same results as the original EMU work (Antoniewicz et al., 2007a).

### *EMUlator* Establishes EMU Balances From EAMs

To simulate the labeling pattern of a given metabolite, EMU balances are established from EMU reaction subnetworks of different size, and MDVs can be calculated according to:

$$X\_i = A\_i^{-1} \cdot B\_i \cdot Y\_i$$

Where A<sup>i</sup> and B<sup>i</sup> are matrix function of flux variable of size i. Ai is square matrix whose shape is dependent on the number of EMU balance m of current size. B<sup>i</sup> 's shape is m × n where n is the number of available EMU variables (Antoniewicz et al., 2007a). Simulation starts from size 1, and Y<sup>1</sup> are MDVs of network substrates. Other MDVs are calculated and then used in Y of larger size. A<sup>i</sup> and B<sup>i</sup> can be easily deduced from EAM as demonstrated in **Figure 6**. Diagonal elements of EAM are first set to be the negative sum of other elements of current column. Then the transposed upper square submatrix will be A1, and the lower submatrix will be B<sup>1</sup> after transposition and all elements negated. The transformation is made according to the isotopomer balance which states that the sum of all influx to an EMU multiplied by its MDV (Pv<sup>i</sup> · MDV) is equal to the sum of the individual product of each influx v<sup>i</sup> multiplied by MDV<sup>i</sup> P(v<sup>i</sup> · MDVi) . Apparently, diagonal element represents the total influx of corresponding EMU. Multiplied by MDV of balanced EMU, the product is equal to the sum of all labeling pattern sources, including those unknown (Xi) and known (Yi) EMU variables. EMU balances of larger size can be established likewise until the Glu<sup>12345</sup> are eventually simulated. EMU balances of all sizes are shown below:

EMU balance of size 1

$$\begin{aligned} \begin{bmatrix} -\nu\_{6b} - \nu\_{5} & 0.5\nu\_{6b} + 0.5\nu\_{5} & 0.5\nu\_{6b} \\ \nu\_{6f} & -\nu\_{6f} - \nu\_{7} & 0 \\ \nu\_{6f} & 0 & -\nu\_{6f} - \nu\_{7} \end{bmatrix} \cdot \begin{bmatrix} MDV\_{\text{Fum}\_{2}} \\ MDV\_{OAA\_{2}} \\ MDV\_{OAA\_{3}} \end{bmatrix} \\ = \begin{bmatrix} 0 & 0 & -0.5\nu\_{5} \\ -\nu\_{7} & 0 & 0 \\ 0 & -\nu\_{7} & 0 \end{bmatrix} \cdot \begin{bmatrix} MDV\_{A\text{sp}\_{2}} \\ MDV\_{A\text{sp}\_{3}} \\ MDV\_{A\text{cCa}A\_{2}} \end{bmatrix} \end{aligned}$$

EMU balance of size 2

$$
\begin{bmatrix}
\nu\_{6f} & -\nu\_{6f} - \nu\_{7}
\end{bmatrix} \cdot \begin{bmatrix}
MDV\_{\text{Fum}\_{23}} \\
MDV\_{OAA\_{23}}
\end{bmatrix} = \begin{bmatrix}
0 & \nu\_{5} \\
\end{bmatrix} \\
\cdot \begin{bmatrix}
MDV\_{Asp\_{23}} \\
MDV\_{OAA\_{2}} \times MDV\_{AcCoA\_{2}}
\end{bmatrix}
$$

EMU balance of size 3

$$
\begin{bmatrix}
\nu\_{6f} & -\nu\_{6f} - \nu\_{7} & 0 \\
\nu\_{6f} & 0 & -\nu\_{6f} - \nu\_{7}
\end{bmatrix} \cdot \begin{bmatrix}
MDV\_{F00u\_{123}} \\ MDV\_{OA4123} \\ MDV\_{OA4234} \\ MDV\_{OA4234}
\end{bmatrix} = \\
\cdot \begin{bmatrix}
0 & 0 & -\nu\_{7} & 0 \\
0 & 0 & 0 & -\nu\_{7}
\end{bmatrix} \cdot \begin{bmatrix}
MDV\_{O0A\_{2}} \times MDV\_{AcOA\_{12}} \\ MDV\_{O0A\_{23}} \times MDV\_{AcOA\_{2}} \\ MDV\_{A8f23} \\ MDV\_{A8p\_{234}}
\end{bmatrix}
$$

EMU balance of size 4

$$[-\nu\_3] \cdot \left[ MDV\_{Glu\_{12345}} \right] = [-\nu\_3] \cdot \left[ MDV\_{OAA\_{123}} \times MDV\_{AcCaA\_{12}} \right]$$

### *EMUlator* Designs Optimal <sup>13</sup>C-Tracer Experiment for Quantifying Metabolic Flux of Interest

To demonstrate the performance of EMUlator in a large-scale setting that reflects the complexity of realistic cell metabolism, we next performed simulation of isotope distributions in a larger network model of C. acetobutylicum. C. acetobutylicum is a solventogenic clostridium and represents a promising chassis microbe capable of utilizing lignocellulose-derived pentose sugar (i.e., xylose) for biofuels production (Mitchell, 1998; Gu et al., 2011). In this case, pentose catabolism through phosphoketolase pathway (Grimmler et al., 2010; Servinsky et al., 2010; Liu et al., 2012) is of special interest in that this pathway has been recognized as a key target for constructing synthetic pathways (e.g., Non-oxidative glycolysis) that bypass CO<sup>2</sup> loss via pyruvate decarboxylase and thus enhance carbon yield in final products

FIGURE 4 | Reduced EAM of size 1. Column metabolites with only one element can be lumped because it has solo influx. These columns can be eliminated therefore, and their corresponding row metabolites will be replaced by its precursor. Elimination and replacement occur iteratively until no solo element column exists. Rows with identical metabolite will be combined.

(Bogorad et al., 2013). Here we used the adjacency matrix-based EMUlator to simulate labeling patterns of metabolites and show how it facilitates the selection of best isotope substrates and readouts quantifying the in vivo phosphoketolase activity.

First, a biochemical network for xylose metabolism of C. acetobutylicum was constructed, based on the genome information (Nölling et al., 2001; Bao et al., 2011). As shown in **Figure 7A**, after phosphorylation, xylose can be metabolized either through the non-oxidative pentose phosphate pathway or cleaved by phosphoketolase to form acetyl-phosphate and glyceraldehyde-3-phosphate. Acetyl-phosphate is further directed to generate extracellular fermentative products (i.e., acetate, ethanol, acetone, butanol, butyrate) and glyceraldehyde-3-phosphate can enter pentose phosphate pathway in which reactions are highly reversible due to the nature of isomerase, epimerase, transketolase, and transaldolase. The oxidative pentose phosphate pathway is not considered which was verified to be inactive in C. acetobutylicum (Au et al., 2014). The TCA cycle is not included as it does not influence the labeling patterns of the upstream metabolites.

We selected acetate (AC), ethanol, 3-phosphoglycerate, erythrose-4-phosphate and ribose-5-phosphate as candidate readouts for reflecting phosphoketolase activity since MDVs of these metabolites can be experimentally obtained either from direct determination or derivation from amino acid MDVs (Nanchen et al., 2007). Meanwhile, we tested all commercially available xylose tracers: 1-13C xylose, 2-13C xylose, 3-13C xylose, 4-13C xylose, 5-13C xylose, 1,2-13C xylose, U-13C xylose. MDVs were simulated using EMUlator in all possible combinations of these candidate readouts and tracers. Goodness of correlation between Fractional Labeling (FL) (defined in Materials and Methods) and flux ratio through phosphoketolase (i.e., v2/v<sup>1</sup> in

**Figure 7A**) and range of effective FL are used as the selection criteria. The modeling results are shown in **Figure 8**.

Among various xylose tracers, 1,2-13C xylose yields EMU AC<sup>12</sup> with best correlation between its FL and phosphoketolase flux ratio (Spearman correlation coefficient = 0.7). The widest effective range of FL (slope of regression line = 0.43) indicates a good sensitivity on the flux ratio of phosphoketolase (**Figure 8A**). EMU AC<sup>1</sup> and AC<sup>2</sup> show identical correlationship between FL and flux ratio with that of AC<sup>12</sup> (**Figures 8B,C**) because the carbons of acetate originate from C2 and C1 of xylose which are both labeled (probably also from C4 and C5 of xylose converted from AcCoA) and behave equivalently in the atom transition. FL of G3P<sup>123</sup> could have a good correlation with the phosphoketolase flux ratio, which is, however, disturbed by many randomly distributed points (**Figure 8E**). This is probably due to the reversibility of the reactions in glycolytic and pentose phosphate pathways. EtOH12, E4P<sup>1234</sup> and R5P<sup>12345</sup> all show poor correlation, and thus cannot be used as the indicators for phosphoketolase flux (**Figures 8D,F,G**). As for other tracers, 1- <sup>13</sup>C xylose and 2-13C xylose labeling result in poor correlation in AC<sup>1</sup> and AC2, respectively (**Supplementary Figures 1, 2**). AC<sup>12</sup> and EtOH<sup>12</sup> will be totally unlabeled using 3-13C xylose tracer as no C3 of xylose will fractionate into these metabolites. G3P<sup>123</sup> shows a good correlation with phosphoketolase flux while the range of effective FL is too small to determine phosphoketolase activity (**Supplementary Figure 3**). If 4-13C xylose and 5-13C xylose are used as substrate, correlation between FL and flux ratio are inverted for most of the EMUs. In addition, AC<sup>2</sup> and AC<sup>1</sup> are totally unlabeled per 4-13C xylose and 5-13C xylose, respectively. (**Supplementary Figures 4, 5**). As a control, FLs of all MDVs are constant when fed with a mixture of U-13C labeled and unlabeled xylose (**Supplementary Figure 6**), which is, therefore excluded from the isotope tracer selection. In comparison of all tracer/readout combination, 1,2- <sup>13</sup>C-xylose/ AC<sup>12</sup> performed the best and this selection paves the way to the experimental measurement of flux in phosphoketolase pathway.

Guided by the above simulations, C. acetobutylicum was thereafter cultivated in 5 g L−<sup>1</sup> 1,2-13C xylose in following experiment. Fermentation kinetics including cell growth and the production of cell products over time are as shown in **Supplementary Figure 7**. The flux ratio from phosphoketolase pathway was then quantified by harvesting the supernatant and determining the isotope pattern of AC with GC-MS. MDVs of AC<sup>1</sup> and AC<sup>12</sup> were measured, and MDVs of AC<sup>2</sup> can be deduced. Accordingly, FL of AC1, AC<sup>2</sup> and AC<sup>12</sup> were calculated to be 0.462 ± 0.018, 0.469 ± 0.012 and 0.465 ± 0.015, respectively, which is consistent with our prediction that FLs of all fragments' EMUs should be identical. A distribution of phosphoketolase flux ratio was obtained from simulated sample points with corresponding FL of AC1, AC<sup>2</sup> and AC<sup>12</sup> falling into the measured ranges. The average and 95% confidence interval of the flux ratio through phosphoketolase were estimated as 22.8% and 14.3–36.6%, respectively under current conditions (**Figure 7B**).

#### DISCUSSION

Although equivalent to isotopomer models, EMU method is able to significantly reduce the variables needed to simulate labeling patterns of metabolites (90% reduction in our C. acetobutylicum case) without any loss of information, which therefore greatly facilitates <sup>13</sup>C flux modeling of realistic and large-scaled metabolic network and dynamic systems (Young et al., 2008). Here we introduce a computational tool EMUlator to the metabolic research community. This engine utilizes adjacency matrix-based approach which is intuitively straightforward and easy to program. Metabolic network can be decomposed into EAMs of different size which can be further reduced by lumping unimolecular reactions and combining equivalent EMUs. For MDV simulation, matrix multiplication starts from EAM of the smallest size, and iteratively continues to larger size until the required EMU is simulated. Overall, the computation time for

EMUlator to perform MDV simulation depends on the network complexity, or connectivity which can be represented by the number of non-zero elements in EAMs. For a realistic and moderate-sized network shown in the above examples, the time complexity is roughly O(n<sup>3</sup> ), where n is the number of EMUs.

The EMUlator allowed large-scale and efficient isotope modeling. To exemplify its capability, we applied it in quantitative understanding of the phosphoketolase pathway in the central carbon metabolism of an industrial model microbe (C. acetobutylicum). We simulated labeling pattern of both intracellular and secretory metabolites using different xylose tracers, and identified the best tracer/readout combination reflecting phosphoketolase activity. <sup>13</sup>C-flux measurement of the phosphoketolase pathway was enabled recently (Liu et al.,

generated 1000 times subjecting to xylose metabolism network. MDVs of (A) AC1, (B) AC2, (C) AC12, (D) EtOH12, (E) G3P123, (F) E4P1234 and (G) R5P12345 are simulated using adjacency matrix-based EMU decomposition method proposed in this work. Metabolite FLs and flux ratio from PKT are subsequently calculated and plotted correspondingly. Regression line and 95% confidence intervals are also plotted.

2012), in which 3-phosphoglycerate was used as the indicator for phosphoketolase flux measurement in a 1-13C-xylose labeling experiment. This design is reasonable as the FL of 3-phosphoglycerate is monotonically reduced with the increased flux through phosphoketolase. The limitation is that 3-phosphoglycerate is not a direct product of phosphoketolase, therefore the flux estimation is largely dependent on the assumption that the labeling pattern of 3-phosphoglycerate and glyceraldehyde 3-phosphate, the product of phosphoketolase are identical. In our case, we compared all available xylose tracers and readouts with a more realistic metabolic network which also takes reversible reactions into account. Our results demonstrated that acetate (with EMUs of AC1, AC2, and AC12) can be a better readout due to the strong correlations with phosphoketolase flux and a wide effective range of FL. Indeed, acetate can be exclusively derived from acetyl phosphate which is directly produced from phosphoketolase. More importantly, acetate is an extracellular metabolite and its isotope pattern can be easily measured by GC-MS without derivatization. The convenience of measurement without breaking the cells provides a high-throughput and noninvasive method for prompt <sup>13</sup>C-flux estimation, which, to our knowledge, was never developed previously. This estimation is based on the numerical relationship between fractional labeling of acetate and flux ratio through phosphoketolase which are positively correlated, even though it may not be strictly linear. It should be noticed that in the FL range 0.5-0.65, multiple flux ratio values are possible for a given value of FL. The jitter could be due to the reversibility of biochemical reactions and partial dependency of the EMU basis vectors (Crown and Antoniewicz, 2012), which cannot make all free fluxes absolutely solvable using only acetate labeling data. To further obtain a more precise prediction of phosphoketolase activity, advanced multiple regression method could be applied such as machine learning using FLs of other relevant EMUs as the training features.

We believe that the EMU modeling based on adjacency matrix approach opens a number of new possibilities in metabolic network analysis and <sup>13</sup>C-MFA. First, as exemplified, EMUlator can be used to select metabolites as readouts reflecting directly in vivo enzyme activities, and can also be used to do tracer simulations (Metallo et al., 2009; Young, 2014) which predict the labeling results of metabolic network models ahead of "wet" experiments, leading to further refinement. More promisingly, EMUlator is developed toward solving the inverse problem to estimate intracellular fluxes through an optimization search that minimize the sum-of-squared residuals between computationally simulated and experimentally determined measurements. With the EMU method, metabolic flux estimation can be further extended to computation-intensive scenarios as genome scale network (Gopalakrishnan and Maranas, 2015) and transient labeling process (Hendry et al., 2019). This task cannot be accomplished by other isotope modeling methods due to a tremendous computational burden. Currently, we are engaged in the development of an updated version focusing on the de novo and complete solution of <sup>13</sup>C-MFA, in which a global flux distribution can be estimated either from steadystate labeling or kinetic labeling experiments. It is our hope that EMUlator will benefit the community and fuel metabolic research as the basis for innovative development of metabolic analysis tools.

### MATERIALS AND METHODS

### Implementation of EMU Algorithm in *EMUlator*

The TCA cycle example in the results illustrates the adjacency matrix approach for implementing EMU algorithm. The software and its instruction and are detailed in **Supplementary Information**.

#### Strain, Culture Conditions, and Medium

Clostridium acetobutylicum ATCC 824 was used in all experiments. For growth studies and biochemical characterization, C. acetobutylicum cells were grown anaerobically in 37◦C in CTFUD defined medium (Olson and Lynd, 2012) which contains 3 g L−<sup>1</sup> Na3C6H5O7·2H2O, 1.3 g L −1 (NH4)2SO4, 1.5 g L−<sup>1</sup> KH2PO4, 0.13 g L−<sup>1</sup> CaCl2·2H2O, 0.5 g L−<sup>1</sup> L-Cysteine–HCl, 11.56 g L−<sup>1</sup> MOPS sodium salt, 2.6 g L−<sup>1</sup> MgCl2·6H2O, 0.001 g L−<sup>1</sup> FeSO4·7H2O, 0.5 mL L−<sup>1</sup> Resazurin 0.2% (w/v), supplemented with Wolfe's Vitamin solution (ATCC). D-xylose was supplied at concentration of 5 g L−<sup>1</sup> as a carbon source. The cultures were started with the optical density at 600 nm (OD<sup>600</sup> = 0.05-0.08) and performed in mid-log-phase. For labeling experiments, 1,2-13C-labeled xylose (99% pure; Cambridge Isotope Laboratories, Tewksbury, MA) was added to media at the concentration of 5 g L−<sup>1</sup> . C. acetobutylicum strains were kept by freezing log-phase cultures at −80◦C with 10% glycerol.

#### Quantitative Analysis of Fermentation Products

Cell growth was monitored by measuring the absorbance at OD<sup>600</sup> with a Spectronic 21D UV-Visible Spectrophotometer (Milton Roy, Houston, TX). To analyze extracellular metabolites, cell samples were harvested by centrifugation at 13,000 g for 10 min. After filtration with 0.2µm filter, the supernatant was analyzed by Agilent 1200 high pressure liquid chromatography (HPLC) (Agilent Technologies, Santa Clara, CA) and injected into a Bio-Rad Aminex HPX-87H column with a Micro Guard Cation H Cartridge. 4 m M H2SO<sup>4</sup> was used as mobile phase at a flow rate of 0.6 mL/min. The column temperature was set to 55◦C. Metabolites were detected by refractive index detector and UV/VIS detector.

#### Isotope Analysis

Labeling pattern of proteinogenic amino acids from cell mass were analyzed by Gas Chomatograph-mass spectrometry (GC-MS) as detailed in Xiong et al. (2018). The labeling pattern of acetate in the supernatant was directly analyzed by GC-MS without derivatization. Analysis of samples was performed on an Agilent 6890N GC equipped with a 5973 MS Detector (Agilent Technologies, Palo Alto, CA). Samples were injected at a volume of 1 uL in splitless mode, and the analyte of interest was separated on a Restek Stabilwax-DA column (Restek Corporation, Bellefonte, PA). A flow of 1 mL min−<sup>1</sup> was held constant throughout the run with the following temperature profile: 35◦C, hold for 3 min; ramped at 10 ◦C min−<sup>1</sup> to 225◦C, hold for 1 min; ramped at 15◦C min−<sup>1</sup> to 250◦C, hold for 5 min.

#### Isotope Modeling From the Metabolic Network of *C. acetobutylicum*

Simulations were repeated 1,000 times with metabolic fluxes randomly generated subjecting to mass balances determined by reactions listed in **Supplementary Table 1**. Fractional labeling (FL) was calculated to indicate labeling status of metabolites according to:

$$FL = \frac{\sum\_{i=0}^{n} i \cdot m\_i}{n}$$

where n is the number of carbon in a EMU, and m<sup>i</sup> represents components of MDV (Nanchen et al., 2007).

#### REFERENCES


#### AUTHOR CONTRIBUTIONS

CW developed the software and analyzed data. WX and CW designed the experiment. CC, JL, and WM conducted the experiment. WX proposed the idea and research questions, guided all stages of the research. CW and WX prepared the manuscript with editing from PM, JL, and CC.

#### ACKNOWLEDGMENTS

We appreciate Prof. Maciek R. Antoniewicz at the University of Delaware for his helpful suggestion on this work. This work was supported by the National Renewable Energy Laboratory LDRD project 0600.10001.18.42.01 (CW, JL, PM, and WX), the DOE Bioenergy Technology Office project under the agreement number 34715 (CW and WX), and Institute of Nuclear Energy Research in Taiwan ROC (CC).

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmicb. 2019.00922/full#supplementary-material

UTEX 2973 Using Transient <sup>13</sup>C-labeling data. Plant Physiol. 179, 761–769. doi: 10.1104/pp.18.01357


modeling software package adjusted for the comprehensive analysis of single and parallel labeling experiments. Microb. Cell. Fact. 13, 152. doi: 10.1186/PREACCEPT-1256381938128538


of a model cellulolytic bacterium clostridium thermocellum. Front. Microbiol. 9:1947. doi: 10.3389/fmicb.2018.01947


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Wu, Chen, Lo, Michener, Maness and Xiong. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Dietary Energy Level Promotes Rumen Microbial Protein Synthesis by Improving the Energy Productivity of the Ruminal Microbiome

#### Zhongyan Lu<sup>1</sup>† , Zhihui Xu2,3† , Zanming Shen<sup>1</sup> , Yuanchun Tian<sup>4</sup> and Hong Shen2,3 \*

<sup>1</sup> The Key Laboratory of Animal Physiology and Biochemistry, College of Veterinary Medicine, Nanjing Agricultural University, Nanjing, China, <sup>2</sup> College of Life Science, Nanjing Agricultural University, Nanjing, China, <sup>3</sup> Bioinformatics Center, Nanjing Agricultural University, Nanjing, China, <sup>4</sup> College of Agriculture, Nanjing Agricultural University, Nanjing, China

#### Edited by:

Garret Suen, University of Wisconsin-Madison, United States

#### Reviewed by:

Shengguo Zhao, Institute of Animal Sciences (CAAS), China Luciano Takeshi Kishi, São Paulo State University, Brazil Hilario C. Mantovani, Universidade Federal de Viçosa, Brazil

#### \*Correspondence:

Hong Shen hongshen@njau.edu.cn †These authors have contributed equally to this work

#### Specialty section:

This article was submitted to Systems Microbiology, a section of the journal Frontiers in Microbiology

Received: 29 November 2018 Accepted: 02 April 2019 Published: 17 April 2019

#### Citation:

Lu Z, Xu Z, Shen Z, Tian Y and Shen H (2019) Dietary Energy Level Promotes Rumen Microbial Protein Synthesis by Improving the Energy Productivity of the Ruminal Microbiome. Front. Microbiol. 10:847. doi: 10.3389/fmicb.2019.00847 Improving the yield of rumen microbial protein (MCP) has significant importance in the promotion of animal performance and the reduction of protein feed waste. The amount of energy supplied to rumen microorganisms is an important factor affecting the amount of protein nitrogen incorporated into rumen MCP. Substrate-level phosphorylation (SLP) and electron transport phosphorylation (ETP) are two major mechanisms of energy generation within microbial cells. However, the way that energy and protein levels in the diet impact the energy productivity of the ruminal microbiome and, thereafter, rumen MCP yields is not known yet. In present study, we have investigated, by animal experiments and metagenome shotgun sequencing, the effects of energy-rich and protein-rich diets on rumen MCP yields, as well as SLP-coupled and ETP-coupled energy productivity of the ruminal microbiome. We have found that an energy-rich diet induces a significant increase in rumen MCP yield, whereas a protein-rich diet has no significant impacts on it. Based on 10 reconstructed pathways related to the energy metabolism of the ruminal microbiome, we have determined that the energy-rich diet induces significant increases in the total abundance of SLP enzymes coupled to the nicotinamide adenine dinucleotide (NADH) oxidation in the glucose fermentation and F-type ATPase of the electron transporter chain, whereas the protein-rich diet has no significant impact in the abundance of these enzymes. At the species level, the energy-rich diet induces significant increases in the total abundance of 15 ETP-related genera and 40 genera that have SLP-coupled fermentation pathways, whereas the protein-rich diet has no significant impact on the total abundance of these genera. Our results suggest that an increase in dietary energy levels promotes rumen energy productivity and MCP yield by improving levels of ETP and SLP coupled to glucose fermentation in the ruminal microbiome. But, an increase in dietary protein level has no such effects.

Keywords: rumen microbiome, energy productivity, substrate-level phosphorylation, electron transport phosphorylation, microbial protein synthesis, dietary modulation

## INTRODUCTION

fmicb-10-00847 April 15, 2019 Time: 17:41 # 2

Dietary protein for ruminants includes nitrogen (N) occurring in true protein and non-protein. In the rumen, the true protein is degraded into amino acid (AA) and ammonia and then utilized by ruminal microorganisms to synthesize microbial protein (MCP). In the small intestine, more than 80% of rumen MCP is digested, accounting for 50–80% of the total absorbable protein contained there (Tas et al., 1981; Storm et al., 1983). In dairy cows, rumen MCP provides all of the AAs needed for milk protein synthesis (Virtanen, 1966). Because of the high digestibility and good AA composition of MCP, increasing the MCP yield in the rumen is of important significance for the promotion of animal performance. Moreover, increasing the MCP yield is the most effective strategy to reduce the protein feed waste in livestock, since the dietary protein that exceeds the requirement of ruminal microorganisms is degraded to ammonia in the rumen, metabolized to urea in the liver, and lost in the urine.

Previous studies have showed that the sources and amounts of fed carbohydrate are the major factors affecting the energy, in the form of adenosine triphosphate (ATP), available for rumen microbial growth (synthesis of MCP in particular), and that of fed protein affects the production of microbial dry matter (DM) per unit of carbohydrate fermented (Maeng et al., 1976; Hoover and Stokes, 1991). In the dairy cow, 12–13% protein content is needed to maximize the ruminal synthesis of MCP (Satter and Slyter, 1974; Satter and Roffler, 1975). More protein N is incorporated into rumen MCP only if more non-fiber carbohydrate (NFC), known to be the major energy substrate for ruminal microorganisms, are fed to the animals (Schwab et al., 2005). Microbiological studies have revealed that, once the extracellular AA is transported inside microbial cells, the fate of the absorbed AA will depend on the availability of ATPs within the microbial cells. If ATPs are available, AA will be transaminated or used directly for MCP synthesis. However, if ATPs are limited, AA will be deaminated into ammonia and excreted from the cytoplasm (Tamminga, 1979). Accordingly, when dietary protein meets the requirement of ruminal microorganisms, increasing the content of dietary NFC promotes rumen MCP yields via improving the amount of energy supplied to the ruminal microorganisms, whereas increasing the content of dietary protein may have no benefit on rumen MCP yields, since a protein-rich diet might not meet the energy requirement of ruminal microorganisms. So far, much attention has been paid to the ratio of dietary NFC/protein that is optimum for MCP yields (Casper and Schingoethe, 1989; Cameron et al., 1991; Henning et al., 1991). However, the mechanism that the ruminal microbiome applies to producing the energy, and the way that the NFC-rich diet or protein-rich diet impacts the energy productivity of the ruminal microbiota, compared with a basal diet, have not been studied as yet.

It is known that ATP is produced in the microbial metabolism via two major mechanisms: (1) substrate-level phosphorylation (SLP), and (2) electron transport phosphorylation (ETP). In the former, the high-energy phosphate in the substrate molecule is directly donated to ADP to produce ATP. In the latter, electrons that are obtained from energy sources are moved by the electron transport chain (ETC) to reduce oxygen (aerobic respiration) or other oxidized components (anaerobic respiration). Meanwhile, the transmembrane electrochemical potential (1µH+/1µN<sup>a</sup> <sup>+</sup>) generated by the movement of electrons drives the synthesis of ATP in the cell. In the rumen, SLP majorly occurs in the glycolysis and short-chain fatty acids (SCFAs) production. During these processes, the electron generated in the degradation of glucose to pyruvate is transferred from nicotinamide adenine dinucleotide (NADH) to produce lactate, ethanol, butanol and malate (this is referred to as fermentation). ETP (anaerobic respiration) is known to be coupled only with the generation of propionate following succinate pathway and methanogenesis (Tamminga, 1979). Based on the comparison of incubation results using carbohydrate (cellobiose and maltose) and pyruvate as the substrates, ETP was shown to take a minor part in ATP yields, accounting for 20% of the total ATP yields in the carbohydrate fermentation (Demeyer and Van Nevel, 1986). However, recent studies have detected several new ETP-coupled pathways under anaerobic conditions, for example, acetogenesis via the reductive acetyl-CoA pathway (Wood-Ljungdahl pathway) (Schuchmann and Muller, 2014), and butyrate formation via the reduction of crotonyl-CoA (Herrmann et al., 2008). Accordingly, the importance of ETP to the ATP productivity in the ruminal microbiome, as well as the effects of fed NFC/protein on the ETP- and SLP-coupled ATPs productivity, needs to be re-evaluated.

Here, we present the first in vivo investigation of the microbial species, pathways and enzymes related to energy metabolism in the rumen, plus a systematic comparison of the energy productivity of the ruminal microbiome in goats receiving a protein-rich or a NFC-rich diet with that of goats receiving a basal diet. We first collected the ruminal microbiota from the goats receiving the three treatments, and compared the rumen MCP yields and fermentation parameters between the groups. Subsequently, by applying metagenome shotgun sequencing and bioinformatics analysis, we reconstructed the pathways related to SLP and the pathways related to ETP in the rumen. Next, we examined the enzymes that catalyze the SLP in these pathways and the enzymes that belong to the ETC and picked out the species whose genome encoded those specific enzyme. Finally, we compared the abundance of the enzymes and the relative abundance of the target species between the groups. By means of the above steps, we aimed to determine the way that the dietary protein or NFC impacted the energy productivity of the ruminal microbiome and, concomitantly to reevaluate the importance of ETP in the ATP productivities of the ruminal microbiome.

#### MATERIALS AND METHODS

#### Ethics Statement

This study was approved by the Animal Care and Use Committee of Nanjing Agricultural University, in compliance with the Regulations for the Administration of Affairs Concerning Experimental Animals (No. 588 Document of the State Council of China, 2011).

#### Animals

fmicb-10-00847 April 15, 2019 Time: 17:41 # 3

Twenty-four goats (Boer × Yangtze River Delta White, aged 4 months) were randomly allocated into three groups and received a NFC-rich diet (G group, n = 8), a protein-rich diet (P group, n = 8), or a basal diet (B group, n = 8). The ingredients and chemical compositions of the experimental diets are listed in **Table 1**.

Goats were housed in individual tie-stall barns and had free access to water. To avoid the selection of dietary components and to maintain the desired ratio, a total mixed ratio (TMR) was offered at 0800 and 1700 daily for the 42-day experimental period with the first 14 days being dedicated to adaptation. Feed intake and refusal of individual goats were measured daily during the experiment. The amount of diet offered during days 15–42 was adjusted on a weekly basis to allow at about 10% ort. Feeds were sampled at the beginning and end of the experiment. The DM, ash, crude fat, and crude protein contents of samples were analyzed according to the procedures of AOAC (Cunniff and AOAC International, 1997). The acid detergent fiber (ADF) and neutral detergent fiber (NDF) values of the samples were analyzed according to the procedures of Van Soest et al. (1991).

### Sample Collection, Microbial DNA Extraction, and Metagenome Shotgun Sequencing

Urine samples were collected from 1200 on day 41 to 1200 on day 42 by indwelling catheters that were connected to containers containing 50% HCl, and then, stored at −20◦C for the later analysis of rumen MCP. All goats were killed at the local slaughterhouse at 6 h after receiving their morning feed on day 43. Ruminal contents were strained through a 4-layer cheesecloth and immediately subjected to pH measurement. An aliquot (10 mL) of ruminal fluid was immediately stored at −20◦C for

TABLE 1 | Ingredient and chemical composition of the experimental diets in the present study.


DM, dry matter; NFC, non-fiber carbohydrate; NDF, neutral detergent fiber; ADF, acid detergent fiber. <sup>1</sup>The values are means ± SE. <sup>2</sup>Aneurotepidimu chinense. <sup>3</sup>The additive was composed of calcium phosphate, limestone, trace mineral salt, and vitamin premix (vitamins A, D, and E). <sup>4</sup> NFC = 100 – (NDF + CP + ether extract + ash).

the extraction of metagenomic DNA. Another aliquot (10 mL) of ruminal fluid was added by 1 mL of 5% HgCl<sup>2</sup> solution to inactive the microbial proteases, and subsequently, stored at −20◦C for the determination of the volatile fatty acids (VFAs) concentration. An aliquot (5 mL) of ruminal fluid was stored at −20◦C for the determination of ammonia N.

The metagenomic DNA was extracted from the ruminal fluid by using a Bacterial DNA Kit, following the instruction of the supplier (Omega, Shanghai, China). The DNA concentration was determined in a Nanodrop 1000 (Thermo Fisher Scientific, Wilmington, DE, United States). The integrity of the microbial DNA was evaluated on a 1.0% agarose gel. Metagenomic DNA libraries were constructed by using the TruSeq DNA Sample Prep kit (Illumina, San Diego, CA, United States). Libraries were sequenced via paired-end chemistry (PE150) on an Illumina Hiseq X Ten platform (Illumina, San Diego, CA, United States) at Biomarker Technologies, Beijing, China.

### Rumen MCP and Fermentation Parameters Determination and Analysis

The amount of rumen MCP was calculated from purine derivative excretion in the urine by using the method described by Chen and Gomes (1995) on a spectrophotometer (721, INESA analytical instrument Co., LTD, Shanghai, China). The concentration of ruminal VFAs was determined by using a gas chromatograph (HP6890N, Agilent Technologies, Wilmington, DE, United States) as described by Yang et al. (2012). The ammonia N was determined by using a colorimetric method (Weatherburn, 1967) on a spectrophotometer.

The two-tailed t-test was used in the analysis of the rumen MCP amount and rumen fermentation parameters (VFAs concentration, ruminal pH and ammonia N concentration). Differences were considered significant when p < 0.05. These analyses were performed by using SPSS software package (SPSS Inc., Chicago, IL, United States).

#### Metagenome Shotgun Sequencing Analysis Genome Assembly

Raw reads were first filtered by using FastX v0.0.13 (Gordon and Hannon, unpublished), with a quality cutoff of 20. Reads shorter than 30 bp were discarded from the sample. The reads that were likely to originate from the host and feeds were removed by using DeconSeq v0.4.3 (Schmieder and Edwards, 2011), with the NCBI goat, corn, soybean, and grass genome sequences as references. The remaining high-quality reads of all samples were taken together and then assembled into scaftigs by using IDBA-UD 1.1.1 (Peng et al., 2012) with the standard parameters.

#### Non-redundant Gene Set Construction

Genes were predicted from the scaftigs by using FragGeneScan 1.31 (Rho et al., 2010). Predicted genes from all samples were gathered together to form a large gene set. BLAT v35 (Kent, 2002) was used to construct the non-redundant gene set. Any two genes with more than 95% identity and more than 90% coverage of the shorter gene were picked out, and subsequently, the shorter one was removed from the large gene set.

#### Gene Abundance Calculation

fmicb-10-00847 April 15, 2019 Time: 17:41 # 4

High-quality reads of each sample were mapped to the non-redundant gene set by using Bowtie2 v2.3.4 (Langmead and Salzberg, 2012) with default parameters. MarkDuplicates in the Picard toolkits version 2.0.1 was used to remove the PCR duplicates in the reads, and then, genomeCoverageBed in BEDTools 2.27.0 was employed to calculate the gene coverage. The reads per kilobase of exon model per million mapped reads (RPKM) of the gene, calculated by [gene coverage × 10<sup>6</sup> /(total mapped reads × gene length)], was used to normalize the gene abundance between the treatments.

#### Gene Function Annotation and Pathway Construction

The predicted genes were annotated to the Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology databases via the KEGG Automatic Annotation Server (KAAS) (Moriya et al., 2007), and subsequently mapped to KEGG pathways (**Supplementary Table 1**). The pathways related to SLP and ETP (for details, see Results) were constructed by using Photoshop CS 8.01 on the background of KEGG pathways. Finally, the enzymes related to SLP, ETC components (details see below), and the enzymes catalyzing the oxidation of NADH during glucose fermentation were picked out according to KEGG Orthology (KO) annotation.

#### Taxonomic Annotation and Relative Abundance Calculation

The high-quality reads of each sample were mapped back to the scaftigs by using Bowtie2. PCR duplicates were removed by means of MarkDuplicates. The coverage of scaftigs in each sample was computed by using genomeCoverageBed and normalized to the relative abundance of 1 000 000. The composition-based rank-flexible classifier Epsilon-NB in FCP v1.0.7 (Parks et al., 2011) was used to assign the taxonomy of the scaftigs, with reference to the in-house reference catalog of 12 607 bacterial and 21 537 archaeal complete genomes downloaded from the NCBI RefSeq database<sup>1</sup> . The relative abundance and taxonomic annotation of each genus are listed in **Supplementary Table 2**.

#### Metagenome Binning

Metagenome bins were recovered from the scaftigs by using CONCOCT package version 0.4.1 (Alneberg et al., 2014) with 300 as the number of the cluster. The coverage of the received bins was calculated by using ClusterMeanCov.pl in the CONCOCT package. CheckM (Parks et al., 2015) was used to estimate the completeness, contamination, strain heterogeneity and genome size of the received metagenome bins. Unfortunately, only 3/208 bins showed more than 90% completeness and less than 10% contamination, and 22/208 bins showed more than 70% completeness and less than 20% contamination in this study (**Supplementary Table 3**). Accordingly, scaftigs were used to represent the species in the present study.

#### Gene and Species Abundance Comparison

The abundance of the enzymes and species were both compared by using the Wilcoxon test in R, respectively. Differences were considered significant when p < 0.05 and | log2(G/B)| or | log2(P/B)| > 1.

#### Detection of the Major Contributors to SLP Enzymes and the Specific Fermenters in the Ruminal Microbiome

Based on the gene locations and the taxonomic annotation, the species whose genome encoded the SLP enzymes or the enzymes catalyzing the oxidation of NADH during glucose fermentation were picked out and then summarized at the genus level. In the present study, any genus whose relative abundance was more than 10 in at least one sample was defined as a major contributor to the specific SLP enzyme. The major contributor to all of the NADH oxidases of the specific fermentation pathway, as well as the specific SLP enzyme coupled to the specific fermentation pathway, was considered to be the fermenter to the corresponding product (details see **Figure 1**).

#### Detection of the Major Contributors to the ETC Components in the Ruminal Microbiome

The ETC is composed of a series of transmembrane ion pumps and F-type ATPase. By checking available publications, we have found the following enzymes that have been reported as ETC components under anaerobic conditions: FADH2-NAD oxidoreductase Rnf; membrane-bound hydrogenases Ech and Mbh; methyltransferase Mtr; NADH dehydrogenases Nuo, Ndh, and Hox; cytochrome c reductase Pet and TorC; cytochrome bd oxidase Cyd; sulfate reductase Apr; nitrite reductase NrfA; nitrate reductases Nar and NapA; fumarate reductase Sdh; and cytochrome o oxidase Cyo (Bongaerts et al., 1995; Meuer et al., 1999; Leahy et al., 2010; Fu et al., 2014; De Rosa et al., 2015; Kracke et al., 2015). Except for Ech, Mbh, and Cyo, we detected 13 kinds of enzymes and all subunits of F-type ATPase in the sequences (**Table 2**). Based on the gene locations and the taxonomic annotation, the species whose genome encoded the ETC components was picked out. Since most enzymes consist of several subunits, only those species whose genome encoded more than half of the subunits of the specific enzyme were used in the subsequent analysis. Finally, the species whose genome encoded the ETC components was summarized at the genus level. The genus whose relative abundance was more than 10 in at least one sample was defined as the major contributor to the specific ETC component in the microbiome.

#### Result Visualization

The comparisons of the relative abundances of the species were visualized by using R program ggplot2.

#### Data Submission

The metagenome sequences are submitted to the NCBI under BioProject PRJNA492173.

<sup>1</sup> ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/

enzymes are shown in Supplementary Table 1.

### RESULTS

### Comparison of Rumen Fermentation Parameters Between Groups

As shown in **Table 3**, rumen MCP, ammonia N, and total VFAs were significantly increased, whereas ruminal pH was significantly decreased when the diet shifted from diet B to G. Similarly, rumen pH was significantly decreased, whereas ammonia N and total VFAs were significantly increased when the diet shifted from B to P.

With regard to the concentrations of individual VFAs, all of them were significantly increased when the diet shifted from B to G. On the other hand, acetate was significantly increased, whereas others VFAs revealed no significant changes when the diet shifted from B to P.

### Pathways Related to Energy Metabolism in the Rumen Microbiome

In the rumen, monosaccharides are degraded into pyruvate through glycolysis, the pentose phosphate pathway, and the Entner-Doudoroff (ED) pathway. Pyruvate is subsequently fermented into formate, acetate, propionate, butyrate, lactate, and ethanol through pyruvate metabolism, propanoate metabolism, and butanoate metabolism. Accordingly, we reconstructed these pathways related to SLP-coupled energy metabolism (**Figure 1**). To date, ETP has been reported in the dissimilatory reduction of inorganic molecules (such as N and S), fumarate reduction, metals reduction, methanogenesis via the reduction of carbon dioxide (CO2), acetogenesis via the reductive acetyl-CoA pathway, and butyrate formation via the reduction of crotonyl-CoA. Therefore, we reconstructed the following pathways related to ETP-coupled energy metabolism: methane metabolism, nitrogen metabolism, sulfur metabolism, and the reductive acetyl-CoA pathway, based on the genes detected in the sequences (**Figure 2**). Here, we have named the pathways according to the pathway names in the KEGG dataset.

### Comparison of Abundance of SLP Enzymes and Their Major Contributors Between the Groups

From the above pathways, we detected 5 kinds of SLP enzymes (**Figures 1**, **2** and **Table 2**), and the abundances of these


**66**


TABLE 2 |

Continued

1 3 the electron transporters

 were reported in many kinds of ETCs.

∗p < 0.05 and | value| > 1 in the wilcox test between groups.

**67**



<sup>1</sup>Values are mean ± standard error (SE). <sup>2</sup>Total VFA = Acetate + Propionate + Butyrate. <sup>a</sup> p < 0.05 in the two-tailed T-test of B group and G group. <sup>b</sup> p < 0.05 in the two-tailed T-test of B group and P group.

enzymes were high in the gene pools of the ruminal microbiome (153–1491). The total abundance of SLP enzymes was 2762, 4173, and 2788 in B, G, and P group, respectively. According to the wilcoxon test, it showed the significant increase (p < 0.05) when the diet shifted from B to G and no significant change when the diet shifted from B to P. For the individual SLP enzymes, only the abundance of butyrate kinase (buk; EC 2.7.2.7), which catalyzes the reaction of butanoyl-P and ADP to form butanoate and ATP, was significantly increased when the diet shifted from B to G (p < 0.05).

Within the ruminal microbiome, 21 of the 68 detected archaeal genera and 258 of the 549 detected bacterial genera included the species whose genome encoded the SLP enzymes (in addition to unclassified genera). However, the changes in the relative abundance of these microorganisms did not correspond to the changes in the total abundance of the SLP enzymes (**Table 2**), which showed a slight increase when the diet shifted from B to G (0.52–0.61%), and a significant decrease when the diet shifted from B to P (0.52–0.46%, p < 0.05). Although at least 8 G of high-quality data was received for each sample, around 90% metagenome bins had less than 70% completeness in the threshold of less than 20% contamination in the present study, possibly because the diversity and abundance of the ruminal microbiome are greater than we are aware. The diet-induced changes in the relative abundance of the major contributors to the individual SLP enzymes are shown in **Supplementary Figure 1**.

#### Comparison of Abundance of Enzymes Catalyzing the Oxidation of NADH in the Glucose Fermentation and the Specific Fermenters Between the Groups

Under anaerobic conditions, NADH produced by glycolysis is oxidized back to NAD<sup>+</sup> by using pyruvate or one of its derivatives as an electron acceptor. In the present study, we analyzed the enzymes that catalyzed the oxidation of NADH during glucose fermentation between the groups. As a result, 11 enzymes belonging to 4 types of fermentation (heterolactic fermentation, homolactic fermentation, butanol fermentation, and propionate fermentation) were detected (**Table 4**). Among them, the acetaldehyde dehydrogenase (adhE; EC 1.2.1.10) was the common enzyme used in the heterolactic fermentation and butanol fermentation. In a comparison of their abundances, five alcohol dehydrogenases (EC 1.1.1.1 and EutG) and adhE were significantly increased when the diet shifted from B to G. On the other hand, the abundance of one alcohol dehydrogenase (EC 1.1.1.1) was significantly decreased when the diet shifted from B to P (**Table 2**).

Within the ruminal microbiome, other than the unclassified genera, we detected 10 butanol fermenters, 11 heterolactic fermenters, 2 homolactic fermenters, and 6 propionate fermenters in the G group. Among them, the relative abundance of 5 butanol fermenters, 7 heterolactic fermenters, two homolactic fermenters, three propionate fermenters, as well as the total abundance of all fermenters, was significantly increased, whereas the relative abundance of one butanol fermenter and one heterolactic fermenter was significantly decreased when the diet shifted from B to G (**Table 4**). On the other hand, other than the unclassified genera, 10 butanol fermenters, 8 heterolactic fermenters, one homolactic fermenter, and 6 propionate fermenters were detected in the P group. Among them, the relative abundance of four butanol fermenters, four heterolactic fermenters, one homolactic fermenter, and one propionate fermenters were significantly increased, whereas the relative abundance of two butanol fermenters and two propionate fermenters was significantly decreased when the diet shifted from B to P. The total abundance of all fermenters had no significant change when the diet shifted from B to P.

#### Comparison of Abundance of ETC Components and Their Major Contributors Between the Groups

As shown in **Table 2**, the mean abundances of subunits of Rnf, Nuo, NrfA, and Sdh were significantly increased, whereas the mean abundance of subunits of Nar was significantly decreased when the diet shifted from B to G. On the other hand, only the mean abundance of NapA subunits showed a significant increase, and the others exhibited no significantly changes when the diet shifted from B to P. Being the final step of ETC, F-type ATPase is known to be the most widely used ATP synthase in both prokaryotes and eukaryotes. In the present study, all subunits of F0-F1 ATPase were detected in the groups. The mean abundance of its subunits was 354, 769, and 383 in the B, G, and P group, respectively. Furthermore, it was significantly increased when the diet shifted from B to G, whereas no significant changes were seen when the diet shifted from B to P.

Within the ruminal microbiome, other than the unclassified genera, 17 major contributors to Rnf, 13 major contributors to Cyd, 20 major contributors to Sdh, 7 major contributors to Nuo, 7 major contributors to NrfA, and 11 major contributors to ATPase were detected in the G group (**Supplementary Table 4** and **Figure 3**). Among them, the abundance of 5/5/9/6/5/3 major contributors to Rnf/Cyd/Sdh/Nuo/NrfA/ATPase was significantly increased, whereas the abundance of 8/5/6/2 major contributors to Rnf/Cyd/Sdh/ATPase was significantly decreased when the diet shifted from B to G. On the other hand, other than the unclassified genera, 16 major

contributors to Rnf, 10 major contributors to Cyd, 22 major contributors to Sdh, 5 major contributors to Nuo, 2 major contributors to NrfA, and 12 major contributors to ATPase were detected in the P group (**Supplementary Table 4** and **Figure 3**). Among them, the abundance of 1/4/12/1/5 major contributor(s) of Rnf/Cyd/Sdh/Nuo/ATPase was significantly increased, whereas the abundance of 3/2/2/2 major contributor(s) of Rnf/Cyd/Sdh/ATPase was significantly decreased when the diet shifted from B to P.

#### DISCUSSION

## Effects of Dietary NFC and Protein on Rumen MCP Synthesis

In the present study, the significant increase of the rumen ammonia N was associated with the significant increase of MCP yields when the diet shifted from B to G. This is in accordance with the study of Kljak et al. (2017) and


<sup>1</sup> mean value ± standard error; the relative abundance was normalized to 1,000,000 for the sample. <sup>2</sup> the numbers were calculated from log2(G/B) or log2(P/B). <sup>∗</sup>p < 0.05 and |value| > 1 in the wilcox test between groups.

indicates that the G diet provides sufficient N and energy to the ruminal microbiome and therefore, has advantages for MCP synthesis compared with the B diet. On the other hand, rumen ammonia N was significantly increased, whereas ruminal MCP yields exhibited no significant change when the diet shifted from B to P. This result suggests that the P diet does not meet the energy requirement of ruminal microorganisms and therefore has no advantages for rumen MCP synthesis compared with the B diet. Secondly, total VFAs was significantly increased in both of G and P group compared with the B group, indicating the increase of rumen fermentation in these groups. However, the fermentation substrates that induced the increase of rumen VFAs were different in these groups. G diet majorly induced the increase of glucose fermentation, whereas P diet majorly induced the increase of protein fermentation.

FIGURE 3 | Comparisons of the relative abundance of major contributors to F-types ATPase between the groups. The relative abundance was normalized to 1,000,000 for each sample. "<sup>∗</sup> " indicates the significant change between the G and B groups. "#" indicates the significant change between the P and B groups.


TABLE 5 | Detected types of electron transporters in ETP-related genera.

Cyd, cytochrome bd oxidase; Mtr, methyltransferase; Ndh and Nuo, NADH dehydrogenases; NrfA, nitrite reductase; Rnf, FADH2-NAD oxidoreductase; Sdh, fumarate reductase/succinate oxidase; ATPase, F-type ATPase.

### Effects of Dietary NFC and Protein on the Levels of SLP-Coupled Energy Productivity Within the Ruminal Microbiome

Our metagenomics analysis showed that (1) the total abundance of SLP enzymes was significantly increased when the diet shifted from B to G, whereas it was no significant change when the diet shifted from B to P; (2) 21 of 68 archaeal genera and 258 of 549 bacterial genera were related to the SLP within the ruminal microbiome; and (3) the relative abundance of SLP-related species was no significant change when the diet shifted from B to G, whereas it showed the significant decrease when the diet shifted from B to P. By considering the low completeness of the detected genomes, these results suggest that the NFC-rich diet provides the significant benefits with regard to the amount of SLP-related microorganisms/gene pool, whereas the protein-rich diet has no significant impact on the amount of SLP-related microorganisms/gene pool.

In the rumen, the conversion of glucose to two pyruvate molecules during glycolysis gives a net gain of two ATPs and two NADHs. Subsequently, the NADHs are oxidized to NAD<sup>+</sup> by using pyruvate or one of its derivatives as an electron acceptor. This process, referred to as fermentation, may lead to the production of more ATPs, since the SLP is coupled with the oxidation of NADH in many kinds of fermentation. In the present study, four kinds of fermentation were detected within the rumen (**Table 4**). Among them, butanol fermentation,

heterolactate fermentation, and propionate fermentation give rise to the additional ATPs (**Figure 1**). According to our analysis, the total abundances of NADH oxidases located on above three fermentation pathways were significantly increased when the diet shifted from B to G (1437–3054) but were no significantly changes when the diet shifted from B to P (1437–1388). This result suggests that the level of SLP-coupled energy productivity is promoted by the NFC-rich diet but not affected by the protein-rich diet in a comparison with the basal diet. Besides, a recent study has shown that an increase of intracellular NAD<sup>+</sup> will promote the transfer of electrons along the ETC and thus promote the generation of ATPs in metal respiring microorganisms (Li et al., 2018). In present study, the total abundances of NADH oxidases in all fermentation pathways showed the same trends during the shift of diets, suggesting the amount of NAD<sup>+</sup> and its cycling rate is promoted by the dietary NFC but not affected by the dietary protein. Accordingly, we speculate that the energy productivity of ruminal microbiome might be promoted by dietary NFC via its effects on the levels of SLP coupled to the glucose fermentation and the amount of intracellular NAD+.

#### Effects of Dietary NFC and Protein on the Levels of ETP-Related Genes and Species Within the Ruminal Microbiome

According to our analysis, first, Ech, and cytochrome b0 were not present among the ruminal microorganisms. Second, the abundance of cytochrome c reductase (Pet and torC) and sulfate reductase Apr was less than 1 in all samples, and the species whose genomes encoded it were not detected in this study. We therefore speculated that the species whose genomes encoded cytochrome c reductase and Apr could not be present in the ruminal microbiome. Third, although the methyltransferase Met had low abundance (around 2 in all samples), it was only encoded on the genomes of archaeal methanogens that accounted for 1.2–1.4% of the ruminal microbiome. The abundances of NADH dehydrogenases Ndh and Hox were more than 10 in at least one group. However, the species whose genome encoded them were not detected. Since the completeness of bins was inadequate here, we therefore speculated that the species whose genomes encoded Met, Ndh and Hox existed in the rumen, but that their relative abundances were low. Fourth, the abundance of fumarate reductase Sdh, FADH2-NAD oxidoreductase Rnf, NADH dehydrogenase Nuo, nitrite reductase NrfA, and cytochrome bd complex cydA and cydB was high indicating their high abundance in the ruminal microbiome.

The microbial ETCs are diverse, and typically end up with a F-type ATPase. In present study, we evaluated the roles of ETP on the energy productivity of the ruminal microbiome by using the results from ATPase in present study, because the low completeness of received species made the detection of the integrity ETC in the specific species impossible. According to our analysis, the total mean abundance of ATPase subunits was significantly increased when the diet shifted from B to G, whereas no significant changes were seen when the diet shifted from B to P. This result indicated the ETP-coupled energy productivity was promoted by the NFC-rich diet but not affected by the protein-rich diet in a comparison with the basal diet.

The comparison on the gene abundance of the SLP enzymes and the ATPase within the groups may imply that the ETP-based ATP productivity accounts for 13–18% of SLP-based ATP productivity in the rumen microbiome. At the species level, 15 kinds of bacteria were considered to have an ETP-type energy producing mechanism within the ruminal microbiome. The detected ETC components of 15 ETP-related species are listed in **Table 5**. According to previous studies, Arthrobacter, Geobacter, and Paenibacillus are metal-respiring bacteria (Bond and Lovley, 2003;Van Driessche et al., 2005). Our analysis showed that Cyd, NrfA, Nuo, Rnf, and Sdh were used as electron transporters in these bacteria. Clostridium and Eggerthella are CO2-respiring acetogenic bacteria (Harris et al., 2018). Our analysis showed that Nuo, Rnf, and Sdh were commonly used as the electron transporters in them. These results are supported by previous studies showing that cytochromes, NADH dehydrogenase, NrfA, and fumarate reductase play roles in the electron transport pathway of metal respiration. Rnf and NADH dehydrogenase have important functions in the electron transport pathway of reductive acetogenesis via the Wood–Ljungdahl pathway. Next, we showed Sdh to be the common electron transporter in the sulfate-respiring Desulfovibrio and Desulfomicrobium (Kushkevych, 2014) and in the fumarate-respiring Syntrophobacter (Plugge et al., 2012). Rnf was shown to be the common electron transporter in the nitrate-reducing bacteria Bacillus and Selenomonas (Zhao et al., 2015; Bruna et al., 2018), the nitrogen-fixing bacterium Sinorhizobium (Delamuta et al., 2015), the proteolytic bacterium Bacteroides (Macfarlane et al., 1995), lactate-fermented bacterium Bifidobacterium (Macfarlane et al., 1995), and the xenobiotic-degraded bacterium Slackia (Cho et al., 2016). This is similar to recently discovered ETCs, all of which use Rnf as their electron transporter. In the present study, most kinds of electron transporters were detected in Prevotella. The high abundance and the high species diversity of this genus in the rumen might be the reason for this result. Together, these results suggest that (1) various kinds of ETCs exist in the ruminal microorganism population, and (2) ETP-type energy-producing mechanism might also play important roles in the energy productivity of the ruminal microbiome.

## CONCLUSION

In conclusion, our data showed that, on the gene level, the mean abundance of the subunits of ATPase accounted for less than 20% of the total abundances of SLP enzymes within the ruminal microbiome. On the genus level, 279 genera were showed to be able to generate ATPs via SLP mechanism, whereas only 15 genera were showed to be able to generate ATPs via ETP mechanism within 606 detected genera. Next, an increase in dietary energy level increased rumen MCP yields, whereas an increase in dietary protein level had no significant impact on it. On the gene level, an increase in dietary energy

level increased the total abundance of SLP enzymes that were coupled to the NADH oxidization in the butanol fermentation, heterolactate fermentation and propionate fermentation, and the mean abundance of F-type ATPase that catalyzed the generation of ATPs in the ETP within the ruminal microbiome, whereas an increase in dietary protein level had no significant impact on the abundances of these enzymes. On the genus level, an increase in dietary energy level increased the total abundance of 15 ETP-related genera, and the total abundance of 40 genera that own SLP-coupled fermentation pathway, whereas an increase in dietary protein level has no significant impact on the relative abundance of these genera. In summary, the comparison on the abundance of SLP enzymes and ATPase suggested that SLP occupied more important position than ETP in the energy generation within ruminal microbiome, accounting for more than 80% of total energy productivity. Furthermore, the increase in dietary energy levels promotes rumen energy productivity and MCP yield by improving levels of ETP and SLP coupled to glucose fermentation in the ruminal microbiome. But, an increase in dietary protein level has no such effects.

#### ETHICS STATEMENT

This study was approved by the Animal Care and Use Committee of Nanjing Agricultural University, in compliance with the

#### REFERENCES


Regulations for the Administration of Affairs Concerning Experimental Animals (The State Science and Technology Commission of China, 2010).

### AUTHOR CONTRIBUTIONS

HS wrote the manuscript. HS and ZX analyzed data. ZL and ZS designed the research. ZL and YT performed the experiments. All authors approved the final manuscript.

### FUNDING

This work was supported by the National Natural Science Foundation of China (A0201800763), the Fundamental Research Funds for the Central Universities (Y0201801248), and the Science Foundation of Jiangsu Province (BK20180542).

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmicb. 2019.00847/full#supplementary-material

sp. PCC 6803 aerobic growth in the dark. Sci. Rep. 5:12424. doi: 10.1038/srep 12424


bioelectrochemical systems. Front. Microbiol. 6:575. doi: 10.3389/fmicb.2015. 00575


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Lu, Xu, Shen, Tian and Shen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Approaches to Computational Strain Design in the Multiomics Era

Peter C. St. John and Yannick J. Bomble\*

National Renewable Energy Laboratory, Golden, CO, United States

Modern omics analyses are able to effectively characterize the genetic, regulatory, and metabolic phenotypes of engineered microbes, yet designing genetic interventions to achieve a desired phenotype remains challenging. With recent developments in genetic engineering techniques, timelines associated with building and testing strain designs have been greatly reduced, allowing for the first time an efficient closed loop iteration between experiment and analysis. However, the scale and complexity associated with multi-omics datasets complicates manual biological reasoning about the mechanisms driving phenotypic changes. Computational techniques therefore form a critical part of the Design-Build-Test-Learn (DBTL) cycle in metabolic engineering. Traditional statistical approaches can reduce the dimensionality of these datasets and identify common motifs among high-performing strains. While successful in many studies, these methods do not take full advantage of known connections between genes, proteins, and metabolic networks. There is therefore a growing interest in model-aided design, in which modeling frameworks from systems biology are used to integrate experimental data and generate effective and non-intuitive design predictions. In this mini-review, we discuss recent progress and challenges in this field. In particular, we compare methods augmenting flux balance analysis with additional constraints from fluxomic, genomic, and metabolomic datasets and methods employing kinetic representations of individual metabolic reactions, and machine learning. We conclude with a discussion of potential future directions for improving strain design predictions in the omics era and remaining experimental and computational hurdles.

Keywords: constraint-based methods, kinetic metabolic models, machine learning, multiomics, strain engineering

### INTRODUCTION

The biorefinery concept involves the development of sustainable and low-impact production routes for major commodity chemicals and fuels from biomass (Bozell and Petersen, 2010). Biomanufacturing using engineered microbes is a critical component of many production pathways, and offers the opportunity for high selectivity and yield (Nielsen and Keasling, 2016). However, optimizing microbial metabolism for a given process is time intensive and costly, limiting microbial bioconversions at present to only a few commercially successful compounds (Van Dien, 2013; Chubukov et al., 2016). This difficulty is primarily due to the complex relationship between genotype and phenotype, involving regulation at the metabolic, translational, and transcriptional levels. In recent years, the procedure of strain engineering has been formalized through the

#### Edited by:

Yinjie Tang, Washington University in St. Louis, United States

#### Reviewed by:

Hector Garcia Martin, Lawrence Berkeley National Laboratory, United States Department of Energy (DOE), United States Ilya R. Akberdin, San Diego State University, United States

#### \*Correspondence:

Yannick J. Bomble Yannick.bomble@nrel.gov

#### Specialty section:

This article was submitted to Systems Microbiology, a section of the journal Frontiers in Microbiology

Received: 21 December 2018 Accepted: 08 March 2019 Published: 05 April 2019

#### Citation:

St. John PC and Bomble YJ (2019) Approaches to Computational Strain Design in the Multiomics Era. Front. Microbiol. 10:597. doi: 10.3389/fmicb.2019.00597

Design-Build-Test-Learn (DBTL) cycle, which takes advantage of recent improvements in genetic engineering and highthroughput characterization in the Build and Test stages, respectively, to efficiently screen larger libraries of strain modifications (Liu et al., 2015). The Learn and Design stages use computational techniques to interpret experimental results and suggest further modification targets. The Learn step is perhaps the most weakly developed step of the DBTL cycle, and can take the form of a wide range of computational techniques from statistical analysis to detailed simulations (Nielsen and Keasling, 2016). In this minireview, we discuss recent research in methodology for integrating biological data – particularly in the form of multiomics analyses – into developing new and efficient strain designs. We first review relevant experimental considerations from the Test stage and summarize the types of data available for informing strain designs. We next cover constraint based methods, kinetic simulations, and machine learning approaches, as well as recent studies that have used these methods in strain design. Lastly, we finish by discussing available software implementations and future directions for tackling the Learn step.

#### EXPERIMENTAL INPUTS

A number of recent reviews have covered the growing usefulness of omics approaches in characterizing cell physiology (Petzold et al., 2015; Nielsen, 2017; Becker and Wittmann, 2018; Yurkovich and Palsson, 2018), and therefore we only briefly cover the relevant data generated in typical strain characterization experiments. Frequently used omics data include transcriptomics, proteomics, metabolomics, and fluxomics, which measure gene expression, protein expression, metabolite concentrations, and intracellular fluxes, respectively. Transcriptomics is typically performed using next-generation sequencing methods that quantify relative differences in RNA expression within a given biological sample (Petzold et al., 2015). Relative comparisons between samples are also possible using statistical techniques (Wagner et al., 2012). Due to the similar physical nature of RNA transcripts, transcriptomics approaches are among the easiest to perform at the genomescale, but their distance from metabolic networks by several layers of regulation makes direct understanding of metabolic function using these data difficult. Proteomics is one step closer to the determination of metabolic fluxes and uses mass spectrometry to quantify protein expression through the amino acid sequences of digested peptides (Kolker et al., 2006). Similar to transcriptomics, proteomics experiments typically measure relative protein expression within a sample, although statistical and experimental methods for comparing relative protein expression between samples are possible (Petzold et al., 2015). Absolute quantification of protein expression is feasible but more difficult, with a range of accuracies depending on the method used (Arike et al., 2012). While more involved than transcriptomics due to protein's 3D structure and lack of amplification techniques, proteomic analyses are still able to survey a similar fraction of the protein-coding genome (Haider and Pal, 2013). Metabolomics poses an even greater challenge, as the high turnover of metabolites requires fast quenching and processing of samples (Petzold et al., 2015). As a result, the scope of metabolomic analyses are typically restricted to a smaller fraction of the organism's metabolism. Similar to transcriptomics and proteomics, metabolite concentrations are typically measured as relative quantities in high-throughput exploratory experiments (Lei et al., 2011). Absolute metabolite quantifications are possible in targeted metabolomic studies using external or isotope-labeled standards. Lastly, fluxomics is concerned with accurately measuring internal fluxes of key metabolic reactions directly using isotopic labeling. While an excellent indicator of metabolic state, fluxomics is performed with less frequency than the previously discussed methods due to its experimental difficulty (Blank, 2016). In addition to careful cell culture and sample processing, fluxomics requires an accurate mathematical model that tracks atom transitions during metabolic reactions (Wiechert, 2001). This mathematical model is used in conjunction with 13C isotope labeling patterns to infer fluxes through each reaction, and as a result, inferred fluxes have typically been restricted to the main reactions in central carbon metabolism. However, extensions of MFA to include genome-scale flux analysis have been proposed (Gopalakrishnan and Maranas, 2015). Some genome-scale MFA methods leverage metabolism's bow-tie structure to constrain fluxes through peripheral pathways with a high degree of confidence (García Martín et al., 2015; Ando and Garcia Martin, 2018).

Even with access to direct measurements of activity for a wide range cellular machinery components, using these data to enhance metabolic flux for a desired pathway remains challenging. We next discuss Learn techniques that synthesize these vast data sources together with generalized knowledge of biological function.

### LEARN METHODOLOGY

The goal of the Learn and Design steps is to use the characterization of previously engineered strains to develop improved strain designs. In its most basic form, this step can be accomplished by examining biological features (i.e., differentially expressed genes) correlated with improved strain performance, and overexpressing those likely involved in the pathway of interest (Yoshikawa et al., 2012). Designs based on rational consideration of omics data have proven successful (Guan et al., 2017), validating the human-in-the-loop approach. However, model driven designs will likely be critical to speeding up the DBTL cycle and revealing non-intuitive targets (Vickers, 2016). In the next sections, we review several lines of research into model-driven interpretation of omics data. A schematic of these approaches is shown in **Figure 1**.

#### Constraint-Based Methods

Constraint-Based Reconstruction and Analysis (COBRA) methods use biological knowledge and data to place constraints on intracellular fluxes, and in recent years have expanded to consider a wide range of recent omics techniques. Here we

focus on extensions of COBRA methods that pertain to guiding strain designs from omics data, while a number of recent reviews have covered COBRA methods in greater depth (O'Brien et al., 2015; Campbell et al., 2017; Stalidzans et al., 2018). A central technique to COBRA methods is flux balance analysis (FBA), which assumes that metabolite concentrations in the cell reach a pseudo steady-state when compared to the time scales associated with substrate uptake and cell division (Orth et al., 2010). This assumption allows fluxes to be constrained by mass balance equations developed from databases of biochemical reaction stoichiometry. Mass balance constraints alone (in the absence of 13C isotope labeling or other product data) are often not sufficient to determine a unique vector of metabolic fluxes. By assuming a cellular objective such as maximizing biomass or ATP production, unique flux vectors can be predicted. The accuracy of these predicted flux values are dependent on the objective chosen, and some objective functions have shown good correlation with experimental omics data (Lewis et al., 2010). Since such models can be simulated quickly and rely primarily on well-curated databases of metabolic reactions, many genome-scale models (GEMs) of microbial metabolism have been created (Henry et al., 2010; King et al., 2015). While useful in understanding metabolic functionality and predicting the results of gene manipulation, these assumptions are not sufficient to fully incorporate the phenotypic observations resulting from omics analyses.

Extensions to the COBRA framework have therefore been proposed to impose additional constraints from experimental observations. One of the earliest such studies used transcriptomic data to block flux through reactions where gene expression for required enzymes was not observed (Åkesson et al., 2004). This method considered gene product expression through boolean logic, however, more recent studies have explicitly included gene product expression in the constraint-based framework (Becker and Palsson, 2008; Shlomi et al., 2008). Metabolism and gene-expression models (ME-models) explicitly model reactions involved in transcription and translations to build a quantitative model of enzyme production and usage (Lerman et al., 2012). These models therefore allow direct comparison of model predictions with transcriptomic and proteomic data (O'Brien et al., 2014, 2015). In a similar method, genome-scale models with protein structures (GEM-PROs) include structural information about each enzymatically catalyzed reaction (Chang et al., 2013). Such models allow the explicit simulation of the proteome fraction devoted to different cellular activities (Basan et al., 2015), and therefore might also be used to add additional constrains from proteomic analyses. The GECKO method combines literature knowledge of enzyme kinetics with proteomics data to constrain metabolic fluxes (Sánchez et al., 2017). However, while many enzymes have been kinetically characterized for well-studied species, these data are typically not available for non-model microbes (Nilsson et al., 2017).

Metabolomics data are typically incorporated in constraintbased models through the explicit consideration of reaction thermodynamics. If absolute metabolite concentrations are available, thermodynamic metabolic flux analysis can provide more condition-specific information on irreversible reactions (Henry et al., 2007). These principles have been successfully applied to select the most promising pathways for the synthesis of a variety of products (Averesch et al., 2017). Further extensions to the COBRA framework will likely include even more cellular functionality. Toward this goal, whole-cell models that integrate gene expression, protein production, and cell cycle have been constructed (Karr et al., 2012).

Constraint-Based Reconstruction and Analysis methods therefore represent an extensible and computationally efficient framework for connecting omics data of different types and have been used to successfully interpret omics data and improve strain designs in a number of studies (Wisselink et al., 2010; Brunk et al., 2016). An advantage of COBRA methods is their limited number of parameters that must be fit from experimental data, and therefore they are often able to suggest strain designs without substantial experimental support. In particular, these methods are especially efficient in determining metabolic changes that couple product production to cell growth (Long et al., 2015). The accuracy of constraint-based models in predicting de novo experimental results has not been rigorously evaluated and would serve a useful study in measuring the progress in our understanding of cellular behavior. However, even modest success rates from predictive tools are useful in guiding experimental efforts where the search space is vast. A limitation of constrained-based methods is that they are often less suitable for suggesting improvements to fine-tune the enzyme expression of an existing pathway. Such a task typically requires a kinetic description of the reactions in question, which we discuss in the next section.

### Kinetic Metabolic Models

fmicb-10-00597 April 5, 2019 Time: 17:17 # 4

The goal of kinetic metabolic models is to capture the dynamic behavior of individual enzymes and integrate these expressions into the behavior of the full metabolic network. These models allow the direct prediction steady-state flux distributions as a function of enzyme expression, which typically serve as the most reliable experimental data for validation. However, models that explicitly incorporate enzyme kinetics (if parameterized correctly) are capable of predicting finer details of pathway dynamics, including the effect of slight changes in enzyme activity on metabolic flux. In constraint-based models, metabolite pools are assumed to be in a pseudo steady-state, and thus the rate rules governing flux through each reaction can be ignored. While the steady-state assumption may be justified, the specific steady-state reached inside the cell is determined, among a multitude of factors, both by external metabolite conditions as well as the kinetics and expression levels of metabolic enzymes. Kinetic modeling frameworks therefore seek to estimate these reaction rate rules from observed metabolic phenotypes to predict how enzyme perturbation will affect steady-state concentrations and fluxes.

Small-scale kinetic models of core carbon metabolism can leverage enzyme kinetics in vitro and time-course metabolite concentration measurements in fitting parameter values (Chassagnole et al., 2002). However, transient cellular responses are difficult to measure at the genome-scale, and direct enzyme kinetic measurements are sparser for peripheral pathways. Large-scale dynamic metabolic simulations are therefore largely based on steady-state flux and concentration data (Vasilakou et al., 2016). Because of these limited data, quantifying parameter uncertainty is therefore a critical challenge in largescale kinetic models (Tummler and Klipp, 2018). Metabolic ensemble modeling addresses this challenge directly by finding distributions in parameter values that all reproduce the observed experimental data (Tran et al., 2008). This approach has been used to suggest subsequent enzymes in a linear pathway for overexpression (Contador et al., 2009), and an ensemblebased kinetic model of Escherichia coli has demonstrated superior predictive ability of steady-state flux distributions (Khodayari et al., 2014).

Smaller-scale, hand-curated kinetic models can use rate rules for individual enzymes with experimentally validated functional forms. However, traditional rate rule expressions (such as Michaelis–Menten kinetics) become difficult to construct for reactions with many participating species. Accordingly, largerscale kinetic models typically choose a generalizable framework for constructing rate rule expressions. These frameworks range in computational complexity and faithfulness to the underlying enzyme-substrate system, and we leave a detailed comparison of these approaches to a number of recently published reviews (Heijnen, 2005; Hadlich et al., 2009; Du et al., 2016; Saa and Nielsen, 2017). Software available for kinetic modeling has continued to improve, and typically allows the user to specify reaction stoichiometry and rate rules independently from the chosen simulation algorithm. Such software includes COPASI (Hoops et al., 2006), CellDesigner (Funahashi et al., 2008), and MATCONT (Dhooge et al., 2003).

Regardless of the framework chosen, a major hurdle in using kinetic models for interpretation of omics data is the computational effort required in parameter estimation. In metabolic ensemble modeling, parameters are sampled at random and retained in the final ensemble only if they match all the considered experimental data (Tran et al., 2008). As a result, as more data is added or the model expanded, the computational costs increase substantially. Methods for improving the computational speed of the approach have been developed (Greene et al., 2017), but calculating steady states of the dynamic model remains a computational bottleneck. Ensemblebased inference approaches are therefore typically applied to smaller, core-carbon metabolic networks (Khodayari et al., 2014). A recent genome-scale kinetic modeling study optimized only a single parameter set due to the added cost of ensemblebased parameter estimation (Khodayari and Maranas, 2016). However, this single parameter set demonstrated a superior ability to reproduce a wide range of experimental observations compared with constraint-based methods (Khodayari and Maranas, 2016). The ensemble modeling sampling approach has been recently formalized as a form of Bayesian inference (Saa and Nielsen, 2016), demonstrating that detailed posterior distributions in parameter estimates and model predictions could be found. Kinetic models therefore offer a promising future direction for incorporating vast quantities of omics data in metabolic reconstructions if computational bottlenecks can be circumvented (St. John et al., 2018). While difficult to fit, the added parameters from kinetic representations give these models more expressive power in fitting experimental data.

A factor complicating the analysis of experimental data with kinetic models is the stochasticity introduced by low cell volumes and small copy numbers of several key enzymes (Levine and Hwa, 2007; Kiviet et al., 2014). Cell to cell heterogeneity therefore imposes unique challenges in understanding microbial kinetics that might be resolved through the use of explicit stochastic simulation algorithms (Gillespie, 1977) as implemented in a variety of software packages (Hoops et al., 2006; Sanft et al., 2011; Abel et al., 2016) In the subsequent section we discuss machine learning approaches that add even more parameters to be fit, but may prove useful as high-throughput strain construction and characterization techniques improve.

#### Machine Learning

Machine learning methods for interpreting omics data have taken a wide range of forms, largely due to the many varied biological questions that can be asked. In this section, we focus on methods that predict future targets for strain engineering. Integrative omics analyses attempt to draw connects between disparate omics data sources, either with or without prior biological knowledge (Berger et al., 2013; Bersanelli et al., 2016). These methods have been used to predict key regulatory genes correlated with metabolic productivity (Larsen et al., 2018), and inferred regulatory networks have also been incorporated into FBA models (Chandrasekaran and Price, 2010). Other studies have used machine learning to understand and predict metabolic performance from hyperparameters associated with cell growth. Wu et al. (2016) explored methods for machine learning in

meta-analysis to predict likely pathway success as a function of the complexity of the engineered pathway and other factors. In Kim et al. (2016), machine learning methods are used both for data reconciliation between omics sources, as well as to directly map the genotype-phenotype relationship. Another interesting study used machine learning methods as a replacement for the traditional rate equation frameworks discussed in the previous section (Costello and Martin, 2018). In that study, rate equations were learned directly from time-series metabolomics and were successful in predicting medium-producing strains given high and low-producing varieties. Costello and Martin (2018) also quantified the amount of data required for accurate rate determination at approximately 10 strains. Given the rapid advancement of machine learning methods and biological data collection, these approaches may offer flexible and efficient ways of directly incorporate biological data in new strain designs.

#### DISCUSSION

Since Learn lags behind the rest of the DBTL methodology in the development of validated and standardized techniques, feasible computational techniques are still being explored and improved upon. As a result, software libraries for performing the analyses described in this minireview are relatively scarce. As the most mature method of the three, COBRA methods have relatively strong software support in both the MATLAB (Heirendt et al., 2017) and Python (Ebrahim et al., 2013) ecosystems. Dependent packages have also been created for a number of the COBRA extensions for integrating or predicting omics-level data. Kinetic models, alternatively, have relatively poor support in the software landscape. This is likely due to the multitude of kinetic frameworks available as well as their slow (but parallelizeable) convergence, requiring hardware-dependent simulation strategies. For machine learning, several actively developed packages are available that implement common approaches. Scikit-Learn for Python implements a variety of machine learning strategies under a consistent API (Pedregosa et al., 2011).

#### REFERENCES


Deep learning frameworks such as Tensorflow or PyTorch simplify the process of constructing deep neural networks and training them on specialized hardware. Compared to the availability of general-purpose machine learning, omics-specific machine learning analyses have substantially fewer libraries under active development. However, creating and distributing standardized Learn work flows will be critical to enabling the reproducible analyses required of the iterative DBTL cycle. Such standardized approaches will necessarily require the development and maintenance of software and best practices in the metabolic modeling community.

#### AUTHOR CONTRIBUTIONS

PSJ and YB contributed to the conception and writing of the manuscript. PSJ created the figure. YB supervised the research.

#### FUNDING

We thank the U.S. Department of Energy Bioenergy Technologies Office for funding under Contract DE-AC36–08GO28308 with the National Renewable Energy Laboratory. This work was authored by Alliance for Sustainable Energy, LLC, the Manager and Operator of the National Renewable Energy Laboratory for the United States Department of Energy (DOE) under Contract No. DE-AC36-08GO28308. The views expressed in the article do not necessarily represent the views of the DOE or the U.S. Government.

#### ACKNOWLEDGMENTS

The U.S. Government retains and the publisher, by accepting the article for publication, acknowledges that the U.S. Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this work, or allow others to do so, for U.S. Government purposes.



of genome-scale metabolic models. Nat. Biotechnol. 28, 977–982. doi: 10.1038/ nbt.1672



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 St. John and Bomble. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fmicb-10-00105 January 30, 2019 Time: 17:59 # 1

# <sup>13</sup>C-Metabolic Flux Analysis Reveals the Metabolic Flux Redistribution for Enhanced Production of Poly-γ-Glutamic Acid in dlt Over-Expressed Bacillus licheniformis

Penghui He<sup>1</sup> , Ni Wan<sup>2</sup> , Dongbo Cai<sup>1</sup> , Shiying Hu<sup>1</sup> , Yaozhong Chen<sup>1</sup> , Shunyi Li<sup>1</sup> and Shouwen Chen1,3 \*

<sup>1</sup> State Key Laboratory of Biocatalysis and Enzyme Engineering, Environmental Microbial Technology Center of Hubei Province, College of Life Sciences, Hubei University, Wuhan, China, <sup>2</sup> Mechanical Engineering and Materials Science, Washington University, St. Louis, MO, United States, <sup>3</sup> State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China

#### Edited by:

Wei Xiong, National Renewable Energy Laboratory (DOE), United States

#### Reviewed by:

Deng Liu, Washington University in St. Louis, United States Joshua Chan, Colorado State University, United States

> \*Correspondence: Shouwen Chen mel212@126.com

#### Specialty section:

This article was submitted to Microbial Physiology and Metabolism, a section of the journal Frontiers in Microbiology

> Received: 30 October 2018 Accepted: 17 January 2019 Published: 01 February 2019

#### Citation:

He P, Wan N, Cai D, Hu S, Chen Y, Li S and Chen S (2019) <sup>13</sup>C-Metabolic Flux Analysis Reveals the Metabolic Flux Redistribution for Enhanced Production of Poly-γ -Glutamic Acid in dlt Over-Expressed Bacillus licheniformis. Front. Microbiol. 10:105. doi: 10.3389/fmicb.2019.00105 Poly-γ-glutamic acid (γ-PGA) is an anionic polymer with various applications. Teichoic acid (TA) is a special component of cell wall in gram-positive bacteria, and its D-alanylation modification can change the net negative charge of cell surface, autolysin activity and cationic binding efficiency, and might further affect metabolic production. In this research, four genes (dltA, dltB, dltC, and dltD) of dlt operon were, respectively, deleted and overexpressed in the γ-PGA producing strain Bacillus licheniformis WX-02. Our results implied that overexpression of these genes could all significantly increase γ-PGA synthetic capabilities, among these strains, the dltB overexpression strain WX-02/pHY-dltB owned the highest γ-PGA yield (2.54 g/L), which was 93.42% higher than that of the control strain WX-02/pHY300 (1.31 g/L). While, the gene deletion strains produced lower γ-PGA titers. Furthermore, <sup>13</sup>C-Metabolic flux analysis was conducted to investigate the influence of dltB overexpression on metabolic flux redistribution during γ-PGA synthesis. The simulation data demonstrated that fluxes of pentose phosphate pathway and tricarboxylic acid cycle in WX-02/pHY-dltB were 36.41 and 19.18 mmol/g DCW/h, increased by 7.82 and 38.38% compared to WX-02/pHY300 (33.77 and 13.86 mmol/g DCW/h), respectively. The synthetic capabilities of ATP and NADPH were also increased slightly. Meanwhile, the fluxes of glycolytic and by-product synthetic pathways were all reduced in WX-02/pHY-dltB. All these above phenomenons were beneficial for γ-PGA synthesis. Collectively, this study clarified that overexpression of dltB strengthened the fluxes of PPP pathway, TCA cycle and energy metabolism for γ-PGA synthesis, and provided an effective strategy for enhanced production of γ-PGA.

Keywords: Bacillus licheniformis, poly-γ-glutamic acid, <sup>13</sup>C-metabolic flux analysis, dltB, cell surface negative charge

### INTRODUCTION

fmicb-10-00105 January 30, 2019 Time: 17:59 # 2

Poly-γ-glutamic acid (γ-PGA) is an important anionic polypeptide consisting of D-glutamic acid and/or L-glutamic acid residues, which linked together via amide bonds between α-amino and γ-carboxyl (Feng et al., 2015; Sirisansaneeyakul et al., 2017). Since γ-PGA owns the excellent features of biocompatibility, biodegradability, water solubility, edibility, non-toxicity, environmentally friendliness, etc. (Cao et al., 2018), it has the wide-range applications in the areas of food, medicine, cosmetics industries, etc. (Cai et al., 2017).

Bacillus species have been proven as the efficient γ-PGA producers, and a number of metabolic engineering strategies have been developed to improve γ-PGA yield. For example, knocking out glutamate dehydrogenase genes rocG and gudB improved glutamic acid accumulation, which led to a 38% increase of γ-PGA yield in Bacillus amyloliquefaciens (Zhang et al., 2015). The synthesis of extracellular polysaccharide and lipopolysaccharide were blocked to decrease by-product yields for γ-PGA production in B. amyloliquefaciens LL3 (Feng et al., 2015). In addition, strengthening of NADPH and ATP supplies all benefited γ-PGA production (Cai et al., 2017, 2018). Recently, cell surface engineering was proven to be an effective strategy for enhancement production of metabolics. Overexpression of phosphatidylserine synthase gene pssA could enhance the cell membrane integrity and hydrophilicity, and further improved the cell tolerance and biorenewable yields (short-chain fatty acids, organic alcohols, organic acids and other aromatic compounds, etc) in E. coli (Tan et al., 2017). Elevation of membrane cardiolipin levels via overexpressing cardiolipin synthase gene clsA significantly increased hyaluronic acid titer by 204% in Bacillus subtilis (Westbrook et al., 2018). However, no research has been focused on the relationship between cell surface engineering and γ-PGA synthesis.

Teichoic acids (TAs) is an anionic polymer composed of ribitol and glycerol residues, and it was linked by phosphodiester bonds in the cell wall. The dlt operon, which consisted of four genes dltA, dltB, dltC, and dltD, owns the function of incorporating D-alanine to TAs in Bacillus (Koprivnjak et al., 2006). Previously, removal of D-alanyl esters from TAs of Staphylococcus aureus increased the amounts of Mg2<sup>+</sup> bound to cell wall, which might affect the activities of several membrane proteins or enzymes (Lambert et al., 1975). Previous researches implied that the expression of dlt operon could regulate the cell surface net negative charge, and the changes on surface charge could mediate the recruitment of signaling molecules (lignin) (Hyyrylainen et al., 2000; Heit et al., 2011). γ-PGA synthesis is regulated by two component systems ComP∼ComA and DegS∼DegU, the regulators DegQ and SwrA (Tran et al., 2000). The recruitment of ComP might be affected by the reduction of cell surface negative charge, and further activated these regulators for γ-PGA synthesis. Furthermore, overexpression of dltA triggered the electrostatic repulsion between S. aureus and daptomycin, which represented the keystone of DAP resistance (Cafiso et al., 2014). γ-PGA and TAs are all anionic polymer, and the electrostatic repulsion generated between γ-PGA and TAs was not conducive to γ-PGA secretion. Thus, reducing cell surface negative charge might be an effective strategy for enhancement production of γ-PGA.

The biochemical reactions in cells were difficult to be profoundly described with the existing techniques. While, acting as a new method, <sup>13</sup>C metabolic flux analysis (13C-MFA) is a promising tool to quantify cellular metabolism reaction rates in a network via isotopomer tracer experiments (Long et al., 2016). Moreover, <sup>13</sup>C-MFA is a powerful tool to quantify the energy generation and consumption rates in the metabolic pathways (He et al., 2014; Yao et al., 2016), and it also applied to comprehend and analyze the changes in central carbon metabolism under aerobic and anaerobic conditions. For instance, the optimal tracers [1,2-13C] glucose, [1,6-13C] glucose, [1,2-13C] xylose, and [5-13C] xylose were applied in Escherichia coli under aerobic and anaerobic conditions, and their results demonstrated that the fluxes of EMP pathway and TCA cycle were all increased under anaerobic conditions, which generated more energy, formate, alcohol and succinate to adapt the adverse environments (Gonzalez et al., 2017).

In this study, the genes dltA, dltB, dltC, and dltD of dlt operon were, respectively, deleted and overexpressed in B. licheniformis WX-02, and <sup>13</sup>C-MFA was performed to expound the effect of dltB overexpression on metabolic flux redistribution. In addition, the transcriptional level, by-products contents were also measured during γ-PGA synthesis. The aim of this study is to illustrate the relationship between dlt operon overexpression and γ-PGA synthesis by <sup>13</sup>C-MFA, and provides an efficient strategy of strain improvement for γ-PGA production.

#### MATERIALS AND METHODS

#### Strains and Culture Conditions

Strains and plasmids used in this study were provided in **Table 1**. B. licheniformis WX-02 was acted as the original strain for constructing recombinants, and E. coli DH5α was served as the host strain for plasmid construction. B. licheniformis and E. coli were grown in LB medium with responsible antibiotic (20 µg/mL kanamycin, 100 µg/mL ampicillin, or 20 µg/mL tetracycline), when required.

The γ-PGA fermentation medium contains 20 g/L glucose, 5 g/L NH4NO3, 11.35 g/L Na2HPO4, 8.15 g/L KH2PO4, 0.20 g/L MgSO4·7H2O, 0.01 g/L MnSO4·H2O, 0.0008 g/L CaCl2, and 0.0045 g/L Na2-EDTA. B. licheniformis were grown in twentyfour well plates with 2 mL working volume, and cultivated at 220 rpm and 37◦C for 14 h. For <sup>13</sup>C-MFA, glucose was replaced

**Abbreviations:** 2,3-BDO, 2,3-butanediol; 3PG, 3-phosphoglyceric acid; 6PG, 6-phosphogluconolactone; AceCoA, acetyl-CoA; AKG, α-ketoglutarate; CIT, citrate; DHAP, dihydroxyacetone phosphate; E4P, erythrose 4-phosphate; EMP pathway, glycolysis pathway; F6P, fructose 6-phosphate; FBP, fructose 1,6 bisphosphate; FUM, fumarate; G6P, glucose-6-phosphate; GAP, glyceraldehyde 3-phosphate; Glu, glutamic acid; GLX, glyoxylate; ICIT, isocitrate; MAL, malate; OAA, oxaloacetate; PEP, phosphoenolpyruvate; PPP pathway, pentose phosphate pathway; PYR, pyruvate; R5P, ribose 5-phosphate; Ru5P, ribulose 5-phosphate; S7P, sedoheptulose 7-phosphate; SUC, succinate; SucCoA, succinyl-CoA; TCA cycle, tricarboxylic acid cycle; X5P, xylulose 5-phosphate.

TABLE 1 | The strains and plasmids used in this study.

fmicb-10-00105 January 30, 2019 Time: 17:59 # 3


by [1, 2-13C] glucose (Sigma-Aldrich, CAS#138079-87-5) in γ-PGA production medium.

#### Strain Construction

The construction procedures of dltB deletion and overexpression strains were served as the examples. For dltB deletion strain, the upstream and downstream arms of gene dltB were, respectively, amplified by the primers 1dltB-F1/R1 and 1dltB-F2/R2 (**Supplementary Table S1**), based on the genomic DNA of B. licheniformis WX-02, and fused by Splicing Overlapping Extension PCR (SOE-PCR) with primers 1dltB-F1/R2. The fused fragment was inserted into the plasmid T2(2)-Ori at SacI/XbaI, colony PCR and DNA sequence confirmed that the dltB deletion vector was constructed successfully, named as T2-1dltB. Then, T2-1dltB was electro-transferred into B. licheniformis WX-02, and the positive transformant was cultivated in LB medium with 20 µg/mL kanamycin at 220 rpm and 45◦C, and sub-cultured for three times to obtain the single-crossover recombinants. The recombinants were grown in LB medium at 37◦C with six subcultures, and the kanamycin sensitive colonies were further confirmed by colony PCR and DNA sequence, and the dltB deletion strain was named WX-021dltB.

For gene overexpression strain, P43 promoter from B. subtilis 168, gene dltB and amyL terminator from B. licheniformis WX-02 were, respectively, amplified (**Supplementary Table S1**), and fused by SOE-PCR. The fused fragment was inserted into pHY300PLK at the restriction sites EcoRI/XbaI, resulting in the plasmid pHY-dltB. Then, pHY-dltB was introduced into WX-02 by electro-transformation, resulting in the dltB overexpression strain, named WX-02/pHY-dltB.

#### Analytic Methods

The cell biomass was monitored by measuring OD<sup>600</sup> using a UV-spectrophotometer-752 N (Shanghai Instrument Analysis Instrument Co., Ltd., Shanghai, China), glucose concentrations were determined via using the Enzyme Electrode Analyzer SBA-40E (Shandong Academy of Sciences, Shandong, China). The γ-PGA yield was measured according to our previously reported method (Cai et al., 2018). Briefly, the volume of 2 mL fermentation broth was mixed with 4 mL distilled water, and cells were separated by centrifugation at 12000 rpm for 6 min after adjusting the pH to 2.5∼3.0. While, three volumes of absolute ethanol were added into the supernatant after adjusting pH to 7.0, and the precipitate was dried at 80◦C to a constant weight for measuring γ-PGA yield. The net negative charge was measured by determining the cation binding rate of cell surface in LB medium, according to the previously described method (Cao et al., 2017; Chen et al., 2018).

The concentrations of by-products, acetoin, 2,3-butanediol and acetic acid, were measured by GC/MS (Gas chromatograph Trace, Thermo; Triple Quadrupole Mass Spectrometer, Thermo; column: DB-WAXMS, 30 m × 0.25 mm × 0.25 µm, Thermo; United States). Briefly, the volume of 0.5 mL fermentation broth was mixed with 1.5 mL absolute ethanol, and the supernatant was separated by centrifugation at 12000 rpm for 10 min. Equal volume of ethyl acetate was added into the supernatant for extracting acetoin, 2,3-butanediol and acetate. The parameters of GC-MS were set as follow: Gas chromatographic inlet temperature was 220◦C, the injection volume was 1 µL, carrier gas flow rate was 1 mL/min and the split ratio was 20:1. The temperature program: hold at 40◦C for 1 min, and increased to 160◦C at 6◦C/min, hold for 1.5 min, and then raised to 220◦C at 30◦C/min, hold for 2 min. Solvent delay was set as 4 min. The range of mass to charge ratio (m/z) in MS was set between 35 and 500.

The gene transcriptional level were measured with TRIzol <sup>R</sup> Reagent (Invitrogen, United States) and PrimeScriptTM II 1st Strand cDNA Synthesis Kit (TaKaRa, Japan) according to our previously reported method (Cai et al., 2017). The primers used for RT-qPCR were listed in **Supplementary Table S2**, and the housekeeping gene 16S rRNA was used to normalize the gene expression data. The data were averaged and presented as the mean ± SD.

#### Mass Isotopomer Distribution Analysis of Amino Acids

To analyze the mass isotopomer distributions (MIDs) of amino acids, [1,2-13C] glucose was served as the sole carbon source for cell growth and γ-PGA synthesis. The cell pellets at the mid-logarithmic phase (7 h) were collected and hydrolyzed by 6 M HCl at 100◦C for 24 h. The supernatant of hydrolyzate was air-dried, and the precipitate was subsequently derivatized with N-tert-butyldimethylsilyl-N-methyltrifluoroacetamide (TBDMS). Then, the derivatization products were quantified by GC/MS (Gas chromatograph Trace, Thermo; Triple Quadrupole Mass Spectrometer, Thermo; column: TG-5MS, 30 m × 0.25 mm × 0.25 µm, Thermo; United States). The injection volume and injection split ratio were 1 µL and 1:10, respectively, and carrier gas helium was 1.2 mL/min. GC temperature program was set as follows: hold at 150◦C for 2 min, and increased to 280◦C at 3◦C/min, and increased to 300◦C at 20◦C/min, and then hold for 5 min. Solvent delay was set as 5 min, and the range of mass to charge ratio (m/z) in MS was set between 60 and 500 (You et al., 2012). The software WuFlux-Ms Tool was used to analyze and correct amino acid MS data (fragments of [M-57]+, [M-159]+, or [M-85]+,

fmicb-10-00105 January 30, 2019 Time: 17:59 # 4

fmicb-10-00105 January 30, 2019 Time: 17:59 # 5

and [f302]) (Wahl et al., 2004). Isotopomer labeling fractions (M0, M1, M2, etc.) represent fragments containing unlabeled, singly labeled, doubly labeled amino acids, etc (Wan et al., 2017).

## <sup>13</sup>C-Metabolic Flux Analysis With INCA

<sup>13</sup>C-MFA was performed by using the INCA software (Young, 2014), which is based on the elementary metabolite units (EMU) framework (Antoniewicz et al., 2007). The central metabolic network was modified according to the previous research (He et al., 2017) and KEGG pathway database. As the target product and main by-products, several steps for γ-PGA, acetoin and 2,3-butanediol syntheses, such as 2 Pyruvate → α-acetolactate + CO2, α-acetolactate → Acetoin + CO2, Acetoin+ NADH ↔ 2,3-Butanediol + NAD+, Acetoin+ NAD<sup>+</sup> → Acetyl-CoA + NADH, Acetoin + ATP → Acetoin\_ex, 2,3-Butanediol → 2,3-Butanediol\_ex, Glutamate + ATP → γ-PGA, and γ-PGA → γ-PGA\_ex, were added into the metabolic network (seeing in the **Supplementary Table S4**). The information of labeling amino acids (alanine, glycine, valine, leucine, serine, theronine, phenylalanine, aspartic acid, glutamic acid, and histidine) (**Supplementary Table S3**) were substituted into the model for calculating metabolic flux of central metabolism (Wan et al., 2017). The standard deviation of <sup>13</sup>C-enrichment was set as 0.0107 for statistical analysis (Nanchen et al., 2007), and the flux estimations were performed by using MATLAB R2014a (The Mathworks Inc.).

#### Statistical Analyses

All samples were analyzed in triplicate, and the data were presented as the mean ± SD for each sample point. Significant differences were determined by one-way analysis of variance (ANOVA). Statistical significance was defined as P < 0.05.

### RESULTS

#### Effects of Deletion and Overexpression of dlt Operon on γ-PGA Production

To evaluate the relationship between dlt operon expression and γ-PGA synthesis, the genes dltA, dltB, dltC and dltD of dlt operon were deleted and overexpressed in the γ-PGA production strain B. licheniformis WX-02, resulting in the recombinant strains WX-021dltA, WX-021dltB, WX-021dltC, WX-021dltD and WX-02/pHY-dltA, WX-02/pHY-dltB, WX-02/pHY-dltC, WX-02/pHY-dltD, respectively. Based on our results of **Figure 1**, deletions of dltA, dltB, dltC, and dltD all reduced γ-PGA yield, which decreased by 19.81, 41.55, 22.22, and 33.33% compared with that of control strain WX-02 (2.07 g/L), respectively (**Figure 1A**). And deletion of one gene has no effects on the transcriptional levels of other genes involved in dlt operon (**Supplementary Figure S2**). Furthermore, γ-PGA yields produced by the dlt overexpression strains were 2.13, 2.55, 2.05, and 2.25 g/L, increased by 62.21, 94.66, 56.49, and 71.76% compared to WX-02/pHY300 (1.31 g/L), respectively (**Figure 1B**). Collectively, these above results indicated that overexpression of dlt operon benefited γ-PGA synthesis.

Previously, the dlt operon was proved to play the important role on the net negative charge of cell surface. As shown in **Figure 2**, the negative charges of cell surface were all increased in the dlt deletion strains, and decreased in the dlt overexpression strains. The negative charge was reduced by 48.93% in the dltB overexpression strain WX-02/pHY-dltB.

### Fermentation Process Curves of WX-02/pHY300 and WX-02/pHY-dltB

Furthermore, the process curves of WX-02/pHY-300 and WX-02/pHY-dltB were determined, and the cell growth, glucose uptake, γ-PGA yield, by-product (acetoin, 2,3-butanediol and

fmicb-10-00105 January 30, 2019 Time: 17:59 # 6

acetic acid) concentrations were measured during γ-PGA production. As shown in **Figure 3A**, the cell biomass of WX-02/pHY-dltB were slightly lower than those of control strain WX-02/pHY-300 before 9 h, and then increased faster subsequently. Besides, the logarithmic growth period is delayed by 2 or 3 h, and there is no obvious decline phase in the dltB overexpression strain WX-02/pHY-dltB. The specific cell growth rate of WX-02/pHYdltB was 0.32 h−<sup>1</sup> , increased by 7.58% compared with that of WX-02/pHY-300 (**Table 2**). The glucose uptake rate of WX-02/pHY-dltB was 7.32 mmol/g DCW/h, similar with that of WX-02/pHY-300 (7.23 mmol/g DCW/h), and the γ-PGA synthetic rate of WX-02/pHY-dltB (0.92 mmol/g DCW/h) was increased by 41.54% compared to WX-02/pHY-300 (0.65 mmol/g DCW/h). The concentrations of by-products (acetoin, 2,3-butanediol and acetic acid) was significantly lower in WX-02/pHY-dltB than that of WX-02/pHY-300 (**Figure 3B**). Moreover, the formation rates of acetic acid and 2,3-butanediol showed no significant difference between these strains, while the acetoin formation rate was decreased by 50.00% in WX-02/pHY-dltB (0.56 mmol/g

FIGURE 3 | Fermentation process curves of WX-02/pHY300 and WX-02/pHY-dltB. (A) Biomass, glucose concentration, and γ-PGA yield. (B) Acetic acid, acetion, and 2,3-butanediol yields. The solid black line indicates the control strain WX-02/pHY300, and the red solid line represents the recombinant strain WX-02/pHY-dltB. Triangle, glucose; pentagram, biomass; diamond, γ-PGA; circle, acetic acid; inverted triangle, 2,3-butanediol; fork, acetoin.

DCW/h) (**Table 2**). Assuming that there was no other significant by-products and CO<sup>2</sup> produced, and the carbon recoveries of WX-02/pHY300 and WX-02/pHY-dltB were roughly consistent for each strain (**Table 2**).

### <sup>13</sup>C-Metabolic Flux Analysis

The metabolic flux distributions were determined based on the measured mass isotopomer distributions of proteinogenic amino acids and synthetic rates of γ-PGA, acetic acid, acetoin and 2,3 butanediol. The measured mass isotopomer distributions (MIDs) of proteinogenic amino acids coincided well with the simulated MIDs (**Supplementary Table S3**), indicating the good fit and high flux precision between the measured and simulated data (Yao et al., 2016). The measured biomass formation rates were not employed as the constraints of <sup>13</sup>C-MFA model (**Table 2**). The flux values and exchange coefficients were estimated with the <sup>13</sup>C MFA model listed in **Supplementary Table S4**.

As shown in **Figure 4**, the fluxes in the central metabolism of B. licheniformis were redistributed in WX-02/pHY-dltB. Firstly, the flux from pyruvate to acetyl-CoA was increased by 11.12%, and the flux distributed into tricarboxylic acid cycle was enhanced by 38.38% in WX-02/pHY-dltB. The flux for acetic acid production was increased slightly, whereas, the flux of overflow metabolism (from pyruvate to acetoin and 2,3-butanediol) was decreased by 26.93%. Secondly, the flux of PP pathway was increased by 7.82% in the dltB overexpression strain, which might generate more NADPH for γ-PGA synthesis. Thirdly, the flux through pyruvate carboxylation (PYR + CO<sup>2</sup> → OAA), the major anaplerotic flux into TCA cycle, was increased by 25.40% in WX-02/pHY-dltB. The increase of pyruvate carboxylation could enable more carbon flow from glucose to TCA cycle, and further increased the demand for oxaloacetate based biomass synthesis (**Table 2**; He et al., 2014). Fourthly, the fluxes from α-ketoglutaric acid to glutamic acid and γ-PGA were increased by 19.47 and 57.92%, respectively. Fifthly, the flux of EMP pathway was decreased by 4.13%, which further reduced the NADH generation and by-products (acetoin, 2,3-butanediol and acetic acid) syntheses. Lastly, the estimated biomass biosynthesis flux was slightly higher (about 6.40%) in WX-02/pHY-dltB (**Figure 4**), which positively correlated with the results of cell biomass (**Table 1**).

The cofactors (such as NADH, NADPH, etc.) and ATP play the important roles in metabolic production (Kind et al., 2013), as well as for γ-PGA synthesis (Cai et al., 2017, 2018). Based on our metabolic flux model in B. licheniformis, the NADPH and ATP formation rates in WX-02/pHY-dltB were increased by 12.50 and 3.54%, which were beneficial for γ-PGA synthesis (**Table 3**).

### Transcriptional Level Analysis

The transcriptional levels of genes related to glucose metabolism and γ-PGA synthesis were measured in WX-02/pHY300 and WX-02/pHY-dltB (**Figure 5**). Compared to WX-02/pHY300, the transcription levels of glucose transporter genes ptsG, glcU, and glcP were reduced obviously in dltB overexpression strain. The glucose-6-phosphate dehydrogenase gene zwf in PP pathway showed higher expression levels, while the transcription level of glucose-6-phosphate isomerase gene pgi in EMP pathway

#### TABLE 2 | The fermentation characteristics of WX-02/pHY300 and WX-02/pHY-dltB.


The carbon recoveries of WX-02/pHY300 and WX-02/pHY-dltB were calculated as follows:

In control strain WX-02/pHY300:

Carbon consumption of glucose: 7.23∗6 = 43.38 mM/g/h.

Carbon consumption for biomass and product synthesis:

0.65∗5+3.48∗2+1.12∗4+1.14∗4+0.29/24.6∗1000 = 31.04 mM/g/h.

In engineered strain WX-02/pHY-dltB:

fmicb-10-00105 January 30, 2019 Time: 17:59 # 7

Carbon consumption of glucose: 7.32∗6 = 43.92 mM/g/h.

Carbon consumption for biomass and product synthesis:

0.92∗5+3.57∗2+0.56∗4+1.10∗4+0.32/24.6∗1000 = 31.39 mM/g/h.

(Biomass was calculated by C1H1.8O0.5N0.2. The calculation of carbon recovery is products formation rate <sup>∗</sup> the number of carbon).

Fluxes shown are normalized to glucose uptake rate of 100 for each strain (estimated flux ± SD). (A) WX-02/pHY300. (B) WX-02/pHY-dltB.

fmicb-10-00105 January 30, 2019 Time: 17:59 # 8


TABLE 3 | The synthesis and consumption contents of NADPH, NADH, ATP, and FADH2 of WX-02/pHY300 and WX-02/pHY-dltB (mmol/g CDW/h).

All the flux values are normalized to the glucose uptake rate of 100 mmol h−<sup>1</sup> for each strain.

was decreased by 25.01%, indicating the elevated PP pathway and depressed EMP pathway fluxes in WX-02/pHY-dltB. In addition, the transcription levels of genes citB (encoding citrate synthase) and icd (encoding isocitrate dehydrogenase) in TCA cycle were enhanced by 90.03 and 105.12%, which strengthened the metabolic flux in TCA cycle.

Furthermore, the transcription levels of glutamate dehydrogenase gene rocG, which mainly catalyzes the formation of glutamic acid from α-ketoglutaric acid in B. licheniformis WX-02 (Tian et al., 2017), was improved by 83.25% in WX-02/pHY-dltB, and γ-PGA synthetase genes pgsB, pgsC were, respectively, increased by 2.62- and 2.12-fold, which indicated that more glutamic acid was synthesized and converted to γ-PGA in WX-02/pHY-dltB. Besides, the gene pgdS (encoding γ-DL-glutamyl hydrolase) was also increased in WX-02/pHYdltB, and the improvement of pgdS could decrease the molecular weight of γ-PGA (Tian et al., 2017), positively correlated with the actual measurement results (Date not shown). Moreover, transcription factor genes comA, comP, degS, and degU, which associated with the synthesis of γ-PGA, were all increased in the dltB overexpression strain.

#### DISCUSSION

Cell wall and cell membrane are the selective permeation barrier of bacteria, and engineering of cell surface negative charge, membrane phospholipid head and fatty acid hydrophobic tail composition could affect bacterial growth and product production (Hyyrylainen et al., 2007; Ghorbal et al., 2013; Tan et al., 2017). Previous researches demonstrated that the increases of negative charge on the cell surface could improve the secretion efficiency of target protein, and the increase rate of target protein with lower isoelectric points (PI) was higher than that of protein with higher PI (Cao et al., 2017; Chen et al., 2018). While, based on our results, the increases of cell surface negative charge led to the decrease of γ-PGA yield in the dlt operon mutants.

On the one hand, overexpression of dltB slowed down the absorption and utilization rate of glucose (**Figure 3A**), and further decreased the flux of EMP pathway. Previous research implied that reduction of glycolysis flux in E. coli THRD could reduce the accumulation of acetate and increase NADPH and L-threonine generations (Xie et al., 2014), which was consistent with our results (**Figure 3A** and **Table 2**). Moreover, the alanylation of teichoic acids could modulate the negative charge of cell wall to protect secretory or cell wall-associated proteins against degradation during the post-translocational folding (Hyyrylainen et al., 2000), and enhancement of cell surface negative charge could increase the binding capacity of autolysin, and accelerated autolysis of bacteria (Steen et al., 2005). **Figure 3A** showed that the logarithmic period of WX-02/pHYdltB was extended by 2∼3 h, and the growing status of WX-021dltB was obviously badness in post-fermentation compared with WX-02 (**Supplementary Figure S1**). This results indicated that overexpression of dltB could promote the cell growth by reducing the autolysis of cells. Therefore, overexpression of dltB reduced the absorption and utilization rate of glucose, which decreased the flux of EMP pathway and synthesis capabilities of by-products (acetoin, 2,3-butanediol and acetic acid), and extended logarithmic growth phase, and all these phenomenons were beneficial for γ-PGA synthesis.

Then, <sup>13</sup>C-labeled isotope tracer and <sup>13</sup>C-MFA were applied to further evaluate the metabolic flux redistributions in the dltB overexpression strain. Based on our results, the flux of EMP pathway is weakened, which reduced the NADH generation and by-products synthesis, and our results were positively consisted with the previous research (Lee and Oh, 2016). Secondly, the flux of PPP pathway and NADPH supply were all enhanced in the dltB overexpression strain. Since the conversion of α-ketoglutarate to glutamic acid requires NADPH in B. licheniformis WX-02, enhancement of NADPH supply contributes to γ-PGA synthesis (Cai et al., 2017; Tian et al., 2017). In addition, the flux of TCA cycle was enhanced by 38.38% in the dltB overexpression strain, and the increase of α-ketoglutaric acid synthesis flux was beneficial for cell growth (Wan et al., 2017) and γ-PGA production. Besides, the flux of complement-deficient pathway (PYR → OAA) was enhanced by 25.40%, which is beneficial for oxaloacetate accumulation and cell growth (Sauer et al., 1999). Finally, the flux from α-ketoglutarate to glutamate and further generate γ-PGA were, respectively, increased by 19.47 and 57.92%, which were consistent with fermentation characters (**Figure 1**).

fmicb-10-00105 January 30, 2019 Time: 17:59 # 9

Furthermore, overexpression of dlt operon led to the decrease of cell surface negative charge (**Figure 2**), and further changed the extracellular microenvironment (Hyyrylainen et al., 2000), which might promote the recruitment of signal molecular ComP to activate two component system ComP-ComA for γ-PGA synthesis. Based on our results, the transcriptional levels of comP/comA and degU/degS were all increased in the dltB overexpression strian, which further affected the transcription levels of genes involved in γ-PGA synthesis and carbon metabolism (**Figure 5**). Moreover, the decrease of cell surface negative charge could reduce the binding of cationic (e.g., Mg2<sup>+</sup> and Ca2+, etc.) and cationic antimicrobial peptides on cell surface (Neuhaus and Baddiley, 2003), and the low levels of cationic or cationic antimicrobial peptides could activate PhoQ(R)∼PhoP system (Garcia Vescovi et al., 1996; Ren et al., 2017). In this study, the concentration of phosphate in the culture medium was too high to activate PhoQ(R)∼PhoP system (Devine, 2018) in WX-02/pHY300, however, due to the low level of Mg2<sup>+</sup> and Ca2<sup>+</sup> on cell surface of dltB overexpression strain, the PhoQ(R)∼PhoP might be activated in WX-02/pHY-dltB, which was consistent with our unpublished results that strengthening PhoQ(R)∼PhoP system benefits γ-PGA synthesis. Finally, γ-PGA is an anionic polypeptide, and the decrease of cell surface negative charge might reduce the strength of electrostatic repulsion (Cafiso et al., 2014) between TAs and γ-PGA, which further benefited γ-PGA secretion. To test this hypothesis, the negatively charged products (such as lichenysin, bacitracin, pulcherrimin, et al.) were analyzed, and our results showed that the yields of those products with negative charge were all increased in dlt overexpression strains (**Supplementary Figure S3**).

overexpression of dlt operon could increase γ-PGA production for the first time. Furthermore, <sup>13</sup>C-MFA was applied to illustrate the affect of dltB overexpression on γ-PGA synthesis. Our results showed that overexpression of dltB could reduce the negative charge of cell surface, which further reduced the absorption and utilization rate of glucose and the flux of EMP pathway. Meanwhile, the increases of TCA cycle flux, NADPH and ATP supplies, glutamic acid formation benefited γ-PGA synthesis. This work demonstrated that <sup>13</sup>C-MFA provided the distinct metabolic insights in engineered microbes for the changes in central carbon metabolism.

#### AUTHOR CONTRIBUTIONS

PH and SC designed the study. PH, SH, and YC carried out the molecular biology studies and construction of recombinant strains. PH, DC, SH, and YC carried out the fermentation studies. PH and NW carried out the metabolic flux analysis. PH, DC, SL, and SC analyzed the data and wrote the manuscript. All authors read and approved the final manuscript.

#### FUNDING

This work was supported by the National Program on Key Basic Research Project (973 Program, No. 2015CB150505), the Technical Innovation Special Fund of Hubei Province (2018ACA149), and the China Postdoctoral Science Foundation (2018M642802).

#### ACKNOWLEDGMENTS

Cell surface engineering is a promising tactic for enhanced production of metabolites. This study demonstrated that We thank to Lian He from University of Washington for his help on <sup>13</sup>C-metabolic flow analysis.

CONCLUSION

#### SUPPLEMENTARY MATERIAL

fmicb-10-00105 January 30, 2019 Time: 17:59 # 10

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmicb. 2019.00105/full#supplementary-material

#### REFERENCES


All the primers sequences for strain construction and RT-qPCR were listed in **Supplementary Tables S1** and **S2**, respectively. The results of labeled amino acids were provided in **Supplementary Table S3**, and the specific data in metabolic pathway were provided in **Supplementary Table S4**.

influences two-component signal transduction in Bacillus subtilis. Microbiology 153, 2126–2136. doi: 10.1099/mic.0.2007/008680-0


fmicb-10-00105 January 30, 2019 Time: 17:59 # 11


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 He, Wan, Cai, Hu, Chen, Li and Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Tandem Mass Spectrometry for <sup>13</sup>C Metabolic Flux Analysis: Methods and Algorithms Based on EMU Framework

#### Jungik Choi and Maciek R. Antoniewicz\*

#### Edited by:

Yinjie Tang, Washington University in St. Louis, United States

#### Reviewed by:

Lifeng Peng, Victoria University of Wellington, New Zealand Chao Wu, National Renewable Energy Laboratory (DOE), United States

> \*Correspondence: Maciek R. Antoniewicz mranton@udel.edu

#### Specialty section:

This article was submitted to Microbial Physiology and Metabolism, a section of the journal Frontiers in Microbiology

> Received: 30 October 2018 Accepted: 09 January 2019 Published: 24 January 2019

#### Citation:

Choi J and Antoniewicz MR (2019) Tandem Mass Spectrometry for <sup>13</sup>C Metabolic Flux Analysis: Methods and Algorithms Based on EMU Framework. Front. Microbiol. 10:31. doi: 10.3389/fmicb.2019.00031 Department of Chemical and Biomolecular Engineering, Metabolic Engineering and Systems Biology Laboratory, University of Delaware, Newark, DE, United States

In the past two decades, <sup>13</sup>C metabolic flux analysis (13C-MFA) has matured into a powerful and widely used scientific tool in metabolic engineering and systems biology. Traditionally, metabolic fluxes have been determined from measurements of isotopic labeling by means of mass spectrometry (MS) or nuclear magnetic resonance (NMR). In recent years, tandem MS has emerged as a new analytical technique that can provide additional information for high-resolution quantification of metabolic fluxes in complex biological systems. In this paper, we present recent advances in methods and algorithms for incorporating tandem MS measurements into existing <sup>13</sup>C-MFA approaches that are based on the elementary metabolite units (EMU) framework. Specifically, efficient EMUbased algorithms are presented for simulating tandem MS data, tracing isotopic labeling in biochemical network models and for correcting tandem MS data for natural isotope abundances.

Keywords: elementary metabolite units, metabolic flux analysis, tandem mass spectrometry, stable isotope tracers, metabolism

## INTRODUCTION

<sup>13</sup>C-Metabolic flux analysis (13C-MFA) is a widely used technique in metabolic engineering and biomedical sciences for quantifying rates of metabolite interconversions inside living cells (i.e., in vivo metabolic fluxes) (Antoniewicz, 2015a,b; Ahn et al., 2016). In <sup>13</sup>C-MFA, labeling experiment are performed by introducing a <sup>13</sup>C labeled substrate (the tracer), followed by measurement of <sup>13</sup>C labeling incorporation, and calculation of metabolic fluxes using one of several available software packages for <sup>13</sup>C-MFA (Yoo et al., 2008; Quek et al., 2009; Weitzel et al., 2013; Young, 2014).

Currently, the main techniques used to measure <sup>13</sup>C labeling are mass spectrometry (MS) (Antoniewicz et al., 2007a, 2011; McConnell and Antoniewicz, 2016), nuclear magnetic resonance (NMR) spectroscopy (Masakapalli et al., 2014), and tandem MS (Antoniewicz, 2013; Okahashi et al., 2016). Previous studies have demonstrated that tandem MS measurements can provide more labeling information than MS resulting in improved performance of <sup>13</sup>C-MFA in complex biological systems (Jeffrey et al., 2002; Choi and Antoniewicz, 2011; Choi et al., 2012).

Various modeling approaches have been applied to simulate tandem MS data for applications in <sup>13</sup>C-MFA. These include manually derived algebraic equations for specific network models (Jeffrey et al., 2002), isotopomers (Choi and Antoniewicz, 2011; Chandra and Peng, 2012), and tandemers (Tepper and Shlomi, 2015). Despite the demonstrated advantages of tandem MS, the technique is still not fully embraced by the <sup>13</sup>C-MFA community, likely because the modeling approaches have not been based on the elementary metabolite units (EMU) framework (Antoniewicz et al., 2007b; Crown and Antoniewicz, 2012), which is the most widely used approach for modeling isotopic labeling in <sup>13</sup>C-MFA (Young et al., 2008). The EMU framework is at the core of all major software packages for <sup>13</sup>C-MFA (Yoo et al., 2008; Quek et al., 2009; Weitzel et al., 2013; Young, 2014). In this methods paper, we describe efficient algorithms for modeling tandem MS data that are firmly based on the EMU framework. By building upon the EMU framework, we illustrate that the presented algorithms can be easily incorporated into existing software packages to take full advantage of the additional information provided by tandem MS for high-resolution flux measurements.

## SIMULATION OF TANDEM MS DATA

#### Compact Tandem MS Matrix

First, we introduce here the new concept of the compact tandem MS matrix that will be used throughout this paper to describe tandem MS data. In later sections, it will be demonstrated that the compact tandem MS matrix is compatible with the EMU framework to allow tandem MS data to be incorporated into existing EMU algorithms for <sup>13</sup>C-MFA.

Tandem MS data was previously represented by the socalled tandem MS matrix (**Figure 1A**; Choi and Antoniewicz, 2011), where the columns correspond to m/z of the parent fragment and rows correspond to m/z of the daughter fragment. As an example, consider a metabolite A that has four carbon atoms, and assume that carbon atoms C1–C4 are present in the parent fragment, and that carbon atoms C3–C4 are present in the daughter fragment. In this case, the tandem MS matrix is a 3 × 5 matrix (rows m0–m2, columns M0– M4; **Figure 1A**). A disadvantage of this representation is that several matrix fields will have zero values by definition. For example, in the first column of the tandem MS matrix,

which corresponds to the unlabeled parent fragment (M0), the only possible daughter fragment is unlabeled (m0). A more convenient way of representing tandem MS data is by defining the compact tandem MS matrix, which is constructed by shifting rows of the tandem MS matrix to eliminate fields that are infeasible (i.e., which are zero by definition). In this example, the compact tandem MS matrix is 3 × 3 matrix (**Figure 1B**).

### Simulating Tandem MS Data Using 2D-Convolutions

Next, we describe an algorithm to simulate the compact tandem MS matrix for metabolites that are naturally labeled and for isotopic tracers that are labeled at specific carbon positions. Previously, we demonstrated that mass isotopomer distributions (MID) can be simulated efficiently using a series of convolutions (function conv in Matlab) (Antoniewicz et al., 2007b), essentially by reconstructing a metabolite atom-byatom (**Figure 1C**). Here, we demonstrate that the compact tandem MS matrix can be simulated analogously using a series of 2D-convolutions (function conv2 in Matlab), again by reconstructing a metabolite atom-by-atom. This approach can be used to simulate both naturally labeled compounds and tracers that are labeled at one or more specific carbon positions.

As an example, assume that we want to simulate the compact tandem MS matrix for metabolite A that is labeled at the second and third carbon positions, i.e., [2,3-13C]A, and assume that the parent fragment contains all four carbon atoms, and the daughter fragment contains carbon atoms 3 and 4. Simulation of the compact tandem MS matrix is achieved by a sequence of 2D-convolutions as shown in **Figure 1D**. The 2D-convolution is performed with the transpose of the atom's MID vector if the atom is present in the daughter fragment, and using the atom's MID vector if the atom is not present in the daughter fragment. The simulated compact tandem MS matrix for [2,3-13C]A is shown in **Figure 1D**. The same approach can be used to simulate compact tandem MS matrices for metabolites of any size. This approach is not limited to simulating only carbon atoms, but can also be used to include any and all atoms. This is important since in LC-MS/MS natural abundances of e.g., sulfur (4.2% M+2) and oxygen (0.2% M+2) contribute to shifts in tandem MS distributions,

and even more importantly, in GC-MS/MS analysis compounds are often derivatized (e.g., with TBDMS) which adds other atoms to the fragment that cause even more dramatic shifts in tandem MS distributions (Choi et al., 2012; Okahashi et al., 2016).

## CORRECTION OF TANDEM MS DATA

### Parent, Daughter, and Complement Fragments

In order to use tandem MS data for <sup>13</sup>C-flux calculations, the data must be corrected for natural isotope abundances. Efficient algorithms have been developed for correcting MS data for use in <sup>13</sup>C-MFA (Fernandez et al., 1996); however, existing algorithms for correcting tandem MS data tend to be more tedious (Rantanen et al., 2002; Niedenfuhr et al., 2016). Here, we describe a more convenient algorithm based on the compact tandem MS matrix formulation described above. First, we must define a new term, the "complement fragment," shown in **Figure 2A**. The complement fragment is defined as the part of the parent fragment that is not present in the daughter fragment. The daughter fragment and the complement fragment can then be further imagined to consist of core carbon atoms that originate from the metabolite that is measured, and other atoms (which may include both carbon and non-carbon atoms) from the derivatizing agent. For <sup>13</sup>C-MFA calculations, we are only interested in the labeling of the core C-atoms; thus, tandem MS data must be corrected for the skewing effects resulting from the presence of carbon atoms that are not core C-atoms.

### Correcting Tandem MS Data for Natural Abundances

The correction of tandem MS data for natural isotope abundances is accomplished using 2D-deconvolution. To illustrate this, consider the case of TBDMS-derivatized aspartate. In a previous study, we validated several parent-daughter GC-MS/MS fragments for analysis of aspartate labeling (Choi et al., 2012). One of the validated parent-daughter pairs was m/z 418 > 244 (**Figure 2A**). In this case, the parent fragment (m/z 418, C18H40O4NSi3) contains all four C-atoms of aspartate and the daughter fragment (m/z 244, C10H22O2NSi2) contains the first two C-atoms of aspartate. Following the definitions in the previous section, the daughter fragment is thus imagined to consist of two core C-atoms of aspartate (i.e., C2, first two carbon atoms), and various other atoms (C8H22O2NSi2); the complement fragment consists of two core C-atoms of aspartate (i.e., C2, last two carbon atoms), and various other atoms (C6H18O2Si). If the labeling of the core C-atoms is known, then we can predict the theoretical measured compact tandem MS matrix by 2D-convolutions shown in **Figure 2B**, where S is the natural abundance MID vector of C6H18O2Si, and Q is the natural abundance MID vector of C8H22O2NSi2. The inverse operation, or 2D-deconvolution, can therefore be used to correct the measured tandem MS data for natural isotope abundances (**Figure 2C**). It should be noted that 2D-deconvolutions of this type are widely used in many fields of science such as image processing and filtering (e.g., functions fft2 and ifft2 in Matlab).

To illustrate the application of this correction algorithm, we have applied it to correct the tandem MS data that was previously measured for [1,4-13C]aspartate. **Figure 3D** shows the measured

paper), used here to illustrate simulation of tandem MS data using the EMU approach.

TABLE 1 | Stoichiometry and atom transitions for the reactions in the example metabolic network.


<sup>∗</sup>For each metabolite atoms are identified using lower case letters to represent successive atoms in the metabolite.

tandem MS matrix for the parent-daughter fragment pair m/z 418 > 244, as reported by Choi et al. (2012) (**Figure 3A**). The measured tandem MS matrix was first transformed into the compact tandem MS matrix by row shifting (**Figure 3E**), and then corrected for natural isotope abundances using 2Ddeconvolution, which produced a 3 × 3 matrix that now reflects the labeling of only the core C-atoms of aspartate (**Figure 3F**). For comparison, we also simulated the theoretical compact tandem MS matrix for [1,4-13C]aspartate, assuming 99% isotopic purity of the tracer (**Figure 3B**), as well as the corresponding corrected compact tandem MS matrix (**Figure 3C**). Overall, there was very good agreement between the measured and simulated matrices. This example demonstrates that correction for natural isotope abundances can be accomplished easily in a single step by 2Ddeconvolution using the compact tandem MS matrices.

## <sup>13</sup>C-METABOLIC FLUX ANALYSIS WITH TANDEM MS DATA

### EMU Framework for Simulating Tandem MS

Lastly, we demonstrate that tandem MS data can be used for <sup>13</sup>C-MFA calculations using the widely used EMU framework. As

matrices of the EMU educts.



described in the original EMU paper (Antoniewicz et al., 2007b), there are three types of reactions that must be considered when constructing EMU models: a condensation reaction, a cleavage reaction, and a unimolecular reaction. **Figure 4** shows that for


FIGURE 7 | EMU balances for simulation of tandem MS data in the simple example network model (shown in Figure 5). The EMU balances were constructed based on the EMU model decomposition shown in Table 2.


network to simulate tandem MS data for metabolite F (parent fragment C1–C3, daughter fragment C2–C3). For this simulation, the fluxes shown in Figure 5 were used and metabolite A was assumed to be 100% <sup>13</sup>C-labeled at the second carbon position.

all three types of reactions the EMU product can be computed from the corresponding EMU educts. To simulate tandem MS data, compact tandem MS matrices can be used as state variables (note that MID vectors were used as state variables for simulating MS data in the original EMU paper). For an EMU condensation reaction, the compact tandem MS matrix of the EMU product is computed by a 2D-convolution as described above (**Figure 4A**). For the cleavage reaction and the unimolecular reaction, the compact tandem MS matrix of the EMU product is equal to the compact tandem MS matrix of the EMU educt.

#### EMU Decomposition of an Example Metabolic Network Model

At the core of the EMU methodology is the decomposition of the biochemical reaction network into EMU networks, which are then solved subsequently. The EMU decomposition is performed by tracing the origin of carbon atoms of a particular metabolite to carbon atoms of substrates. For more details, the reader is referred to the original EMU paper (Antoniewicz et al., 2007b). In the original EMU framework, EMU decomposition was accomplished by keeping track of C-atoms for the parent fragment only. To generate EMU networks for simulation of tandem MS data, we must also keep track of C-atoms of the daughter fragment. To illustrate this, we will use a simple example network model shown in **Figure 5** (that was also used in the original EMU paper), with the corresponding atoms transitions shown in **Table 1**. The network model was decomposed here in order to simulate tandem MS data for the parent-daughter fragment pair F<sup>123</sup> > F23. The complete EMU decomposition is shown in **Table 2**. It is noted that the approach described above is conceptually the same as the tandemers approach described by Tepper and Shlomi (2015).

#### <sup>13</sup>C-MFA With Tandem MS Data and the EMU Framework

To determine fluxes with <sup>13</sup>C-MFA, isotopic labeling must be simulated by solving the EMU network models, which are represented mathematically in the form (see the original EMU paper for more details) (Antoniewicz et al., 2007b):

A <sup>∗</sup> X = B <sup>∗</sup> Y

Here, A and B are matrices that are functions of fluxes, and X and Y are matrices that contain the unknown and known EMU variables, respectively. In the original EMU framework, each row in the X and Y matrices contained MID vectors of the respective EMU variables. For simulations of tandem MS data, X and Y are now 3-dimensional matrices that contain the compact tandem MS matrices of the respective EMU variables (**Figure 6A**). For convenience, these 3D matrices can be transformed into 2D

#### REFERENCES

Ahn, W. S., Crown, S. B., and Antoniewicz, M. R. (2016). Evidence for transketolase-like TKTL1 flux in CHO cells based on parallel labeling experiments and (13)C-metabolic flux analysis. Metab. Eng. 37, 72–78. doi: 10.1016/j.ymben.2016.05.005

matrices as shown in **Figure 6B**. By performing this 3D-to-2D transformation, the same EMU algorithms can be used to simulate tandem MS data that are used currently to simulate MS data. Thus, this allows current software packages to be upgraded easily to accommodate tandem MS measurements for flux calculations.

To illustrate the simulation of tandem MS data using the EMU framework and compact tandem MS matrices, **Figure 7** shows the EMU balances for the simple example model, and **Figure 8** shows the numerical simulation results. Here, we used the fluxes shown in **Figure 5** with metabolite A being 100% <sup>13</sup>C-labeled at the second carbon position. This simple example clearly illustrates that tandem MS data can be efficiently simulated using the existing EMU framework without any major modifications. Recent work has demonstrated that EMU decompositions of large-scale models are computationally tractable (Gopalakrishnan and Maranas, 2015). Thus, the methods and algorithms presented in this paper can be applied to realistically sized network models.

#### CONCLUDING REMARKS

Tandem mass spectrometry is a promising new analytical approach that provides additional labeling information for <sup>13</sup>C-flux studies. Previously, we demonstrated that this additional labeling information can significantly improve flux precision and resolution in complex biological systems (Choi and Antoniewicz, 2011). In this paper, we have presented a set of tools and algorithms for efficient simulation of tandem MS data using the EMU framework and for correction of tandem MS data for natural isotope abundances. By building upon the EMU framework, which is used by all current software packages for <sup>13</sup>C-MFA, we hope to accelerate the acceptance of tandem MS technique by <sup>13</sup>C-MFA community and encourage software developers to include capabilities for tandem MS <sup>13</sup>C-flux analysis in future software updates.

#### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

#### FUNDING

This work was supported by the NSF CAREER Award (CBET-1054120).



chromatography/mass spectrometry. Anal. Chem. 88, 4624–4628. doi: 10.1021/acs.analchem.6b00779


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Choi and Antoniewicz. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Flux Connections Between Gluconate Pathway, Glycolysis, and Pentose–Phosphate Pathway During Carbohydrate Metabolism in Bacillus megaterium QM B1551

Julie A. Wushensky<sup>1</sup> , Tracy Youngster<sup>2</sup> , Caroll M. Mendonca<sup>1</sup> and Ludmilla Aristilde1,2 \*

<sup>1</sup> Department of Biological and Environmental Engineering, College of Agriculture and Life Sciences, Cornell University, Ithaca, NY, United States, <sup>2</sup> Soil and Crop Sciences Section, School of Integrative Plant Science, College of Agriculture and Life Sciences, Cornell University, Ithaca, NY, United States

#### Edited by:

Yinjie Tang, Washington University in St. Louis, United States

#### Reviewed by:

Wei Xiong, National Renewable Energy Laboratory (DOE), United States Junyoung O. Park, University of California, Los Angeles, United States Shan Yi, University of California, Berkeley, United States

\*Correspondence:

Ludmilla Aristilde ludmilla@cornell.edu; la31@cornell.edu

#### Specialty section:

This article was submitted to Microbial Physiology and Metabolism, a section of the journal Frontiers in Microbiology

> Received: 15 September 2018 Accepted: 30 October 2018 Published: 21 November 2018

#### Citation:

Wushensky JA, Youngster T, Mendonca CM and Aristilde L (2018) Flux Connections Between Gluconate Pathway, Glycolysis, and Pentose–Phosphate Pathway During Carbohydrate Metabolism in Bacillus megaterium QM B1551. Front. Microbiol. 9:2789. doi: 10.3389/fmicb.2018.02789 Bacillus megaterium is a bacterium of great importance as a plant-beneficial bacterium in agricultural applications and in industrial bioproduction of proteins. Understanding intracellular processing of carbohydrates in this species is crucial to predicting natural carbon utilization as well as informing strategies in metabolic engineering. Here, we applied stable isotope-assisted metabolomics profiling and metabolic flux analysis to elucidate, at high resolution, the connections of the different catabolic routes for carbohydrate metabolism immediately following substrate uptake in B. megaterium QM B1551. We performed multiple <sup>13</sup>C tracer experiments to obtain both kinetic and longterm <sup>13</sup>C profiling of intracellular metabolites. In addition to the direct phosphorylation of glucose to glucose-6-phosphate (G6P) prior to oxidation to 6-phosphogluconate (6Pgluconate), the labeling data also captured glucose catabolism through the gluconate pathway involving glucose oxidation to gluconate followed by phosphorylation to 6Pgluconate. Our data further confirmed the absence of the Entner–Doudoroff pathway in B. megaterium and showed that subsequent catabolism of 6P-gluconate was instead through the oxidative pentose–phosphate (PP) pathway. Quantitative flux analysis of glucose-grown cells showed equal partition of consumed glucose from G6P to the Embden–Meyerhof–Parnas (EMP) pathway and from G6P to the PP pathway through 6P-gluconate. Growth on fructose alone or xylose alone was consistent with the ability of B. megaterium to use each substrate as a sole source of carbon. However, a detailed <sup>13</sup>C mapping during simultaneous feeding of B. megaterium on glucose, fructose, and xylose indicated non-uniform intracellular investment of the different carbohydrate substrates. Flux of glucose-derived carbons dominated the gluconate pathway and the PP pathway, whereas carbon flux from both glucose and fructose populated the EMP pathway; there was no assimilatory flux of xylose-derived carbons. Collectively, our findings provide new quantitative insights on the contribution of the different catabolic routes involved in initiating carbohydrate catabolism in B. megaterium and related Bacillus species.

Keywords: metabolomics of carbohydrate catabolism, glucose and fructose assimilation, gluconate uptake and metabolism, fructose metabolism, oxidative pentose–phosphate pathway, Bacillus megaterium

## INTRODUCTION

fmicb-09-02789 November 19, 2018 Time: 14:41 # 2

Bacillus megaterium, an aerobic bacterium ubiquitous in a diverse range of environments, has been of special investigative interest for its applications in promoting plant health (Eppinger et al., 2011; Santos et al., 2014), bioremediation of contaminants (Quinn et al., 1989), and industrial bioproduction (Vary, 1994; Vary et al., 2007; Biedendieck et al., 2010). Despite extensive genetic characterization of B. megaterium, detailed elucidation of its metabolic network is still lacking (Kanehisa and Goto, 2000; Eppinger et al., 2011; Kanehisa et al., 2017). Much of what is known about carbon metabolism in Bacillus species is derived from Bacillus subtilis, the primary model organism for Gram-positive bacteria (Sauer et al., 1997; Dauner and Sauer, 2001; Dauner et al., 2001; Fuhrer et al., 2005). Previous metabolic studies highlighted incongruent metabolic fluxes between B. subtilis and B. megaterium species (Sauer et al., 1997; Dauner and Sauer, 2001; Dauner et al., 2001; Fuhrer et al., 2005; Furch et al., 2007a,b; Tannler et al., 2008; Youngster et al., 2017). Metabolic flux modeling has been performed on mutant strains of B. megaterium that lacked specific metabolic functions (Furch et al., 2007b). Therefore, the operational network of wildtype B. megaterium remains to be evaluated experimentally. Of particular interest is an investigation of the different catabolic routes that initiate carbohydrate metabolism in B. megaterium due to the ubiquitous presence of carbohydrate-containing feedstocks.

Here, we investigated the metabolic fluxes underlying the catabolism of glucose, fructose, and xylose in B. megaterium QM B1551. Glucose, a monomer of the biopolymers cellulose and starch, is a common carbohydrate in environmental matrices (Koegel-Knabner, 2002). The disaccharide sucrose, a dimer with glucose and fructose, is common in plant materials and can serve as a carbon source to B. megaterium (Youngster et al., 2017). Xylose, a pentose monosaccharide, is a major monomer in hemicellulose, an abundant component of plant cell walls (Koegel-Knabner, 2002). The enzymes involved in the uptake pathway of all three carbohydrates (i.e., glucose, fructose, and xylose) have been annotated in the genome of B. megaterium (**Figure 1**). The metabolic pathways that can be involved in carbohydrate catabolism are the Embden–Meyerhof–Parnas (EMP) pathway, the gluconate pathway, the oxidative pentose– phosphate (PP) pathway, the non-oxidative PP pathway, and the Entner–Doudoroff (ED) pathway (**Figure 1**).

It is well established that Bacillus species undergo glycolysis via the EMP pathway wherein glucose is first phosphorylated to glucose-6-phosphate (G6P), then isomerized to fructose-6-phosphate (F6P) before phosphorylation to fructose-1,6 bisphosphate (FBP) (**Figure 1**). An important step that provides the driving force for the yields of ATP downstream of the EMP pathway is the lysis of FBP to dihydroxyacetone phosphate (DHAP) and glyceraldehyde-3-phosphate (GAP) (**Figure 1**; Rabinowitz et al., 2015). The metabolites in the EMP pathway can also contribute to the PP pathway (**Figure 1**). The oxidative phase of the PP pathway involves the oxidation of G6P to 6 phosphogluconate (6P-gluconate) followed by decarboxylation to PPs; the non-oxidative phase involves the transketolase and transaldolase reactions that combine F6P and GAP to yield intermediates in the PP pathway [xylulose-5-phosphate (Xu5P), erythrose-4-phosphate (E4P), and sedoheptulose-7-phosphate (S7P)] (**Figure 1**). Carbon fluxes from the EMP and PP pathways, which are eventually channeled to the tricarboxylic acid (TCA) cycle, generate reducing equivalents [NAD(P)H], energy (ATP), and metabolite precursors for biomass growth (i.e., biosynthesis of ribonucleotides for RNA and DNA, biosynthesis of amino acids for proteins) (**Figure 1**).

There is a lack of consensus regarding the distribution of the flux from G6P to the EMP pathway relative to the flux from G6P to the PP pathway via 6P-gluconate (Sauer et al., 1997; Dauner et al., 2001; Furch et al., 2007a,b; Tannler et al., 2008). Certain strains of B. subtilis (PRF93) were reported to utilize the PP pathway more than the direct EMP pathway (Sauer et al., 1997), whereas the opposite was reported for other B. subtilis strains (wild-type strain 168 and mutant strains RB50::[pRF69]n,) (Dauner et al., 2001; Tannler et al., 2008). In one mutant strain of B. megaterium (WH323), carbon starvation was shown to promote usage of the EMP pathway over the PP pathway, whereas recombinant gene induction increased activity in the PP pathway relative to the EMP pathway (Furch et al., 2007a,b). Furthermore, only a forward flux of the EMP pathway has been portrayed in previous models of Bacillus species, including B. megaterium (Sauer et al., 1997; Dauner et al., 2001; Furch et al., 2007b; Tannler et al., 2008). However, the genome of B. megaterium QM B1551 annotates the enzymes that can operate both forward and backward fluxes through the EMP pathway (Kanehisa and Goto, 2000; Kanehisa et al., 2017; **Figure 1**). The contribution of both of these fluxes, which has implications for the net thermodynamic driver of the EMP pathway, has been unexplored in previous Bacillusstudies (Sauer et al., 1997; Dauner et al., 2001; Furch et al., 2007a,b; Tannler et al., 2008).

Regarding the gluconate pathway, which involves the oxidation of glucose to gluconate followed by phosphorylation to 6P-gluconate (**Figure 1**), annotation of the B. megaterium QM B1551 genome implied an incomplete gluconate pathway (Kanehisa and Goto, 2000; Kanehisa et al., 2017). Specifically annotated in the genome were glucose-1-dehydrogenase and gluconate kinase, which convert glucose to glucono-1,5-lactone and gluconate to 6P-gluconate, respectively (**Figure 1**; Kanehisa and Goto, 2000; Kanehisa et al., 2017). However, the enzyme gluconolactonase for the hydrolysis of glucono-1,5-lactone to gluconate was not present, albeit this reaction can happen spontaneously in highly alkaline conditions (**Figure 1**; Kanehisa and Goto, 2000; Kanehisa et al., 2017). Interestingly, despite the lack of genome annotation of gluconolactonase, evidence of the gluconate pathway was found in germinating spores of B. megaterium QM B1551 wherein significant gluconate evolution from glucose was detected (Otani et al., 1986; Sano et al., 1988). The connection between the gluconate pathway and the remaining network for carbohydrate catabolism in B. megaterium remains unclear. Pseudomonas species, which are well known to exhibit the gluconate pathway, rely significantly on the ED pathway to connect the gluconate pathway to downstream metabolism (del Castillo et al., 2007; Nikel et al., 2015; Sasnow et al., 2016; Wilkes et al., 2018). Due to the

2-keto-3-deoxy-6-phosphogluconate; Ru5P, ribulose 5-phosphate; Xu5P, xylulose 5-phosphate; S7P, sedoheptulose 7-phosphate; E4P, erythrose 4-phosphate; F6P, fructose 6-phosphate; FBP, fructose 1,6-bisphosphate; DHAP, dihydroxyacetone-3-phosphate; GAP, glyceraldehyde 3-phosphate; 1,3-bisPG, 1,3-biphosphoglycerate; 3PG, 3-phosphoglycerate; 2PG, 2-phosphoglycerate; PEP, phosphoenolpyruvate. The enzymes corresponding to the gene annotations are as follows: amyD, carbohydrate ABC transporter permease AmyD; ptsG, PTS system glucose-specific transporter subunit IIBC; ptsH, phosphocarrier protein HPr; ptsI, PTS system transporter I; gdh, glucose 1-dehydrogenase; pgi, glucose-6-phosphate isomerase; gntK, gluconate kinase; BMQ\_1157, fructokinase; xylT, xylose permease; xylA, D-xylose isomerase; xylB, xylokinase; BMQ\_0309, 6-phosphogluconolactonase; BMQ\_3633, 2-dehydro-3-deoxyphosphogluconate aldolase; fruA, PTS system fructose-specific II subunit IIA; fruB, PTS system fructose-specific II subunit IIB; zwf, glucose-6-phosphate 1-dehydrogenase; gntZ, phosphogluconate dehydrogenase; rpe, ribulose-phosphate 3-epimerase; rpiA, ribose 5-phosphate isomerase A; tkt, transketolase; tal, transaldolase; pfk, 6-phosphofructokinase; fbp, fructose-1,6-bisphosphatase; fba, fructose-1,6-bisphosphate aldolase; tpiA, triosephosphate isomerase; gap, glyceraldehyde-3-phosphate dehydrogenase; pgk, phosphoglycerate kinase; pgm, phosphoglucomutase; eno, enolase; pgy, pyruvate kinase.

absence of all the relevant enzymes, a functional ED pathway in Bacillus species has not been considered in previous metabolic studies (Furch et al., 2007a,b; Tannler et al., 2008). Specifically, in B. megaterium QM B1551, genome-level characterization has annotated one of the two essential enzymes required for the ED pathway (Kanehisa and Goto, 2000; Kanehisa et al., 2017) (**Figure 1**). This characterization has noted the presence of 2-keto-3-deoxyphosphogluconate aldolase but the upstream enzyme, 6-phosphogluconate dehydratase, is absent from the genome of B. megaterium QM B1551 (**Figure 1**). Both the gluconate pathway and the ED pathway have not been considered in previously reported metabolic network models

of Bacillus species (Sauer et al., 1997; Dauner et al., 2001; Furch et al., 2007a,b; Tannler et al., 2008; Korneli et al., 2012).

Previous metabolic flux analysis (MFA) of Bacillus species has relied on <sup>13</sup>C-labeled amino acid labeling to estimate labeling at specific metabolic nodes (Sauer et al., 1997; Dauner et al., 2001; Furch et al., 2007a,b; Tannler et al., 2008; Korneli et al., 2012). However, because there are only two amino acid precursors from intermediates in the PP pathway [ribose-5 phosphate (R5P) and E4P], this method is not suitable for highresolution determination of the carbon fluxes through the PP pathway as well as the other vicinal metabolic pathways such as the gluconate pathway and the ED pathway. Furthermore, labeling of free metabolites is needed for more accurate quantitative determination of flux distribution between the EMP pathway and the adjacent pathways. Here, we employed a <sup>13</sup>C-assisted metabolomic approach using high-resolution liquid chromatography–mass spectrometry (LC–MS) to detect free metabolites involved in all four pathways. We obtained both kinetic and long-term <sup>13</sup>C labeling of intracellular metabolites. We combined the cellular <sup>13</sup>C mapping with growth phenotypes to perform quantitative flux analysis of carbohydrate catabolism. Our results provide new insights on the metabolic network structure and flux distribution in B. megaterium (strain QM B1551). These findings have broader implications regarding the optimization of Bacillus and related bacterial species in biotechnological applications.

#### EXPERIMENTAL METHODS

#### Culturing Conditions

The B. megaterium QM B1551 cells were obtained from the Bacillus Genetic Stock Center (Columbus, OH, United States). Cell cultures were grown at 30◦C in a G24 environmental incubator shaker (New Brunswick Scientific, Edison, NJ, United States) at 220 rpm. The growth medium, which was pHadjusted (pH 7.0) and filter-sterilized (0.22 µm nylon; Waters Corporation, MA, United States), contained the following major salts: 18.7 mM NH4Cl, 0.81 mM MgSO4, 0.034 mM CaCl2. 2H2O, 89.4 mM K2HPO4, 56.4 mM NaH2PO4.H2O, 8.6 mM NaCl. The trace metal concentrations were as follows: 30 µM FeSO4.7H2O, 1.9 µM H3BO3, 0.86 µM CuSO4.5H2O, 7.7 µM ZnSO4.7H2O, 0.75 µM MnSO4.H2O, 0.26 µM NiCl2.6H2O, and 0.31 µM Na2MoO4.2H2O. The carbohydrate composition was 330 mM C total as glucose alone (equivalent to 55 mM or 9.91 g L−<sup>1</sup> glucose), fructose, alone, xylose alone, 1:1 glucose:gluconate mixture, or equimolar glucose:fructose:xylose mixture. All chemicals listed above were analytical grade, purchased from Sigma-Aldrich (St. Louis, MO, United States) and Fisher Scientific (Pittsburg, PA, United States). For all growth conditions, the cells were transferred twice into fresh growth media to ensure that the cells were well acclimated to their nutrient growth conditions prior to experimental sampling. Cell growth (three biological replicates per condition) was monitored via both optical density at 600 nm (OD600) using an Agilent Cary UV-visible spectrophotometer (Santa Clara, CA, United States) and measuring cell dry weight (in grams, gCDW). To obtain accurate reading at OD<sup>600</sup> above 0.5, cell suspensions were diluted. To determine the cell dry weight (three biological replicates), we centrifuged (9,391 g for 5 min at 4◦C) harvested 1.5-mL culture aliquots and the retentate was frozen (−20◦C) overnight prior to lyophilization using a Labconco freeze-dryer (Kansas City, MO, United States). The conversion factors of gCDW L <sup>−</sup><sup>1</sup> per OD<sup>600</sup> were found to be: 0.61 ± 0.08 for glucose; 0.43 ± 0.10 for glucose:fructose:xylose; and 0.53 ± 0.07 for glucose:gluconate.

#### Carbohydrate Consumption

The depletion of substrate in the extracellular medium was taken to account for sugar consumption; this was confirmed by intracellular <sup>13</sup>C labeling (Aristilde et al., 2015). To quantify substrate depletion by exponentially growing cells under each growth condition, 0.7 mL aliquots were harvested at different times (two independent biological replicates at each timepoint) throughout cell growth and analyzed via <sup>1</sup>H nuclear magnetic resonance (NMR). The extracted aliquots were centrifuged for 30 min at 15,871 g and 4◦C in filter-containing (0.22-µm pore size, nylon) microcentrifuge tubes. The filtered supernatants were frozen at −20◦C until further processing. In preparation for the NMR analysis, 200 µL of the filtered samples were mixed with 60 µL of 100% D2O, 50 µL of 6 mM 2,2-dimethyl-2 silapentane-5-sulfonate (DSS) as an internal standard, 240 µL of 100 mM sodium bicarbonate as a pH control, and 50 µL of 10 mM sodium-azide as an antimicrobial agent. Samples were stored at 4◦C until analysis (Aristilde et al., 2015). The <sup>1</sup>H NMR measurements were performed using a Varian Unity INOVA 600-MHz NMR spectrometer at 25◦C, with a relaxation delay of 5 s, recording of 16 scans per sample, and receiver gain of 32 dB (Aristilde et al., 2015). Substrate depletion rates (in mmol gCDW−<sup>1</sup> h −1 ) were determined subsequently via regression analysis of carbohydrate depletion over time combined with biomass growth rate.

#### Metabolite Excretion

To determine excretion rates of metabolites, 50 µL of filtered cell suspensions (three biological replicates per condition) were obtained during growth and subsequently diluted with LC–MS grade water (Fisher Scientific, Pittsburgh, PA, United States) at 1:20 v/v during early exponential phase, or 1:200 v/v during late exponential and stationary phases. Different dilution ratios were used to account for elevated concentrations of extracellular metabolites in the media as a function of cell growth. The samples were stored at 4◦C (for 5 h or less) prior to processing via LC– MS. Excretion rates (µmol gCDW−<sup>1</sup> h −1 ) were calculated via regression analysis.

### Kinetic Metabolite <sup>13</sup>C Labeling

Kinetic flux experiments were performed to monitor in vivo cellular incorporation of glucose (Sasnow et al., 2016). Batch cultures (two independent biological replicates) were grown as described above for the glucose-growth condition until early exponential phase, corresponding to OD<sup>600</sup> 0.4–0.6. Aliquots (3 mL) of the cultures were filtered (0.22-µm nylon filter). Each

cell-containing filter disk was then placed cell-side up on an agar plate containing unlabeled carbon in minimal media to allow the cells to acclimate and reach an exponentially growing phase, corresponding to OD<sup>600</sup> 0.5. To determine when this OD<sup>600</sup> value was reached, cells on filters from parallel plates were rinsed off into a 3-mL suspension for OD<sup>600</sup> reading. After the cells were adequately acclimated on the agar plate, the cellcontaining filter disks were transferred to agar plates containing [U-13C6]-glucose and, after a set period of time following the isotopic switch (0 and 20 s; 1, 4, 12, and 30 min), the filter disks were removed from the labeled plate and metabolism was immediately quenched by flipping the disks cell-side down into a 2-mL cold (4◦ C) methanol:acetonitrile:water quenching solution (2:2:1). The quenched solution containing the lysed cells scraped from the filter disks was then centrifuged (5 min at 9,391 g and 4 ◦C). We dried 100-µL aliquots of the supernatants under N<sup>2</sup> gas prior to re-suspension in 100-µL LC–MS water for use in metabolomics analysis as described below.

### Long-Term Intracellular Metabolite <sup>13</sup>C Labeling

For long-term isotopic enrichment of the intracellular metabolites, liquid cultures were prepared with the growth medium containing the major and minor salts as listed above and the following <sup>13</sup>C-labeled substrates: [1,2-13C6]-glucose alone, equimolar [U-13C6]-glucose and unlabeled gluconate, or equimolar [1,2,3-13C3]-glucose, [1,6-13C2]-fructose, and unlabeled xylose. All labeled carbohydrates were purchased from Cambridge Isotopes (Tewksbury, MA, United States) or Omicron Biochemicals (South Bend, IN, United States). For these labeling experiments, batch cultures (two independent biological replicates) were grown twice in labeled minimal media and sampling was done at two timepoints (when the cells reached OD<sup>600</sup> of 1.0 and 2.0) during exponential growth (**Supplementary Figure S1**). At each sampling time, 3 mL aliquots of cell suspensions were filtered through 0.22-µm nylon filters in a sterile environment. The cell-containing filter disks were immediately quenched in 2 mL of cold (4◦C) methanol:acetonitrile:water solution (2:2:1). The lysed cells in the quenching solution were processed as described in the previous section in preparation for metabolomics analysis as described below.

#### Metabolomics Analysis via LC–MS

The samples prepared as described above were analyzed by reversed-phase ion-pairing ultra-high performance LC (Thermo Scientific Dionex UltiMate 3000) coupled to high-resolution/accurate MS (Thermo Scientific Q Exactive quadrupole-Orbitrap hybrid MS) with electrospray ionization run in full-scan negative mode (Aristilde et al., 2017). Details of the LC–MS protocol used here were previously reported (Aristilde et al., 2017). The following metabolites were isolated by LC–MS: gluconate, 6P-gluconate, G6P, F6P, FBP, pyruvate, Xu5P, R5P, DHAP, and S7P. As previously detailed and illustrated in Aristilde, 2017, the analytical isolation of the different compound isomers (i.e., G6P/F6P, Xu5P/R5P, and DHAP/GAP) was made possible due to their chromatographic separations, despite their similar mass-over-charge ratios. Metabolite standards at various concentrations (10–1,000 nM) were also run in parallel to verify LC–MS identification and quantitation. The <sup>13</sup>C labeling patterns were analyzed with the MAVEN (Metabolomic Analysis and Visualization Engine) software suite (Melamud et al., 2010; Clasquin et al., 2012). The labeling data were corrected subsequently for natural abundance of <sup>13</sup>C.

### Quantitative Metabolic Flux Analysis

The MFA of B. megaterium cells grown on glucose alone or the glucose:fructose:xylose mixture was constrained by the following experimental data: substrate consumption rates, metabolite excretion rates, long-term <sup>13</sup>C labeling data of intracellular metabolites, and biomass growth. We employed previously reported stoichiometric biomass composition for B. subtilis (Dauner and Sauer, 2001) to estimate metabolite effluxes (from G6P, R5P, E4P, 3-PG, pyruvate, and DHAP) to sustain the biomass requirements for cell growth under each condition. The modeling software suite 13CFLUX2 was used to conduct the MFA analysis of the pathway network involving metabolic nodes for substrate uptake, the gluconate pathway, the EMP pathway, the PP pathway, and the biomass effluxes (Weitzel et al., 2012). Metabolite secretion rates were also included for gluconate. The model was initialized using a number of free fluxes, informed by previous values of fluxes in B. megaterium (Furch et al., 2007a). From these free fluxes, which were unconstrained values assigned by the user to provide an initial set of flux values, the modeling algorithm optimized all flux values in accordance with the experimental constraints through an iterative process (**Supplementary Tables S1, S2**). At each iteration, the quality of estimated fluxes was evaluated by calculating residual errors between the model-estimated metabolite labeling data and the corresponding experimental data (**Supplementary Figures S2, S3**).

## RESULTS

#### Glucose Catabolism Involves the Gluconate Pathway Without the ED Pathway

Following uptake of extracellular glucose, glucose catabolism can be initiated by either phosphorylation to G6P or oxidation to gluconate (**Figure 1**). Kinetic <sup>13</sup>C-labeling data following feeding of cells on [U-13C6]-glucose showed, by 30 min, nearly complete isotopic enrichment of G6P and 6P-gluconate and ∼75% isotopic enrichment of gluconate (**Figure 2A**). The <sup>13</sup>C labeling of gluconate was indicative of glucose oxidation to gluconate, which was also reported in spores of B. megaterium QM B1551 (Otani et al., 1986; Sano et al., 1988). The plateau of gluconate <sup>13</sup>C labeling below 100% may be due to carry over of residual non-labeled gluconate in the extracellular milieu because gluconate secretion was evident during growth of B. megaterium on glucose alone (**Supplementary Table S1**). Evidence of the gluconate pathway, which would include phosphorylation of

gluconate to 6P-gluconate could not be resolved from the kinetic intracellular labeling from <sup>13</sup>C-glucose alone. The similar labeling kinetics for 6P-gluconate and G6P implied that 6P-gluconate was primarily produced from G6P but these data could not confirm either the presence or lack of gluconate flux to 6Pgluconate (**Figure 2A**). To capture specifically the incorporation of gluconate-derived carbons into 6P-gluconate, we fed the cells equimolar concentrations of fully <sup>13</sup>C-labeled glucose and unlabeled gluconate (**Figure 2B**). Across the two timepoints during exponential growth on this glucose:gluconate mixture, the fraction of non-labeled 6P-gluconate increased from 18% to 36%, confirming the presence of an active gluconate pathway (**Figure 2B**). However, given the high-fraction of <sup>13</sup>C-labeled 6P-glcuonate (above 60%), these kinetic data also indicated

that the assimilation route of glucose-derived <sup>13</sup>C carbons from G6P to 6P-gluconate was favored over the gluconate pathway (**Figure 2B**).

There are two possible fates of 6P-gluconate: the oxidative PP pathway and the ED pathway (**Figure 2B**). The ED pathway involves the splitting of 6P-gluconate into GAP and pyruvate (**Figure 2B**). At the timepoint when 6P-gluconate had the highest non-labeled fraction (∼36%), the non-labeled fractions of both pyruvate and DHAP (an isomer of GAP) were less than 5% (i.e., within the error range of the LC–MS detection) (**Figure 2B**). The oxidative PP pathway was expected to generate non-labeled PPs (such as R5P) as evidence of catabolism of gluconate-derived carbons via this pathway (**Figure 2B**). We found that nearly 20% of R5P was non-labeled, implying that about 72% of the R5P pool was synthesized from 6P-gluconate via the oxidative PP pathway (**Figure 2B**). Therefore, these data collectively demonstrated that the fate of 6P-gluconate was through the oxidative PP pathway and that the ED pathway was inactive in B. megaterium QM B1551.

We compared the metabolite pools in cells grown on the equimolar glucose:gluconate mixture to those in cells grown on glucose alone – the total substrate carbon concentration (330 mM C) was the same in both conditions (**Figure 2C**). There was no appreciable change in the levels of G6P and F6P (**Figure 2C**). However, during growth on the glucose:gluconate mixture, the level of 6P-gluconate was severely depleted (16-fold reduction), in agreement with limited flux from gluconate to 6P-gluconate, as indicated by the labeling data (**Figures 2B,C**). In accordance with the lack of the ED pathway, we obtained a decrease in both DHAP (∼3-fold reduction) and pyruvate (∼30 fold reduction) during feeding on the glucose:gluconate mixture (**Figure 2C**). Interestingly, despite the depletion in DHAP, which is downstream of FBP, there was an accumulation in FBP (∼18 fold increase) during feeding on the glucose:gluconate mixture (**Figure 2C**). Given the constant level of F6P upstream of FBP, these data implied that, when gluconate is present with glucose, the flux of FBP production from F6P exceeded the flux of FBP lysis to generate DHAP and GAP (**Figure 2C**).

#### Glucose Catabolism to Pyruvate Is Through a Partially Reversible EMP Pathway

Following the <sup>13</sup>C probing approach described above, [1,2-13C2] glucose was chosen as a substrate to evaluate possible backward fluxes in the EMP pathway (**Figure 3**). These backward fluxes, which are implied from the annotated genome of B. megaterium (Kanehisa and Goto, 2000; Kanehisa et al., 2017), have not been

probed previously. The forward EMP pathway involves carbon flux from glucose through G6P, F6P, FBP, and onward to DHAP and GAP following FBP lysis (**Figure 3**). In agreement with the assimilation of the doubly <sup>13</sup>C-labeled glucose, we obtained ∼90% doubly <sup>13</sup>C-labeled fractions for gluconate, 6P-gluconate, G6P, and F6P (**Figure 3**). However, there was a discrepancy between the labeling patterns of F6P and those of FBP (**Figure 3**). Compared to F6P, there was a 30% depletion in the doubly <sup>13</sup>Clabeled fraction of FBP (**Figure 3**). Moreover, non-labeled FBP was 10%, compared to 1% in F6P; the quadruply <sup>13</sup>C-labeled fraction was 14% in FBP, compared to 2% in F6P (**Figure 3**). Non-labeled intermediates in the EMP pathway were generated during lysis of the doubly [1,2-13C2]-FBP into [2,3-13C2] DHAP and non-labeled GAP (**Figure 3**). Equilibrium isotopic labeling between the isomers DHAP and GAP led to the measurement of near equal fraction (∼50%) of non-labeled and doubly <sup>13</sup>Clabeled fractions in DHAP (**Figure 3**). Operating the EMP pathway in the reverse direction by combining the labeling scheme for DHAP and GAP to produce FBP explained the nonlabeled and quadruply <sup>13</sup>C-labeled fraction in FBP, in addition to the doubly <sup>13</sup>C-labeled fraction (**Figure 3**). However, the fact that F6P was primarily doubly <sup>13</sup>C-labeled indicated that the EMP pathway was only partially reversible (**Figure 3**).

### Glucose Catabolism and Gluconate Pathway Are Both Connected to a Primarily Oxidative PP Pathway

The PP pathway possesses both an oxidative route and a nonoxidative route (**Figures 4A,B**). The oxidative PP pathway involves the decarboxylation of 6P-gluconate to Ru5P, which subsequently interconverts to R5P and Xu5P; the non-oxidative PP pathway combines metabolites from the EMP pathway, F6P and GAP, to generate eventually Xu5P and R5P (**Figures 4A,B**). According to our labeling scheme, decarboxylation of doubly <sup>13</sup>C-labeled 6P-gluconate through the oxidative PP pathway would produce singly <sup>13</sup>C-labeled R5P and Xu5P but the nonoxidative PP pathway would use EMP-pathway metabolites to generate triply <sup>13</sup>C-labeled R5P and Xu5P (**Figures 4A,B**). We found that the metabolite labeling patterns of R5P and Xu5P were primarily singly <sup>13</sup>C-labeled (77 and 72%, respectively) and only up to 18% of these metabolites were triply <sup>13</sup>C-labeled (**Figure 4**); in accordance with the labeling of Xu5P and R5P, we also obtained primarily doubly <sup>13</sup>C-labeled S7P (**Figure 4**). Thus, the labeling patterns of Xu5P and R5P were consistent with a significant involvement of the oxidative PP pathway with relatively minor contribution from the non-oxidative pathway (**Figure 4**). Therefore, through the oxidative PP pathway, PP metabolites were primarily produced from 6P-gluconate, which was generated from both the gluconate pathway and the direct glucose catabolism through G6P.

### Simultaneous Processing of Different Carbohydrates Is Partitioned Into Different Catabolic Routes

Previous reports of carbon metabolism and metabolic flux modeling have focused on glucose-grown Bacillus species (Sauer et al., 1997; Dauner et al., 2001; Tannler et al., 2008). However, glucose is often present with other carbohydrates in carbon feedstocks (Wendisch et al., 2016). In addition to glucose, we were able to obtain growth of the B. megaterium cells when the sole carbon source was given as fructose (another common hexose) or xylose (a common pentose) (**Supplementary Figure S1**). These data thus indicated that, in accordance with the genome annotation (**Figure 1**), the cells have the transporters and catabolic pathways to assimilate these carbohydrates. We also investigated how B. megaterium QM B1551 processes simultaneously glucose, fructose, and xylose by monitoring intracellular incorporation of equimolar carbon-equivalent concentrations of [1,2,3-13C3]-glucose, [1,6- <sup>13</sup>C2]-fructose, and unlabeled xylose (**Figure 5**). The labeling data revealed that a non-uniform intracellular routing of each substrate in the metabolic network (**Figure 5**).

Consistent with the oxidation of the triply <sup>13</sup>C-labeled glucose, gluconate was predominantly triply <sup>13</sup>C-labeled (at ∼85%) (**Figure 5**). From the phosphorylation of triply <sup>13</sup>Clabeled glucose to G6P, the G6P pool had a high triply <sup>13</sup>Clabeled fraction (60–73%) with a lesser amount of doubly <sup>13</sup>Clabeled (19%) fraction (**Figure 5**). Downstream of G6P and gluconate, 6P-gluconate was ∼85% triply <sup>13</sup>C-labeled and 11% doubly <sup>13</sup>C-labeled (**Figure 5**). The enrichment in triply <sup>13</sup>Clabeled fraction (by up to 20%) from G6P to 6P-gluconate emphasized the contribution of gluconate to generate 6Pgluconate, thus underscoring an active gluconate pathway in initiating the catabolism of glucose-derived carbons during growth on multiple carbohydrates (**Figure 5**). The small fraction of doubly <sup>13</sup>C-labeled fractions in both G6P and 6P-gluconate reflected the minor incorporation of fructose-derived carbons in these metabolites (**Figure 5**).

Notably, for the F6P pool, we obtained an equal amount (∼45% each) of triply <sup>13</sup>C-labeled fraction and doubly <sup>13</sup>Clabeled fractions, respectively from glucose-derived and fructosederived carbons (**Figure 5**). Forward flux through the EMP pathway was expected to generate <sup>13</sup>C labeling for FBP similar to F6P and, subsequently, the splitting of the triply <sup>13</sup>C-labeled and doubly <sup>13</sup>C-labeled fractions of FBP would generate equal proportions of non-labeled, singly <sup>13</sup>C-labeled, and triply <sup>13</sup>Clabeled fractions of triose-phosphates (**Figure 5**). In accordance with this labeling scheme in the EMP pathway, we did obtain an equal proportion of non-labeled, singly <sup>13</sup>C-labeled and triply <sup>13</sup>C-labeled fractions of DHAP, 3-PG, and pyruvate (**Figure 5**). However, in addition to doubly and triply <sup>13</sup>C-labeled FBP (at 20– 32%), there were other <sup>13</sup>C-labeled fractions in FBP (at 5-15%) (**Figure 5**). Specifically, the latter fractions were non-labeled, singly <sup>13</sup>C-labeled, quadruply <sup>13</sup>C-labeled, and sextuply <sup>13</sup>Clabeled fractions of FBP (**Figure 5**). Thus, the labeling pattern of FBP reflected backward flux from the combination of DHAP and GAP to produce FBP (**Figure 5**). However, this backward flux was not extended to F6P (**Figure 5**). Therefore, the partial reversibility of carbon flux in the upper EMP pathway previously discussed with the glucose-grown cells was also evident in cells grown on the glucose:fructose:xylose mixture (**Figures 3**, **5**). It is important to note that, in addition to indicating that the upper EMP pathway was only partially reversible, the distinct difference

between the labeling patterns of F6P and FBP revealed that F6P was the entry point for the catabolism of fructose-derived carbons (**Figure 5**).

With respect to xylose assimilation, the incorporation of xylose-derived carbons was expected to introduce non-labeled carbons in the PP pathway. However, non-labeled fractions in the PP pathway metabolites (R5P, Xu5P, and S7P) were only 5% or less (**Figure 5**). By contrast, in accordance with the oxidative PP pathway, triply <sup>13</sup>C-labeled 6P-gluconate from assimilated glucose-derived carbons led to 52–60% doubly <sup>13</sup>Clabeled fractions in both R5P and Xu5P (**Figure 5**). Consistent with the combination of these latter two metabolites to produce S7P, we obtained quadruply <sup>13</sup>C-labeled S7P (∼46%) (**Figure 5**). Despite the predominance of the oxidative PP pathway, minor fractions of other <sup>13</sup>C isotopologs of the PP pathway metabolites were consistent with minor flux of the non-oxidative PP pathway (**Figure 5**). In sum, during growth on the carbohydrate mixture, glucose-derived carbons primarily populated the gluconate pathway and the oxidative PP pathway, both fructose-derived and glucose-derived carbons equally contributed to the EMP pathway, and the assimilation of xylose was insignificant (**Figure 5**).

non-oxidative pentose–phosphate pathway. Measured data (average ± data range) were collected at two different timepoints during cell growth (T1, OD<sup>600</sup> = 1.0; T2, OD<sup>600</sup> = 2.0). Growth curves in Supplementary Figure S1 indicate when data were acquired during biomass growth. Definitions for metabolite abbreviations are as shown in Figure 1 caption.

### Quantitative Flux Modeling During Growth on Glucose Alone Versus a Mixture With Glucose, Fructose, and Xylose

Using the <sup>13</sup>C labeling data and the growth phenotypes, we performed MFA to quantitate explicitly up to 18 fluxes through the different catabolic routes for carbohydrate utilization in B. megaterium during growth on glucose alone or with fructose and xylose (**Figure 6**). The fluxes were normalized to the glucose uptake rate in both growth conditions (**Figure 6**) – the absolute fluxes are presented in **Supplementary Table S2**. The cellular fluxes demonstrated that different catabolic routes were accentuated in B. megaterium depending on whether the cells were processing a single or a mixture of carbohydrate substrates (**Figure 6**).

Compared to the metabolism of glucose alone, there was a two-fold increase of the flux through the gluconate pathway in cells grown on the mixture (**Figures 6A,B**). This increased flux through the gluconate pathway during processing of the mixture was accompanied by a decrease (by up to 12%) in the carbon flux from G6P to F6P (**Figures 6A,B**). The increased carbon flux through the gluconate pathway led to a 17% increase in the activity of the oxidative PP pathway, from 6P-gluconate to

Ru5P, in cells grown on the mixture relative to glucose alone (**Figures 6A,B**). Subsequent fluxes through the PP pathway were also increased (**Figures 6A,B**). The exception was the increase (by 25%) in the reaction flux that combines Xu5P and E4P to produce F6P and GAP (**Figures 6A,B**), which may be influenced by the influx of fructose-derived carbons into the EMP pathway (**Figures 5**, **6A,B**). During growth on the mixture, the uptake flux from fructose to F6P was about half of the glucose uptake flux (**Figure 6B**). The second annotated pathway for fructose uptake (i.e., from fructose to FBP), was found to be inactive (**Figures 1**, **6B**). Due to the incorporation of fructose-derived carbons at the metabolic node of F6P, the flux through the remaining EMP pathway downstream of F6P, was increased by up to 70% (**Figures 6A,B**). Thus, the overall flux profile in the gluconate pathway, EMP pathway, and PP pathway was reprogrammed to accommodate the assimilation of fructose (**Figures 6A,B**).

#### DISCUSSION

Here, we employed <sup>13</sup>C-carbon mapping of intracellular metabolism to address knowledge gaps regarding the routes for carbohydrate catabolism in B. megaterium QM B1551. We also resolved inconsistencies from previous studies on Bacillus species regarding the relative flux from glucose catabolism through either the EMP pathway or the PP pathway (Sauer et al., 1997; Dauner et al., 2001; Furch et al., 2007a,b; Tannler et al., 2008; Korneli et al., 2012). Our <sup>13</sup>C labeling data presented evidence of both glucose oxidation to gluconate and flux from gluconate to 6P-gluconate, thus confirming the presence of the complete gluconate pathway in B. megaterium (**Figures 2A,B**). Our data also revealed that flux through the gluconate pathway was compromised by congestion at the gluconate node (**Figures 2A,B**). Further studies are needed to elucidate the regulatory mechanism for the gluconate pathway in B. megaterium QM B1551, which was beyond the scope of our data. Interestingly, in Pseudomonas species, gluconate pathway is strongly linked to a very active ED pathway (Nikel et al., 2015; Sasnow et al., 2016; Wilkes et al., 2018). Although the ED pathway was widely neglected in prior MFAs conducted on Bacillus species (Furch et al., 2007a,b; Tannler et al., 2008), genomic characterization of B. megaterium QM B1551 annotated one of the two essential enzymes in the ED pathway (**Figure 1**; Kanehisa and Goto, 2000; Kanehisa et al., 2017). However, our data demonstrated that the ED pathway was not active in B. megaterium QM B1551 (**Figure 2B**). Regarding the traditional glycolytic pathway (i.e., the EMP pathway), our flux modeling determined a net forward flux through the EMP pathway (**Figure 6**), in agreement with previous MFA studies

on Bacillus species including B. megaterium (Sauer et al., 1997; Dauner et al., 2001; Furch et al., 2007a,b; Tannler et al., 2008). However, our discrete metabolite labeling data revealed that the FBP lysis was reversible (**Figures 3**, **5**). The consequence of this reversibility regarding the optimization of bioproduct generation in B. megaterium species remains to be determined.

The resolution of metabolite labeling in the PP pathway of Bacillus species has previously been deduced from <sup>13</sup>C labeling of their amino acid derivatives (Sauer et al., 1997; Dauner et al., 2001; Furch et al., 2007b; Korneli et al., 2012). Here, using a LC–MS approach to obtain the labeling patterns of three free metabolites specific to the PP pathway, we determined that the flux through the oxidative PP pathway (i.e., from 6P-gluconate to Ru5P) was ∼6-fold to ∼10 fold greater than the flux through the non-oxidative PP pathway (from F6P and GAP) (**Figures 4C**, **6B**). This relative contribution of the two routes in the PP pathway was larger than previous published estimates, which reported a 3.5- to 5-fold greater activity of the oxidative PP pathway activity than non-oxidative PP pathway in Bacillus species (Sauer et al., 1997; Dauner et al., 2001; Furch et al., 2007b). The increased flux through the oxidative PP pathway determined from our modeling fluxes implied that the B. megaterium QM B1551 cells can afford a greater production of reducing power through the generation of NADPH than previously estimated.

While glucose is the substrate of choice to elucidate the metabolism of most bacterial model systems, a mixture of carbohydrates is typically present in environmental systems or in engineered bioreactors for industrial applications (Wendisch et al., 2016). Biomass growth on only glucose, fructose, or mannose indicated that B. megaterium can rely on each substrate as a sole source of carbon in its metabolism (**Supplementary Figure S1**). However, our metabolic study of cells fed on equimolar carbon-equivalent concentrations of glucose, fructose, and xylose revealed that B. megaterium QM B1551 utilizes different catabolic routes to process this carbohydrate mixture (**Figures 5**, **6**). No appreciable incorporation of xylose-derived carbons was detected (**Figure 5**). Carbons derived from fructose and glucose contributed equally to the flux in the EMP pathway from F6P to the triose phosphates (**Figure 5**). The gluconate pathway was only populated by glucose-derived

#### REFERENCES


carbons (**Figure 5**). Flux of glucose-derived carbons was also significant from the gluconate pathway through the oxidative PP pathway (**Figures 5**, **6**). Our findings thus demonstrated that B. megaterium QM B1551 grown on a mixture can exhibit both complete repression of a given carbohydrate assimilation and preferential channeling of the assimilated carbohydrates into different catabolic routes. These findings have important implications for the processing of different carbon feedstocks. Follow-up metabolic flux studies and proteomics profiling are needed to gain further insights on how B. megaterium regulates this hierarchy in carbohydrate catabolism in response to different composition and concentration of carbohydrate mixtures.

Given the emergent usage of different feedstocks in engineered bioproduction, it is critical to understand the metabolic network in biotechnologically important bacterial species in response to various carbon sources (Wendisch et al., 2016). Due to the abundance of carbohydrate-containing feedstocks, we have focused here on determining the flux through the different catabolic pathways available for carbohydrate processing in B. megaterium QM B1551. The <sup>13</sup>C-metabolomics approaches employed here present a powerful tool to elucidate new pathways and gain insights on existing pathways involved in the intracellular network of cellular metabolism. These approaches could be instrumental to the broader field of metabolic characterization of environmentally-relevant bacteria and optimization of bacterial cells in biotechnological applications.

#### AUTHOR CONTRIBUTIONS

LA supervised the research. JW and LA designed the research, conducted data analysis, and wrote the manuscript. JW and TY performed the experiments. JW and CM performed the quantitative flux modeling. All authors contributed to the final draft of the manuscript.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmicb. 2018.02789/full#supplementary-material

Biosystems Engineering I, eds C. Wittmann and R. Krull (Heidelberg: Springer-Verlag Berlin Heidelberg), 133–162.



of Bacillus megaterium QM B1551: a study with a mutant lacking hexokinase. Biochem. Biophys. Res. Commun. 151, 48–52. doi: 10.1016/0006-291X(88) 90557-8


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Wushensky, Youngster, Mendonca and Aristilde. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Bacterial Metabolism During Biofilm Growth Investigated by <sup>13</sup>C Tracing

Ni Wan<sup>1</sup> , Hao Wang<sup>2</sup> , Chun Kiat Ng3,4† , Manisha Mukherjee3,4, Dacheng Ren2,5 , Bin Cao3,4 \* and Yinjie J. Tang<sup>6</sup> \*

<sup>1</sup> Mechanical Engineering and Materials Science, Washington University, St. Louis, MO, United States, <sup>2</sup> Department of Biomedical and Chemical Engineering, Syracuse Biomaterials Institute, Syracuse University, Syracuse, NY, United States, <sup>3</sup> School of Civil and Environmental Engineering, Nanyang Technological University, Singapore, Singapore, <sup>4</sup> Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore, <sup>5</sup> Department of Civil and Environmental Engineering, and Biology, Syracuse Biomaterials Institute, Syracuse University, Syracuse, NY, United States, <sup>6</sup> Energy, Environmental and Chemical Engineering, Washington University, St. Louis, MO, United States

#### Edited by:

Weiwen Zhang, Tianjin University, China

#### Reviewed by:

Saori Kosono, The University of Tokyo, Japan Chao Wu, National Renewable Energy Laboratory (DOE), United States

#### \*Correspondence:

Bin Cao BinCao@ntu.edu.sg Yinjie J. Tang yinjie.tang@seas.wustl.edu

#### †Present address:

Chun Kiat Ng, Department of Engineering Science, University of Oxford, Oxford, United Kingdom

#### Specialty section:

This article was submitted to Microbial Physiology and Metabolism, a section of the journal Frontiers in Microbiology

> Received: 01 June 2018 Accepted: 17 October 2018 Published: 20 November 2018

#### Citation:

Wan N, Wang H, Ng CK, Mukherjee M, Ren D, Cao B and Tang YJ (2018) Bacterial Metabolism During Biofilm Growth Investigated by <sup>13</sup>C Tracing. Front. Microbiol. 9:2657. doi: 10.3389/fmicb.2018.02657 This study investigated the metabolism of Pseudomonas aeruginosa PAO1 during its biofilm development via microscopy imaging, gene expression analysis, and <sup>13</sup>Clabeling. First, dynamic labeling was employed to investigate glucose utilization rate in fresh biofilms (thickness 40∼60 micrometer). The labeling turnover time of glucose-6- P indicated biofilm metabolism was substantially slower than planktonic cells. Second, PAO1 was cultured in continuous tubular biofilm reactors or shake flasks. Then <sup>13</sup>Cmetabolic flux analysis of PAO1 was performed based on the isotopomer patterns of proteinogenic amino acids. The results showed that PAO1 biofilm cells during growth conserved the flux features as their planktonic mode. (1) Glucose could be degraded by two cyclic routes (the TCA cycle and the Entner-Doudoroff-Embden-Meyerhof-Parnas loop) that facilitated NAD(P)H supplies. (2) Anaplerotic pathways (including pyruvate shunt) increased flux plasticity. (3) Biofilm growth phenotype did not require significant intracellular flux rewiring (variations between biofilm and planktonic flux network, normalized by glucose uptake rate as 100%, were less than 20%). (4) Transcription analysis indicated that key catabolic genes in fresh biofilm cells had expression levels comparable to planktonic cells. Finally, PAO1, Shewanella oneidensis (as the comparing group), and their c-di-GMP transconjugants (with different biofilm formation capabilities) were <sup>13</sup>C-labeled under biofilm reactors or planktonic conditions. Analysis of amino acid labeling variances from different cultures indicated Shewanella flux network was more flexibly changed than PAO1 during its biofilm formation.

Keywords: c-di-GMP, dynamic labeling, Entner-Doudoroff pathway, pyruvate shunt, tubular biofilm reactors

## INTRODUCTION

Biofilm is a heterogeneous and dynamic system. Its development consists of steps of adhesion of planktonic microbes, colony formation and growth, and detachment/migration of dispersed cells to new surfaces. Moreover, cells at different locations inside a biofilm may have distinct metabolisms (e.g., different transcriptomic and proteomic profiles) due to intrinsic chemical gradients (Williamson et al., 2012). The physiological differences between biofilm and planktonic cells have attracted extensive studies (O'Toole et al., 2000; Bester et al., 2005). To quantify biofilm

physiologies, diverse technologies including crystal violet assay, transcription/protein/metabolite analyses, and imaging (e.g., SEM, TEM, confocal microscopy) have been applied (Pantanella et al., 2013). Moreover, genetic mutations are used to reveal regulatory mechanisms of cell survival in various biofilm environments (Ding et al., 2014; Zhang et al., 2014). However, there is still little knowledge of metabolic fluxomes that describe in vivo enzyme activities inside biofilm cells for carbon/energy metabolism.

To decipher flux distributions in biofilm cells, the present study investigated the opportunistic pathogen Pseudomonas aeruginosa PAO1 for its metabolic functions under both planktonic and biofilm modes. Particularly, <sup>13</sup>C-fingerprinting of proteinogenic amino acids was used to trace carbon fluxes for substrate utilization and biomass synthesis. In parallel, dynamic labeling via <sup>13</sup>C-glucose pulses was used to reveal the speed of <sup>13</sup>C percolating through central pathways in fresh biofilms as well as planktonic cells. This study also examined the c-di-GMP transconjugant of PAO1 via <sup>13</sup>Cfingerprinting. The transconjugant overexpressed c-di-GMP and produced excess extracellular polymer substances (EPS) to enhance the biofilm formation (Chua et al., 2015). To broaden our perception of the degree of flux profile conservations between planktonic and biofilm cells, the same isotopic approaches were also used to investigate Shewanella oneidensis MR-1 (a metal reducing bacterium capable of proliferating in both aerobic and anaerobic conditions) (Tang et al., 2007). The outcomes improved our understanding of the mechanisms about how bacterial species reorganized their flux network during biofilm development.

### MATERIALS AND METHODS

#### Strains and Cultivations

Pseudomonas aeruginosa PAO1 and its c-di-GMP transconjugants (i.e., a high c-di-GMP transconjugant with twice more biofilm formation and a low c-di-GMP transconjugant with reduced biofilm formation by ∼30%) were grown in an M9 medium using 20 mM glucose. S. oneidensis MR-1 and its high c-di-GMP transconjugant for enhanced biofilm formation, as additional tests, were grown in a modified MR-1 medium (Cao et al., 2011) using 20 mM sodium lactate. For planktonic cultures, bacteria (20 mL) were grown in flasks (150 mL) with inoculation volume ratio of 0.5% (at the room temperature, shaking at 200 rpm). To produce sufficient biofilm biomass for metabolic flux analysis (**Figure 1**), PAO1 or MR-1 was grown in tubular biofilm reactors (sets of 20 cm long O2-permeable silicon tubing with an inner diameter of 3 mm) at the room temperature (Ding et al., 2014), where the respective media were continuously pumped through the tubular reactor by Low-Speed Digital Peristaltic Pump system (Cole-Parmer, Singapore) (Sternberg and Tolker−Nielsen, 2006). Each tubular reactor was inoculated by injecting diluted planktonic culture using a syringe and resulted in initial OD<sup>600</sup> of ∼0.01. After inoculation, the media flow was stopped for 1 h to allow initial attachment followed by continuous media flow with a flow rate of 6 mL/h. The biofilm reactor had a pseudo-steady state operation for 3 days and the average wet biomass generation rate was ∼0.03 g/day/reactor.

### <sup>13</sup>C-Fingerprinting Amino Acids to Trace Flux Distributions

For labeled experiments, 20 mM [1,2-13C] glucose was used for cultivating both PAO1 and its c-di-GMP transconjugants, while [3-13C] sodium lactate was used for cultivating MR-1 and its c-di-GMP transconjugant. In planktonic mode, pseudo-steady-state shake flask cultures were harvested by centrifugation during mid-exponential phases. Cell pellets and supernatant were stored at −20◦C before further analysis. For biofilm mode, <sup>13</sup>C-labeled biomass in the tubular reactor was squeezed out for amino acid analysis. For labeled experiments, substrate concentrations (including glucose, lactate, and acetate) were measured using HPLC (Sivakumar et al., 2014). EPS formation was also determined (Jiao et al., 2010). To analyze proteinogenic amino acids, biomass pellets were hydrolyzed by 6 M HCl at 100◦C, then air-dried and derivatized with N-tert-butyldimethylsilyl-N-methyltrifluoroacetamide (TBDMS) prior to GC-MS measurement (You et al., 2012). A published software was used to correct MS peaks (i.e., [M-57] and [M-159]) (Wahl et al., 2004). Mass isotopomer distributions (MID) (M0, M1, M2...) represent fragments with (0, 1, 2. . .) labeled carbons in amino acids. Due to overlapping peaks or product degradation, proline, arginine, cysteine, and tryptophan were not analyzed (Antoniewicz et al., 2007).

#### Biofilm Imaging and Viability Analysis

Fresh PAO1 cells were grown on glass slides (1 cm<sup>2</sup> ) for biofilm imaging and viability analyses. Briefly, PAO1 overnight cultures were used to inoculate Petri dishes containing M9 medium supplemented with 1 g/L unlabeled glucose. Biofilm cultures were incubated for 96 h (replacing spent medium with fresh M9 medium containing 1 g/L glucose every 48 h). After washing with PBS buffer, glass slides with attached biofilms (thickness 40∼60 µm) were transferred into new Petri dishes containing fresh M9 medium. To observe the attachment/settlement of planktonic cells on biofilms, biofilm cells were stained using SYTO 9 green fluorescent nucleic acid stain (Thermo Fisher Scientific, Waltham, MA, United States), then PAO1 planktonic cells (OD<sup>600</sup> 0.7∼0.8) stained by orange dye Alexa Fluor 555 (Thermo Fisher Scientific, Waltham, MA, United States) were added into Petri dishes and incubated with biofilm slides for 1 h. The resulting biofilm was imaged using an Axio Imager M1 fluorescence microscope (Carl Zeiss, Inc., Germany) (note: the green color represents biofilm cells and orange color represents planktonic cells settled on slide surface). For parallel samples, live/dead staining images of PAO1 biofilm were also collected, where biofilm slides were stained with SYTO 9 (green) and propidium iodide (red) for 15 min at the room temperature before imaging (note: green stains all cells; while red indicates DNA in dead cells or extracellular DNA).

### Comparison of Glucose Catabolic Rates in PAO1 Cultures Using Dynamic <sup>13</sup>C Labeling

Glucose uptake in planktonic and biofilm PAO1 were measured by tracking <sup>13</sup>C incorporation rates of two key metabolites (glucose-6-P and glutamate) after pulsing fully labeled glucose into unlabeled cultures at the room temperature. For planktonic <sup>13</sup>C-experiments, PAO1 was grown in shake flasks with 1 g/L unlabeled glucose. Once cells reached late exponential phase (OD<sup>600</sup> 0.7∼0.8) and ∼90% non-labeled glucose was consumed, fully labeled <sup>13</sup>C glucose was added into the culture with final concentration of 2 g/L. After <sup>13</sup>C-glucose additions, 15 mL of cell cultures were harvested by mixing cultures with 5 mL pre-cold M9-ice solutions at four sampling points (0, 0.2, 1, and 5 min). The samples were further quenched with ethanoldry ice bath (−70◦C) to reduce culture temperature to ∼0 ◦C. Samples (with ice particles) were centrifuged at 8,000 rpm for 1 min and the pellets were kept at −20◦C before LC–MS measurement. For dynamic <sup>13</sup>C-experiments on biofilm, fresh PAO1 biofilm cells were prepared using glass slides (same as that for cell imaging). Before labeling experiments, glass slides with fresh unlabeled biofilm cells were washed by phosphatebuffered saline (PBS, 1X) buffer then soaked in 25 mL M9 medium containing 1 g/L fully labeled <sup>13</sup>C-glucose for 0.2, 1, 5, 30, and 180 min. To harvest time-course samples, glass slides were placed in PBS-ice solution to quench cell metabolisms. Free metabolites were measured by LC–MS (Hollinshead et al., 2016). Briefly, quenched planktonic or biofilm cells were placed in cold methanol/chloroform solution (7:3 v/v) and shaken at 150 rpm at 4◦C overnight. Deionized water was added to the solvent mix to extract cell metabolites. The aqueous phase was filtered through an Amicon Ultra centrifuge filter (3000 Da; EMD Millipore, Billerica, MA, United States) then lyophilized. The dried samples were dissolved in acetonitrile and water (6:4, v/v) solution for LC–MS analysis (Agilent Technologies 1200 Series equipped with a SeQuant Zic-pHILIC

column) to determine MS distributions of targeted metabolites (**Figure 2**).

#### Gene Expressions in Fresh Biofilm Cells

The qPCR was used to compare the expressions of glycolytic pathway genes between fresh biofilm cells from glass slides and planktonic cells. The protocol has been reported in our previous research (Choudhary et al., 2015). Generally, the cDNA was synthesized from the isolated RNA samples of PAO1 planktonic cells and glass slide biofilms using iScript cDNA Synthesis Kit (Biorad, Hercules, CA, United States). The primers were designed within primer blast (NCBI). The qPCR samples were prepared by mixing cDNA, primers, and iTaqTM universal SYBR Green Supermix (Biorad, Hercules, CA, United States). The qPCR reactions were accomplished with an Eppendorf Mastercycler Realplex thermal cycler (Eppendorf, Hauppauge, NY, United States). The condition of qPCR reactions was: heat activation at 95◦C for 1 min, 40 cycles of denaturation at 95◦C for 10 s, and annealing/extension at 60◦C for 1 min. The melting curve was set at 95◦C for 30 s, 45◦C for 30 s, 20 min hold with temperature gradient, and 95◦C for 1 min. The relative expression ratios of the selected genes were analyzed using the LinRegPCR program (Heart Failure Research Center, Netherlands) and equation below (Pfaffl, 2001):

$$\text{Log2Ratio} = \text{Log2}\left[\frac{E\_T^{\Delta C\_q^{\text{Target}}(\text{Planckonic} - Bioflim)}}{E\_{References}^{\Delta C\_q^{\text{Referencze}}(\text{Planckonic} - Bioflim)}}\right]$$

1Cq represented the difference in value of quantitation cycle between planktonic and biofilm samples. E described the qPCR efficiency. Both 1Cq and E were calculated by the LinReg PCR program based on the raw data of qPCR experiments. The target samples were seven selected genes (PA4732, PA5110, PA3131, PA5192, PA5435, PA1580, and PA2828) related to glucose metabolism of P. aeruginosa

(**Figure 3**). The reference sample was housekeeping gene proC (Savli et al., 2003).

relative expression ratio of target genes (Pfaffl, 2001; Savli et al., 2003; Pan

et al., 2012).

#### Metabolic Flux Analysis of Planktonic Culture and Tubular Reactor Biofilm Cells

<sup>13</sup>C-MFA was performed based on isotopomer data from proteinogenic amino acids from biofilm reactor and shake flask cultures (**Supplementary Material**). The software WUflux (He et al., 2016) was used for flux calculations. Biomass composition was modified based on previous study (Bartell et al., 2016). The MFA model included the EMP (Embden-Meyerhof-Parnas) pathway, the OPP (oxidative pentose phosphate) pathway, the ED (Entner-Doudoroff) pathway, the TCA cycle, the glyoxylate shunt, and biomass synthesis (Stover et al., 2000). Based on KEGG database, PAO1 contains fructose-1,6 bisphosphatase but lacks phosphofructokinase and thus the reaction (F6P→FBP) was deleted from the model. Since the precise measurement of actual glucose utilization for biofilm production was very difficult due to the presence of both planktonic cells and biofilm cells in tubular reactors, <sup>13</sup>C-MFA profiled relative fluxes by assuming glucose uptake rate as 100 units. The relative fluxes were solved by minimizing a quadratic error function that calculated the differences between predicted and measured isotopomer patterns (n = 2). The confidence intervals of fluxes were estimated as following. The model randomly perturbed both biomass equation for EPS formations by ± 10% and amino acid MID data within measurement standard deviations for 500 times to simulate experimental uncertainty. Based on each new dataset, the model re-calculated fluxes. Then confident intervals were estimated based on the variations of resulting fluxes (He et al., 2016).

### RESULTS

fmicb-09-02657 November 16, 2018 Time: 18:10 # 5

### Dynamic Labeling of Free Metabolites in Planktonic and Biofilm Cells

Some bacterial species favor the growth on solid surfaces, while others favor planktonic mode. Comparisons between biofilm and planktonic cell growths have been extensively reported (Heffernan et al., 2009). To understand overall PAO1 biofilm physiologies, we prepared fresh PAO1 biofilms on glass slides. Before pulsing <sup>13</sup>C-glucose for the dynamic labeling of biofilm cells, we washed glass slides to remove planktonic cells attached on the biofilm surface. Fluorescence microscope imaging confirmed that few planktonic cells (pre-labeled with orange dye) remained on the biofilm surface (in green color) (**Figure 1**). These biofilm cells on glass slides could be easily

FIGURE 4 | Flux ratio of Pseudomonas aeruginosa as planktonic (black) and biofilm (red). The fluxes were normalized to the glucose uptake rate (represented as 100), and the fluxes are represented as 'best fit ± confidence intervals' based on the measured isotopomer distributions (biological duplicates). The arrow thickness relates to the magnitude of flux. The white arrows represent the fluxes toward biomass synthesis. 3PG, 3-phosphoglycerate; 6PG, 6-phosphogluconate; AceCoA, acetyl-CoA; DHAP, dihydroxyacetone phosphate; E4P, erythrose 4-phosphate; FBP, fructose 1,6-bisphosphate; F6P, fructose 6-phosphate; G6P, glucose 6-phosphate; GAP, glyceraldehyde 3-phosphate; GLX, glyoxylate; ICT, isocitrate; MA L, malate; OAA, oxaloacetate; PEP, phosphoenolpyruvate; PYR, pyruvate; R5P, ribose 5-phosphate; Ru5P, ribulose-5-phosphate; RuBP, ribulose-1,5-diphosphate; S7P, sedoheptulose-7-phosphate; SUC, succinate; X5P, xylulose-5-phosphate.

sampled and quenched for fast turnover metabolite analysis or cell imaging. Here, dynamic labeling technique was used to measure metabolite turnover rates in biofilm cells from glass slides, which were then compared with shake flask cultures. **Figure 2** showed labeling rates for two key metabolites after <sup>13</sup>Cglucose was pulsed into biofilm or planktonic cells. As expected, labeling rates of G6P (first metabolic node after glucose uptake) for planktonic cells were much faster than biofilm cells, and the <sup>13</sup>C enrichment reached saturation within 5 min. However, it took 180 min for G6P labeling to reach saturation in biofilm cells. Interestingly, final labeling percentages of G6P reached > 85% in biofilm cells, indicating that the majority of biofilm cells were metabolically active for glucose utilizations despite the slow rate. Spatial stratification of oxygen and glucose within the biofilm was a possible explanation. Moreover, free glutamate (the key downstream product from the TCA cycle for biomass synthesis) from both planktonic cells and biofilm cells were labeled much slower than that of G6P (20∼25% after 5 min). This observation could be explained by the fact that metabolite turnover rates in amino acid synthesis pathways were much slower than the glucose uptake rates under both biofilm and planktonic modes.

## Fluxomes of Planktonic and Biofilm Pseudomonas Cells

Planktonic fluxes in P. aeruginosa have been reported (Berger et al., 2014; Lassek et al., 2016; Opperman and Shachar-Hill, 2016). These studies highlighted the glucokinase (phosphorylate glucose to G6P then to 6PG) and ED pathways that are mainly responsible for glucose catabolism. The magnitude of fluxes through the oxidative pentose phosphate pathway, glyoxylate shunt, and the TCA cycle varied among different reports. This study examined the P. aeruginosa metabolism in both planktonic and biofilm modes at the room temperature. By cultivation with [1, 2-13C] labeled glucose in tubular reactors, the resulting proteinogenic amino acids stored labeling information (i.e., <sup>13</sup>C-fingerprinting) that could be used for <sup>13</sup>C-MFA. In contrast to dynamic labeling experiments (i.e., G6P turnover rates) that showed overall metabolic rates in the biofilm were much slower than their free-floating counterparts, metabolic flux distributions had smaller variations between biofilm and planktonic modes after normalizing glucose uptake as 100%. Most flux values in biofilm cells differed within 20% compared to planktonic cells. For both cultivation modes, flux network showed a complete carbohydrate degradation loop: Entner-Doudoroff-Embden-Meyerhof-Parnas (EDEMP) cycle (G6P→6PG→GAP→F6P→G6P) (**Figure 4**), possibly due to metabolic congestion at the lower segment of glycolysis. Compared to biofilm cells, planktonic cells had moderately higher fluxes through the TCA cycle. Similar EDEMP cycle has been observed in Pseudomonas putida (Nikel et al., 2015). Pseudomonas is well-known for using the ED pathway rather than the EMP for the glucose catabolism due to the absence of phosphofructokinase (Berger et al., 2014). The ED pathway

is not beneficial to ATP generation, but it reduces metabolic cost for enzyme synthesis (Stettner and Segrè, 2013). More importantly, the formation of EDEMP cycle could improve NADPH generation to diminish oxidative stress and to promote the biosynthesis of C6 sugar phosphates (the precursor of EPS). The TCA cycle in Pseudomonas species was reported to operate with the pyruvate shunt, which was catalyzed by malic enzyme and pyruvate carboxylase (malate→pyruvate→OAA) (Fuhrer et al., 2005; del Castillo et al., 2007). Same pyruvate shunt was observed in both PAO1 planktonic and biofilm cells (**Figure 4**). For example, very little malate dehydrogenase flux was observed in planktonic cells, and a significant amount of OAA was synthesized from pyruvate. The pyruvate shunt coupled with other anaplerotic pathways (including glyoxylate shunt) could regulate fluxes between glycolysis nodes (PEP and Pyruvate) and the TCA nodes (Malate and OAA) to increase flux network plasticity.

The variations of flux network between planktonic and biofilm cells were further investigated via qPCR analysis. We compared the expression levels of seven key genes related to glucose metabolism (including pgi, fbp, edaB, pckA, oadA, gltA, aminotransferase) along with the housekeeping gene proC. According to the results shown in **Figure 3**, proC was not differentially expressed under the two culture modes, which was consistent with a previous research (Savli et al., 2003). The qPCR results also indicated that expression levels of all selected genes between biofilm and planktonic samples have relatively small differences (note: less than twofold). This result, though incomplete to reflect global genetic regulations, suggested PAO1 could maintain normal functions of many central genes for glucose catabolism during the active biofilm growth phase.

### <sup>13</sup>C Fingerprinting of the PAO1 Transconjugant and Shewanella Under Planktonic and Biofilm Conditions

We examined the transconjugants (i.e., high or low c-di-GMP expressions) of PAO1 via <sup>13</sup>C-labeling of proteinogenic amino acids from tubular reactors or shake flask cultures. The <sup>13</sup>Cfingerprints (MID of amino acid labeling) of PAO1 and high c-di-GMP transconjugant were collected from planktonic cultures and biofilm reactors and plotted in **Figure 5A**. Compared to PAO1 wild type, its high c-di-GMP transconjugant produced 1.9-fold more EPS and twice more biofilm in tubular reactors. Moreover, using labeling data of PAO1 planktonic culture as the baseline, MID data were found to have high correlations (R <sup>2</sup> = 0.99) correlations between PAO1 and its transconjugant samples from planktonic and biofilm cultures (**Figure 5A**). This observation inferred that the mutant and the wild type shared similar flux distributions (i.e., change of planktonic or biofilm growth rate does not require significant intracellular flux rewiring). To obtain a broader understanding of flux regulations, similar <sup>13</sup>C-fingerpring experiments on S. oneidensis MR-1 and its high c-di-GMP transconjugant were performed. The MID of proteinogenic amino acids also demonstrated strong correlations (R <sup>2</sup> = 0.99) among MR-1 and its c-di-GMP transconjugant (**Figure 5B**). However, the root-mean-square error (RMSE) of labeling data variations between planktonic and biofilm cells in MR-1 was 1.5-fold higher than the RMSE obtained from PAO1 cultures (**Figure 5A**). Further principal component analysis (PCA) examined MID (as the features) of amino acids from different <sup>13</sup>C-cultures (planktonic or biofilm cultures of PAO1, MR-1, and their transconjugants) (**Figure 5C**). Both the RMSE and PCA results indicated that the MR-1 metabolism could be more affected by its biofilm growth mode than the PAO1. This observation (i.e., MR-1 flux network was more flexible) was consistent to the reproted versatility of MR-1 metabolisms (Guo et al., 2015). For example, O<sup>2</sup> conditions could influence acetate overflows and intracellular fluxome in MR-1 (Tang et al., 2007). Nevertheless, different bacteria may have different capabilities for minimizing the change of flux network when cells switch from planktonic to biofilm growth.

#### DISCUSSION

There is a consensus that cell attachment onto surfaces strongly influences microbial metabolism. For example, P. aeruginosa displays phenotypic changes during biofilm development (Sauer et al., 2002). Because of temporal and structural variations, conflicting observations have been reported on biofilm growth kinetics and metabolic activities compared to free-floating cells (van Loosdrecht et al., 1990; Heffernan et al., 2009). In this study, glucose uptake by fresh biofilm cells (based on G6P labeling) was found to be much slower than planktonic cells, while both planktonic and biofilm cells had sluggish glutamate synthesis (**Figure 2**). Moreover, biofilm cells employed a relatively similar flux network as planktonic cultures: PAO1 glucose catabolism was mainly dependent on the EDEMP/TCA loops, pyruvate shunt, and several anaplerotic pathways. Meanwhile, expression levels of essential genes in PAO1 central pathways were analyzed and no target gene in glucose catabolism was highly up-regulated or down-regulated (Log2 ratio of 2 as the cutoff, **Figure 3**) between planktonic and fresh biofilm cells (note: only two genes in glycolysis, pgi and fbp appeared to be moderately repressed in

biofilms compared to planktonic cells). The gene expressions in fresh biofilm cells indicated that their metabolism could maintain stable catabolic functions. These biofilm metabolic features could be explained by three reasons. First, our cultivations offered optimal biofilm growth and minimized biofilm heterogeneity. For example, majority of cells were alive in the freshly prepared biofilm (thickness of only 40∼60 µm) on glass slides (dominating green signals compared to red signals in **Figure 6**), while the use of silicone tubing bioreactor improved oxygen and nutrient transports for biofilm biomass generation. Second, cells located in the peripheral layers of biofilms might contribute significantly to biofilm growth since these cells received nutrients at a level similar to that of planktonic cells. Third, bacterial metabolism inherently demonstrated robust ratios for resource allocations.

In a broader perspective, bacterial flux networks are not straightforwardly correlated with gene expressions (Chubukov et al., 2013) or proteomic profiles (Lassek et al., 2016). Although bacterial physiologies are sensitive to nutrient and growth conditions, flux ratios/network may demonstrate small perturbations or certain rigidity against genetic and environmental changes (Fischer and Sauer, 2005). For example, bacterial flux distribution under salt stresses could remain the same as normal growth conditions, which was in stark contrast to slower growth rate and high changes of transcript profiles (Tang et al., 2009). The conservation of microbial fluxomics (i.e., metabolic robustness) is regarded as the principle of how cell metabolism distributes resources for biomass growth, while microbial species may demonstrate different degrees of flux conservations during their biofilm growth.

The methods and observations in this study still have limitations. First, the variation in growth conditions and surface materials from different lab cultures may influence cell metabolisms. Second, a biofilm culture includes at least three sub-populations (planktonic cells, fast growing biofilm cells, and dormant/dead biofilm cells in deep layers, as shown in **Figure 6**). <sup>13</sup>C-fingerprining of proteinogenic amino acids could only track these actively growing cells (i.e., on the top of the biofilms or deposited from planktonic phase) that consumed major nutrient resources for biomass synthesis. This approach failed to provide unique insights into the metabolic topology or flux network plasticity for these dormant/slow-growth biofilm cells under environmental stresses. To further reveal metabolic activities in heterogeneous biofilm, new tools (such as population snapshot measurements by cell sorting) are required to integrate with <sup>13</sup>Clabeling techniques. Some cell patterning technologies may also be adapted to obtain biofilms with well-defined structures (thus reduced heterogeneity) to allow better understanding of biofilm metabolism (Ren et al., 2012, 2013a,b; Gu et al., 2013). This is part of our ongoing work.

#### REFERENCES


### CONCLUSION

The flux network in biofilm cell is not yet well understood. This study elucidated metabolic features of PAO1 biofilm cells via comparative <sup>13</sup>C labeling. Bacterial cells within biofilms differ in physiologies because of nutrient and oxygen limitations, but biofilm flux distributions could still show certain degree of invariability. Specifically, PAO1 cells could fairly maintain its flux distributions and gene expressions as its planktonic culture during active biofilm development. To further decipher biofilm metabolism and regulations in different bacterial sepcies, our future work aims to expand metabolite coverage as well as spatial and temporal anlaysis of biofilm subpopulations.

### AVAILABILITY OF DATA AND MATERIALS

All isotopomer data and metabolic reactions are included as supporting information.

### AUTHOR CONTRIBUTIONS

YT, DR, and BC initiated the project and designed experiments. NW, HW, CN, and MM performed experiments and modeling analysis. All authors wrote and approved the final version of the manuscript.

### FUNDING

This research was supported by US NSF grants (CBET 1700881 and 1706061). This research is also supported by the Ministry of Education (MOE) in Singapore with Academic Research Fund (AcRF) Tier 1 Grant RG50/16 (M4011622.030 to BC), AcRF Tier 2 Grant (MOE2017-T2-2-042 to BC), and the National Research Foundation under its Research Centre of Excellence Program (Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore).

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmicb. 2018.02657/full#supplementary-material

to interrogate virulence factor synthesis. Nat. Commun. 8:14631. doi: 10.1038/ ncomms14631

Berger, A., Dohnt, K., Tielen, P., Jahn, D., Becker, J., and Wittmann, C. (2014). Robustness and plasticity of metabolic pathway flux among uropathogenic isolates of Pseudomonas aeruginosa. PLoS One 9:e88368. doi: 10.1371/journal.pone. 0088368


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Wan, Wang, Ng, Mukherjee, Ren, Cao and Tang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Isotope-Assisted Metabolite Analysis Sheds Light on Central Carbon Metabolism of a Model Cellulolytic Bacterium *Clostridium thermocellum*

Wei Xiong\*, Jonathan Lo, Katherine J. Chou, Chao Wu, Lauren Magnusson, Tao Dong and PinChing Maness\*

Cellulolytic bacteria have the potential to perform lignocellulose hydrolysis and

National Renewable Energy Laboratory, Golden, CO, United States

#### *Edited by:*

Ivan A. Berg, Universität Münster, Germany

#### *Reviewed by:*

Wolfgang Buckel, Philipps-Universität Marburg, Germany Wolfgang Eisenreich, Technische Universität München, Germany

*\*Correspondence:*

Wei Xiong Wei.Xiong@nrel.gov PinChing Maness PinChing.Maness@nrel.gov

#### *Specialty section:*

This article was submitted to Microbial Physiology and Metabolism, a section of the journal Frontiers in Microbiology

> *Received:* 23 May 2018 *Accepted:* 31 July 2018 *Published:* 23 August 2018

#### *Citation:*

Xiong W, Lo J, Chou KJ, Wu C, Magnusson L, Dong T and Maness P (2018) Isotope-Assisted Metabolite Analysis Sheds Light on Central Carbon Metabolism of a Model Cellulolytic Bacterium Clostridium thermocellum. Front. Microbiol. 9:1947. doi: 10.3389/fmicb.2018.01947 fermentation simultaneously. The metabolic pathways of these bacteria, therefore, require more comprehensive and quantitative understanding. Using isotope tracer, gas chromatography-mass spectrometry, and metabolic flux modeling, we decipher the metabolic network of Clostridium thermocellum, a model cellulolytic bacterium which represents as an attractive platform for conversion of lignocellulose to dedicated products. We uncover that the Embden–Meyerhof–Parnas (EMP) pathway is the predominant glycolytic route whereas the Entner–Doudoroff (ED) pathway and oxidative pentose phosphate pathway are inactive. We also observe that C. thermocellum's TCA cycle is initiated by both Si- and Re-citrate synthase, and it is disconnected between 2-oxoglutarate and oxaloacetate in the oxidative direction; C. thermocellum uses a citramalate shunt to synthesize isoleucine; and both the one-carbon pathway and the malate shunt are highly active in this bacterium. To gain a quantitative understanding, we further formulate a fluxome map to quantify the metabolic fluxes through central metabolic pathways. This work represents the first global in vivo investigation of the principal carbon metabolism of C. thermocellum. Our results elucidate the unique structure of metabolic network in this cellulolytic bacterium and demonstrate the capability of isotope-assisted metabolite studies in understanding microbial metabolism of industrial interests.

Keywords: <sup>13</sup>C-isotope tracer, cellulolytic bacteria, citrate synthase, glycolytic pathways, isoleucine biosynthesis, metabolic flux analysis, tricarboxylic acid cycle

## INTRODUCTION

Lignocellulosic biomass is the most abundant bio-renewable resource on earth. It has the potential to substitute the current sugar feedstocks for sustainable biofuels and biochemicals production. The use of lignocellulosic biomass has significant merits including improving net carbon and energy balances, reducing production cost, and avoiding the competition between food vs. fuel (Lynd et al., 2005, 2008). However, effective application of lignocellulose in bio-refineries is currently impeded by biomass recalcitrance: its inherent resistance to depolymerization. A potential solution is consolidated bioprocessing (CBP), in which microbes of outstanding cellulolytic ability are recruited for combined cellulose hydrolysis and fermentation without extra addition of cellulases (Lynd et al., 2005). Clostridium thermocellum is among the most attractive CBP host microbes.

C. thermocellum is a Gram-positive, thermophilic, and strict anaerobic bacterium. By taking advantage of an extracellular cellulase system called the cellulosome (Gold and Martin, 2007), C. thermocellum can degrade cellulose into soluble oligosaccharides. The latter can be further utilized by the cells and fermented to products of industrial interest (e.g., H2, formate, lactate, acetate, ethanol) (Ellis et al., 2012; Holwerda et al., 2014). Recent advances in genetic modification tools (Tripathi et al., 2010; Argyros et al., 2011) further enable heterologous expression of new functional pathways (e.g., pentose sugar utilization Xiong et al., 2018, isobutanol production Lin et al., 2015) into the bacterium, making this CBP bacterium an attractive and engineerable platform for biofuels or biochemicals production.

Applications of C. thermocellum in CBP require not only knowledge in molecular and genetic details of its cellulolytic system but also global understanding of its carbon metabolism. Detailing the carbon fluxes from metabolic pathways in C. thermocellum will guide design principles for efficient conversion of lignocellulosic sugars into dedicated products. To date, there have been a few published works describing the fermentation pathways in C. thermocellum (Zhang and Lynd, 2005; Lu et al., 2006; Nataf et al., 2010; Olson et al., 2010; Deng et al., 2013; Zhou et al., 2013), whereas metabolic characteristics of this species have not been comprehensively and quantitatively understood. Although the availability of genome sequence of C. thermocellum strains (DSM 1313 and ATCC 27405) has provided essential information and blue prints in terms of metabolic pathways, elaborate experimental validation is still required to fully realize its biotechnological applications.

Recently, <sup>13</sup>C-tracer and metabolic flux analysis has emerged as a powerful tool for validating genome annotation and this technique has enabled us to identify a unique CO2 fixing one-carbon metabolic pathway in C. thermocellum (Xiong et al., 2016). Complementing classic biochemical approaches in understanding metabolism such as enzyme activity assay, gene mutagenesis and the use of specific inhibitors to key metabolic steps, it provides a uniquely non-invasive approach to delineate the cellular metabolism in vivo. Here we use <sup>13</sup>Clabeled substrates as isotopic tracers to follow the operation of C. thermocellum's central carbon metabolism and other primary metabolic pathways directly in living cells. As a verification of genomic information, we find that the Embden–Meyerhof– Parnas (EMP) pathway is the predominant glycolytic route whereas the Entner–Doudoroff (ED) pathway and oxidative pentose phosphate pathway are inactive. We also observe that C. thermocellum's TCA cycle is initiated with both Si- and Recitrate synthase and it is broken between 2-oxoglutarate and oxaloacetate in the oxidative direction. C. thermocellum uses a citramalate shunt to synthesize isoleucine; and the one-carbon metabolism and the malate shunt are active in this bacterium.

To gain a quantitative insight into the newly identified metabolic activities, we constructed a fluxomic model that enables us to quantify the metabolic fluxes through the EMP pathway, the nonoxidative pentose phosphate pathway, the TCA cycle, the malate shunt, and amino acid biosynthesis pathways. This study represents the first global characterization of the central carbon metabolic pathways in a model cellulolytic species. It also showcases the capability of isotope-assisted metabolite analysis in complementing genome annotation with a more indepth understanding of microbial metabolism.

## MATERIALS AND METHODS

### Strains, Culture Conditions, and Medium

C. thermocellum DSM 1313 derived strain 1hpt (Xiong et al., 2016) was grown anaerobically at 55◦C on 5 g/L glucose. We used the DSM122-defined medium referred to as CTFUD minimum medium (Olson and Lynd, 2012). The rich medium was added with 1 g/L yeast extract. The growth medium was deoxygenated by gassing with argon for 20 min and autoclaved before use.

#### Quantitative Analysis of Fermentation Products

Cell growth was measured as a function of optical density by spectrophotometry (DU800; Beckman Coulter) at OD600nm. An OD<sup>600</sup> of 1 correlated to 1.04 g/L cell dry weight (R <sup>2</sup> = 0.9918). Hydrogen gas was measured using a gas chromatograph (7890A GC; Agilent Technologies) equipped with a thermal conductivity detector and a stainless-steel Supelco 60/80 Mol Sieve column (6 ft × 1/8 in) with argon as the carrier gas. Peak areas were compared with a standard curve. Glucose, lactate, formate, acetate, and ethanol were measured by HPLC (1200 series; Agilent Technologies) with a mobile phase of 4 mM H2SO<sup>4</sup> at 0.6 mL/min flow rate using an Aminex HPX-87H column with a Micro Guard Cation H Cartridge. The column temperature was set to 55◦C.

## <sup>13</sup>C-Tracer Experiment

We adopted a steady-state labeling strategy. The <sup>13</sup>C-labeling experiment was performed with different nutrient combinations: (1) 5.56 mM [U-13C6] glucose (20%) + 22.22 mM unlabeled glucose (80%); (2) 20 mM <sup>13</sup>C-bicarbonate + 27.78 mM (5 g/L) glucose; (3) 20 mM <sup>13</sup>C-formate + 27.78 mM glucose. (4) 20 mM [U-13C5] glutamate + 27.78 mM glucose. These nutrient combinations are supplemented into the CTFUD defined medium, respectively. For cell growth, C. thermocellum strains were inoculated into these media with a starting OD<sup>600</sup> of 0.05. When late log phase was reached (OD<sup>600</sup> above 0.6), 3 mL of cultures was sampled.

#### GC-MS and Isotopomer Analysis for Proteinogenic Amino Acids

The sample treatment and GC-MS analysis were done as previously reported (5), with a few modifications. Briefly, 3 mL of sampled cultures was centrifuged at 10,000 × g for 1 min, and the cell pellets were digested with 500 µL of 6 M HCl at 105◦C for 12 h. The hydrolysate was dried under nitrogen gas flow at 65◦C and dissolved in 50 µL of waterfree dimethylformamide. For the GC/MS measurement, the proteinogenic amino acids were derivatized before analysis. The dried hydrolysate, dissolved in N, N-dimethylformamide, was derivatized by 1% tert-butyl-dimethylchlorosilane (TBDMS) at 85◦C for 60 min. One microliter of the sample in the organic phase was loaded onto the GC/MS instrument [Agilent GC-6890 gas chromatograph equipped with an Agilent 19091J-413 column (30 m × 0.32 mm × 0.25µm) directly connected to a MS-5975C mass spectrometer]. Helium was the carrier gas. The oven temperature was initially held at 50◦C for 2 min; it was then raised to 150◦C at 5◦C/min and held at that value for 2 min. Finally, it was raised to 320◦C at 7◦C/min and held at that final value for 2 min. Other settings included splitless and electron impact ionization at 70 eV. The amino acids, including alanine, aspartate, glutamate, glycine, isoleucine, leucine, methionine, phenylalanine, proline, serine, threonine, tyrosine, and valine, were separated and analyzed. Histidine was not detected due to its low abundance in cell biomass.

To analyze the isotope labeling pattern of amino acids, a mass isotopomer distribution vector, MDVα, was assigned according to Nanchen et al. (2007).

$$\text{MDV}\_{\alpha} = \begin{bmatrix} (m\_0) \\ (m\_1) \\ \vdots \\ (m\_n) \end{bmatrix} \\ \sum\_{i=0}^n m\_i = 1 \\ \tag{1}$$

where m<sup>0</sup> is the fractional abundance of molecules with monoisotopic mass and m<sup>i</sup> <sup>&</sup>gt;<sup>0</sup> is the abundance of fragments with heavier masses. The GC-MS data were corrected for the naturally occurring isotopes of oxygen (O), hydrogen (H), and carbon (C) atoms using a correction matrix (Equation 2) as described by Nanchen et al. (2007).

$$\text{MDV}\_{\alpha}^{\ast} = \text{C}\_{\text{corr,COH}^{-1}} \text{MDV}\_{\alpha} \tag{2}$$

where MDV<sup>∗</sup> α is the corrected mass isotopomer distribution vector and C−<sup>1</sup> corr,COH is the correction matrix. According to Equation 3, the resulting MDV<sup>∗</sup> α values were then used to assess the fractional labeling enrichment of amino acids, respectively.

$$FL = \frac{\sum\_{i=0}^{n} i.mi}{n.\sum\_{i=0}^{n} mi} \tag{3}$$

where n represents the number of carbon atoms in the amino acid and i is the mass isotopomer.

#### Quantification of Metabolic Fluxes

The central carbon network of C. thermocellum DSM 1313 was constructed based on genome knowledge which has been validated by isotope tracer experiments in this work. Specifically, the network includes the EMP pathway, the nonoxidative pentose phosphate pathway, the malate shunt, the incomplete TCA Cycle led by both Si- and Re-citrate synthase, and amino acids biosynthesis pathways (Complete reactions list please see **Supplementary File 1**). The biomass composition (**Supplementary File 1**, R53) was defined according to a previous report describing genome-scale reconstruction of C. thermocellum metabolic network (Roberts et al., 2010). Minor modifications include: (1) We formulate the biomass equation by using the metabolites appeared in the reactions list (**Supplementary File 1**). (2) Experimentally measured fatty acids profiles are employed (see **Supplementary File 1**). The <sup>13</sup>C-metabolic fluxes are quantified by minimizing the sumof-squared residuals (SSR) between computationally simulated and experimentally determined measurements. INCA, a <sup>13</sup>C-flux software based on Matlab platform (Young, 2014) was utilized for flux estimation.

### RESULTS

#### Validation of the Experimental System for <sup>13</sup>C Tracer Analysis of *C. thermocellum*

The first task for <sup>13</sup>C-tracer analysis is to validate the experimental system by examining the effect of medium nutrients on <sup>13</sup>C-labeling. This is because complex carbon nutrients (i.e., amino acids in yeast extract) can be incorporated into cell biomass, and thus interfere with a quantitative flux analysis. To test medium effect, we used 20% fully <sup>13</sup>C-labeled and 80% fully unlabeled glucose as the carbon source and cultivated the cells in CTFUD rich medium containing yeast extract (1 g/L) and CTFUD defined medium without yeast extract, respectively. We used cultures when cells were at late exponential phase of growth (OD<sup>600</sup> above 0.6). The labeling steady state is assumed to be reached at this growth stage, we then quantify the fractional labeling of each proteinogenic amino acid (AA) fragments by GC-MS. As shown in **Figure S1**, the labeling ratio of most proteinogenic AA fragments in rich medium is far less than the theoretical value (20%), indicating that unlabeled amino acids in yeast extract were assimilated by C. thermocellum directly and thus diluting the labeling ratio significantly. In contrast, the labeling ratio of AA fragments in CTFUD define medium is much close to theoretical value (20%), confirming that <sup>13</sup>C-labeling is not interfered by medium nutrients in the defined medium and the pseudo steady state for labeling has been reached. Overall, our results have shown that tracer experiment using minimal CTFUD medium is a valid system for <sup>13</sup>C metabolic flux analysis.

### Key Amino Acids Pathways

To examine amino acids biosynthetic pathways in C. thermocellum, the labeling profiles of 13 proteinogenic amino acids were analyzed. We used <sup>13</sup>C-bicarbonate labeling data as the blueprint, since <sup>13</sup>C-bicarbonate labeling leads to the [1-13C] pyruvate through reversed pyruvate ferredoxin oxidoreductase (PFOR) (Xiong et al., 2016), and pyruvate may serve as the precursor of several key amino acids. For example, valine and leucine exhibited similar carbon molecule-labeling pattern of pyruvate (see **Figure S2**) and the labeled carbons were shown on the first position of these two amino acids (carboxylic group). The finding supports that pyruvate is the common precursor of both valine and leucine in C. thermocellum (see **Figure S2**).

With respect to aspartate biosynthesis, we detected that 15% of aspartate has two labeled carbons (**Figure S3**). It is consistent with a strong activity of anaplerotic reactions (the malate shunt), which enable the combination of [1-13C] pyruvate and a <sup>13</sup>Cbicarbonate to form a two-carbon labeled malate and then oxaloacetate, the precursor of aspartate. We further observed that threonine has identical labeling pattern as that of aspartate, confirming that its carbon skeleton is derived from oxaloacetate directly (**Figure S3**).

Next, we examine the biosynthetic pathway for isoleucine, which is a typical branch chain amino acid. Genomic information (https://biocyc.org) suggests that two possible pathways may contribute to the biosynthesis of isoleucine: the threonine pathway and the citramalate pathway (Please see **Figure S4** for detailed pathways). These two routes may lead to differential features in isotopomer labeling, thus enabling us to evaluate their relative contribution accordingly. The pathway from threonine may generate two-carbon labeled isoleucine (see **Figure S4**), while the citramalate pathway leads to only one <sup>13</sup>C-carbon on the first position of isoleucine. The measured Mass Distribution Vector (MDV) of isoleucine showed that main labeling of isoleucine is exactly located at the carboxylic group of isoleucine (see **Figure S4** Ile M-15, and Ile M-85) and the labeling probability on another carbon atom is comparatively low (m1: 7% in Ile M-85, see **Figure S4**). Apparently, this data suggests that the citramalate pathway serve as the dominant pathway for isoleucine biosynthesis.

#### Glycolytic Fluxes Are Dominantly Contributed by the EMP Pathway

The pathway activities in C. thermocellum including the glycolytic pathway and the one-carbon pathway were then quantitatively analyzed by <sup>13</sup>C-tracer analysis. Specifically, we utilized [1-13C] glucose as the carbon tracer to distinguish the glycolytic flux from the EMP pathway, the oxidative pentose phosphate pathway, and the ED pathway. Proteinogenic serine was used as the readout as it can be diversely synthesized from multiple routes of the glycolytic pathways. The deduced serine labeling pattern from each pathway is shown in **Figure 1**. Since the EMP pathway will split the hexose in the middle and results in 50% of serine labeled with one <sup>13</sup>C atom, whereas other pathways only generate unlabeled serine (labeling ratio equals to 0%). The flux from the EMP pathway can be quantified accordingly and the results are shown in **Figure 2**. Our data show 62% of serine are derived from the EMP pathway, supporting that the EMP pathway is the dominant glycolytic route in C. thermocellum. However, origin of the rest 38% of serine remains as an intriguing question.

We further tracked which other pathway(s) may contribute to the biosynthesis of the remaining labeled serine. With respect to the ED pathway, genes for ED pathway are not annotated in C. thermocellum genome but are identified in other Clostridia species (Bender and Gottschalk, 1973). The potential contribution from ED pathway therefore should be taken into account. We analyzed the ED pathway activity by checking the labeling pattern of alanine. If fluxes go through the ED pathway, [1-13C] glucose tracer will lead to labeling at the carboxylic group (C1) of pyruvate which is the precursor of alanine (Carbon destination of [1-13C] glucose by the ED pathway is shown in **Figure 3**). However, main labeling of alanine is found on C3 (**Figure 3**) identical to the EMP pathway activity, and negligible labeling is at C1 position. It clearly indicates very low activity of the ED pathway, which is consistent with the fact that key genes for ED pathway (i.e., 2-keto-3-deoxyphosphogluconate aldolase) is absent or not annotated based on the genomic information.

We next considered the activity of oxidative pentose phosphate pathway. Genes for the glucose 6-phosphate dehydrogenase and 6-phosphogluconate dehydrogenase are not found in C. thermocellum genome either. Additionally, their biochemical activity was absent from cell-free extracts (Patni and Alexander, 1971), making it unlikely that serine can be derived from this pathway. With respect to other pathways which may contribute to the biosynthesis of serine, we evaluated the importance of one-carbon metabolism. Serine can be reversibly synthesized from glycine by combining a methyl group which is derived from the formate via the one-carbon metabolism. To validate this pathway, we labeled the cells with <sup>13</sup>C-formate and detected ∼25% of serine labeled at C3 position (**Figure 4**). This result suggests the flux through one-carbon metabolism is quite high and dramatically contributes to serine synthesis. Overall, this analysis quantitatively informs us of the relative activities of glycolytic pathways and one-carbon metabolism.

#### *C. thermocellum* Carries Both *Re*- and *Si*-Citrate Synthase

As a ubiquitous energetic and biosynthetic pathway, the tricarboxylic acid (TCA) cycle however remains elusive for anaerobic bacteria. As the first step of the TCA cycle, biosynthesis of citrate from acetyl-coenzyme A and oxaloacetate is catalyzed in most organisms by a Si-citrate synthase, which is Si-face stereospecific with respect to C-2 of oxaloacetate. Whereas in C. kluyveri and some other clostridia, the reaction can be catalyzed by a Re-citrate synthase whose homolog was also found in C. thermocellum (Li et al., 2007). <sup>13</sup>C tracer and isotopomer analysis offers an opportunity to visualize the stereospecificity of citrate synthase, in vivo (Wu et al., 2010). With respect to C. thermocellum, the stereospecificity of citrate synthase can be deduced by a <sup>13</sup>C-bicarbonate labeling experiment (**Figure 5**) in which <sup>13</sup>C-carbons can be propagated to both carboxylic group of oxaloacetate via the reactions of reversed PFOR (rPFOR) and malate shunt, and then to citrate by the stereospecific citrate synthases. The stereotype of citrate synthase can be mirrored by the labeling patterns of downstream glutamate (GC-MS fragments C2-5 and C1-2, respectively). As shown in **Figure 5B**, over 25% C2-5 fragment has one <sup>13</sup>C-carbon, presumably on the δ-carboxylic group of glutamate. This is consistent with Re-type citrate synthase. We then checked the labeling pattern of glutamate C1-2. Interestingly 28% glutamate C1-2 fragment also has one <sup>13</sup>Ccarbon, suggesting the first carboxyl group of glutamate was labeled as well. This labeling pattern is consistent with the notion that C. thermocellum also carries an active Si-citrate synthase. To confirm this observation, we investigated the genome of C. thermocellum DSM 1313 and indeed the gene for Si-citrate synthase (CLO1313\_RS02945) is present. It has 67% sequence identity with a known Clostridial Si-citrate synthase (Li et al., 2007). For Re-citrate synthase, we did the BLAST analysis using the sequence of a biochemically verified gene from C. kluyveri DSM 555 (Li et al., 2007). A candidate gene (CLO1313\_RS03665)

FIGURE 2 | Calculation of flux ratio: Serine from the EMP pathway (f). Upon [1-13C] glucose labeling, the serine molecules originate through the EMP pathway will lead to half of the serine molecules labeled at position 3 of serine(m1: 0.5) while the other half will be unlabeled (m0: 0.5) (shown in blue). If the serine molecules originate from other pathways, none of the molecules will be labeled (m0: 1, shown in black). The final MDV of serine (shown in orange) is the cooperative result of these two possibilities and the splitting ratio (f) can be analyzed quantitatively. The algorithm is based on (Nanchen et al., 2007).

shares 61% identity with C. kluyveri Re- citrate synthase and a neighbor gene (CLO1313\_RS03670) shares 73% homology with aconitase, strongly indicating their joint functionality in the TCA cycle by converting acetyl CoA and oxaloacetate to isocitrate.

1-13C-glucose. GAP, Glyceraldehyde 3-phosphate; OPP pathway, Oxidative Pentose Phosphate pathway.

Integrating all the acquired knowledge, our data provides labeling evidence for the first time to support that C. thermocellum carries both Re- and Si-Citrate synthases to initiate the TCA cycle.

pyruvate labeling at carboxylic group (C1). If fluxes go through the EMP pathway, pyruvate can be labeled at the C3 position. According to labeling pattern of alanine, the direct product of pyruvate, there is no significant labeling at C1 of pyruvate, indicating negligible activity of the ED pathway. KDPG, 2-keto-3-deoxygluconate-6-phosphate; FBP, Fructose 1,6-bisphosphate; DHAP, Dihydroxyacetone phosphate; GAP, glyceraldehyde 3-phosphate; Pyr, pyruvate.

## The TCA Cycle of *C. thermocellum* Is Incomplete

It is generally accepted that most obligatory anaerobes do not harbor a complete oxidative TCA cycle, as energy can be acquired mostly from anaerobic fermentation. Recent studies however showed that a few model anaerobic bacteria such as Proteus mirabilis (Alteri et al., 2012) and C. acetobutylicum (Amador-Noguez et al., 2010) harbor a complete TCA cycle, even if genome annotation suggests the absence of certain TCA cycle enzymes. It is therefore worthwhile to decipher the structure of the TCA cycle in C. thermocellum specifically. First, we adopted the flux ratio analysis (Nanchen et al., 2007) to probe the completeness of the TCA cycle. Using [U-13C6] glucose experiments, we quantified the fraction of oxaloacetate originated from 2-oxoglutarate via the TCA cycle (f) vs. the ratio of oxaloacetate derived through anaplerotic reaction by e.g., phosphoenolpyruvate (PEP) carboxylase (1-f) (The algorithm please, see **Figure 6A** and Nanchen et al., 2007). Our calculation returned a f ≈ 0, representing the least squares solution of the relative flux from oxidative TCA cycle. This result indicates that the oxaloacetate is mainly synthesized by the anaplerotic reactions, whereas the flux barely reaches the oxaloacetate from the TCA cycle in the oxidative direction.

To further validate this calculation, we performed an additional pulse labeling experiment using [U-13C5] glutamate as the isotope tracer. After 12 h of labeling, we analyzed the proteinogenic glutamate and aspartate, respectively, which are produced from the TCA cycle (**Figure 6B**). Our result shows that nearly 4% fully labeled glutamate can be detected from the final glutamate pool, suggesting that the <sup>13</sup>C-tracer was incorporated into the metabolic pathway. In comparison, no fully labeled aspartate was detectable during the labeling period. It indicates that the <sup>13</sup>C carbons entering the 2-oxoglutarate branch cannot be propagated to oxaloacetate, the precursor of aspartate. Indeed,

FIGURE 5 | The stereospecificity of citrate synthase revealed by a <sup>13</sup>C-bicarbonate labeling experiment (A) using GC-MS fragments Glu2-5 and Glu1-2 as the readouts (B). The Re- citrate synthase will result in the labeling at the C5, thus consistent with one carbon labeled Glu2-5 fragment. The Si- citrate synthase activity will result in the labeling at the C1 position and lead to one carbon labeled Glu1-2 fragment.

these fluxomic evidences are consistent with the genomic data that the absence of a few key enzymes including 2-oxoglutarate decarboxylase, fumarate reductase, and fumarase prevents C. thermocellum from forming a complete TCA cycle.

### Quantitative Flux Mapping

In vivo biochemical reaction rates are among the most important metabolic phenotypes. To obtain a quantitative insight into the fluxes in C. thermocellum, we developed a metabolic model that describes the isotope labeling of metabolites upon the addition of 20% universally labeled <sup>13</sup>C-glucose and 80% unlabeled glucose (see section Materials and Methods, **Figure S5** and **Supplementary File 1** for modeling details). The biochemical reactions adopted in the model are based on genomic information and involve newly validated pathways that have been identified by isotope tracer experiments as described above. Specifically, the principal metabolic network includes the EMP pathway, anaplerotic reactions (malate shunt), citramalate pathway for isoleucine biosynthesis, Re- and Sicitrate synthases, incomplete TCA cycle, all of which have been verified by isotope tracer experiments in this work. In addition, we excluded the oxidative pentose phosphate pathway and the ED pathway from the central carbon metabolism which have been shown absent in C. thermocellum as described above. The balancing of energetic co-factors such as NAD(P)H is not input into the model either, because whether a particular redox reaction involves NADH vs. NADPH needs to be validated experimentally, and any assumptions for NAD(P)H-specific reactions will bring uncertainty to the modeling. Experimental inputs to the model include steady-state labeling data from proteinogenic amino acids, specific growth rates, sugar uptake rates, excretion rates for fermentation products (see **Table S1**), and flux ratio data at certain branch points: e.g., the relative contribution of the EMP pathway vs. one-carbon metabolism to serine biosynthesis as described above and the relative malate shunt and pyruvate phosphate dikinase fluxes to pyruvate, which has been quantitatively addressed recently (Olson et al., 2017).

**Figure 7** shows representative results for the fitting data and a map of the identified flux values in central metabolism. The complete results are presented in **Supplementary File 1** and the goodness for fitting in the **Figure S5**. The metabolic model fits the observed data. Most of the identified fluxes were tightly constrained, indicating that they reliably mirror the available experimental data. The results show that glycolytic flux predominates and is directed primarily toward the fermentation of lactate, formate, acetate and ethanol. Other apparent fluxes include aspartate production via malate shunt, and pentosephosphate production from glycolytic intermediates. Within the incomplete TCA cycle, the flux through the oxidative branch is limited. The production of glutamate from citrate can occur via Re-citrate synthase but is also expected to occur via the Si-citrate synthase. Compared with the flux map of other Clostridia e.g., C. acetobutylicum (Amador-Noguez et al., 2010), the computational results presented herein demonstrate a featured fluxomic structure unique in C. thermocellum.

### DISCUSSION

Using <sup>13</sup>C-tracer studies, we have developed a quantitative flux model outlining the central carbon metabolism in C. thermocellum and elucidated the key pathways responsible for carbon flux distribution. This model could serve as a platform to probe cellular phenotypes upon subjecting to different growth

FIGURE 7 | Flux map profiled from <sup>13</sup>C-tracer experiment. Net fluxes are represented by proportionally-scaled arrow thickness and are normalized to a glucose monomer consumption rate of 100 (3.62 ± 0.18 mmol/gDW/h). Dotted arrows indicate fluxes toward biomass synthesis. Complete results including flux values with the 95% flux confidence interval are listed in Supplementary File 1. Abbreviations for metabolites: G6P, Glucose 6-phosphate; Ru5P, Ribulose 5-phosphate; RuBP, Ribulose bisphosphate; F6P, fructose 6-phosphate; R5P, Ribose 5-phosphate; FBP, Fructose bisphosphate; X5P, Xylulose 5-phosphate; E4P, Erythrose 4-phosphate; DHAP, Dihydroxyacetone phosphate; GAP, Glyceraldehyde-3-phosphate; SBP, Sedoheptulose bisphosphate; S7P, Sedoheptulose 7-phosphate; PGA, Phosphoglycerate; PEP, Phosphoenolpyruvate; PYR, pyruvate; AcCoA, Acetyl Coenzyme A; CIT, Citrate; ICT, Isocitrate; 2OG, 2-oxoglutarate; FUM, Fumarate; MAL, Malate; OAA, Oxaloacetate. Abbreviations for reactions: ACO, Aconitase; CS, citrate synthase; ENO, Enolase; FBA, Fructose bisphosphate aldolase; FUS, Fumarase; ICTDH, Isocitrate dehydrogenase; MDH, Malate dehydrogenase; ME, malic enzyme; PDH, pyruvate dehydrogenase; PEPC, (Continued)

conditions to further refine and validate the model and guide the genetic engineering strategies toward targeted products.

Versatile glycolytic pathways including the EMP pathway (Amador-Noguez et al., 2010), the ED pathway (Bender and Gottschalk, 1973), and the phosphoketolase pathway (Liu et al., 2012) have been found in Clostridia. These glycolytic pathways vary in reaction schemes and how much ATP and NAD(P)H they produce per glucose metabolized. Specific to C. thermocellum, however, we only identified the EMP pathway as the dominant one. Given various energetic and carbon requirements by the cells, C. thermocellum uses the EMP pathway exclusively, representing a typical metabolic rigidity. It implies that C. thermocellum may choose alternative strategies to modulate anaerobic energy and carbon demand. Indeed, the EMP pathway in C. thermocellum has many atypical features. For instance, the conversion of PEP to pyruvate features the malate shunt, which is believed to catalyze a transhydrogenase reaction converting NADH to NADPH and generate GTP (Deng et al., 2013). The flux through the malate shunt is regulated by NH<sup>+</sup> 4 and pyrophosphate (PPi), which may indicate the metabolic state of the cell (Taillefer et al., 2015). Additionally, a number of phosphorylating enzymes in the EMP pathway rely on GTP/GDP/PPi, rather than ATP/ADP, including glucokinase and phosphoglycerate kinase (Zhou et al., 2013). Outside of glycolysis, C. thermocellum contains numerous enzymes that interconvert reduced ferredoxin and NAD(P)<sup>+</sup> and influence energy metabolism including ferredoxin:NAD<sup>+</sup> oxidoreductase (Rnf)(Müller et al., 2008), NADH-ferredoxin: NADP<sup>+</sup> oxidoreductase (Nfn)(Wang et al., 2010), and hydrogenases. Understanding the metabolic network constrained by bioenergetic cofactors could help guide genetic strategies for strain redesign and optimization to ensure redox and energy balance to maximize the production of targeted products.

Our labeling results confirmed that C. thermocellum has activities for both Si- and Re-citrate synthase (**Figure 5**). Similar result was also observed in C. kluyveri (Li et al., 2007). So far, why two stereotypes of citrate synthase are contained in Clostridia remains an open question. Si- and Re-citrate synthase have no significant sequence homology (Li et al., 2007). Long phylogenetic distance indicates that they could be originated divergently. In addition, it should be noted that Re-citrate synthase is oxygen sensitive while Si-citrate synthase is not (Li et al., 2007). In view of citrate synthase's importance in initiating TCA cycle and in glutamate biosynthesis, the presence of two versions of citrate synthase could increase the sustainability of C. thermocellum in evolution, especially when subjecting to varying O<sup>2</sup> atmosphere.

Our isotope experiment also demonstrates an incomplete TCA cycle in the oxidative direction from 2-oxoglutarate to oxaloacetate. This observation further prompts the question whether the TCA cycle in C. thermocellum can be operated reversibly in the reductive direction. In fact, the isotope data has excluded this possibility. In <sup>13</sup>C-bicarbonate labeling experiment, we detected two-carbon labeled oxaloacetate. If it follows reductive operation of the TCA cycle, then 3-carbon labeled glutamate should be visualized due to one more <sup>13</sup>C incorporation into the 2-oxoglutarate from the 2-oxoglutarate synthase. However, this is not observed in the glutamate labeling pattern (**Figure 5**), indicating no activity for the rTCA cycle. C. thermocellum's TCA cycle is different from that of C. acetobutylicum, the latter showed a complete, albeit bifurcated structure (Amador-Noguez et al., 2010). Our findings strengthen the notion that TCA cycle in anaerobes plays a major role in biosynthesis rather than bioenergetics, although the architecture of the cycle is species-specific.

The approach used in this study represents a standard and applicable methodology for metabolic flux analysis using steady-state isotope tracer experiments. The isotope labeling approach (steady-state isotopic approach) we used is different from the kinetic flux profiling which is another promising fluxomics method (Yuan et al., 2008). One major merit of steady-state isotopic approach is its capability in providing accurate ratios of fluxes at branch points. Additionally, only a few data points are needed for the analysis, satisfying the requirements for high-throughput and large-scale fluxomic analysis (Fischer et al., 2004; Fischer and Sauer, 2005). Other advantages include easy experimental procedures, simple modeling algorithm. Nevertheless, steady-state isotopic approach still has its limitation: e.g., it cannot discriminate the fluxes from multiple parallel pathways if no distinguishable labeling patterns can be generated. In the case of C. thermocellum, for example, such a scenario occurs in deciphering fluxes from pyruvate phosphate dikinase (ppdk) vs. malate shunt, both

#### REFERENCES


of which synthesize pyruvate from PEP but without forming distinguishable carbon-carbon bond. In this case, kinetic flux profiling which records labeling trajectories of intermediate as a function of time can provide a complementary solution (Olson et al., 2017). Joint and appropriate utilization of steady-state and kinetic fluxomics technique will enable a comprehensive understanding of the metabolic network by rationally designed isotope tracer experiments.

In conclusion, this work represents the first global investigation of the central carbon metabolism in C. thermocellum and exemplifies the ability of isotope tracer experiments and metabolic modeling in understanding microbial metabolism of industrial interests.

#### AUTHOR CONTRIBUTIONS

WX and PM led the research. WX designed the experiments. WX, KC, CW, LM, and TD performed the experiments. WX, JL, KC, and PM wrote the article.

#### ACKNOWLEDGMENTS

This work was supported by the US Department of Energy (DOE) Energy Efficiency and Renewable Energy (EERE) Fuel Cell Technologies Office under Contract DE-AC36- 08-GO28308, and the NREL Laboratory Directed Research and Development (LDRD) project. We thank Chris Urban for his technical assistance. The authors declare that they have no conflicts of interest with the contents of this article.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmicb. 2018.01947/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Xiong, Lo, Chou, Wu, Magnusson, Dong and Maness. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.