# SYNTHETIC BIOLOGY-GUIDED METABOLIC ENGINEERING

EDITED BY : Rodrigo Ledesma-Amaro, Pablo Ivan Nikel and Francesca Ceroni PUBLISHED IN : Frontiers in Bioengineering and Biotechnology

#### Frontiers eBook Copyright Statement

The copyright in the text of individual articles in this eBook is the property of their respective authors or their respective institutions or funders. The copyright in graphics and images within each article may be subject to copyright of other parties. In both cases this is subject to a license granted to Frontiers. The compilation of articles constituting this eBook is the property of Frontiers.

Each article within this eBook, and the eBook itself, are published under the most recent version of the Creative Commons CC-BY licence. The version current at the date of publication of this eBook is CC-BY 4.0. If the CC-BY licence is updated, the licence granted by Frontiers is automatically updated to the new version.

When exercising any right under the CC-BY licence, Frontiers must be attributed as the original publisher of the article or eBook, as applicable.

Authors have the responsibility of ensuring that any graphics or other materials which are the property of others may be included in the CC-BY licence, but this should be checked before relying on the CC-BY licence to reproduce those materials. Any copyright notices relating to those materials must be complied with.

Copyright and source acknowledgement notices may not be removed and must be displayed in any copy, derivative work or partial copy which includes the elements in question.

All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For further information please read Frontiers' Conditions for Website Use and Copyright Statement, and the applicable CC-BY licence.

ISSN 1664-8714 ISBN 978-2-88963-708-9 DOI 10.3389/978-2-88963-708-9

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# SYNTHETIC BIOLOGY-GUIDED METABOLIC ENGINEERING

Topic Editors:

Rodrigo Ledesma-Amaro, Imperial College London, United Kingdom Pablo Ivan Nikel, Novo Nordisk Foundation Center for Biosustainability (DTU Biosustain), Denmark Francesca Ceroni, Imperial College London, United Kingdom

Citation: Ledesma-Amaro, R., Nikel, P. I., Ceroni, F., eds. (2020). Synthetic Biology-Guided Metabolic Engineering. Lausanne: Frontiers Media SA. doi: 10.3389/978-2-88963-708-9

# Table of Contents

*04 Editorial: Synthetic Biology-Guided Metabolic Engineering* Rodrigo Ledesma-Amaro, Pablo I. Nikel and Francesca Ceroni *06 Glucose-Dependent Promoters for Dynamic Regulation of Metabolic Pathways* Jérôme Maury, Soumya Kannan, Niels B. Jensen, Fredrik K. Öberg, Kanchana R. Kildegaard, Jochen Forster, Jens Nielsen, Christopher T. Workman and Irina Borodina *18 Dynamic Control of* ERG20 *and* ERG9 *Expression for Improved Casbene Production in* Saccharomyces cerevisiae Roberta Callari, Yvan Meier, Davide Ravasio and Harald Heider *29 Models for Cell-Free Synthetic Biology: Make Prototyping Easier, Better, and Faster* Mathilde Koch, Jean-Loup Faulon and Olivier Borkowski *35 How Synthetic Biology and Metabolic Engineering Can Boost the Generation of Artificial Blood Using Microbial Production Hosts* August T. Frost, Irene H. Jacobsen, Andreas Worberg and José L. Martínez *42 CopySwitch—*in vivo *Optimization of Gene Copy Numbers for Heterologous Gene Expression in* Bacillus subtilis Florian Nadler, Felix Bracharz and Johannes Kabisch *52 Gene-Expressing Liposomes as Synthetic Cells for Molecular Communication Studies* Giordano Rampioni, Francesca D'Angelo, Livia Leoni and Pasquale Stano *62 Improving Reproducibility in Synthetic Biology* Mathew M Jessop-Fabre and Nikolaus Sonnenschein *68 New Applications of Synthetic Biology Tools for Cyanobacterial Metabolic Engineering* María Santos-Merino, Amit K. Singh and Daniel C. Ducat *92 Metabolic Engineering and Synthetic Biology: Synergies, Future, and Challenges* Raúl García-Granados, Jordy Alexis Lerma-Escalera and José R. Morones-Ramírez *96 High-Performance Biocomputing in Synthetic Biology–Integrated Transcriptional and Metabolic Circuits* Angel Goñi-Moreno and Pablo I. Nikel *102 Biological Parts for* Kluyveromyces marxianus *Synthetic Biology* Arun S. Rajkumar, Javier A. Varela, Hannes Juergens, Jean-Marc G. Daran and John P. Morrissey *117 Production of 3-Hydroxypropanoic Acid From Glycerol by Metabolically Engineered Bacteria* Carsten Jers, Aida Kalantari, Abhroop Garg and Ivan Mijakovic *132 Build Your Bioprocess on a Solid Strain—*b*-Carotene Production in Recombinant* Saccharomyces cerevisiae

Javiera López, Vicente F. Cataldo, Manuel Peña, Pedro A. Saa, Francisco Saitua, Maximiliano Ibaceta and Eduardo Agosin

# Editorial: Synthetic Biology-Guided Metabolic Engineering

#### Rodrigo Ledesma-Amaro1,2 \*, Pablo I. Nikel <sup>3</sup> \* and Francesca Ceroni 2,4 \*

<sup>1</sup> Department of Bioengineering, Imperial College London, London, United Kingdom, <sup>2</sup> Imperial College Centre for Synthetic Biology, Imperial College London, London, United Kingdom, <sup>3</sup> The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Lyngby, Denmark, <sup>4</sup> Department of Chemical Engineering, Imperial College London, London, United Kingdom

Keywords: synthetic biology, metabolic engineering, genetic constructs, bioproduction, TX-TL

#### **Editorial on the Research Topic**

#### **Synthetic Biology-Guided Metabolic Engineering**

Synthetic biology can be now considered a mature discipline. A growing number of diverse synthetic gene constructs and circuits have been designed within a wide number of organisms leading to a profound impact on different fields. This Research Topic features and reviews some of the latest progress in Synthetic Biology applications and improvements within the Metabolic Engineering portfolio, covering different aspects in the domain (e.g., bioinformatics, design of synthetic pathways, and implementation of multi-omics approaches).

#### Edited and reviewed by:

Jean Marie François, Institut Biotechnologique de Toulouse, France

#### \*Correspondence:

Rodrigo Ledesma-Amaro r.ledesma-amaro@imperial.ac.uk Pablo I. Nikel pabnik@biosustain.dtu.dk Francesca Ceroni f.ceroni@imperial.ac.uk

#### Specialty section:

This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology

Received: 21 January 2020 Accepted: 04 March 2020 Published: 20 March 2020

#### Citation:

Ledesma-Amaro R, Nikel PI and Ceroni F (2020) Editorial: Synthetic Biology-Guided Metabolic Engineering. Front. Bioeng. Biotechnol. 8:221. doi: 10.3389/fbioe.2020.00221

Among the successes of Synthetic Biology and Metabolic Engineering, the ability to achieve smarter construct design and higher yields of valuable chemicals needs to be considered. As one example, Callari et al. engineered Saccharomyces cerevisiae to produce the diterpene casbene, precursor of many terpenoids of medical interest. The authors successfully achieve increased casbene titers via expression of heterologous enzymes that can boost internalization and conversion of precursors; performing dynamic control of inducible promoters also helped maximize the pathway flux toward casbene production. Another example comes from 3-hydroxypropanoic acid (3-HP), a valuable product employed in the bioproduction of several other chemicals, including bioplastics. Maury et al. achieved high product titers in batch cultivations using engineered S. cerevisiae. Firstly, the authors characterized a series of 34 native promoters which respond to glucose. Placing the 3-HP pathway under the control of a promoter active in absence of glucose, they then achieved decoupling of biomass and compound production, maximizing product formation. Jers et al. reviewed the state-of-the-art in 3-HP production via Metabolic Engineering. The review highlights how major improvements have been reached but that efforts are still needed to achieve a scalable system, mainly relying on better biochemical characterization of the synthesis pathways and bioprocessing conditions. In addition, Frost et al. present an important case of cooperation between Synthetic Biology and Metabolic Engineering approaches in the recombinant production of hemoglobin. Authors summarize the state-of-the-art and the challenges that this technology is facing while suggesting novel routes to improve the promising yeast-based production of artificial blood.

Despite the great successes, there is still a wide number of organisms that could become valuable hosts for the production of important chemicals but that still suffer from the lack of properly standardized bio-parts for their easier engineering. A valid example on this respect comes from Rajkumar et al.. The authors developed a toolkit of standardized genetic parts, namely promoters and terminators, for the genetic engineering of a non-conventional yeast with high potential, Kluyveromyces marxianus. Parts were isolated from the genome, "domesticated" for golden gate assembly and then cloned for characterization under different conditions. Another example is presented by Santo-Merino et al. where the authors summarize the latest efforts in the development of novel biological parts for engineering cyanobacteria.

One of the reasons why we still miss parts and tools for a more rapid engineering of many organisms is our limited knowledge on many of them. Together with collecting new information, we need progress on how to handle and use it. García-Granados et al. suggest that in the era of the "—omics" approaches we are in, mathematical modeling and biocomputing could serve as essential tools to coordinate and predict information, providing deeper and more accessible understanding that we could use for engineering target hosts. In this respect, Goñi-Moreno and Nikel advocate the in silico conceptualization of metabolism and its motifs as the way forward to achieve whole-cell biocomputations. The authors argue that the design of merged transcriptional and metabolic circuits will not only increase the amount and type of information being processed by a given synthetic construct, but that it will also provide fundamental control mechanisms for increased reliability on synthetic implants.

Not only handling information is challenging, but also making sure that this information is accurate and the data we collect are robust. Jessop-Fabre and Sonnenschein give an overview of the efforts currently in place toward a more reproducible science. Naming novel developed bio-foundries and cloudlaboratories together with new effort for the standardization of protocols and results, the authors suggest the way Synthetic Biology should continue to invest in to achieve more reliable data acquisition. One example comes from Nadler et al.. Here, the authors develop a novel system, named CopySwitch, that allows rapid transformation and copy number modulation of any genetic construct in an easier way than previously possible in Bacillus subtilis.

Again, López et al. considered two different S. cerevisiae strains engineered for β-carotene production comparing their growth in shake-flasks and in bench-scale fed-batch fermentation (as a proxy for industrial bioprocessing conditions). Surprisingly, the two strains showed opposite behavior, highlighting the need for proper understanding of how engineered systems behave in the final, industrially relevant settings.

An important source of information and characterization is represented by transcription and translation systems (TX-TL). In their contribution, Koch et al. describe how TX-TLs offer powerful tools for the prototyping of genetic constructs and understanding of network behavior out of the cellular context. This could be of great importance in Metabolic Engineering for the modeling and prediction of the behavior of entire pathways so to optimize them priorin vivo engineering. However, preparation of TX-TL extracts is still challenging, and automation tools could help improve reproducibility. Combination of TX-TL technology and liposome technology has recently gained momentum, and many successful examples in this emerging field are based on genetic constructs that can be expressed in a "simplified cell" able to interact with living cells and communicate with them. On this premise, Rampioni et al. present the latest advances in synthetic cells development. The possible applications of such a technology are many, as many are the opportunities to use it for better understanding biological systems.

In all, the works collected in this Research Topic expose that we are experiencing critical times for Synthetic Biology-guided Metabolic Engineering, where deeper and better understanding of the complexity of biological systems enables more predictable designs. Walking this pathway will significantly improve the applicability of first-design principles for living organisms toward reliable cell factory design, construction and testing.

## AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Ledesma-Amaro, Nikel and Ceroni. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Glucose-Dependent Promoters for Dynamic Regulation of Metabolic Pathways

Jérôme Maury <sup>1</sup> , Soumya Kannan<sup>2</sup> , Niels B. Jensen1†, Fredrik K. Öberg1† , Kanchana R. Kildegaard<sup>1</sup> , Jochen Forster 1†, Jens Nielsen<sup>1</sup> , Christopher T. Workman<sup>2</sup> \* and Irina Borodina<sup>1</sup> \*

<sup>1</sup> Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Lyngby, Denmark, <sup>2</sup> Department of

Biotechnology and Biomedicine, Technical University of Denmark, Lyngby, Denmark

*Edited by:*

Rodrigo Ledesma-Amaro, Imperial College London, United Kingdom

#### *Reviewed by:*

Jian-Zhong Liu, Sun Yat-sen University, China Jean Marie François, UMR5504 Laboratoire d'Ingénierie des Systèmes Biologiques et des Procédés (LISBP), France

#### *\*Correspondence:*

Christopher T. Workman cwor@dtu.dk Irina Borodina irbo@biosustain.dtu.dk

#### *†Present Address:*

Niels B. Jensen, Evolva Biotech A/S, Copenhagen, Denmark Fredrik K. Öberg, Novo Nordisk A/S, Målov, Denmark Jochen Förster, Carlsberg A/S, Copenhagen, Denmark

#### *Specialty section:*

This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology

*Received:* 23 February 2018 *Accepted:* 30 April 2018 *Published:* 22 May 2018

#### *Citation:*

Maury J, Kannan S, Jensen NB, Öberg FK, Kildegaard KR, Forster J, Nielsen J, Workman CT and Borodina I (2018) Glucose-Dependent Promoters for Dynamic Regulation of Metabolic Pathways. Front. Bioeng. Biotechnol. 6:63. doi: 10.3389/fbioe.2018.00063 For an industrial fermentation process, it can be advantageous to decouple cell growth from product formation. This decoupling would allow for the rapid accumulation of biomass without inhibition from product formation, after which the fermentation can be switched to a mode where cells would grow minimally and primarily act as catalysts to convert substrate into desired product. The switch in fermentation mode should preferably be accomplished without the addition of expensive inducers. A common cell factory Saccharomyces cerevisiae is a Crabtree-positive yeast and is typically fermented at industrial scale under glucose-limited conditions to avoid the formation of ethanol. In this work, we aimed to identify and characterize promoters that depend on glucose concentration for use as dynamic control elements. Through analysis of mRNA data of S. cerevisiae grown in chemostats under glucose excess or limitation, we identified 34 candidate promoters that strongly responded to glucose presence or absence. These promoters were characterized in small-scale batch and fed-batch cultivations using a quickly maturing rapidly degrading green fluorescent protein yEGFP3-Cln2PEST as a reporter. Expressing 3-hydroxypropionic acid (3HP) pathway from a set of selected regulated promoters allowed for suppression of 3HP production during glucose-excess phase of a batch cultivation with subsequent activation in glucose-limiting conditions. Regulating the 3HP pathway by the ICL1 promoter resulted in 70% improvement of 3HP titer in comparison to PGK1 promoter.

Keywords: inducible promoters, dynamic regulation, yeast *Saccharomyces cerevisiae*, strain engineering, 3-hydroxypropionic acid

# INTRODUCTION

Budding yeast, Saccharomyces cerevisiae, is widely used for the production of fuels, chemicals, food ingredients, food and beverages, and pharmaceuticals (Nielsen and Jewett, 2008; Hong and Nielsen, 2012). S. cerevisiae is further being exploited for production of many new products, e.g., farnesene, butanol, resveratrol or melatonin (Hong and Nielsen, 2012; Shin et al., 2012; Germann et al., 2016; Meadows et al., 2016) and many more are under development. The enabling technology in the development of yeast cell factories is metabolic engineering, which is the introduction of directed genetic modifications with the objective to improve the performance of the cell factory (Nielsen and Jewett, 2008). Development of cell factories that can meet the high requirements for yield, titer and productivity, generally requires multiple rounds of metabolic engineering. In connection with implementation of cell factories for industrial production, it is necessary to scale-up the process, and the final process will typically involve subsequent rounds of cultivation at different conditions (**Figure 1**). A typical process could comprise a large-scale main cultivation in a so-called continuous or fed-batch mode in which at least one substrate is growth limiting, such as in the resveratrol process (Shin et al., 2012). Usually, a number of serial cultivations in batch mode precede the main cultivation step in order to propagate the inoculum. In principle, serial cultivation steps aim at building up biomass while for the final-stage cultivation it is desirable to re-direct the metabolic fluxes such that the product of interest is formed at high yield and rate (Borodina and Nielsen, 2014) (**Figure 1**). Constraints are therefore very different between the seed train, which prioritizes strain stability, efficient biomass production, process convenience and robustness, and the main cultivation step, often referred to as the "production phase," where maximal yield and product formation rate are ultimate goals. The bioprocess is therefore imposing a number of constraints on strain design that need to be taken into account early in process development.

In order to better match expression of metabolic pathways to changing conditions in the cells or in the fermentation process, a number of research groups have recently developed strategies for dynamic regulation. As opposed to static up- or down-regulations, dynamic regulation allows for re-balancing fluxes along with changing conditions. Dynamic gene expression profiles allow trade-offs between growth and production to be better managed and can help avoid build-up of undesired intermediates (Brockman and Prather, 2015b). In their review, Brockman and Prather (2015b) provide experimental demonstration of the benefits arising from dynamic control on production of a number of chemicals. Farmer and Liao improved the yield of lycopene production by 18-fold in E. coli by implementing a sensor responding to the build-up of acetylphosphate (Farmer and Liao, 2000). By modulating protein levels of glucokinase via a genetic inverter, Solomon et al. redirected glucose into gluconate production and improved titers by 30% (Solomon et al., 2012). Using a metabolic toggle switch in E. coli to conditionally downregulate citrate synthase expression, Soma et al. redirected acetyl-CoA into isopropanol production with an improvement on titer and yield of more than 3-fold (Soma et al., 2014). Dahl et al. used whole-genome transcript arrays to identify promoters that respond to the accumulation of a toxic intermediate of the isoprenoid biosynthetic pathway, i.e., farnesyl di-phosphate (FPP), and used the identified promoters to regulate FPP formation and improve amorphadiene production two-fold over constitutive or inducible promoters (Dahl et al., 2013). Lo et al. presented a synthetic gene circuit that decouples E. coli cell growth from metabolite production through a twolayered circuit based on two genetic sensor controller modules:

one sensing feedstock substrate, the other one sensing nutrients (Lo et al., 2016). By delaying enzyme expression until cells have depleted key nutrients and attained high cell densities, the system positively impacted conversion of hydroxycinnamic acids and oleic acid into value-added compounds (Lo et al., 2016). Conditional degradation of essential enzymes has also been exploited to stop elongation of fatty acids and improve octanoate production (Torella et al., 2013), or to increase yield and titer of myo-inositol production (Brockman and Prather, 2015a). Conditional degradation of enzymes is an efficient method for rapidly depleting an enzyme of interest even at slow growth rates where removal via dilution is slow (Brockman and Prather, 2015b).

Dynamic regulation has also been employed in a number of cases in budding yeast. On/off regulation systems like those making use of native repressible promoters like MET3 promoter, responding to the concentration of methionine, or HXT1 promoter reacting to glucose concentration were used to redirect the flux in the ergosterol biosynthetic pathway and improve production of isoprenoids of commercial interest (Ro et al., 2006; Asadollahi et al., 2008; Scalcinati et al., 2012). David et al. demonstrated the benefits of hierarchical dynamic control on 3-hydroxypropionic acid (3HP) production (David et al., 2016). Here, a two-layered control with a growth stage control, based on a carbon source responsive promoter, and an intracellular metabolite concentration control, based on a malonyl-CoA sensor, allowed for a sufficient build-up of biomass in the initial growth phase while gradually redirecting metabolism toward 3HP production in the production phase (David et al., 2016). David et al. report a 10-fold increase in 3HP production (David et al., 2016). More recently, dynamic control was used to establish the production of very long chain fatty acid-derived (VLCFA) compounds in S. cerevisiae. Production suffered from impaired biomass formation caused by deprivation of essential precursors C22-CoA (Yu et al., 2017). Dynamic control, based on carbonsource regulated promoters, was used to relieve the competition between VLCFA product formation and cell-growth associated processes. This strategy divided the process in a production phase and a cell growth phase and resulted in an increase of docosanol production of almost 4-fold. After further optimization, a titer of 83.5 mg/L was achieved which represents a 80-fold improvement compared to the control strain (Yu et al., 2017).

The aim of our study is to increase the number of genetic elements that can be used by metabolic engineers for dynamic control of metabolic pathways. As industrial fermentation processes often comprise different phases (**Figure 1**), we embarked on identifying and characterizing a set of native yeast promoters that can be used to control expression of metabolic pathways in a dynamic manner, and will positively impact production of value-added chemicals in an industrially relevant context. In order for our tools to be potentially applicable to various kinds of industrial production processes, from bulk to high value-added products, it was decided to use elements already present in the cultivation medium to drive transient expression of genes. Amongst others, glucose is a nutrient often used as a carbon source. Furthermore concentrations of glucose largely differ in the seed train, usually operated as batch, and the final stage fermentation where glucose is often fed in limiting amounts to avoid overflow metabolism. Glucose was thus identified as the element that will trigger the dynamic regulation of gene expression in our system.

Through analysis of genome-wide transcription datasets retrieved from the Gene Expression Omnibus (GEO) database (GEO database), a number of promoters were identified, cloned upstream of a quickly maturing, rapidly degrading green fluorescent protein yEGFP3-Cln2PEST and integrated into S. cerevisiae genome. Promoter activity was then characterized in microscale batch and fed-batch conditions. As a proof of the concept, the most promising promoters for transient gene expression were successfully applied to control the production of 3-hydroxypropionic acid.

# MATERIALS AND METHODS

#### Analysis of Transcriptome Data

Transcription data was retrieved from the GEO database, using the R package GEOquery (Davis and Meltzer, 2007), for a microarray study comparing different nutrient limited chemostat cultivations of CEN.PK113-7D (accession GDS777) (Tai et al., 2005). The summarized expression values provided by the authors were log2 transformed before applying a linear model (LIMMA) to identify differentially expressed genes between the glucose-limited experiments and the other nutrient limited experiments for the aerobic series of chemostat cultivations. The differential expression profiles (log2-fold-change of glucoselimited vs. others) for HXT1, TEF1, ADH1, MAL12 genes were used to find other correlated genes using the Pearson correlation coefficient and test for significance.

#### Cloning of the Selected Promoters Upstream of a Reporter Gene: yEGFP3-Cln2PEST and Integration Into *S. cerevisiae* Genome

Primers for PCR amplification and Uracil excision mediated cloning of the different promoters into the integrative vector pCFB0125 were designed and are listed as Table S1. As the length of promoter regions is not always well defined, the promoter sequence length was either defined based on previous reports or as starting approximately 1,000 bp upstream of the start codon (ATG) when no reference in the literature could be identified. In order to closely follow promoter activity, promoters were cloned upstream of a quickly maturing and rapidly degrading green fluorescent protein encoding gene: yEGFP3-Cln2PEST (Mateus and Avery, 2000). yEGFP3-Cln2PEST, as reporter gene, was amplified using the following primers: uGFP\_fw: ATCTGTCAU AAA ACA ATGTCTAAAGGTGAA GAATTATTC and uGFP\_rv: CACGCGAUTCATATTACTTGG GTATTGCCC) with pCFB0058 (pFA6a-yEGFP3-CLN2-PESTnatMX6, EUROSCARF). Uracil excision was used as cloning method to combine the different promoter-yEGFP3-Cln2PEST expression cassettes together onto pCFB0125. pCFB0125 is a pESC-URA-ccdB vector devoid of its 2micron replication origin, which makes it a non replicative vector that will integrate at

the URA3 locus upon linearization. Standard Uracil excision cloning conditions were used, as described previously (Jensen et al., 2014). After verification of successful cloning, each and every vector was linearized by digestion with restriction enzyme StuI (FastDigestr, Thermo Scientific) or EcoRV (FastDigestr, Thermo Scientific) in the case of the vector bearing promoter pMAL32. Digested, linearized, vectors were transformed into S. cerevisiae CEN.PK 113-5D. Transformants were selected on agar plates containing SC-Ura medium and were used for cultivation experiments.

## Construction of 3-hydroxypropionic Acid *S. cerevisiae* Producing Strains

All strains from this study were derived from S. cerevisiae CEN.PK102-5B MATa ura3-52 his31 leu2-3/112 MAL2-8c SUC2 obtained from Peter Kötter (Johann Wolfgang Goethe-University Frankfurt, Germany). This strain was further modified by integration of single copy integrative vectors pCFB0380 (PTEF1-SeACS\_L641P; PPGK1-ALD6; KlLEU2) and pCFB0382 (PTEF1-PDC1; SpHIS5). The latter strain is the parent strain for the 3HP producing strains of this study. Four Plasmids were constructed based on pCFB0343 (PTEF1- ACC1\*\*; PPGK1/CaMCR; KlURA3; insertion site Easyclone X-2). More details about the Easyclone plasmid set and integration sites can be found at Jensen et al. (2014). Backbone vector fragment was prepared for USER cloning by PCR using primers vec\_open\_mid\_ptrTEF1-PGK1 and vec\_open\_CaMCR and template pCFB0343. Three promoter containing fragments were prepared by PCR using S. cerevisiae genomic DNA as template and primers pHXT7\_switch and pHXT7\_rev for promoter pHXT7, pADH2\_switch and pADH2\_rev for promoter PADH2, and primers pICL1\_switch and pICL1\_rev for promoter pICL1. Primer sequences can be found in **Table 1**.

By combining, under standard uracil exclusion cloning conditions (Jensen et al., 2014), backbone vector prepared as described above and insert PHXT7, PADH2 or PICL1, one obtains three single integrative vectors pCFB0728 (PTEF1- ACC1\*\*; PHXT7-CaMCR; insertion site Easyclone X-2), pCFB0727 (PTEF1-ACC1\*\*; PADH2-CaMCR; insertion site Easyclone X-2) and pCFB0729 (PTEF1-ACC1\*\*; PICL1-CaMCR; insertion site Easyclone X-2), respectively. After transforming


strain S. cerevisiae CEN.PK102-5B MATa ura3-52 his31 leu2-3/112 MAL2-8c SUC2 X-3::PTEF1-SeACS\_L641P; PPGK1- ALD6; KlLEU2 X-4::PTEF1-PDC1; SpHIS5 with integrative vector pCFB0343, pCFB0728, pCFB0727, or pCFB0729, 4 strains are obtained that bear pPGK1-CaMCR, pHXT7-CaMCR, pADH2-CaMCR or pICL1-CaMCR, at integration site X-2 respectively.

## Cultivation

Strains bearing the different promoter-yEGFP3-Cln2PEST expression cassettes were cultivated in two different conditions: (1) in batch mode with mineral medium containing glucose as sole carbon source (Jensen et al., 2014); (2) in fed-batch mode with synthetic fed-batch medium for S. cerevisiae M-Sc.syn-1000 purchased from m2p-labs GmbH (Baesweiler, Germany). M-Sc.syn-1000 medium was supplemented with the supplied vitamins solution (final 1% v/v) and the enzyme mix (final concentration 0.5% v/v) immediately prior to use.

Mineral medium contained (L−<sup>1</sup> ): 7.5 g (NH4)2SO4, 14.4 g KH2PO4, 0.5 g MgSO4-7H2O, 22 g dextrose, 2 mL trace metals solution, and 1 mL vitamins. The pH of the medium was adjusted to 6 prior to autoclavation. Vitamin solution was added to the medium after autoclavation. Trace metals solution was added after autoclavation. Trace metals solution contained (L−<sup>1</sup> ): 4.5 g CaCl2-2H2O, 4.5 g ZnSO4-7H2O, 3 g FeSO4-7H2O, 1 g H3BO3, 1 g MnCl2-4H2O, 0.4 g Na2MoO4-2H2O, 0.3 g CoCl2-6H2O, 0.1 g CuSO4-5H2O, 0.1 g KI, 15 g EDTA. The trace metals solution was prepared by dissolving all the components except EDTA in 900 mL ultrapure water at pH 6. The solution was then gently heated and EDTA was added. In the end, the pH was adjusted to 4, and the solution volume was adjusted to 1 L and autoclaved (121◦C in 20 min). This solution was stored at + 4 ◦C. Vitamin solution contained (L-1): 50 mg biotin, 200 mg p-aminobenzoic acid, 1 g nicotinic acid, 1 g Capantothenate, 1 g pyridoxine-HCl, 1 g thiamine-HCl, 25 g myo-inositol. Biotin was dissolved in 20 mL 0.1 M NaOH and 900 mL water is added. pH was adjusted to 6.5 with HCl and the rest of the vitamins were added. pH was re-adjusted to 6.5 just before and after adding m-inositol. The final volume was adjusted to 1 L, sterile-filtered and stored at + 4 ◦C.

Composition of M-Sc.syn-1000, vitamin solution and enzyme mix are proprietary and were not disclosed to us.

The main difference between the mineral medium and the synthetic M-Sc.syn-1000 used here deals with the carbon source that each contains. In mineral medium used for batch cultivations, glucose is present as at a starting concentration of 20g.L−<sup>1</sup> while synthetic medium M-Sc.syn-1000, used for fedbatch, contains a solubilized glucose polymer which, due to the activity of glucose releasing enzymes, allows for the slow release of glucose monomers and limits biomass production.

All cultivations were performed at 30◦C with 1,000 rpm in the microbioreactor system BioLector (m2p-labs). Forty eight cultivations were run in parallel in FlowerPlates (m2p-labs) with a working volume of 1.1 mL per well. Biomass growth (light scattering units, LSU) and fluorescence (relative fluorescence units, RFU) were monitored online approximately every 12 min.

#### Outlier Removal and Data Correction

Some cultivations were removed before further analysis due to data quality or failure of the cultivation to proceed normally.

## Background Correction and Alignment of Measurements

To correct the biomass measurements for background signal, we calculated the growth rates µ via two different methods and compared them for various possible backgrounds to determine the optimal background value. The first method is derived from Poulsen et al. (2003), and is known as the log of slope (LOS) method.

Poulsen et al. (2003) note that experimental data that can be described by an exponential function sometimes contains an offset, giving a function of the form

$$b = b\_0 \cdot \exp(\mu t) + c \tag{1}$$

Where b is biomass and t is time, µ is growth rate, c is the background, b<sup>0</sup> is the initial value of of b at t = 0. In order to remove the background and calculate µ, take the derivative with respect to t and take the natural log to get

$$
\log\left(\frac{db}{dt}\right) = \log(b\_0\mu) + \mu t \tag{2}
$$

Thus, the slope of any rectilinear section of the LOS plot will be the specific growth rate µ.

Here, we extend this past what is described by Poulsen et al. (2003) by taking the derivative of Equation (2) with respect to time and further simplifying with the chain rule to get:

$$
\mu = \frac{d}{dt} \left( \log \left( \frac{db}{dt} \right) \right) = \frac{d^2 b}{dt} \Big/ \frac{db}{dt} \tag{3}
$$

Thus, we see that if we have a function describing the data, assuming an underlying exponential dependence on t, dividing the second derivative of this function by the first derivative will give us µ. This expression for µ is not dependent on the magnitude of b and is thus independent of background.

The second method of calculating growth rate is simply the definition of a growth rate normalized to population size:

$$
\mu = \frac{db/dt}{b} \tag{4}
$$

This method is commonly applied for calculating growth rate of cell populations (Ronen et al., 2002; De Jong et al., 2010; Rudge et al., 2016). As this depends directly on b, it is thus dependent on background.

It was observed that the biomass profiles of batch cultures exhibit a slight peak at the diauxic shift (Altintas et al., 2016; Figure 3B). For each biomass profile, the time of the peak was identified by determining the time at which light scattering began decreasing. If no peak was found, the peak was artificially set to 10 h for batch cultures and 33 h for fed-batch cultures. A window of 7.5 to 0.5 h before the diauxic shift peak for batch cultures and 20.5 to 1.5 h before the diauxic shift peak for fed-batch cultures was then used to calculate growth rates via each of the two methods described above in Equations (3) and (4). These time windows were chosen as the growth curves appeared approximately exponential in these regions, which is an assumption of the LOS method. Derivatives were computed numerically from a smoothing spline fit to the data. The first and last 6 points of the calculated growth rates were trimmed to account for edge effects of the smoothing spline and the sum of squared errors (SSE) between the remaining growth rate profiles were compared. The growth rate calculation via the Equation (4) was repeated for a range of background values from 0 to 13 for batch cultures and 0 to 10 for fed-batch and compared for each value to growth rate calculated via Equation (3). The background value giving the minimum SSE for each biomass profile was chosen.

To correct for background fluorescence, we used fluorescence measurements from a control strain (p0125) cultivated under the same conditions as experimental strains. These measurements were used to define a function relating control fluorescence (autofluorescence) as a function of biomass, which we will call fc(b). For each experimental strain, each biomass measurement was input into fc(b) to obtain the autofluorescence for that amount of biomass. This autofluorescence was then subtracted from the corresponding experimental fluorescence associated with that biomass measurement. This process was carried out for all biomass/fluorescence measurements in each experimental strain, resulting in a background-corrected fluorescence profile for each replicate.

In order to compare replicates we aligned the growth curves such that landmark events occur at the same time. This is necessary as the lag phase may last different amounts of time for different cultivation experiments even when the pattern of biomass accumulation is the same after entry to exponential phase. In this case we are interested in the diauxic shift so we used this as the landmark with which to align the growth curves. As previously described we identified the diauxic shift peak and determined the difference between this landmark and a reference time, which we set to be at 15 h for batch cultures and 30 h for fed-batch cultures. This difference was subtracted from the entire set of times corresponding to both biomass and fluorescence measurements to determine the corrected times; moving forward, we only use data from corrected times greater than 0. A similar procedure for growth curve alignment is discussed by van Ditmarsch and Xavier (2011).

#### Promoter Activities in Small-Scale Batch Cultivations

Background-corrected biomass and fluorescence profiles were input into the PromAct model (Kannan et al., 2018) using a protein maturation rate of 0.45 h −1 as described for EGFP (Heim et al., 1995) and a protein degradation rate of 0.5 h −1 as described for yEGFP3-Cln2PEST (Mateus and Avery, 2000). mRNA degradation rate was set as <sup>1</sup> 3 h <sup>−</sup><sup>1</sup> based on genome-wide estimates of mRNA degradation in S. cerevisiae (Munchel et al., 2011; Neymotin et al., 2014).

The promoter activity profiles were cropped to 35 h for batch and 60 h for fed-batch cultures on the corrected time scale for standardization. Lag phase was identified as when the rate of change of the average biomass profile over all replicates from a given promoter first surpassed 0.5 LSU per h. The diauxic shift was identified as the time between the characteristic peak, identified as where the derivative of the average biomass profile over all replicates becomes crosses 0, to when the derivative next surpasses 0.5 LSU per h following this peak. For batch cultures, the stationary phase was also identified using the same peak detection method as with the diauxic shift identification, as a similar characteristic peak is seen. A similar method for growth phase identification was used by Altintas et al. (2016). "Early" and "Late" designations in **Figure 4** indicate the first and second half of each growth phase, respectively.

# Measurement of 3-hydroxipropionic Acid Concentration

Strains expressing CaMCR (malonyl-CoA reductase from Chloroflexus aurantiacus) for the production of 3HP were cultivated in FlowerPlates in the BioLector, in the same conditions as described above, in batch and fedbatch modes. Samples for 3HP measurement were taken during the course of the cultivations, 3HP concentration was measured using a dedicated enzyme assay. The enzyme assay was performed as described in the supplementary materials of Borodina et al. (2015).

#### RESULTS

#### Selection of Promoters Through Analysis of Genome-Wide Transcription Datasets

In order to identify promoters with glucose-dependent expression profiles, we searched the literature for genomewide expression datasets comparing glucose-limited and glucose excess conditions. Boer et al. (2003) characterized the specific transcriptional responses of S. cerevisiae cells growing at steady state in chemostats under growth limitation by carbon, nitrogen, phosphorus, or sulfur. We performed an analysis of this dataset in order to directly compare gene expression levels in the glucose limiting condition to the average of expression levels observed while not glucose limited (e.g., combining the results from the sulfur, the phosphorus and the nitrogen limiting conditions) which we describe as "glucose excess" conditions.

The 2D histogram (**Figure 2B**) displays the comparison of gene expression levels between glucose-limited and glucose excess conditions. The volcano plot (**Figure 2A**) displays the genes found significantly differently expressed between glucoselimited and glucose excess conditions. Four archetypes of gene expression were identified for their known roles and are represented in **Figure 2**: (1) TEF1 archetype with constitutive expression in both conditions, (2) HXT1 archetype with high expression in glucose-excess and low expression in glucoselimiting conditions, (3) ADH2, and (4) MAL12 archetypes with low expression in glucose-excess and high expression in glucose limiting conditions. As some of the archetypes are not differentially expressed in this comparison, e.g., TEF1, the identified genes correlated with them are most often not differentially expressed in this comparison. By contrast, genes significantly correlated to ADH2 and HXT1 were among the most differentially expressed in glucose-excess vs. glucose limiting conditions.

From the archetype correlation analysis, 34 genes were identified with diverse expression profiles in respect to dynamic regulation of gene expression. Promoter regions from all 34 genes were cloned (or attempted) as reported in the materials and methods section. Although not all were expected to be significantly differentially expressed, 20 genes were found significantly up-regulated in glucose limiting conditions and 5 genes were significantly down-regulated in glucose excess conditions (p < 0.05 after Benjamini-Hochberg correction).

# Characterization of Promoter Activities in Microfermentations

#### Microscale Batch Cultivations

Batch cultivations were followed until stationary phase was reached. As presented in **Figure 3**, the two classical growth phases of a S. cerevisiae batch fermentation, where glucose is the sole carbon source, were clearly observed with a glucose growth phase followed by an ethanol growth phase separated by a diauxic shift (Monod, 1941). A few typical profiles of promoter activity are presented in **Figure 3**: constitutive expression, i.e., TEF1, high expression in glucose-excess conditions, i.e., PGK1 and TDH3, activation at low glucose concentration, i.e., HXK1, HXT7, and MAL12 or expression when growing on ethanol in the absence of glucose, i.e., ADH2, ICL1, and ACS1. None of the promoters tested showed sustained activity in stationary phase, likely due to the lack of metabolic activity in the absence of carbon sources. Promoter activities of other promoters tested can be found in Figure S1.

A heat map and cluster analysis were applied to all promoter activities in the batch condition (**Figure 4**). The cluster analysis reveals a tight correlation between promoter activities experimentally determined here and the gene expression categories defined from the analysis of genomewide transcription data presented in **Figure 2**, which reflects that the different promoters cloned upstream of yEGFP3- Cln2PEST respond mostly in a similar fashion as when they are in their native context. Globally, two main branches appear, at the top of the figure promoters active in glucose excess conditions, and at the bottom can be observed promoters activating at low glucose concentration or when glucose is absent (**Figure 4**). This cluster analysis grouped profiles by archetype group in all but 6 cases where cluster grouping seemed to contradict with the gene expression data. This was observed for the promoters of ISF1, FDH1, FOX2, RGI2, YIG1, and EGO4 (**Figure 4**). In the case of EGO4, (RGI2 YIL057C), and FOX2, this may be explained by their unclear promoter activity patterns (Figure S1). The three others, ISF1, FDH1, and YIG1, clearly behave in opposite fashion compared to what was expected from the gene expression analysis (**Figure 2**).

counts, (B) plot of expression levels in glucose limiting vs. glucose excess conditions with a 2D histogram of gene counts. The four gene expression archetypes identified are represented: (1) TEF1 archetype (red) with constitutive expression in both conditions, (2) HXT1 archetype (green) with high expression in glucose-excess and low expression in glucose-limiting conditions, (3) ADH2 (light blue), and (4) MAL12 (orange) archetypes with low expression in glucose-excess and high expression in glucose limiting conditions.

#### Microscale Fed-Batch Cultivations

In this microscale fed-batch cultivation, realized using the glucose slow release technology Feed-In-Time (m2p-labs), two main growth phases are observed: (1) an initial phase until approximately 25 h of cultivation where residual glucose is still present at a low level (2) a glucose-limiting growth phase after 25 h of cultivation where biomass growth is limited by the slow release of glucose (**Figure 5**).

In terms of promoter activity, two major activity profiles appear: (1) promoters characterized by a higher and sustained activity in the first growth phase that remain active in the second phase, although to a lower extent (e.g., TEF1, PGK1, TDH3, HXT7, MAL12); and (2) promoters that strongly activate at the onset of glucose limiting phase and that quickly become at most weakly active or inactive (ADH2, ACS1, ICL1).

Based on the outcome of promoter activity characterization, it was decided to run a proof of concept study and test a subset of promoters for transient expression and controlled production of a commercially relevant compound, with the objective of restricting its production only in "production" phase in fed-batch and not during the seed train phases carried on as batch.

## Proof of Concept Study: Dynamic Regulation of 3-hydroxypropionic Acid (3HP) Production

3HP is a platform chemical that can be converted into acrylic acid, 1,3-propanediol, malonic acid, and other valuable chemicals. In 2011, the world annual production of acrylic acid was 5000 kMT and the market size was USD 11.5 billion (Borodina et al., 2015). Acrylic acid-derived products include superabsorbent polymers used in diapers and incontinence products, plastics, coatings, adhesives, elastomers, and paints (Borodina et al., 2015). In biological systems, 3HP can be synthesized via at least four different intermediates: glycerol, lactate, malonyl-CoA or β-alanine (Kumar et al., 2013). Production of 3HP based on the βalanine or the malonyl-CoA have been reported in S. cerevisiae (Chen et al., 2014; Borodina et al., 2015; Kildegaard et al., 2016). Furthermore, Kildegaard et al. reported a reduced growth rate of the strains expressing the malonyl-CoA route for 3HP production, especially in the case where pathway genes were integrated in multiple copies (Kildegaard et al., 2016).

In an attempt to control production of 3HP via the malonyl-CoA route, with the objective to drive 3HP production only in "production" phase in fed-batch and not during the seed train phases carried on as batch, three promoters were selected based on their expression profiles: pHXT7, pADH2, and pICL1. They were selected for their weak (if not absent) expression in cultivation with excess of glucose and their induction in glucose limiting conditions (**Figure 3**). Promoters of HXT7, ADH2, and ICL1 were then cloned upstream of the gene encoding bi-functional malonyl-CoA reductase from Chloroflexus aurantiacus (CaMCR), which is the only heterologous enzyme necessary for producing 3HP via the malonyl-CoA route in S. cerevisiae. As a reference, CaMCR was expressed under the control of pPGK1, a promoter which is both active in glucose excess and glucose limiting conditions (**Figures 3**, **5**). When pPGK1 is controlling the expression of CaMCR, a clear accumulation of 3HP is observed already at the early stages of growth on glucose in the batch condition (**Figure 6**). This is clearly not the case when the (glucose-) regulated promoters pHXT7, pADH2 and pICL1 are

controlling CaMCR expression, as here production of 3HP remains below detection limit of the assay (**Figure 6**). In the fed-batch cultivation condition, where all promoters are being activated, 3HP production is restored. Expressing CaMCR from the regulated ICL1 promoter resulted in a 1.7-fold increase in 3HP production when compared to the constitutive expression from PGK1 promoter (**Figure 6**).

It is clearly demonstrated here that production of a metabolite of interest can be turned on and off by using a set of dynamically activated promoters. Furthermore, turning off 3HP production in growth phases of batch fermentation did not affect the final production titer in the fed-batch condition, and was even beneficial, especially when pICL1 controlled expression of CaMCR.

#### DISCUSSION

In this study, specific genome-wide expression datasets were analyzed in order to identify genes characterized by different expression patterns in glucose excess as compared to glucose limiting conditions. Four archetypes of gene expression were defined. A number of genes with expression patterns qualified as interesting were selected and their promoters cloned upstream of a quickly maturing rapidly degrading GFP. Destabilized

GFP variants are better suited for the study of transient gene expression as native GFP variants have particularly long halflives. In this study, it was decided to use yEGFP3-Cln2PEST which is a destabilized variant of GFP that has successfully been used to monitor dynamic changes in gene expression in yeast (Mateus and Avery, 2000). After characterizing promoter activities both in batch with glucose as sole carbon source and glucose-limiting fedbatch conditions, a hand full of promoters were chosen for a proof of concept study where controlled production of the commercially relevant chemical 3HP was demonstrated.

Comparing results from the genome-wide gene expression analysis and from the characterization of promoter activity in the batch cultivation conditions, it was interesting to notice the correlation between the archetypes defined for the genomewide gene expression analysis (**Figure 2**) and the clusters obtained for the promoter activity characterization (**Figure 4**). This confirmed that most of the promoters cloned upstream yEGFP3-Cln2PEST respond according to the archetype defined for the gene they relate to. It also indicated that elements responsible for transcriptional regulation of those genes are present in the promoter sequence used in this study. This validated the promoter sequences that were chosen here. Discrepancies were observed though for six genes. For three of them, ISF1, FDH1, and YIG1, the promoter that was cloned triggered an expression pattern opposite to what was reported for the genome-wide gene expression analysis. This may indicate that the chosen promoter sequence is lacking regulatory elements otherwise present in the native promoter, or that DNA structure at the integration site affects activity pattern of these promoters.

Taking a look at promoter activities measured during the microscale fedbatch cultivation, it appeared that while some promoters were only activated at the onset of glucose limitation (e.g., promoters of ACS1, ADH2, ICL1, SFC1, SPG4), others were characterized by a much wider expression time window, with an early activation (e.g., promoters of HXT7, MAL12, FMP3) (**Figure 5**). This may be explained by the fact that the latter promoters are active already in the phase where residual glucose is present but at a low concentration. This early phase of the microscale fedbatch experiment is an intermediary phase caused by the technology employed here, i.e., the Feed-In-Time technology as a low amount of free glucose is initially present in the medium and acts as a carbon source until the slow enzymatic release of glucose from the polymer takes over and triggers glucose limitation. The low concentration of glucose in this early phase may still allow for activation of the above-mentioned promoters (**Figure 5**).

Surprisingly, promoters like ADH2, ICL1, or ACS1 were characterized by a burst of expression at the onset of the glucose limiting phase. These promoter activity profiles suggest a rapid increase and decrease in their activity although they appear to maintain some activity for roughly 10 h after activation. This was not necessarily expected as the corresponding genes were clearly maintaining expression in the glucose limited continuous cultivation of the genomewide gene expression analysis (**Figure 2**). It further underlines the necessity for testing genetic elements for controlled expression in conditions as close as possible to the intended bioprocess.

Benefits of dynamic control of metabolic pathways have been demonstrated before, as described in the introduction. Nevertheless, this study further confirms that transient activation of a metabolic pathway does not negatively impact production of a compound of interest, here 3HP, but rather benefits production performance, as shown here in the case where ICL1 promoter restricts expression of CaMCR to the microscale fedbatch cultivation stage (**Figure 6**). Even-though other factors, e.g., timing of MCR expression, substrate availability or pathway intermediate concentrations may also contribute to the benefits observed on 3HP production, it is surprising that such a sudden and short expression timespan triggered by ICL1 promoter can result in superior production of the compound of interest

as compared to longer expression timespan as in the case of pHXT7 or pPGK1. This certainly raises the question whether sudden, short but high expression bursts at the onset of changing conditions could not be sufficient to sustain efficient production of chemicals of interest. Reducing the timespan of expression of a promoter may be beneficial to the overall production process as it may restrict the loads of energy spent on transcription and translation of the specific gene only to a brief time-window. Recently Christiano et al. measured 8.8–12 h as median half lives of proteins in S. cerevisiae (Christiano et al., 2014). Therefore, once synthesized the functional protein would still remain in the cells even though the promoter of its gene, hence transcription and translation of that gene, would be off during most of the cultivation. Promoters like ICL1, ADH2, ACS1, combined with other synthetic biology tools that could expand mRNA

half-lives, e.g., sequences improving mRNA stability (Curran et al., 2013; Yamanishi et al., 2013), may certainly play an active role.

# CONCLUSION

Through analysis of gene expression data, promoters showing specific responses to glucose excess or limitation were identified. DNA sequences of these promoters were cloned upstream of a quickly maturing and rapidly degrading green fluorescent protein to allow the study of promoter activities. Dynamics of promoter activities for a number of promoters in batch with glucose as sole carbon source, or glucose limiting conditions were analyzed. A subset of promoters, pADH2, pICL1, and pHXT7, were demonstrated as suitable for dynamic control

pADH2 driving production of CaMCR. Error bars on 3HP levels represent standard error of the mean, n = 3. The gray ribbon on Biomass represents plus/minus one standard deviation. 3HP concentration is in g/L and Biomass is measured in light scattering units (LSU).

of production of a commercially relevant compound, i.e., 3HP, in S. cerevisiae. The set of promoters and the dynamic promoter activity profiling presented here will contribute to the molecular biology toolbox available to metabolic engineers.

#### AUTHOR CONTRIBUTIONS

IB and JM conceived the design of the study. JN contributed with initial ideas about the project and discussions on the possible promoter set for evaluation. JM, NJ, FÖ, and KK performed DNA work. JM generated the strains and performed small scale cultivations. JM and CW performed the transcriptome data analysis. JF participated in the design of the study and interpretation of initial results. SK and CW performed dynamic promoter activity analyses. SK, CW, and JM analyzed

#### REFERENCES


the data. SK, JM, IB, and CW drafted the manuscript and all authors contributed to preparing the final version of the manuscript.

### FUNDING

The Novo Nordisk Foundation for financial support (Novo Nordisk Foundation, Grant Number NNF10CC1016517, http://www.novonordiskfonden.dk). The Denmark-America Foundation & Fulbright Commission.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fbioe. 2018.00063/full#supplementary-material


by analysis of Aspergillus niger in batch culture. 25, 565–571. Biotechnol. Lett. doi: 10.1023/A:1022836815439


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Maury, Kannan, Jensen, Öberg, Kildegaard, Forster, Nielsen, Workman and Borodina. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Dynamic Control of *ERG20* and *ERG9* Expression for Improved Casbene Production in *Saccharomyces cerevisiae*

Roberta Callari, Yvan Meier, Davide Ravasio and Harald Heider\*

Evolva SA, Reinach, Switzerland

#### *Edited by:*

Rodrigo Ledesma-Amaro, Imperial College London, United Kingdom

#### *Reviewed by:*

John A. Morgan, Purdue University, United States Dae-Hee Lee, Korea Research Institute of Bioscience and Biotechnology (KRIBB), South Korea Paola Branduardi, Università degli studi di Milano Bicocca, Italy Jose L. Revuelta, Universidad de Salamanca, Spain

> *\*Correspondence:* Harald Heider haraldh@evolva.com

#### *Specialty section:*

This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology

*Received:* 18 July 2018 *Accepted:* 16 October 2018 *Published:* 01 November 2018

#### *Citation:*

Callari R, Meier Y, Ravasio D and Heider H (2018) Dynamic Control of ERG20 and ERG9 Expression for Improved Casbene Production in Saccharomyces cerevisiae. Front. Bioeng. Biotechnol. 6:160. doi: 10.3389/fbioe.2018.00160 Production of plant metabolites in microbial hosts represents a promising alternative to traditional chemical-based methods. Diterpenoids are compounds with interesting applications as pharmaceuticals, fragrances and biomaterials. Casbene, in particular, serves as a precursor to many complex diterpenoids found in plants from the Euphorbiaceae family that have shown potential therapeutic effects. Here, we engineered the budding yeast Saccharomyces cerevisiae for improved biosynthesis of the diterpene casbene. We first expressed, in yeast, a geranylgeranyl diphosphate synthase from Phomopsys amygdali in order to boost the geranylgeranyl diphosphate pool inside the cells. The enzyme uses isopentenyl diphosphate and dimethylallyl diphosphate to directly generate geranylgeranyl diphosphate. When co-expressing a casbene synthase from Ricinus communis the yeast was able to produce casbene in the order of 30 mg/L. Redirecting the flux from FPP and sterols, by means of the ergosterol sensitive promoter of ERG1, allowed for plasmid-based casbene production of 81.4 mg/L. Integration of the target genes into the yeast genome, together with the replacement of the promoter regions of ERG20 and ERG9 with combinations of ergosterol- and glucose-sensitive promoters, generated a titer of 108.5 mg/L of casbene. We here succeeded to engineer an improved route for geranylgeranyl diphosphate synthesis in yeast. Furthermore, we showed that the concurrent dynamic control of ERG20 and ERG9 expression, using ergosterol and carbon source regulation mechanisms, could substantially improve diterpene titer. Our approach will pave the way for a more sustainable production of GGPP- and casbene-derived products.

Keywords: casbene, diterpene, *ERG20*, *ERG9*, yeast, metabolic engineering, dynamic control, mevalonate pathway

# INTRODUCTION

Diterpenoids represent one of the largest and most diverse classes of plant metabolites. Although in some cases they carry out important primary functions (e.g., regulation of growth and development by gibberellins), they are usually products of the secondary metabolism with specialized pathways extremely varied across the plant kingdom (Zerbe and Bohlmann, 2015). Diterpenoids are beneficial for plants because of their role in protection from abiotic stress and control of the ecological interactions with other organisms (e.g., defense against herbivores and microbial pathogens) (Cheng et al., 2007; Tholl, 2015). Moreover, they can benefit humanity because of their industrial application as pharmaceuticals (e.g., paclitaxel, forskolin, ingenol-3-angelate, prostratin), fragrances (e.g., sclareol), and other industrial bioproducts (e.g., steviol glycosides as natural sweetners, diterpene resins for inks, and coatings) (Bohlmann and Keeling, 2008; Goyal et al., 2010; Caniard et al., 2012; Doseyici et al., 2014; Fidler and Goldberg, 2014; Howat et al., 2014; Miana et al., 2015).

Current production methods of diterpenoids rely on extraction from natural sources and chemical synthesis. Because of the low yield occurrence in the producing organisms and the structural complexity of the compounds, such methods are inefficient and environmentally costly. An attractive and environmentally friendly alternative is represented by microbial fermentation. Biosynthesis in heterologous hosts such as Escherichia coli or Saccharomyces cerevisiae can (1) reduce costs using sugar-based carbon sources, (2) increase sustainability by avoiding harvesting and extraction from natural sources, (3) increase yield and productivity using genetic manipulation of the heterologous hosts, and (4) provide enantiomerically pure products through enzymatic biocatalysis (Scalcinati et al., 2012). S. cerevisiae in particular is a robust host that not only offers the biosynthetic machinery needed for production of diterpenoids, but also contributes the necessary environment for expression of membrane-bound enzymes, such as cytochrome P450 hydroxylases. These p450 enzymes are frequently involved in the biosynthesis of complex plant terpenoids and are usually difficult to express in prokaryotic systems (Hamann and Møller, 2007; Kirby and Keasling, 2009).

The precursors for production of terpenes are present in the native metabolic network of S. cerevisiae (Chambon et al., 1991). The yeast mevalonate (MVA) pathway, through multiple rounds of condensation of isopentenyl diphosphate (IPP) and dimethylallyl diphosphate (DMAPP), leads to generation of geranyl diphosphate (GPP), farnesyl diphosphate (FPP) and geranylgeranyl diphosphate (GGPP) (**Figure 1**). GPP, FPP, and GGPP represent the universal precursor units of all monoterpenes (C10), sesquiterpenes (C15), and diterpenes (C20) respectively. However, the GGPP content in S. cerevisiae is rather low, due to the fact that the endogenous geranylgeranyl diphosphate synthase (GGPPS) Bts1p does not compete efficiently with enzymes directly upstream and downstream (farnesyl diphosphate synthase Erg20p and squalene synthase Erg9p) for the pools of IPP and FPP (Jiang et al., 1995). Thus, the endogenous flux of the pathway favors production of FPP, and the sterols derived from it. As a consequence, improvement of the GGPP pool is essential for efficient, hightiter diterpene production. To reach this goal different studies have focused on either the use of heterologous GGPP synthases, Erg20p mutants that are able to synthetize GGPP, or on fusions of BTS1 and ERG20 (Kirby et al., 2010; Zhou et al., 2012; Ignea et al., 2015; Song et al., 2017). Such strategies have been shown to improve production of GGPP-derived compounds, but they have also demonstrated that additional improvements in substrate conversion toward diterpene production are feasible.

In this study, we developed a yeast platform for production of the diterpene hydrocarbon casbene, originally identified from castor bean (Ricinus communis) where it serves as a phytoalexin (Mau and West, 1994). In the last decades, the compound has gained much interest since the conversion of GGPP into casbene, catalyzed by the casbene synthase enzyme, is acknowledged as the first committed step in the biosynthesis of the diterpenoid molecular backbones jatrophane, tigliane, lathyrane, and ingenane (Kirby et al., 2010). Casbene is the precursor of many diterpenoids of medical interest, identified in a number of Euphorbiaceae (**Figure 2**). For example, Ingenol-3-angelate, found in the sap of Euphorbia peplus, has been approved by the FDA for the treatment of actinic keratosis (Fidler and Goldberg, 2014) whilst Prostratin, derived from Homolanthus nutans is a protein kinase C activator that inhibits HIV-1 infections and reduces HIV-1 latency (Miana et al., 2015). Jatrophone, isolated from extracts of Jatropha gossypifolia L., has shown significant anti-proliferative effects against human tumor cell lines (Kupchan et al., 1970; Theoduloz et al., 2009) and the tigliane derived diterpenoid, resiniferatoxin from Euphorbia resinifera is effective against a broad range of inflammatory and neuropathic pain conditions (Iadarola and Gonnella, 2013). Euphorbia factor L2 from Euphorbia lathyris induces apoptosis in lung cancer cell lines (Lin et al., 2017).

For efficient production of casbene in yeast, we increased the GGPP supply inside the cells by expression of a truncated, fungal GGPP synthase, from the endophytic fungus Phomopsis amygdali. Expression of the enzyme, producing GGPP from IPP and DMAPP directly, significantly improved GGPP biosynthesis and allowed, upon co-expression of the casbene synthase from R. communis, for production of casbene in the order of 30 mg/L. Similar titers have been reported previously (Kirby et al., 2010). Dynamic control of genes ERG20 and ERG9 by means of ergosterol- and glucose-sensitive promoters further redirected the flux to GGPP and diterpene synthesis, improving casbene production by up to 108.5 mg/L.

We envision that our approach will pave the way toward the sustainable production of various GGPP- and casbene-derived isoprenoids.

#### MATERIALS AND METHODS

#### Chemicals and Media

All chemicals were bought from Sigma-Aldrich (St. Louis Missouri, USA) unless stated otherwise. Authentic standard of cembrene was purchased from CHEMOS GmbH & Co. KG (Regenstauf, Germany). Authentic standard of casbene was received from Professor Birger MØller's laboratory at the University of Copenhagen.

LB medium for growth of Escherichia coli was supplied from Carl Roth GmbH + Co. KG (Karlsruhe, Germany), and was

**Abbreviations:** MVA, mevalonate; IPP, isopentenyl diphosphate; DMAPP, dimethylallyl diphosphate; GPP, geranyl diphosphate; FPP, farnesyl diphosphate; GGPP, geranylgeranyl diphosphate; GGPPS, geranylgeranyl diphosphate synthase; PaGGPPS, geranylgeranyl diphosphate synthase from Phomopsis amygdali; RcCBS, casbene synthase from Ricinus communis; CAS, casbene; OD600, optical density at 600 nm; PaFS, fusicoccadiene synthase from Phomopsis amygdali; ORF, open reading frame.

supplemented with 100 µg/L of ampicillin for amplification of plasmids. Yeast Extract Peptone Dextrose (YPD) medium with 20 g/L glucose was used for growth of wildtype strains prior to transformation. For preparation of pre- and main cultures we used synthetic complete (SC) drop-out medium (Formedium LTD, Hustanton, England), supplied with 6.7 g/L yeast nitrogen base, 20 g/L glucose and all amino acids necessary for the corresponding auxotrophy.

#### Plasmids and Strains Construction

**Table 1** lists all plasmids constructed in this work. Coding sequences (**Table S1**) were synthesized by GeneArt <sup>R</sup> (Thermofisher Scientific, Zug, Switzerland) as yeast codon optimized versions. Synthetic genes were cloned via HindIII HF, SacII restriction digestion and T4 DNA ligase (New England Biolabs, Ipswich, Massachusetts, USA) according to standard protocols (Green and Sambrook, 2012). E. coli XL10 Gold (Agilent, Santa Clara, California, USA) cells were used for subcloning of genes.

S. cerevisiae strains generated throughout this study are listed in **Table 2**. All constructed strains were derived from strain NCYC 3608 (NCYC, Norwich, United Kingdom), a derivative of strain S288C modified in our labs and carrying a truncated 3-hydroxy-3-methylglutaryl coenzyme A reductase (tHMGR) in the YCT1 locus to increase the flow through the early part of the mevalonate pathway (CAS1- MATalpha ho10 his310 leu210 ura310 CAT5(T91M) MIP1(-661T) SAL1 GAL2 YLL055w::loxPpTDH3-Sc\_tHMGR-tCYC1). All yeast strains were stored in 25% glycerol at −80◦C. Yeast transformation was performed using the classical lithium acetate method (Gietz and Schiestl, 2007).

Transformants were grown on agar plates prepared with selective SC drop-out medium. For integration into the yeast genome, genes were cloned into yeast integration plasmids targeting incorporation into specific sites (**Table 1**). Transformants were verified by PCR on genomic DNA for correct insertion of heterologous genes.

The 805 bp promoter region of ERG1 **(Table S2)** was amplified from genomic DNA of S. cerevisiae S288C after lysis in 30 µL 0.2% SDS at 95◦C for 5 min and clarification at 14,000 g for 5 min. iProofTM High-Fidelity DNA Polymerase was used according to the manufacturer's protocol with primer pair F\_ERG1p\_SpeI/ R\_ERG1p\_SacII. The PCR product was cut with SpeI/SacII, and inserted into the integration vector pUG72 (Gueldener et al., 2002) close to the URA3 marker flanked by loxP sites. Marker and promoter were amplified with primers pair F\_ERG9int/ R\_ERG9int and transformed in yeast for homologous recombination at the ERG9 site **(Table 3)**.

Plasmid pCAS4 for replacement of the promoter region of ERG20 **(Table S2)** was kindly provided by Michael Eichenberger. The plasmid already contained up- and down-tags for homologous recombination at the ERG20 site, a URA3 marker and the promoter of CYC1 for exchange with the promoter of ERG20. PCYC<sup>1</sup> was released from the plasmid with BglII and HindIII and replaced by promoters of ERG1 and HXT1. PERG<sup>1</sup> was amplified from plasmid pCAS3 with primers pair F\_ERG1p\_BglII and R\_ERG1p\_HindIII. PHXT<sup>1</sup> was amplified from an expression vector found in our labs with primers pair F\_HXT1p\_BglII and R\_HXT1p\_HindIII (**Table 3**). The PCR products were cut with BglII/HindIII, and inserted into the integration vector pCAS3 to replace PCYC1. Strains with successful replacement of native promoters of ERG20 and ERG9 were verified by diagnostic PCR.

#### Yeast Growth

Yeast batch cultures for production of metabolites were performed with the System Duetz (EnzyScreen, Heemstede, Netherlands) in an ISF-1-W shaker (Kuhner, Birsfelden, Switzerland) at 30◦C, 300 rpm, and 5 cm shaking diameter. Precultures were prepared by inoculation of single colonies into 3 mL of selective liquid SC medium in triplicates. Pre-cultures were grown for 24 h at 30◦C, 160 rpm. Optical density at 600 nm (OD600) of a 1:40 dilution was measured in an Ultrospec 10 table top spectrophotometer (GE Healthcare, Little Chalfont, United Kingdom). Main cultures were inoculated in 2.5 mL of selective SC medium in a 24-deepwell microplate to an OD<sup>600</sup> of 0.1 and incubated from 24 up to 120 h as stated.

#### Sample Preparation

Cells from a 2 mL volume were harvested by centrifugation at 4,000 rpm for 5 min in 2 mL screw cap tubes. Supernatants were discarded and pellets were mixed with 500 µL of methanol. Pellets were shaken for 10 min at 60◦C and 1,500 rpm in a Thermo-Shaker TS-100 (Axonlab, Reichenbach an der Fils, Germany). Cell debris was removed by centrifugation at 4,000 rpm for 5 min and liquid phases were transferred to glass vials. The methanol extracts were evaporated for 24 h at room temperature under the fume hood, resolubilized in 500 µL hexane and used for analysis.

## Detection of Diterpenes

Casbene and alcohols derived from FPP and GGPP were detected using gas chromatography–mass spectrometry (GC–MS). Quantification was carried out using the Agilent Technologies 7890a GC system (Agilent Technologies, Santa Clara, USA) equipped with a 5975 Mass Selective Detector (MSD). Sample volumes of 1 µL were injected in splitless mode at 250◦C with the following GC program: 80◦C for 2 min, raise to 220◦C at 25◦C/min, raise to 240◦C at 3◦C/min, ramp at rate 80◦C/min to 300◦C and held for 3 min. Helium was used as carrier gas with a constant flow rate of 1.2 mL/min. The MSD was operated at 70 eV in scan mode, with a range between 35 and 500 m/z. Diterpenes were identified by retention time and comparison with a mass spectral database (NIST version 2.0, Gaithersburg, MD, USA) and/or comparison with authentic standards. β-Caryophyllene (10 mg/L) was added in each sample as internal standard and calibration curves were generated to quantify the target compounds. In the case of geranylgeranyol an authentic standard of the compound was used, whereas casbene quantification was approximated by a standard curve generated using an authentic standard of the diterpene cembrene.

## RESULTS

#### Establishing Casbene Production in *S. cerevisiae*

In order to assess casbene (CAS) production in yeast, the casbene synthase gene from R. communis (RcCBS) was cloned as a codon optimized version under control of the promoter of TEF1 in a yeast expression plasmid ("pCAS1"). The N-terminal

TABLE 1 | List of plasmids constructed in this work.


chloroplast transit peptide, identified with the ChloroP software used at default settings (Emanuelsson et al., 1999) and needed for plastidial localization in plants, was removed (Kirby et al., 2010). Plasmid pCAS1 was introduced into yeast strain H-MEV, engineered for production of isoprenoids. H-MEV contained a chromosomally integrated copy of a truncated 3-hydroxy-3 methylglutaryl coenzyme A reductase (tHMGR) enabling high mevalonate pathway activity (Donald et al., 1997). The resulting strain (CAS1), was analyzed for production of casbene. Small amounts of the compound (∼ 0.8 mg/L) could be detected after 72 h of growth (data not shown).

To boost casbene production, we co-expressed a heterologous GGPP synthase from the plant-pathogenic fungus Phomopsis amygdali. Toyomasu et al. (2007) found that synthesis of the diterpene fusicoccadiene in P. amygdali is catalyzed by the unusual chimeric enzyme fusioccadiene synthase (PaFS). This multifunctional enzyme can synthetize fusicoccadiene starting from IPP and DMAPP, through (1) a prenyltransferase domain, synthetizing GGPP from the C5 isoprene units, and (2) a

TABLE 2 | List of strains constructed in this work.


terpene cyclase domain, catalyzing the cyclization of GGPP into fusicoccadiene.

We reasoned that the expression of the prenyltransferase domain from PaFS in yeast, catalyzing all steps from the C5 to the C20 precursor unit, could enhance the efficiency of this conversion, as it requires only one enzyme for the entire biosynthesis of GGPP. A construct encoding for the codon optimized prenyltransferase domain (residues 390–719 of PaFS) (Chen et al., 2016) was synthesized and cloned under control of the promoter of PGK1 into pCAS2 yeast expression vector. pCAS2 was introduced together with pCAS1 (RcCBS) into H-MEV, generating strain CAS2. PaGGPPS+RcCBS co-expression gave rise to a peak with the same retention time and mass-spectrum of the authentic casbene standard (**Figure S1A**), while, as previously discussed, only trace amounts of the compound accumulated when expressing solely RcCBS (**Figure S1B**). A titer of 32 mg/L of casbene was reached in strain CAS2 (PaGGPPS+RcCBS) after 72 h of batch culture. The relatively high titer of casbene strongly suggested that PaGGPPS could considerably increase the supply of the GGPP precursor necessary for casbene production by RcCBS. This suggestion was further supported by the appearance of a pronounced peak in the chromatogram corresponding to GGOH (**Figure S1B**), the prenyl alcohol accumulating in engineered yeast cells due to endogenous phosphatase activities acting on the excess amounts of GGPP (Faulkner et al., 1999; Tokuhiro et al., 2009).

#### Casbene Production in Engineered Strains With Engineered Mevalonate Pathway

To further improve casbene titers, we focused on downregulation of competing metabolic pathways. PaGGPPSp uses IPP and DMAPP to catalyze the consecutive condensation steps needed for GGPP synthesis and is in competition with the native farnesyl diphosphate synthase, Erg20p for the IPP/DMAPP

TABLE 3 | Oligonucleotides used for plasmid construction (restriction sites are underlined).


pool. Moreover, FPP, produced by Erg20p, enters the ergosterol biosynthetic pathway by action of the squalene synthase Erg9p. Thus, in order to redirect the flux to diterpene production, we attempted dynamic control of either ERG20, or of ERG20 in combination with ERG9, by replacing their native promoters with the ergosterol sensitive promoter PERG1. This approach has recently been shown to positively affect amorpha-4-11 diene production in yeast (Yuan and Ching, 2015). The promoter of ERG1 senses intracellular ergosterol levels and adjusts transcription of downstream genes to enable production of optimal levels of this essential metabolite. We constructed three strains containing (1) ERG20 under control of PERG1, (2) ERG9 under control of PERG1 and (3) both ERG20 and ERG9 under control of PERG<sup>1</sup> (**Table 2**, CAS3-5). All engineered strains also carried an integrated copy of tHMGR and were transformed with pCAS1 (RcCBS) and pCAS2 (PaGGPPS) to evaluate the effect of dynamic control of expression of ERG20 and/or ERG9 on casbene production. The resulting strains were designated as CAS6 (PERG1-ERG20), CAS7 (PERG1-ERG9) and CAS8 (PERG1-ERG20 and PERG1-ERG9) (**Table 2**). CAS2, containing the native promoters of ERG20 and ERG9, was used as a reference strain. Strains were analyzed for casbene production in time course experiments up to 120 h cultivation time.

Interestingly, strains with engineered ERG20 and ERG9 expression improved the casbene titers substantially but showed different profiles of compound accumulation (**Figure 3A**). The two strains CAS6 (PERG1-ERG20) and CAS8 (PERG1-ERG20, PERG1-ERG9) showed similar profiles of casbene accumulation over time. Strain CAS6 (PERG1-ERG20) steadily built-up casbene during 96 h of cultivation where the measured casbene titer reached 61 mg/L. This represented a significant ∼3-fold improvement in production over the reference strain CAS2, which accumulated only 21 mg/L of casbene. At 120 h the casbene titer in CAS6 dropped to ∼40 mg/L, indicating that potential degradation mechanisms or modifying enzymes start to act on the casbene molecule. Strain CAS8 (PERG1- ERG20, PERG1-ERG9) harboring dynamic control of expression of both ERG20 and ERG9 could reach the highest titer of 81.4 mg/L casbene after 96 h of cultivation, a ∼4-fold improvement over production in CAS2. Strain CAS7 (PERG1- ERG9) revealed instead the highest production after 48 h of cultivation (42 mg/L). Casbene titers steadily dropped to 18 mg/L thereafter. Finally, reference strain CAS2 (tHMGR) showed a gradual accumulation of casbene up 35 mg/L upon 72 h of culture, followed by a decline to 13.3 mg/L at 120 h.

Also, GGOH considerably accumulated in the engineered strains (**Figure 3B**). Strains CAS7 (PERG1-ERG9) and CAS2 (tHMGR) accumulated increasing amounts of GGOH during 72 h of cultivation, reaching up to 39.7 mg/L, and 30.3 mg/L, respectively. The titers of GGOH did not drop significantly throughout the entire cultivation period of 120 h. Strains CAS6 (PERG1-ERG20) and CAS8 (PERG1-ERG20, PERG1- ERG9) showed a progressive build-up of the compound reaching, upon 120 h of growth, 96.7 mg/L and 82.9 mg/L, respectively.

Strain CAS6 (PERG1-ERG20) in fact not only accumulated high amounts of both casbene and GGOH, but also showed optimal robustness, as revealed by its growth profile, which was comparable to that of reference strain CAS2 (**Figure 3C**). Strains with dynamic control of ERG9 expression instead displayed growth retardation in comparison to reference strain CAS2. At the end of cultivation, strains CAS7 (PERG1-ERG9) and CAS8 (PERG1-ERG20, PERG1-ERG9) reached cell densities which were lower compared to strain CAS2 by 15 and 27%, respectively.

These results demonstrated the potential of dynamically controlling FPP synthesis for improved casbene production via modification of the native ERG20 promoter.

#### Construction of Stable *S. cerevisiae* Strains for Casbene Production

To provide a stable chassis for casbene production, the engineered GGPP synthase PaGGPPS and the truncated casbene synthase RcCBS were stably integrated into the yeast chromosomes. In the previous plasmid-based assays, dynamic control of solely ERG9 expression showed the poorest performance in terms of production and was for this reason not included in the next phase of experiments. As dynamic control of ERG20 expression was shown to be crucial for improved diterpene production, we further investigated the effect of dynamic regulation of ERG20 by replacing its native promoter with the glucose-sensing promoter of HXT1 (PHXT1). It has recently been shown that using the HXT1 promoter to control ERG9 expression could redirect the carbon flux from sterol synthesis toward sesquiterpene production (Scalcinati et al., 2012). The effect of expression of ERG20 under control of PHXT<sup>1</sup> was tested either alone, or in combination with control of expression of ERG9 by PERG<sup>1</sup> (**Table 2**, strains CAS9 and CAS10).

The expression cassettes for RcCBS and PaGGPPS were integrated into five strains: CAS11 (tHMGR), CAS12 (tHMGR, PERG1-ERG20), CAS13 (tHMGR, PHXT1-ERG20), CAS14 (tHMGR, PERG1-ERG20, PERG1-ERG9) and CAS15 (tHMGR, PHXT1-ERG20, PERG1-ERG9). All strains were analyzed for casbene and GGOH accumulation upon growth in batch culture for 96 h, as this production time yielded the highest titers of casbene in plasmid-based assays.

All engineered strains accumulated higher amounts of casbene and GGOH compared to reference strain CAS11 (24.6 mg/L of casbene and 27.6 mg/L of GGOH) (**Figure 4**). Strain CAS13 accumulated 85.5 mg/L of casbene and 50 mg/L of GGOH, whereas strain CAS12 accumulated 66.5 mg/L of casbene and 63.4 mg/L of GGOH, respectively. In strain CAS14 (tHMGR, PERG1-ERG20, PERG1-ERG9) accumulation of casbene did not reach the amounts observed previously (only 49.1 mg/L of casbene and 46.5 mg/L of GGOH accumulated). In strain CAS15 expression of ERG20 controlled by PHXT<sup>1</sup> combined with control of ERG9 expression by promoter PERG<sup>1</sup> led to the highest titers of casbene (108.5 mg/L) and GGOH (79.9 mg/L) production. Interestingly, all strains, except CAS14, grew better than the reference strain CAS11 (tHMGR only). CAS14 (tHMGR, PERG1- ERG20, PERG1-ERG9) reached, in accordance with the plasmidbased expression results, lower final cell densities.

#### DISCUSSION

In this study we focused on the redirection of carbon flux toward casbene production by (1) introduction of a new biosynthetic branch for production of GGPP starting from IPP and DMAPP and (2) dynamic control of ERG20 and ERG9 expression by means of glucose- and ergosterol-sensitive promoters.

To increase the pool of GGPP in yeast, we expressed the GGPP synthase domain of the fusioccadiene synthase from P. amygdali. Similarly to other GGPP synthases from higher plants and bacteria (Vandermoten et al., 2009), this enzymatic domain synthesizes GGPP starting from one molecule of DMAPP and three molecules of IPP (Chen et al., 2016). In the yeast native pathway, GGPP is formed from the condensation of IPP and FPP, by action of the endogenous GGPP synthase Bts1p. FPP, in turn, is formed by the successive condensation of DMAPP with IPP units, catalyzed by FPP synthase, Erg20p. This requirement for two enzymatic steps for the production of GGPP, represents a limiting factor, partially due to the low native activity of Bts1p (Jiang et al., 1995). Low-titer diterpene production starting from the endogenous GGPP pool has been previously reported (Zhou et al., 2012; Ignea et al., 2015) and was confirmed in this study by the very low levels of casbene produced in yeast expressing RcCBS (**Figure S1B**). Improved efficiency of the conversion of FPP to GGPP was shown to significantly benefit diterpene production (Zhou et al., 2012; Ignea et al., 2015). Expression of PaGGPPS in yeast introduced the direct conversion of IPP and DMAPP to GGPP and improved the efficiency of GGPP formation. Indeed, expression of PaGGPS+RcCBS lead to 32 mg/L casbene accumulation, a ∼40 fold improvement compared to cells expressing solely RcCBS and harnessing the endogenous source of GGPP. The substantial accumulation of GGOH in those cells clearly indicated the improved efficiency of GGPP biosynthesis and suggested that the flux toward diterpene synthesis could be further optimized.

To further increase the supply of IPP and DMAPP, we restricted the flux toward FPP and sterols, by limiting the activity of Erg20p and Erg9p. These two enzymes catalyze FPP synthesis and the first step in sterol biosynthesis, respectively. Sterols are essential for yeast growth, therefore it is crucial to ensure the delivery of intermediates and end-products at levels that can guarantee proper cell function. Ergosterol, the major product of sterol biosynthesis, fulfills in fact several vital cellular functions that require balanced sterol concentrations (Parks et al., 1995). Downregulation of Ergosterol synthesis by means of weak or inducible and repressible promoters has been shown to be beneficial for isoprenoid production, while negatively affecting yeast viability (Paradise et al., 2008; Fischer et al., 2011; Scalcinati et al., 2012; Ignea et al., 2014). Therefore, we engineered a dynamic control strategy to balance metabolism between diterpene formation and cell growth. PERG<sup>1</sup> represents

FIGURE 4 | Production of casbene and GGOH in engineered strains containing integrated PaGGPPS and RcCBS. Casbene (green bars) and GGOH (blue bars) produced by strains CAS11 (tHMGR), CAS12 (tHMGR, PERG<sup>1</sup> -ERG20), CAS13 (tHMGR, PHXT1-ERG20), CAS14 (tHMGR, PERG<sup>1</sup> -ERG20, PERG<sup>1</sup> -ERG9), and CAS15 (tHMGR, PHXT1-ERG20, PERG<sup>1</sup> -ERG9). The corresponding OD600 values are represented by filled circles. Engineered strains were incubated in selective SC medium for 96 h (all data: mean ± SD, n = 3).

an ergosterol-sensitive promoter, previously shown to efficiently restrict ERG9 expression levels in order to boost amorphadiene production (FPP-derived) (Yuan and Ching, 2015). We applied a similar strategy to improve diterpene synthesis, extending dynamic control to ERG20, given the need to boost the IPP and DMAPP supply. Indeed, in our hands, the dynamic control of ERG9 alone did not have the remarkable improvement in sesquiterpene production that was reported by Yuan and Ching (2015). The strains harboring such a modification (tHMGR, PERG1-ERG9) produced amounts of casbene comparable to the reference strain (tHMGR), at least in plasmid-based assays. Moreover, they accumulated the lowest titer of GGOH and reached lower cell densities. Enhancement of the mevalonate pathway (by tHMGR overexpression), together with ERG9 down-regulation in the absence of a sesquiterpene synthase or another draining route, possibly led to accumulation of toxic concentrations of pathway intermediates, such as FPP. The toxicity of isoprenoid precursors (including FPP) has previously been reported in E. coli (Martin et al., 2003; Sarria et al., 2014).

Dynamic control of ERG20 by PERG<sup>1</sup> showed instead to be a significant improvement, leading to a 3-fold increase in production, without negatively affecting growth. Titers reached even 4-fold higher values than the control strain when both ERG9 and ERG20 expression were controlled by PERG1. Downregulation of ERG20 alone, similar to the ERG9 downregulation reported by Yuan and Ching, most likely created a metabolic balance in which gene expression was adjusted to the cellular requirement for ergosterol biosynthesis. Once the cell sensed an excess of ergosterol, ERG20 expression was decreased, resulting in redirection of the metabolic flux from IPP/DMAPP to GGPP and non-native diterpenoids. The concerted downregulation of both ERG20 and ERG9, although leading to the highest casbene accumulation, might have resulted in an excessively restricted flux toward ergosterol which would explain the growth impairment observed upon dynamic control of both ERG9 and ERG20, leading to a 27% reduced cell density.

Integration of the expression constructs for PaGGPPS and RcCBS in neutral loci of the yeast genome, together with dynamic control of ERG20 by means of the glucose-sensitive promoter of HXT1, further enhanced production of casbene. PHXT<sup>1</sup> allows for moderate expression levels when glucose is present in the medium, but leads to gene repression when glucose concentration is low or absent. This promoter was previously used for dynamic control of ERG20 expression in strains engineered for improved production of geraniol (Zhao et al., 2017). When used for dynamic control of ERG20, PHXT<sup>1</sup> had a beneficial effect on both production and growth compared to PERG1. Repression of expression by PHXT<sup>1</sup> might be stronger than the one imposed by PERG1, leading to a higher supply of DMAPP and IPP for the heterologous GGPP synthase, thus explaining the higher titers. Moreover, since repression of expression mediated by PHXT<sup>1</sup> occurs when glucose concentration is low, its control possibly maximized the carbon flux to ergosterol and biomass, thus explaining the higher cell densities. Combining dynamic control of ERG20 by PHXT<sup>1</sup> with dynamic control of ERG9 by PERG<sup>1</sup> led to the highest casbene titer of 108.5 mg/L and worked best to guarantee a proper flux distribution between cell growth and diterpene synthesis.

Proper adjustment and usage of carbon source sensitive promoters, to improve casbene or in general diterpenoid production, opens up new perspectives for an extensive optimization of the culturing conditions during fermentation that were beyond the scope of this study. For example, substantial improvements in isoprenoid biosynthesis have been reported when ethanol is provided as the carbon source (Westfall et al., 2012; Zhao et al., 2017). In the future, it would therefore be interesting to analyze the effect of combining feeding with ethanol or with mixture of different carbon sources with ERG20 expression under control of PHXT<sup>1</sup> for casbene production.

Overall these results showed remarkable improvements in diterpene production via an engineered heterologous route toward synthesis of GGPP, combined with dynamic regulation of competing pathways. However, a substantial accumulation of GGOH could still be observed in all our engineered strains, indicating that not all of the GGPP produced is channeled toward casbene biosynthesis. This may be due to the low efficiency of the casbene synthase which, similarly to other enzymes of the secondary metabolism of plants, has not evolved to synthesize high titers of products. During preparation of this manuscript, a casbene production of 160 mg/L was reported in a yeast strain expressing multiple copies of casbene synthase constructs, with improved solubility achieved by protein tagging strategies (Wong et al., 2017). Titers in our strains could most likely similarly be improved by increasing the copy number of the casbene synthase gene. Moreover, fusion between the GGPP synthase PaGGPPS and the casbene synthase RcCBS could possibly offer an enzyme with improved solubility and a more efficient substrate channeling that mimics the natural chimera fusicoccadiene synthase. Expression of such a chimera may further increase production levels.

In conclusion, although further optimization is needed, the design developed here provides a valuable strategy for sustainable production of casbene derived chemicals. Moreover, the approach can easily be adapted to synthesis of other GGPPderived compounds.

# AUTHOR CONTRIBUTIONS

RC designed experiments, performed strain constructions, extractions, analysis of samples, and drafted the manuscript. DR established the analysis methods. YM helped in construction of the integration strains. RC and HH coordinated the study. HH reviewed and edited the manuscript. All authors read and approved the final manuscript.

# FUNDING

This work was supported by the Danish Innovation Foundation funded project Plant Power: light-driven synthesis of complex terpenoids using cytochrome P450s (12-131834; project lead, Dr. Poul Erik Jensen, University of Copenhagen).

#### ACKNOWLEDGMENTS

We thank Christophe Folly for support in initial GC-MS analysis, Michael Eichenberger for providing plasmid pCAS4, Dr Samantha Capewell for proofreading the manuscript and Professor Birger Møller for continuous support and for supplying the casbene standard.

#### REFERENCES


The content of this research was first reported and is adapted from the PhD thesis of RC (Callari, 2018).

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fbioe. 2018.00160/full#supplementary-material


**Conflict of Interest Statement:** All authors were or are employees of the company Evolva SA. Evolva SA is listed on the Swiss stock exchange.

Copyright © 2018 Callari, Meier, Ravasio and Heider. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Models for Cell-Free Synthetic Biology: Make Prototyping Easier, Better, and Faster

#### Mathilde Koch<sup>1</sup> , Jean-Loup Faulon1,2,3 and Olivier Borkowski 1,2 \*

<sup>1</sup> Micalis Institute, INRA, AgroParisTech, University of Paris-Saclay, Jouy-en-Josas, France, <sup>2</sup> Systems and Synthetic Biology Lab, CEA, CNRS, UMR 8030, Genomics Metabolics, University Paris-Saclay, Évry, France, <sup>3</sup> SYNBIOCHEM Center, School of Chemistry, Manchester Institute of Biotechnology, University of Manchester, Manchester, United Kingdom

Cell-free TX-TL is an increasingly mature and useful platform for prototyping, testing, and engineering biological parts and systems. However, to fully accomplish the promises of synthetic biology, mathematical models are required to facilitate the design and predict the behavior of biological components in cell-free extracts. We review here the latest models accounting for transcription, translation, competition, and depletion of resources as well as genome scale models for lysate-based cell-free TX-TL systems, including their current limitations. These models will have to find ways to account for batch-to-batch variability before being quantitatively predictive in cell-free lysate-based platforms.

#### Edited by:

Francesca Ceroni, Imperial College London, United Kingdom

#### Reviewed by:

Yuan Lu, Tsinghua University, China James T. MacDonald, Imperial College London, United Kingdom

\*Correspondence:

Olivier Borkowski olivier.borkowski@gmail.com

#### Specialty section:

This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology

Received: 03 September 2018 Accepted: 12 November 2018 Published: 29 November 2018

#### Citation:

Koch M, Faulon JL and Borkowski O (2018) Models for Cell-Free Synthetic Biology: Make Prototyping Easier, Better, and Faster. Front. Bioeng. Biotechnol. 6:182. doi: 10.3389/fbioe.2018.00182 Keywords: mathematical model, cell-free, prototyping, resource competition, transcription, translation, metabolism

## INTRODUCTION

All the processes required to produce proteins in bacteria can be performed by adding DNA to a cell-free platform. After lysis of living cells, transcription, translation, degradation, and protein folding continue to operate as they do in vivo (Hodgman and Jewett, 2012; Sun et al., 2013; Takahashi et al., 2015a). Metabolic pathways like glycolysis or pentose phosphate pathway remain active and are used to regenerate ATP and maximize protein production over time (Kim and Swartz, 2001; Calhoun and Swartz, 2005). Protein production outside of the cell simplifies gene expression with well-defined parameters, easy to control inputs, faster time scale, and less numerous unknown interactions. As a result, many laboratories use cell-free as a prototyping platform to characterize expression of single proteins or complex metabolic pathways (Takahashi et al., 2015b; Wu et al., 2017; Borkowski et al., 2018). Mathematical models dedicated to cell-free emerged to predict protein production and understand the limits of this new platform. Cell-free properties are close to living organisms as the same processes take place in both systems, yet significant differences exist. For example, molecular crowding (Spruijt et al., 2014) and resources distribution (Sun et al., 2013) are significantly altered in cell-free and there is no resource competition with the host. Such differences oblige synthetic biologists to adapt the models already developed for living cells. This short review focuses on the recent deterministic models developed to understand lysate-based cellfree platforms and used to predict the behavior of simple or complex pathways (**Table 1**). Those models pave the way for efficient metabolic engineering in the emerging field of cell-free synthetic biology.


\*Simple stands for one protein produced and limited amount of parameters (<10) Complex stands for more than one protein produced or/and large amount of parameters (more than 10).

### TRANSLATION AND TRANSCRIPTION PROCESSES IN CELL-FREE

Lysate-based cell-free consists of a crude cell extract supplemented with buffer, amino acids, NTP, NAD, PEG, tRNA, and metabolic intermediates (Sun et al., 2013). A major advantage of cell-free is the absence of host regulations (Hodgman and Jewett, 2012), allowing circuits to function in isolation and an easy quantitative description of gene expression. A constitutively expressed gene in cell-free exhibits specific patterns at the translation and transcription levels. Protein production can be divided in 4 phases: in phase 1, the production rate increases over time, in phase 2, the production rate is constant during a 30 min/1 h, in phase 3 the production rate decreases slowly and eventually in phase 4 the production rate is null (Siegal-Gaskins et al., 2014; **Figure 1A**). A similar 4 phases pattern is observed with the mRNA concentration (Siegal-Gaskins et al., 2014; **Figure 1B**). ODE Models describing transcription, translation, mRNA, and protein degradation processes at various scales have been successfully used to predict mRNA and protein dynamics in lysate-based systems (Stögbauer et al., 2012; Tuza et al., 2013; Siegal-Gaskins et al., 2014; Nieß et al., 2017; Borkowski et al., 2018; Matsuura et al., 2018; Moore et al., 2018). DNA concentrations are usually considered constant: degradation is neglected as plasmid DNA or protected linear template are used, and replication is considered not to happen since no dNTPs are added to the reaction mix.

Numerous models, with varying degrees of complexity, try to reproduce those production phases observed in cell-free reactions.

A simple model based on only 4 reactions and 10 parameters is sufficient to fit the full mRNA and protein dynamics during the first hours of reaction (Karzbrun et al., 2011). The transcription process is reduced to one step in which the RNA polymerase binds to the DNA; the rate of mRNA production depends only this binding rate and the DNA length. Similarly, the translation process is described as one binding step of the ribosome on the mRNA with the rate of protein production depending only on the binding rate and the mRNA length. This model is appropriate for the first hours before the consumption of resources and/or the waste accumulation (e.g., ATP degradation, toxic metabolites. . . ) cause the reaction to stop (Siegal-Gaskins et al., 2014). A simple way to simulate the slow decay in synthesis is the consumption of the NTP over time. The transcription reaction slows down and eventually stops (Stögbauer et al., 2012; Tuza et al., 2013). The decrease in the NTP concentration (Kim and Swartz, 2001; Jewett et al., 2008; Moore et al., 2018) is an efficient method to obtain a decreasing transcription over time and simulate protein and mRNA production in cell-free but no experimental data either confirms or denies this approach. The accumulation of inactive

RNA polymerase/ribosome (Failmezger et al., 2016; Moore et al., 2018), accumulation of toxic metabolites (Kim and Swartz, 2000), or increase of the relative RNases concentration compare to the total amount of mRNA (Siegal-Gaskins et al., 2014) are possible other explanations of the arrest of protein production after 8 h. Voyvodic et al. (2018) added terms corresponding to decreasing resources for protein production and accumulation of toxic byproducts as a reduction in production rates parametrized by a Michaelis-Menten like ratio, as an elegant way to account for the slowing production rate. As all cell-free models trying to account for the end of production after 8 h, the main issue is identifying the exact cause for decreasing production.

Models using Michaelis–Menten kinetics also succeed to capture protein production pattern in cell-free (Stögbauer et al., 2012). Those models precisely captured the observable mRNA and proteins dynamics in cell-free while remaining relatively coarse-grained. The model of Tuza et al. add extra steps in the transcription (and translation) process with a reversible binding of the RNA polymerase (ribosome) on the DNA (mRNA) followed by a reversible binding of the first NTP (amino acid) and eventually an irreversible elongation step. Moore et al. also developed a model accounting for reversible binding, unbinding and elongation steps, sharing the NTP energy source. Those more detailed descriptions of the transcription and translation processes lead to accurate predictions of the data obtained in cell-free and capture additional properties (Stögbauer et al., 2012; Tuza et al., 2013; Borkowski et al., 2018; Moore et al., 2018). For example, the non-additive cost of protein production when several genes are expressed requires higher level of complexity to be predicted (Borkowski et al., 2018).

While all models presented in this section described transcription and translation processes, the main challenge they faced is proper parameter identification, as biochemical parameters can vary widely from batch to batch and from in vivo to cell-free systems. Currently, models often used components concentration measured in vivo and estimated their concentration based on the dilution factor of the E. coli cytoplasm after the lysate extraction protocol, which is not entirely satisfactory.

## RESOURCE COMPETITION IN CELL-FREE

Resource competition is an important phenomenon that impacts circuit behavior in cell-free systems and should be accounted for in modeling approaches.

As a fixed amount of resources is present in the cellfree extract, competition has been measured between synthetic circuits (Siegal-Gaskins et al., 2014; Borkowski et al., 2018; Moore et al., 2018). Some of the previously described models take into account the limitation of each resource and include a fixed amount of transcription and translation machineries to predict the impact of resource competition in cell-free (**Figure 2A**; Siegal-Gaskins et al., 2013; Borkowski et al., 2018; Moore et al., 2018).

A maximal protein production is measured after a few hours before resources depletion and degradation (**Figure 1**). This upper limit on production rate is the result of one or several limited resources (RNA polymerase, NTP, ribosome, elongation factors, amino acids, chaperone, tRNA synthetase, or tRNA). DNA, NTP, amino acids, and T7 RNA polymerase are directly added to the mix so their impacts on the protein production can be easily measured. Increasing DNA concentration leads to an increase of protein production until a saturation point is reached (Siegal-Gaskins et al., 2014; Borkowski et al., 2018; Voyvodic et al., 2018), and toxicity can be observed with high DNA concentration (Borkowski et al., 2018). T7 polymerase (Siegal-Gaskins et al., 2014), amino acids, tRNA and nucleotides (Shin and Noireaux, 2010) are present in excess in the cell-free mix causing no noticeable competition for these resources. Eventually, High NTP concentration negatively affects the translation process (Nagaraj et al., 2017). Natural transcription and translation machineries are less controlled as they are added via the crude extract. Indirect measurements using competition for resources between two plasmids are

used to deduce competition for transcription and translation machineries (Underwood et al., 2005; Siegal-Gaskins et al., 2014; Gyorgy and Murray, 2016; Borkowski et al., 2018; Moore et al., 2018). The main source of competition can be at the transcription and/or translation level depending of the extract and the level of protein produced (Underwood et al., 2005; Li et al., 2014; Siegal-Gaskins et al., 2014; Gyorgy and Murray, 2016; Borkowski et al., 2018; Moore et al., 2018; Voyvodic et al., 2018). Parametrization of the models using the appropriate RNA polymerase and ribosomes concentrations and binding/unbinding rates allows an accurate description of the resource competition and to fit properly the production of several proteins expressed concurrently in cell-free (Siegal-Gaskins et al., 2014; Borkowski et al., 2018; Moore et al., 2018). Accounting for RNAP and ribosomes sharing between parts can also be leveraged to minimize the number of experiments required to fit parameters and obtain a predictive model (Halter et al., 2018). While not the main focus of this review, parameter estimation or identification is a major hurdle of

Entner-Doudoroff pathway.

detailed models, and techniques from systems biology (e.g., Lillacci and Khammash, 2010) can be used to tackle this issue.

The models presented in this sections, while being able to account for transcription, translation and resource competition from a lack of generalizability due both to variability in experimental conditions as batches can differ greatly, and to scarcity of biochemical work measuring those parameters in cell-free setting as has been estimated from in vivo measurements.

#### METABOLISM IN CELL-FREE

The models presented in this section are constraint based, so as to take the whole metabolism into account and not the circuit in isolation as done in the previous sections.

In cell-free platforms, translation, and transcription are not the only active processes. Glycolysis, pentose phosphate pathway, TCA cycle, Entner-Doudoroff pathway, and amino acid biosynthesis are still producing ATP, reducing equivalents, and amino acids (Kim and Swartz, 2001; Jewett and Swartz, 2004a,b). The previously described models account for the resources competition for a fixed amount of transcription and translation resources and usually do not include any metabolite production or consumption. Such an approach is quite limiting for metabolic pathway prototyping as those circuits also compete for metabolites (Wu et al., 2017; Borkowski et al., 2018). Constraint-based models have been used to simulate metabolites production and consumption when various proteins are produced at different levels (Vilkhovoy et al., 2018; **Figure 2B**). This model coupled transcription and translation processes with the availability of metabolic resources. Flux balance analysis was adapted to cell-free conditions with the objective function being the maximization of the protein translation rate. Growth associated reactions were removed and cell-free specific deletions were added from E. coli metabolic model, leading to 264 reactions and 146 species (Vilkhovoy et al., 2018). The stoichiometric network was adjusted to cell-free and fluxes were constrained by experimental measurements of glucose, nucleotides, amino and organic acid consumption and production rates. The transcription and translation were bound by Michaelis-Menten formula with a maximum transcription and translation rate depending on RNA polymerase concentration, RNA polymerase elongation rate, gene length, promoter strength, and the ribosome concentration, a polysome amplification constant, the ribosome elongation rate, the protein length, and the RBS strength, respectively. The energy efficiency was calculated using the ATP cost by transcription and translation processes. Transcription and translation rates are subject to resource constraints encoded by the metabolic network (**Figure 2B**). This model efficiently predicts proteins production and simulates optimal flux distribution in cell-free metabolic network. It makes predictions possible for metabolic engineering in cell-free as metabolites produced or consumed by a pathway will be accounted for via its energy efficiency.

Constraint based modeling for cell-free systems is an interesting field that would need further developments from the research community, both to include cell-free specific constraints and reactions, as well as to account for dynamic behavior such as metabolite exhaustion in cell-free systems.

#### REFERENCES


### CONCLUSION

Cell-free appeared as the ideal platform for circuits prototyping. It accelerates characterization and avoids the impact of the host on the circuit behavior. Models can be easily parametrized and predictions are easier and more accurate than in vivo, for qualitative behavior. Parametrization for quantitative behavior can be tackled using techniques from systems biology. Simple models succeed to accurately predict the simultaneous production levels of multiple proteins and the competition for the limited amount of resources in cellfree. A certain level of complexity is necessary to capture competition for metabolites but produces a powerful tool for metabolic engineering. The main limit for lysate-based cellfree in metabolic engineering and modeling remains extract preparation: extract efficiency can differ strongly depending on the experimentalist leading to variability of protein production and necessity of robust controls for each new batch, as well as uncertain parameters that vary with each batch for the modeler. Preliminary control of the extract quality and tuning of model parameters on each batch is required to obtain accurate predictions and precludes generalization. A way forward to both increase reproducibility and predictive modeling in cell-free systems would be a higher degree of automation in the extract production providing robust lysate preparation at affordable price.

# AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

# FUNDING

J-LF acknowledges support from BBSRC/EPSRC (grant number BB/M017702/1) and from the ANR (grant number ANR-15- CE1-0008). MK is supported by DGA (French Ministry of Defense) and Ecole Polytechnique. OB is supported by Genopole Allocation Recherche 2017 and CRI Paris Short-term Fellows. We thank Dr. Melchior du Lac for his technical help.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Koch, Faulon and Borkowski. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# How Synthetic Biology and Metabolic Engineering Can Boost the Generation of Artificial Blood Using Microbial Production Hosts

#### August T. Frost <sup>1</sup> , Irene H. Jacobsen<sup>1</sup> , Andreas Worberg<sup>2</sup> and José L. Martínez <sup>1</sup> \*

<sup>1</sup> Section of Synthetic Biology, Department of Biotechnology and Biomedicine, Technical University of Denmark (DTU), Lyngby, Denmark, <sup>2</sup> Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark (DTU), Lyngby, Denmark

#### Edited by:

Rodrigo Ledesma-Amaro, Imperial College London, United Kingdom

#### Reviewed by:

Rosario Gil, University of Valencia, Spain Pasquale Stano, University of Salento, Italy

\*Correspondence:

José L. Martínez jlmr@dtu.dk

#### Specialty section:

This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology

Received: 28 September 2018 Accepted: 15 November 2018 Published: 30 November 2018

#### Citation:

Frost AT, Jacobsen IH, Worberg A and Martínez JL (2018) How Synthetic Biology and Metabolic Engineering Can Boost the Generation of Artificial Blood Using Microbial Production Hosts. Front. Bioeng. Biotechnol. 6:186. doi: 10.3389/fbioe.2018.00186 Hemoglobin is an essential protein to the human body as it transports oxygen to organs and tissues through the bloodstream (Looker et al., 1992). In recent years, there has been an increasing concern regarding the global supply of this vital protein, as blood availability cannot currently meet the high demands in many developing countries. There are, in addition, several risks associated with conventional blood transfusions such as the presence of blood-borne viruses like HIV and Hepatitis. These risks along with some limitations are presented in Figure 1 (Kim and Greenburg, 2013; Martínez et al., 2015). As an alternative, producing hemoglobin recombinantly will eliminate the obstacles, since hemoglobin-based oxygen carriers are pathogen-free, have a longer shelf-life, are universally compatible and the supply can be adjusted to meet the demands (Chakane, 2017). A stable, safe, and most importantly affordable production, will lead to high availability of blood to the world population, and hence reduce global inequality, which is a focus point of the World Health Organization for the millennium (WHO, 2018). Synthetic biology and metabolic engineering have created a unique opportunity to construct promising candidates for hemoglobin production (Liu et al., 2014; Martínez et al., 2016). This review sets out to describe the recent advances in recombinant hemoglobin production, the societal and the economic impact along with the challenges that researchers will face in the coming years, such as low productivity, degradation, and difficulties in scale-up. The challenges are diverse and complex but with the powerful tools provided by synthetic biology and metabolic engineering, they are no longer insurmountable. An efficient production of cell-free recombinant hemoglobin poses tremendous challenges while having even greater potential, therefore some possible future directions are suggested in this review.

Keywords: synthetic biology, metabolic engineering, yeast, protein production, blood substitutes, HBOC, recombinant hemoglobin

#### HISTORICAL PERSPECTIVE—SOCIETAL AND MEDICAL NEEDS

Since the first recorded blood transfusion in the ancient Inca civilization (fifteenth century), there have been countless attempts to create alternative blood substitutes to further improve the chance of surviving anemia (Sarkar, 2008). In the dawn of this development, elementary liquids such as sheep blood, urine, and beer were tested for the possibility to be a substitute for human blood. In the late nineteenth century, even cow milk was injected into patients with the belief that it would help regenerate white blood cells (Chen et al., 2009).

The earliest report of a hemoglobin-based blood substitute originates from the beginning of the 1930s, however it was not until 1957 that the first study on an artificial red blood cell composed by microencapsulated hemoglobin was performed (Li et al., 2005). After a stagnant period, the advancement was significantly stimulated by the onset of AIDS, hepatitis, and HIV infections in the 1980s, which imposed a radical risk to conventional blood transfusions (Kim and Greenburg, 2013; Martínez et al., 2015). Consequently, the following period was a golden age for the development of artificial substitutes that would be able to mimic the oxygen-carrying property of erythrocytes (Varnado et al., 2013; Martínez et al., 2015).

Through history, the pursuit of artificial blood substitutes has been heavily funded by the military to eliminate the supply, storage, and portability problems associated with human blood. However, the societal and medical need to find an alternative to conventional blood transfusion has escalated in recent years, especially in developing countries (Kim and Greenburg, 2013; Moradi et al., 2016). This is due to factors such as population growth, natural disasters, decreasing donor number, population aging, terror attacks, and the risk of blood-borne pathogens threatening the supply-demand balance of human blood (Looker et al., 1992; Varnado et al., 2013; Alayash, 2014; Moradi et al., 2016; Chakane, 2017).

The development of blood substitutes furthermore has the potential to get rid of logistical barriers for pre-hospital use in acute emergency situations and remote civilian locations, by enabling long-term storage, eliminating blood type matching, supplying adequate quantities, be "pathogen-free," and providing immediate availability in catastrophic scenarios (Chakane, 2017).

## RECENT ADVANCES

Recent breakthroughs in the fields of synthetic biology and metabolic engineering have substantially boosted the development of the so-called oxygen carriers, which can be divided into two main categories: perfluorocarbon-based substitutes (PFC) and hemoglobin-based oxygen carriers (HBOCs) (Mozafari et al., 2015). While the research in the PFC as blood substitutes has been partially discontinued, due to their limited oxygen transfer capacity to tissues and toxicity problems, the capacity of HBOCs to mimic red blood cells in terms of oxygen transport while being less toxic, has put the focus on them. Hemoglobin supply is, therefore, the first key issue for the successful development of the latter. The primary sources of hemoglobin for the current development of HBOCs are expired human blood, mammalian hemoglobins formed as a by-product in the meat industry (e.g., bovine hemoglobin), and recombinant hemoglobin produced in transgenic organisms (Motwani et al., 1996; Sanders et al., 1996; Varnado et al., 2013; Alayash, 2014; Moradi et al., 2016). The two first options are not the optimal choices, as their availability is too limited to allow a substantial development in HBOCs research. It is paramount for the case of the recombinant hemoglobin production, that new production hosts are designed with an increased and more efficient production capacity. Here, synthetic biology and metabolic engineering are key role players, both in terms of designing better cell factories and achieving affordable production costs.

Escherichia coli was the first choice as production workhorse. On a first attempt, a single β-globin chain with a cleavable linker was expressed, and subsequently refolded in vitro with a native α-globin chain and exogenous heme (Nagai and Thøgersen, 1987). In a follow-up experimental setup, α-globin and β-globin chains were co-expressed in vivo with endogenous heme incorporated (Shen et al., 1993). Although recombinant hemoglobin was successfully produced in bacteria, later advances focused on eukaryal hosts, as a result of the discovery that the vital functions of the hemoglobin produced by bacterial hosts were altered, most likely due to the methionine termini at the end of the globin chains. The alterations were identified to be caused by reduced Bohr effect and 2,3-BPG effects of the produced recombinant hemoglobin compared to normal human hemoglobin (Hoffman et al., 1990).

Thus, the production shifted to the model yeast Saccharomyces cerevisiae. Since then, synthetic biology and metabolic engineering have been applied to modify the heme biosynthetic pathway by overexpressing the genes coding for enzymes responsible for the rate-limiting steps (e.g., HEM3). Furthermore, the optimal globin expression balance between α and β chains was studied. In combination, the recombinant hemoglobin production was improved by an impressive 87% (Liu et al., 2014) compared to the previous attempts.

Important key issues to take into consideration, in order to get further in the advancement of HBOCs, apart from the sustained hemoglobin supply, are the safety concerns and costeffectiveness. Non-clinical and clinical studies of HBOCs have raised safety concerns regarding hemoglobin extravasation across the blood vessel wall, scavenging of endothelial nitric oxide, oversupply of oxygen, and oxidative side reactions (Alayash, 2014). As a result, the regulatory agencies in the United States and the European Union have not yet approved any HBOCs (Chen et al., 2009; Varnado et al., 2013; Mozafari et al., 2015; Meng et al., 2018).

# ECONOMICAL PERSPECTIVE AND SOCIETAL IMPACT

A study performed in the United States have estimated that it is currently possible to produce recombinant human hemoglobin at a cost of approximately \$11/g. However, if the operating expenses and equipment investments are included, the cost increases to a dazzling amount of ≥\$200/g. In comparison, human hemoglobin can be derived from alternative sources for roughly \$2/g to \$4/g (excl. post-related costs) based on fixed reimbursement prices of whole blood packs (Varnado et al., 2013).

There is, consequently, a demanding need to significantly improve the cost-effectiveness by either reducing the production cost or increasing the expression yield of recombinant hemoglobin by more than 3-fold to be cost-effective against alternatives. Synthetic biology can contribute with further optimizations to provide affordable recombinant hemoglobin at a comparable price to alternatives in the future. WHO has created a set of goals to improve global health for the world population by 2020—these are called "HEALTH 2020" (WHO, 2018). Recombinant hemoglobin is extremely relevant to one of the goals, which is to reduce inequality in the access to the medicine of the twenty first century and save lives.

# THE CHALLENGES OF RECOMBINANT HEMOGLOBIN PRODUCTION

The evolution of synthetic biology has given researchers the possibility to synthesize new genetic elements instead of transferring them from a donor organism (Stephanopoulos, 2012). Nowadays entire genomes can be synthesized and inserted into a host organism (Kuo et al., 2018). Biotechnological research has also benefited from synthetic biology, as it synergizes with metabolic engineering. Synthetic biology provides genetic switches, vectors, characterized enzymes, and minimal hosts, all of which minimizes the cost and the time needed for metabolic engineering (Keasling, 2012).

To successfully engineer novel cell factories a number of frameworks can be applied, one of the most popular is the "Design-Build-Test-Learn" cycle—**Figure 2**. The cycle is an iterative process divided into four phases, which aims at satisfying a set of predetermined specifications. The first phase is the "Design"-phase where the organism is designed in silico. That design is then constructed in the "Build"-phase. The third phase is the "Test"-phase, where the organism or construct is tested in terms of productivity, -omics, and growth characteristics. The final part is the "Learn"-phase, where the results from the "Test"-phase are evaluated and compared to the predetermined specifications set in the "Design"-phase. Often multiple iterations of the cycle are needed to achieve the optimal organism and satisfy the specifications (Ando and Martin, 2018; Jensen and Keasling, 2018). Synthetic biology and metabolic engineering contribute mainly to the first two phases (Nielsen and Keasling, 2016).

# Higher Productivity

As previously described, the production of recombinant hemoglobin from a heterologous host is not yet economically feasible. Without accounting for operation cost and the expenses of establishing the production plant, the estimated costs should still be reduced by at least three-fold before being able to compete on the market. One of the challenges is low productivity. In 2016, Martìnez et al. developed a recombinant S. cerevisiaebased cell factory. After a few iterations of the DBTL cycle, the constructed stain was able to synthesize functional human hemoglobin with α-globin and β-globin genes expressed in the optimal ratios, being the maximum measured concentration of active human hemoglobin up to 7% of the total cell protein content (Martínez et al., 2016). In perspective, this means that this strain would be able to produce 200 kg of hemoglobin in a 100 m<sup>3</sup> tank, assuming the highest theoretical yield. Considering that one liter of human blood contains 150 grams of hemoglobin, the equivalent to 1.300 liters of blood would be produced in one batch. These are, however, ideal conditions in which upscaling limitations and recovery of the product are not considered. To reach these theoretical numbers at production-scale, a higher productivity needs to be achieved in laboratory-scale. Enhancing productivity in a heterologous organism is a task in the "Design" phase of the DBTL-cycle. Classically a lot of work would have to be conducted, and a lot of manpower required, in order to plan and test hypotheses which might lead to higher productivity. In recent years, both the number and the variety of accurate openend genome-scale models have increased (Nielsen and Keasling, 2016). Applying these models to the rational design can save time in the optimization of the productivity. It is now possible to find a robust, open-end genome-scale model of S. cerevisiae, that can be customized to have the genetic modifications implemented in previous research. Different strategies can be executed, and a simulation will reveal the success of the strategy in terms of enhanced productivity. Bottlenecks and feedback inhibitions can be identified and hence eliminated (Nielsen and Keasling, 2016). Ultimately, only successful in-silico strategies will be carried out experimentally.

# Intracellular Degradation and Negative Feedback Circuits

In the pursuit of higher productivities, most of the times undesired trade-offs are found as a consequence. Normally, degradation of precursors due to inherent feedback regulatory reactions is the main issue, and recombinant hemoglobin production is, unfortunately, not an exception to this matter. Heme is essential to the yeast cell for signaling and co-factor purposes, but it is also cytotoxic in higher concentrations as it can originate reactive oxygen species (ROS). To prevent accumulation and thereby toxicity, heme biosynthesis is tightly controlled and coordinated with heme degradation in wellunderstood processes (Martínez et al., 2016; Hanna et al., 2018). When heme is bound in hemoglobin, ROS are no longer formed, and heme will not be degraded (Liu et al., 2014).

A rapidly forming hemoglobin would therefore remove un-bound heme from the mitochondrion and hence stop degradation. Thus, the key is to have an abundance of α and β chains ready for assembly. In the "Test"-phase, proteomics combined with the advances in mass spectrometry, can assist the quantification of the chains and thereby identify possible limitations in the following "Learn"-phase (Trent, 2012). Other ways to reduce degradation involve gene knock-outs. For instance, genes involved in (i) iron-depletion initiated degradation of heme to recover the iron (Philpott and Protchenko, 2008) (ii) vacuolar miss-sorting and vacuolar degradation of mature hemoglobin (Ammerer et al., 1986; Marcusson et al., 1994; Hecht et al., 2014). Gene edition is tedious and very limiting most of the times, however the introduction and refining of the CRISPR-Cas9 system can drastically shorten the duration of the "Build"-phases where multiple knock-outs or insertions are needed. The method allows for gene disruption in up to five different loci in S. cerevisiae with a high success rate. The method is, moreover, marker-free (Jakociunas et al., 2015 ˇ ).

# Oxygen Limitation and Scale-Up

Hemoglobin synthesis in yeast is induced by high levels of extracellular oxygen (Martínez et al., 2016). In production-scale vessels adequate oxygen supply and transfer pose an issue due to the low solubility of oxygen in submerged cultures. To ensure a linear scale-up of oxygen availability other parameters need to be modified, e.g., agitation which can lead to shear stress (Reuss, 2008; Garcia-Ochoa and Gomez, 2009). Switching the point of view from process engineering to genetic and metabolic engineering could solve the problem. In 2015, Martìnez et al. deleted a heme activator protein (HAP1) in S. cerevisiae and found that heme production was increased due to the removal of a feedback inhibition. Another measured effect of the HAP1 mutant is the reduction of respiration (Martínez et al., 2015). In the ideal scenario a large-scale production could be established based on a non-respiring strain where the presence of oxygen would initiate heme synthesis. The feasibility of the strategy can be predicted based on model-simulations rather than real-life cultivations in production scale.

# Ethanol Production

The Crabtree-positive nature of S. cerevisiae (meaning its ability to produce ethanol from glucose in aerobic conditions) is one of the challenges in recombinant hemoglobin production for several reasons: (i) lower yields as carbon is diverted into ethanol production (Dai et al., 2018) and (ii) yields of hemoglobin are lower on ethanol than on glucose (Liu et al., 2014). The code to engineering a Crabtree-negative S. cerevisiae strain has finally been cracked by a Dai et al. who used state-of-theart synthetic biology and molecular biology tools. The group reverse engineered a strain which had evolved under Adapted Laboratory Evolution. The created strain is the first Crabnegative S. cerevisiae reported to be able to grow at high glucose concentrations (Dai et al., 2018).

This particular strain in combination with the strategies stated above could create the future S. cerevisiae-based hemoglobin producer. Some challenges could arise as the Crabtree negative S. cerevisiae needs mitochondrial respiration to reoxidize cofactors, whereas it is beneficial to knock-out HAP1 to avoid respiration in recombinant hemoglobin production (Martínez et al., 2015; Dai et al., 2018). Simulations with an open-end genome-scale model will disclose whether further design is needed on co-factor reoxidizing or compartmentalization.

## Recovery and Purification

Recovery and purification of hemoglobin from S. cerevisiae differs from the challenges described above, as it requires a new iteration of the DBTL-cycle. All the recombinant hosts designed to date, store the produced hemoglobin intracellularly. To recover the intracellular hemoglobin numerous unit operations will be required (Bulmer et al., 1996; Stocker-Majd et al., 2008). Here we will not discuss on how to solve the task by process engineering, but rather the solutions that can be created by using synthetic biology and metabolic engineering.

The rational approach sounds simple: to secrete the hemoglobin and recover it from the fermentation broth. All though S. cerevisiae is a desired platform for heterologous protein production, the yeast only secretes a few of its native proteins. To successfully secrete hemoglobin the protein needs the correct signal peptide, and genes involved in protein translocation should be upregulated (Tang et al., 2015; Bao et al., 2017). The combined powers of synthetic biology and metabolic engineering can decrease the amount of time and experiments needed. Combinations of genes and signal peptides can be modeled using genome-scale models to find the optima. The material for the modifications can be synthetically generated or taken directly from a BioBrick library (Constante et al., 2011) and lastly, CRIPSR-Cas9 can serve as an efficient and marker-free method for gene insertions or deletions (Jakociunas et al., ˇ 2015). However, it is a complex task to make multimeric proteins secreted. Synthetic biology can contribute through the design of novel protein structures that would facilitate secretion, and hence the downstream processing, while keeping the functional properties intact. As an example, there are currently several strategies ongoing in order to design fusion hemoglobin molecules as a single peptide, but keeping a domain-like structure that resembles the multimeric structure.

### Novel Production Hosts

Experimenting with novel organisms is no longer just a wish from the researchers: new technologies have made the wish come true. With BioBricks and minimal hosts, organisms can be constructed and customized like mechanical objects—such as airplanes (Constante et al., 2011; Keasling, 2012). BioBricks are standardized which means that they are modular and can be assembled in any way desired, and can be ordered from the Standard Registry of Biological Parts which is a growing collection (Constante et al., 2011). It is not farfetched to envision that a paradigm shift will occur in the near future of biotechnology. Researchers will be able to design the organism that they want instead of working around native shortcomings.

All of these technologies can widen the array of possible cell factories. For the production of recombinant hemoglobin this means that the production platform will no longer be restricted to S. cerevisiae. The performance of more organisms can be tested out both theoretically (using models) and experientially (using well-defined modular components) to find the optimal one. With the recent advances in high through-put micro-fermentation systems all-new stains can be designed, constructed and tested rapidly (Kensy et al., 2009).

#### REFERENCES


# CONCLUDING REMARKS

The implementation and continuous development of synthetic biology and metabolic engineering tools have enabled researchers to move the boundaries of what cell factories can do. The tools cannot only be used for creating new "super-drugs" but also to make well-known treatments more available.

Producing hemoglobin recombinantly in yeast has proven feasible. The challenges which still remain are considered solvable with the combined forces of knowledge, synthetic biology and metabolic engineering. This review has described some of the possible strategies to further improve yeast-based cell factories.

Currently, commercial production of recombinant hemoglobin is too expensive, to match the price of bloodderived hemoglobin. The cost should be further decreased by three-fold, but once strains have been optimized and large production plants have been established, hemoglobin can be produced at large scale. Yeasts are easy to cultivate and require cheap substrates in order to produce hemoglobin. It is also important to note that recombinant hemoglobin is not associated with the same risks as blood- derived hemoglobin in terms of pathogens and unstable supply.

All in all, the production of recombinant hemoglobin can ensure access to safe and affordable blood substitutes for the entire world population. Ultimately, the production of recombinant hemoglobin via synthetic biology will decrease inequality and help reach the goals set by WHO.

# AUTHOR CONTRIBUTIONS

AF, IJ, and JM wrote the manuscript. AW and JM edited and supervised the final version.

# FUNDING

This work is supported by the Novo Nordisk Fonden within the framework of the Fermentation Based Biomanufacturing Initiative.

# ACKNOWLEDGMENTS

The authors would like to acknowledge funding from the Novo Nordisk Fonden within the framework of the Fermentation Based Biomanufacturing education initiative.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Frost, Jacobsen, Worberg and Martínez. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# CopySwitch—in vivo Optimization of Gene Copy Numbers for Heterologous Gene Expression in Bacillus subtilis

#### Florian Nadler, Felix Bracharz and Johannes Kabisch\*

Computer-Aided Synthetic Biology, Institute of Biology, Technische Universität Darmstadt, Darmstadt, Germany

#### Edited by:

Rodrigo Ledesma-Amaro, Imperial College London, United Kingdom

#### Reviewed by:

Stefan Junne, Technische Universität Berlin, Germany Jochen Schmid, Technische Universität München, Germany Katrin Messerschmidt, Universität Potsdam, Germany

> \*Correspondence: Johannes Kabisch johannes@kabisch-lab.de

#### Specialty section:

This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology

Received: 17 October 2018 Accepted: 13 December 2018 Published: 08 January 2019

#### Citation:

Nadler F, Bracharz F and Kabisch J (2019) CopySwitch—in vivo Optimization of Gene Copy Numbers for Heterologous Gene Expression in Bacillus subtilis. Front. Bioeng. Biotechnol. 6:207. doi: 10.3389/fbioe.2018.00207

The Gram-positive bacterium Bacillus subtilis has long been used as a host for production and secretion of industrially relevant enzymes like amylases and proteases. It is imperative for optimal efficiency, to balance protein yield and correct folding. While there are numerous ways of doing so on protein or mRNA level, our approach aims for the underlying number of coding sequences. Gene copy numbers are an important tuning valve for the optimization of heterologous gene expression. While some genes are best expressed from many gene copies, for other genes, medium or even single copy numbers are the only way to avoid formation of inclusion bodies, toxic gene dosage effects or achieve desired levels for metabolic engineering. In order to provide a simple and robust method to address above-mentioned issues in the Gram-positive bacterium Bacillus subtilis, we have developed an automatable system for the tuning of heterologous gene expression based on the host's intrinsic natural competence and homologous recombination capabilities. Strains are transformed with a linearized, low copy number plasmid containing an antibiotic resistance marker and homology regions up- and downstream of the gene of interest. Said gene is copied onto the vector, rendering it circular and replicative and thus selectable. We could show an up to 3.6-fold higher gfp (green fluorescent protein) expression and up to 1.3-fold higher mPLC (mature phospholipase C) expression after successful transformation. Furthermore, the plasmid-borne gfp expression seems to be more stable, since over the whole cultivation period the share of fluorescent cells compared to all measured cells is consistently higher. A major benefit of this method is the ability to work with very large regions of interest, since all relevant steps are carried out in vivo and are thus far less prone to mechanical DNA damage.

Keywords: Bacillus subtilis, copy number change, gene expression tuning, heterologous expression, natural competence

# INTRODUCTION

Heterologous gene and pathway expression remains a task mostly approached by trial and error, making use of a broad palette of available optimization strategies (Stevens, 2000). The goal of most approaches is to find the optimal level of gene expression for the maximum amount of active protein without the formation of inclusion bodies. There are special cases, where inclusion body formation is wanted for easier protein purification, but these are only applicable in combination with specific refolding strategies. In prokaryotic gene expression, optimization implies finding a suited balance between transcription and translation speed on the one hand and correct protein folding on the other hand. This can be achieved by reducing the temperature during protein production with the drawback of increased process times (Sørensen and Mortensen, 2005).

Advanced methods use libraries of constructs, transcribed and translated with different strengths by varying the promoter (Kraft et al., 2007), the ribosome binding site (Kohl et al., 2018) or by fusion with peptides aiding in correct folding (Butt et al., 2005; Kraft et al., 2007). Another approach is to vary the copy number of the genetic constructs. As shown before, the number in which a gene is present in a cell is correlated with the expression level of said gene (Lee et al., 2015). In the most prominent host Escherichia coli, plasmids with different copy numbers are available such as the high copy number pUC vectors (Yanisch-Perron et al., 1985) and the low copy number pBR322 based vectors (Bolivar et al., 1977) and assisted integration into the chromosome is possible through recombineering (Sharan et al., 2009). Plasmid replication is a major burden on the cell's metabolism, thus always requiring an active selection mechanism to ensure plasmid stability during production. The higher the copy number of the plasmids, the higher this burden and the less stable the plasmid DNA is propagated. For the yeast S. cerevisiae it was demonstrated, that a set of tunable copy number plasmids (dependent on antibiotic concentration) could balance a pathway for n-butanol production in a way that yielded a 100-fold increase in production of the desired chemical (Lian et al., 2016). In another publication, different copy number plasmids were introduced into Bacillus methanolicus via electroporation to show the effect of gene dosage on the expression of gfpuv, an amylase from Streptomyces griseus and the lysine decarboxylase cadA from E. coli for overproduction of cadaverine. A positive correlation between estimated plasmid copy numbers and observed expression levels could be determined (Irla et al., 2016). Since plasmid copy numbers can be affected by media composition as well as growth phase, efforts have been undertaken to uncouple expression levels from gene dosage by introducing so-called incoherent feedforward loops in the form of transcription-activator-like effector (TALE)-regulated promoters (Segall-Shapiro et al., 2018).

Contrary to the Gram-negative organism E. coli, which under laboratory conditions is only able to take up plasmid DNA after various pretreatment steps, Bacillus subtilis becomes naturally competent. Transformation is as easy as cultivating cells in a competence-inducing medium in the presence of the DNA to be integrated for a few hours (Kumpfmüller et al., 2013). To further accelerate this process, B. subtilis strains with inducible competence have been developed (Rahmer et al., 2015). In contrast to this easy method for transforming linear DNA for genomic integration, the transformation of circular plasmid DNA is more difficult in B. subtilis. One method based on naturally competent cells has the drawback of needing multimeric plasmid forms (Canosi et al., 1978). These can be generated either by PCR and ligation or by transformation of a monomeric form into specialized E. coli strains (e.g., JM105 Voss et al., 2003). Most other methods of transforming B. subtilis with replicative plasmids require the use of special devices (e.g., electroporation), pretreatment of cells with polyethylene glycol (PEG), glycine or the preparation of protoplasts, which is tedious and requires recovery for 2 to 3 days (Vojcic et al., 2012).

The CopySwitch system presented in this work allows the rapid and easy in vivo increase of copy numbers for applications in B. subtilis without the drawbacks of classic plasmid transformation techniques. The underlying process is depicted in **Figure 1**. A construct for heterologous expression is integrated into the chromosome via natural competence. After verification of the correct insertion, a linearized plasmid, containing an origin of replication (ori) for B. subtilis, an antibiotic resistance gene and homologous regions up- and downstream of the integration site, is added to competent cells. Upon intracellular uptake, recombination occurs and all genetic elements between the two recombination sites are copied onto the plasmid, rendering it circular and therefore replicative again (Tomita et al., 2004b). Since most of the steps happen inside the cell, plasmid size becomes less limiting than with in vitro cloning methods. If a certain chromosomal locus is chosen for integration of genes to be expressed heterologously, the same CopySwitch backbone can be reused.

We chose two different reporter systems for demonstrating the application of CopySwitch, one being the intracellular production of GFP and the other being the secretion of a recombinant mature phospholipase C (mPLC) from Bacillus cereus SBUG 516. Both were placed under the acetoin-inducible promoter PacoA from B. subtilis. The mPLC is a monomeric, phosphatidylcholine-preferring PLC (E.C.3.1.4.3, UniProt-ID P09598), containing three catalytically relevant zinc ions. It is used in the industrial degumming of oils (Jiang et al., 2015; Elena et al., 2017). Its sequence comprises of the amino acids 39–283 (mature form) and is preceded by the first 33 amino acids of the B. subtilis amyE gene, which function as a signal peptide for secretion (Durban et al., 2007). Green fluorescent protein production can be assessed via flow cytometry fluorescence measurements, whereas mPLC activity is evaluated by the increased absorbance at 410 nm due to accumulation of pnitrophenol from the cleavage of the chromogenic substrate paranitrophenylphosphorylcholine (p-NPPC) (Kurioka and Matsuda, 1976; Durban et al., 2007).

With the here described method we aim to address some of the shortcomings and current bottlenecks of traditional replicative plasmid transformation protocols.

### MATERIALS AND METHODS

#### Strains and Plasmids

For propagation and maintenance of plasmids E. coli DH10B cells were used. For all experiments on gfp expression, a derivative of Bacillus subtilis 6051-HGW was used as a starting point, combining six different modifications previously described (Kabisch et al., 2013; Zobel et al., 2015) (see **Table 2**). For all experiments on mPLC (matured phospholipase C) expression, a derivative of Bacillus subtilis PY79 was used, combining seven

knock-outs of extracellular proteases. This so-called KO7 strain is available from the Bacillus Genetic Stock Center (Columbus, OH, USA) under the BGSCID 1A1133. In our studies we used a derivative of pBS72 (Titok et al., 2003) for a low copy number ori. Instead of the full length B. subtilis origin region as it is used for example in the commercially available pHT01 system (MoBiTec, Göttingen, Germany), we shortened the region by approximately 600 base pairs (bps) by deleting two putative open reading frames ('orf-4 and orf-3) of unknown function (Titok et al., 2006) in order to reduce replicative burden.

For details on the plasmid sequences including oligonucleotides see **Supplementary Material** section.

#### Media and Cultivation

All standard cultivations were performed in LB containing 10 g/L NaCl (LB10; Carl Roth, Karlsruhe, Germany), except for B. subtilis strains carrying a resistance gene for the saltsensitive antibiotic zeocin (InvivoGen, San Diego, CA, USA). In these cases, LB containing only 5 g/L NaCl (LB5; Carl Roth, Karlsruhe, Germany) was used. Transformation medium for Bacillus subtilis was prepared according to Kumpfmüller and coworkers (Kumpfmüller et al., 2013). Antibiotics were used in the following concentrations: 100µg/ml ampicillin, 50µg/ml (E. coli) or 12.5–50.0µg/ml (B. subtilis) kanamycin, 100µg/ml spectinomycin, and 40µg/ml zeocin.

All incubations for the flow cytometry experiment were performed at 37◦C and 200 - 250 rpm in a 25 mm amplitude rotary shaker (Infors GmbH, Einsbach, Germany). Thirty milliliters of preculture (15 ml 2x LB5, 1% (v/v) glucose from a 40% (w/v) stock solution, 40µg/ml zeocin and with or without 25µg/ml kanamycin, filled to 30 ml with sterile, desalted H2O) were inoculated from glycerol stocks of BsFLN045 and BsFLN051, respectively. After incubation overnight, 20 ml of main culture (10 ml 2x LB5, 0.5% (v/v) glucose from a 40% (w/v) stock solution, 0.5% (v/v) acetoin from a 10% (w/v) stock solution, 40µg/ml zeocin, and with or without 25µg/ml kanamycin, filled to 20 ml with sterile, desalted H2O) were inoculated with preculture to an optical density of 0.1 at 600 nm (OD600) in quadruplicates for each strain. Media were prepared as master mixes and distributed evenly into 50 ml Erlenmeyer flasks for cultivation.

Cultivations for the phospholipase C experiments were performed at 30◦C and 150 rpm in a 50 mm amplitude rotary shaker (Eppendorf New Brunswick, Enfield, CT, USA). Twenty milliliters of preculture (SB medium: 32.0 g/L Tryptone, 20.0 g/L yeast extract, 5.0 g/L NaCl, 5.0 ml 1N NaOH, 42.3 mM Na2HPO4, 22.0 mM KH2PO4, 18.7 mM NH4Cl, and 0.02 mM CaCl2, supplemented with 1% (v/v) glucose from a 40% (w/v) stock solution, 40µg/ml zeocin, and with or without 25µg/ml kanamycin) were inoculated from glycerol stocks of BsMKA005, BsFLN064 and BsFLN066, respectively. To ensure sufficient growth overnight, precultures were incubated at 37◦C. Fifty milliliters of main culture (SB containing 0.5% (v/v) of acetoin from a 10% stock solution in SB and respective antibiotic) were inoculated to reach an OD<sup>600</sup> of 0.2 in triplicates for each strain. Cultivation was carried out in 500 ml Erlenmeyer flasks.

#### Enzymes

All restriction enzymes and Q5 proofreading polymerase (for amplicons >3 kb) were purchased from New England Biolabs (NEB, Ipswich, MA, USA). Colony polymerase chain reactions (cPCRs) were performed with standard Taq polymerase, whereas error-free amplification of small (<3 kb) fragments was performed with OptiTaq, which is a mixture of Taq and Pfu polymerase (Roboklon, Berlin, Germany). Alternatively to Q5, proofreading Polymerase X was also used for amplicons >3 kb (Roboklon, Berlin, Germany).

#### Molecular Biology Methods

All PCRs and restriction digestions were carried out as suggested by the supplier of the used enzymes. Sequencing reactions and oligonucleotides were purchased from Eurofins genomics (Ebersberg, Germany). If replicative plasmids were used as PCR templates and no antibiotic switch was possible after cloning, a thorough DpnI digestion (1–2 h at 37◦C) was performed to reduce background through template plasmid carry-over, followed by heat inactivation (20 min at 65◦C). PCR fragments were column-purified before being used for TABLE 1 | Optimization of transformation conditions. Numbers given correspond to colony forming units.


#### TABLE 2 | Strains and plasmids used in this work.


cloning (NucleoSpin <sup>R</sup> Gel and PCR Clean-up Kit, Macherey-Nagel, Düren, Germany or Monarch <sup>R</sup> PCR & DNA Cleanup Kit, NEB, Ipswich, MA, USA). Plasmids were also columnpurified with two different kits (High Pure Plasmid Isolation Kit, F. Hoffmann-La Roche AG, Basel, Switzerland or Monarch <sup>R</sup> Plasmid Miniprep Kit, NEB, Ipswich, MA, USA). For plasmid preparation from B. subtilis, 5 to 10 mg of lyophilized lysozyme (Sigma-Aldrich, St. Louis, MO, USA) were added to the resuspension buffer and incubated at 37◦C for 10 min. All subsequent steps were done as described in the respective protocol of the plasmid preparation kit used. DNA assemblies were performed using SLiCE as described by Messerschmidt and coworkers (Messerschmidt et al., 2016). In contrast to the described method, DNA was transformed by electroporation (2.5 kV, 2 mm gap). For an assembly containing more than three DNA parts, amplifying the SLiCE assembled DNA via PCR enabled us to reduce the fragment number. Correct assembly of fragments and integration into a B. subtilis genome was detected by colony PCR (cPCR) and confirmed by sequencing of either plasmids or proofreading-amplified PCR products of regions of interest.

# Transformation of Bacillus subtilis

For transformation, 0.05% (w/v) of casamino acids (CAA; Sigma-Aldrich, St. Louis, MO, USA) and either 1% (v/v) glycerol stock or 3–5% (v/v) of an overnight culture of the desired strain (grown in the same medium) were added to 1 ml of transformation medium (TM) containing a suitable antibiotic. Cells were cultivated for 5–6 h, shaking at 37◦C until the broth was visibly turbid. At this point, plasmid DNA (undigested for chromosomal integration and XhoI-linearized for CopySwitch) was added at a final concentration of >200 fmol/ml and incubated for 30 min while shaking. For each milliliter of transformation sample, 200 µl of expression mix (2.5% (v/v) of yeast extract and CAA each) were added and, depending on the task, shaken for another 1 (chromosomal integration) to 4 (CopySwitch) h at 37◦C. Afterwards, different amounts (200 to 1,000 µl) were spread onto antibiotic-containing LB-agarplates and incubated overnight to ensure growth of distinct single colonies. To recycle the antibiotic marker in the chromosomal integration approaches, single colonies were subjected to xyloseinduced expression of cre recombinase by growth in 300 µl of LB5-Zeo (40µg/ml) containing 1% (v/v) of xylose for >6 h at 37◦C and 1,400 rpm (Thermoshaker, Eppendorf, Hamburg, Germany). Constant aeration during this time was achieved by melting a hole into the lid of the 2 ml Eppendorf tubes used. This procedure results in recombination of lox71 and lox66, thereby looping out the marker gene cassette inbetween, generating a lox72 scar. After a dilution streaking on zeocin-containing LB5-agar-plates, single colonies were further verified via cPCR, replica-plating to ensure loss of the antibiotic marker and finally sequencing. To verify a CopySwitch, we isolated plasmid from B. subtilis and transformed the obtained DNA into E. coli to obtain sufficient amounts of plasmid for sequencing.

#### Enzymatic Assay For mPLC Activity Sampling

Samples for the enzymatic assay of mPLC activity were collected after 6, 12, 24, and 48 h of cultivation. Two milliliters of culture broth were centrifuged (17.000 rcf, 4◦C, 5 min) and 1.8 ml were transferred into new vessels. Sterile-filtering was omitted to ensure full enzymatic activity.

#### Assay Principle and Plate Reader Measurements

The underlying principle of the activity assay lies in the ability of mPLC to turn the non-natural substrate p-NPPC (CAS number 21064-69-7, Cayman Chemicals, Ann Arbor, MI, USA) into phosphorylcholine and p-nitrophenol. Absorption of pnitrophenol is measured at 410 nm (Kurioka and Matsuda, 1976; Durban et al., 2007). Activity measurements were carried out with 70 µl of diluted supernatant (one part supernatant + three parts 100 mM borax-HCl buffer, pH 7.5) and 30 µl of substrate solution (100 mM p-NPPC dissolved in 100 mM borax-HCl buffer, pH 7.5) in 96-well microwell plates (96-well, PS, F-bottom, #655101, Greiner Bio-One GmbH, Frickenhausen, Germany). Measurements were done at 30◦C for 2 h every minute and for another 5 h every 5 min afterwards. Since substrate depletion caused the reaction to plateau after several hours, only values from the first 4 h were taken for all calculations.

#### Data Analysis

All data conversions, analyses and plotting were performed with R Version 3.5.1 (Ihaka and Gentleman, 1996) using Wickhams tidyverse Version 1.2.1. (Wickham, 2017).

#### Assaying GFP Fluorescence Sampling

Samples for flow cytometry analysis were taken after 3, 5, 7, 9, 11, 20, and 24 h of cultivation, to ensure capturing the transition between non-expressing and gene-expressing state. The sample volume was chosen to yield an OD<sup>600</sup> between 1.7 and 2.6 after pelleting the cells, discarding the supernatant and resuspending in 1.5 ml 1x ClearSort Sheath Fluid (Sony Biotechnology, Weybridge, UK). Before resuspension, the cell pellets were stored at −20◦C until all samples were collected and prepared for measurement. Reconstituted samples were diluted again (one part sample plus four parts 1x ClearSort Sheath Fluid), to be in a suitable range of events per second for flow cytometry measurement.

#### Flow Cytometry

Fluorescence in cells was detected in a Sony LE-SH800SZBCPL (Sony Biotechnology, Weybridge, UK) using a 488 nm argon laser. Photomultipliers for backscatter and FL-1 (525/50 nm) were set to 25.5 and 38.0% with a FSC-threshold of 0.20% and a window extension of 50. The FSC-diode was set on an amplification setting of 6/12 with events per second being 50,000.

#### Flow Cytometry Data Analysis

Areas of scattering and fluorescence signals (FSC-A, SSC-A, FL1- A) were brought to a near-normal distribution by applying an inverse hyperbolic sine (asinh) to all values. Cell agglomerates and fragments were excluded by fitting a bivariate normal distribution in the channels FSC-A and FSC-W. For the threshold between fluorescent and non-fluorescent cells, the minimum of the clearly bimodal distribution of all measured FL-1 values was chosen. For every sample, medians of the fluorescent population as well as the ratio of counts between fluorescent events and total events was calculated. The resulting values were analyzed by showing means and standard deviations of biological quadruplicates. All analyses, calculations and plotting were performed with R (Ihaka and Gentleman, 1996). Specifically, for parsing of fcs files and gating, bioconductor's (Huber et al., 2015) flowcore (Hahne et al., 2009) was used. Packages from the metapackage tidyverse, especially dplyr (Hadley et al., 2017) and ggplot2 (Wickham, 2011) were employed for summarization and presentation.

#### Plate Reader Measurements

Directly after all flow cytometry measurements had been done, 200 µl of each diluted sample were transferred from the flow cytometry tubes into 96-well microwell plates (NuncTM MicroWellTM #152037, ThermoFisher Scientific, Waltham, MA, USA). Optical density was measured at 600 nm in a PHERAstar FSX (BMG Labtech, Ortenberg, Germany). Fluorescence was induced with an excitation wavelength of 485 nm and measured at an emission wavelength of 520 nm. This procedure guarantees comparability between flow cytometry and plate reader data. All analyses, calculations and plotting were performed with R (Ihaka and Gentleman, 1996).

# RESULTS

## Optimizing Parameters For Successful Transformation and Gene Transfer Onto Replicative Plasmid

Transformation via natural competence requires internalization of single-stranded DNA, formation of a heteroduplex between regions of homology and expression of an antibiotic resistance marker. Using our method for replicative plasmid transformation furthermore requires time for expression of additional, plasmidbased replication machinery (e.g., RepA protein) and replication of the newly circularized plasmid. To optimize the yield of plasmid-bearing colonies, three different parameters were tested. For standard chromosomal integration, homology regions of 500 bp and a regeneration time after transformation of 1 h were employed. However, this procedure did not yield any colonies for CopySwitch, regardless of the antibiotic concentration (see **Table 1**). To increase the probability of correctly assembled replicative plasmids, longer regeneration times (1, 2, and 4 h) and longer regions of homology (500, 1,000, and 2,000 bp) were tested. The volume plated corresponds to 48 fmol of linearized plasmid in the transformation mix. All of the different conditions were tested in BsFLN045 with either plasmids containing different homology regions or elution buffer (negative control). From the colony count of these transformations, the conclusions can be made that (i) 4 h regeneration time resulted in satisfactory CopySwitch efficiency and (ii) 2,000 bp are a sufficient length for homology regions. Further, a kanamycin concentration of 25µg/ml was superior to both 12.5µg/ml, which resulted in false-positive clones after >17 h incubation period (data not shown) and 50µg/ml (too high, no colonies formed at all). Higher colony forming units (cfus) were the result of prolonged incubation. In case of 2,000 bp homology and 12.5µg/ml kanamycin a 4 h incubation resulted only in a doubling of OD<sup>600</sup> but an increase of cfus by a factor of 35. Sequencing of three different, retransformed plasmids showed no mutations of the copied locus.

## Growth Curve Comparison Between One and Multiple Gene Copy Bearing Strains

Growth curves of a strain containing a chromosomal copy of PacoA-gfp (BsFLN045) and a strain containing a chromosomal and a plasmid copy (BsFLN051) have been recorded at 37◦C. During lag- and log-phase there is no visible difference between both curves. Contrary to this observation, in transient phase the plasmid-bearing strain is growing slower and does not reach an OD<sup>600</sup> as high as BsFLN045 (see **Figure 2A**).

All strains of the mPLC experiment were also investigated for differences in growth behavior. At 30◦C, no significant differences could be found (see **Figure 3A**). The reason for choosing 30◦C instead of 37◦C for

FIGURE 2 | (A) Averaged growth curves of strains BsFLN045 (chromosomal copy) and BsFLN051 (chromosomal + plasmid copy); (B) Logarithmic plot of increase in FL-1 fluorescence and assumed intracellular GFP content over cultivation time. For all points except at t = 11 h, the increased fluorescence of BsFLN051 is statistically significant (Student's t-test, alpha = 0.05). (C) Increase in the share of gfp-expressing cells over cultivation time. For all points except at t = 11 h the increased fluorescence of BsFLN051 is statistically significant (Student's t-test, alpha = 0.05). Plotted values are averages of n = 4 biological replicates with error bars showing standard deviation.

cultivation, lies in higher protein stability and solubility during heterologous expression at lower temperature (Schein and Noteborn, 1988).

FIGURE 3 | (A) Averaged growth curves of strains BsMKA005 (negative control), BsFLN064 (chromosomal copy), and BsFLN066 (chromosomal + plasmid copy); (B) OD600-normalized slope values of phospholipase C assay as a measure of enzymatic activity. Since amount of substrate and volume of enzyme solution (= diluted crude supernatant) were constant and differences in OD600 at sampling time are normalized, the slope is a direct indicator for the amount of active enzyme produced by the according strain. The data used for calculation of the slope consists of 4 h of kinetic measurement. Plotted values are averages of n = 3 biological replicates with error bars showing standard deviation.

# Determining gfp Expression Levels With Flow Cytometer and Microwell Plate Reader

To test whether our system has an influence on gfp gene expression, two different approaches were made. Flow cytometry was used to determine if more copies of gfp genes equal higher amounts of GFP protein per cell and additionally what the share of expressing cells was, compared to the total cell population.

In general, observed fluorescence in the FL-1 channel appeared to positively correlate with cultivation time. However, for BsFLN051 with additional, plasmid-based gfp expression, values were consistently higher than for BsFLN045 (see **Figure 2B**).

Similarly, the share of fluorescent cells is significantly higher for the plasmid carrying strain BsFLN051 for all time points except t = 11 h (see **Figure 2C**). At the same time, the number of non-fluorescent cells appears to be significantly lower if the plasmid is present, especially in early growth phases.

Our second approach involved the widely used standard procedure of measuring fluorescence of the whole sample in a microwell plate reader (see **Figure S1**). Since the OD<sup>600</sup> was already in a suitable range (0.3–0.5), photometry samples were taken directly from the flow cytometry tubes of section "Sampling."

# Application of CopySwitch to Change mPLC Expression Levels

After showing the principle of CopySwitch with the easy-tomonitor GFP, we applied the system to the expression of a secreted and industrially relevant enzyme. The chosen mPLC from B. cereus SBUG 516 cleaves the substrate p-NPPC and thereby releases p-nitrophenol, which can be measured as the increase of absorption at 410 nm (Durban et al., 2007). In total, four different time points were assayed (t6, t12, t24, and t48; number corresponds to hours after inoculation). Since every parameter was kept the same for every sample, the amount of enzyme in the sample directly correlates to the slope of the absorption change over time. OD<sup>410</sup> values of each biological replicate were divided by their corresponding OD<sup>600</sup> values before calculations to compensate for slightly different growth between replicates and strains. To make sure the increase of absorption is not due to background activity from the cultivation supernatant, each sample was additionally measured without substrate. Strain BsMKA005 without the mPLC gene was included as a negative control. Buffer containing substrate was measured as a autohydrolysis control. After 4 h of measurement at 30◦C, no significant increase of absorption could be detected for (a) the negative control strain with and without substrate, (b) buffer with and without substrate and (c) mPLC-containing strains without substrate. All these controls strongly suggest mPLC as the sole source of p-NPPC-cleaving activity. After 6 h of cultivation, almost no enzymatic activity could be detected in the diluted supernatant used for the assay, indicating expression has not started yet. After 12 and 24 h of cultivation, the strain containing plasmid and chromosomal copies (BsFLN066) yields an about 1.3-fold increase of the slope (= activity) over the strain containing only a chromosomal mPLC copy (BsFLN064). After 48 h the activity difference drops to about 1.1-fold, still favoring BsFLN066 (see **Figure 3B**).

#### DISCUSSION

For CopySwitch being applicable as a tool for the B. subtilis community, we first had to find the right transformation parameters. Our results suggest these to be approximately 2,000 bp for each homology region, a regeneration time of 4 h and a kanamycin concentration of 25µg/ml. The overall increased values, compared to standard chromosomal integration (500 bp, 1 h, 12.5µg/ml) may stem from the plasmid nature of the CopySwitch product, which requires separate replication machinery and replication of a circularized plasmid. The ori used in our system has a reported copy number of six units/chromosome (Titok et al., 2003). Since the used antibiotic resistance marker against kanamycin is also increased in gene copy number, this may explain the possibility of using higher antibiotic concentrations. In addition, we tried using the backbone of the high copy number plasmid pMSE3 (Silbersack et al., 2006; data not shown). These attempts did not show robust and reproducible results, likely due to the necessity of having four kb of adjacent chromosomal DNA for efficient homologous recombination on the plasmid, resulting in the increase of copy numbers of ORFs in these regions. For the pksX locus used in this work, this implies a copy number increase of ymzD, ymcC (both putative integral inner membrane proteins), pksS (cytochrome P450 of bacillaene metabolism) and ymzB (hypothetical protein) to several hundreds (Silbersack et al., 2006). The pksX locus had been chosen because it contains a secondary metabolite cluster for the polyketide bacillaene (Butcher et al., 2007), which has proven to be non-essential under laboratory conditions in our previous experiments (data not shown). Deletion of this cluster furthermore reduces the genome by 76.5 kb, which means less metabolic cost for replication and unnecessary protein production. It has already been shown elsewhere, that efficiency of recombinational transfer may be region-dependent (Tomita et al., 2004a). The integration of completely synthetic and unique regions into the chromosome could be a way to address this issue. To find out more about the metabolic stress applied by additional gene copy numbers, growth curves were recorded. There were no visible differences neither in lag-, nor in log-phase, indicating that the additional burden of plasmid replication in BsFLN051 is not interfering with growth (see **Figure 2A**). However, in the transition stage from log- to stationary phase, BsFLN051 seems to have a slight disadvantage and also does not reach the final optical density of BsFLN045. This is likely due to the nature of the acetoin-inducible promoter, which is controlled by catabolite repression, meaning expression from it only starts once residual glucose in the medium is exhausted. As shown elsewhere, the growth is impaired from the onset of protein expression. This effect is getting more pronounced as the number of promotergene constructs rises (Silbersack et al., 2006). With our second test system (mPLC), those effects were not observed, probably due to a lower cultivation temperature of 30◦C and the omission of glucose.

Since the system is intended for fast optimization of heterologous gene expression, we performed test expressions of GFPmut2 with (BsFLN051) and without (BsFLN045) CopySwitch. Flow cytometry indicates not only higher fluorescence of the plasmid-bearing strain on a single cell level, but also an increased share of fluorescent cells. Apparently, additional gfp transcription from CopySwitch plasmid p14073 appears to increase the total amount of GFP in cells. The reduction of the non-fluorescent cell population in BsFLN051 might be due to stronger deregulation of the utilized acoA promoter. The increased copy number of CcpA (Carbon catabolite control protein (A) binding sites in BsFLN051 compared to the single chromosomal promoter of BsFLN045 leads to a titration of CcpA repressor proteins. Since the medium contains glucose to ensure promoter repression and therefore undisturbed growth before the onset of protein production, ccpA plays an important role in our system (Ali et al., 2001). Additionally to flow cytometry, a photometric measurement of the whole sample was conducted in microwell plates directly after cytometry. As expected, the fluorescence intensity (normalized by OD600) is following the same pattern, rising over time and being higher in the plasmid-bearing BsFLN051 (see **Figure S1**).

To strengthen our hypothesis, another experiment was performed with strains containing either no mPLC gene (BsMKA005), one mPLC gene integrated into the chromosome (BsFLN064) or mPLC both in the chromosome and on a low copy number plasmid (BsFLN066). The slope of the enzymatic activity (= absorption increase at 410 nm) was taken as read-out for comparison of the strains, after having been normalized against their respective OD<sup>600</sup> values at the time of sampling. BsMKA005 showing no activity neither with nor without chromogenic substrate, suggests that mPLC is the enzyme responsible for absorption change in the supernatants of the remaining two strains. Since the negative control of buffer plus substrate also did not show any increase of absorption over time, autohydrolysis seems to play an insignificant role. After a cultivation time of 6 h, corresponding to the end of exponential phase, the first samples were taken. A very shallow slope indicates almost no activity for all three strains, which is in accordance to previous publications, showing maximum induction of acetoin catabolism after exponential phase (Ali et al., 2001). As expected, after 12 and 24 h, both BsFLN064 and BsFLN066 had increased levels of activity compared to BsMKA005, with BsFLN066 outcompeting BsFLN064 by 1.3-fold each time. Since activity values are OD600 normalized and every parameter despite copy number has been kept constant, it is likely that CopySwitch is leading to the higher amount of enzyme in the supernatant, therefore being responsible for higher activity. This effect is still visible after 48 h of cultivation, although dropping down to 1.1-fold in favor of BsFLN066.

# CONCLUSIONS

In comparison to other means of plasmid transformation in B. subtilis, CopySwitch has the great advantage of using naturally occurring competence. Other methods require either special E. coli strains for plasmid propagation, special devices for electroporation or tedious preparation of very fragile protoplasts. Additionally, all of the above-mentioned methods are either not well-suited for high-throughput applications due to lack of robustness or require expensive, specialized equipment. Since a lot of genetic engineering in B. subtilis is done chromosomally, this procedure is readily applicable for many strains. The CopySwitch method is as easy as growing B. subtilis in a competence-inducing medium, adding a linearized plasmid followed by addition of expression mix and plating. All these steps can be automated and parallelized with standard liquid-handling platforms.

In summary, our method comprises a very easy and straightforward tool for fast tuning of gene expression, which is scalable and easily automatizable. Future applications beyond the scope of this work could be an easy tuning of metabolic pathways by performing CopySwitch with complete pathways or subsets of these.

#### AVAILABILITY OF DATA AND MATERIALS

The flow cytometry datasets analyzed during the current study are available from the corresponding author on reasonable request.

#### AUTHOR CONTRIBUTIONS

FN conducted the experiment. FN and FB analyzed the data. FN, FB and JK prepared the manuscript. All authors read and approved the final manuscript.

#### REFERENCES


#### FUNDING

FN and FB are funded by an FNR grant (FKZ: 22007413). JK is funded by the CompuGene LOEWE grant.

#### ACKNOWLEDGMENTS

We like to thank Alex Elsholz for providing pYC121 as a template for GFPmut2, Dr. Zeigler for providing BsKO7 and Dr. Boettcher for providing p14094 as a template for mPLC. We acknowledge support by the German Research Foundation and the Open Access Publishing Fund of Technische Universität Darmstadt.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fbioe. 2018.00207/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Nadler, Bracharz and Kabisch. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Gene-Expressing Liposomes as Synthetic Cells for Molecular Communication Studies

#### Giordano Rampioni <sup>1</sup> , Francesca D'Angelo<sup>1</sup> , Livia Leoni <sup>1</sup> and Pasquale Stano<sup>2</sup> \*

<sup>1</sup> Department of Science, University Roma Tre, Rome, Italy, <sup>2</sup> Department of Biological and Environmental Sciences and Technologies (DiSTeBA), University of Salento, Lecce, Italy

The bottom-up branch of synthetic biology includes—among others—innovative studies that combine cell-free protein synthesis with liposome technology to generate cell-like systems of minimal complexity, often referred to as synthetic cells. The functions of this type of synthetic cell derive from gene expression, hence they can be programmed in a modular, progressive and customizable manner by means of ad hoc designed genetic circuits. This experimental scenario is rapidly expanding and synthetic cell research already counts numerous successes. Here, we present a review focused on the exchange of chemical signals between liposome-based synthetic cells (operating by gene expression) and biological cells, as well as between two populations of synthetic cells. The review includes a short presentation of the "molecular communication technologies," briefly discussing their promises and challenges.

#### Edited by:

Francesca Ceroni, Imperial College London, United Kingdom

#### Reviewed by:

Yo Suzuki, J. Craig Venter Institute, United States Kerstin Göpfrich, Max Planck Institute for Medical Research (MPIMF), Germany

#### \*Correspondence:

Pasquale Stano pasquale.stano@unisalento.it

#### Specialty section:

This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology

Received: 30 September 2018 Accepted: 02 January 2019 Published: 17 January 2019

#### Citation:

Rampioni G, D'Angelo F, Leoni L and Stano P (2019) Gene-Expressing Liposomes as Synthetic Cells for Molecular Communication Studies. Front. Bioeng. Biotechnol. 7:1. doi: 10.3389/fbioe.2019.00001 Keywords: synthetic cells, bottom-up synthetic biology, molecular communications, quorum sensing, lipid vesicles (liposomes), cell-free protein synthesis

#### MOLECULAR COMMUNICATIONS AND SYNTHETIC CELLS (SCs)

Natural organisms coordinate their activities through communication. Isolated cells, tissue cells, as well as higher organisms, share their environment with other living forms. Tactile, physical, and especially chemical signals define in unique and complex manner the sensory world of living beings. Communications in the chemical domain are ubiquitous intercellular processes, and play important roles in all organisms.

Inspired by the already mentioned capabilities of natural organisms, a new branch of biomimetic technology has been proposed which focuses on molecular communications (Nakano et al., 2011, 2013). Network engineers have envisioned the exploitation of chemical exchanges as the basis for developing new types of Information and Communication Technologies (the so-called bio-chem-ICTs, **Figure 1A**). This is an exciting new arena for engineers and biologists that aims at the construction of well-characterized biological parts, devices, and systems that will process chemical information in a controlled and programmable manner, as it happens with classical electric signals. The challenge, here, relies on the ability of managing communication and information processing through chemical signals with the same mastery as nature has done for billions of years. Such a broad and innovative territory of research offers several opportunities for various approaches to synthetic biology, which needs adequate theoretical frameworks, numerical modeling strategies, and experimental methodologies. More generally, bio-chem-ICTs refers to radically new forms of computation, communication, and information processing approaches—at the nano- and micro-scale levels—based on chemical and biochemical systems (Amos et al., 2011).

FIGURE 1 | Molecular communications based on synthetic cell (SC) technology. (A) Application areas of molecular communication research. Molecular communication is a (bio)chem-information and communication technology that can be applied to nanomedicine (smart drug delivery systems), smart responsive materials, synthetic biology (construction of biochips), artificial intelligence (AI), hybrid bio-electronic systems and for sensors in environmental monitoring (Nakano et al., 2013). (B) Synthetic cells are cell-like systems, generally built by encapsulating a number of (bio)molecular components into artificial micro-compartments. One of the possible designs focuses on liposome-based SCs operating by gene expression (Luisi, 2002; Luisi et al., 2006). With this aim, TX-TL kits produce the protein(s) of interest starting from the corresponding DNA sequence. The SC membrane can be functionalized with membrane proteins as pores (Noireaux and Libchaber, 2004) and receptors (Hamada et al., 2014); cytoskeletal proteins have been implemented as well (Maeda et al., 2012). (C) The principles of autopoiesis (self-production) (Varela et al., 1974), which guides the long-term goal of constructing SCs that produce all their components. Autopoiesis provides insights into the spatial and dynamical organization that a chemical system should be endowed with in order to display self-maintenance, organizational closure, homeostasis and reproduction achieved by the internal processes of manufacturing and assembling its components. (D) Schematic representation of a SC which produces and releases a signal molecule into the environment. The signal is perceived by a natural cell (e.g., a bacterium) that consequently activates a response (for example, a reporter protein, an enzyme operating as an actuator to perform a certain operation, including a reply signaling) (Nakano et al., 2011; Stano et al., 2012). Table 1 reports several cases of unidirectional or bidirectional molecular communications between SCs, or between SCs and natural cells. (E) The vision of using SCs as smart drug delivery systems or for enzyme replacement therapy (Leduc et al., 2007). SCs, intended as a biotechnological evolution of current liposomes for drug delivery, reach and bind to the target cells by a molecular recognition mechanism and activate their internal circuits responding to chemical stimuli and consequently act, in a programmable manner, for a certain task (e.g., producing a therapeutic or diagnostic agent Ding et al., 2018; Krinsky et al., 2018, or a secondary easy-to-detect signal, etc.). The chemical stimulus can be an endogenous chemical that derives from the target cell itself (as shown in the cartoon) or from other tissues (not shown), as well as purposely-added exogenous chemicals (not shown).

Owing to our direct involvement in the field (Stano et al., 2012; Rampioni et al., 2014, 2018), and considering recent exciting reports, in this review we present and discuss the intersection between the bio-chem-ICT idea of exchanging chemical signals in a programmable way, and the bottom-up synthetic biology approach focused on the construction of cell-like systems based on gene expression inside liposomes (Luisi, 2002; Noireaux and Libchaber, 2004; Luisi et al., 2006; Ichihashi et al., 2010; Stano et al., 2011; Nourian and Danelon, 2013; Spencer et al., 2013). For simplicity, we will shortly refer to these systems simply as "synthetic cells" (SCs, **Figure 1B**), keeping in mind that these are rather simple mimics of biological cells.

In this mini-review, the principles on which liposomebased SCs operate will be summarized, together with an explanation of the reason why they could contribute significantly to molecular communication technologies on account of their inherent possibilities in terms of design, modeling, control, programmability, and modularity. Next, recent experimental reports focused on chemical communication between SCs and natural cells (or with other SCs) will be reviewed (see also Lentini et al., 2016), while the opportunities and challenges facing this novel research arena will be discussed in the final section.

Before advancing in the discussion, two notes of warning are intended for readers unfamiliar with this research field. Firstly, the term "synthetic cell" is also used in synthetic biology to indicate living cells generated either by engineering biological cells (e.g., metabolic engineering, genetic optimization, or reprogramming), as well as by the transplantation of an entire synthetic genome in a living cell deprived of its own genome. Second, bottom-up synthetic biology approaches aiming at constructing cell-like systems are not restricted to liposomebased SCs. No less interesting are systems based on other types of compartments (Walde et al., 1994; Martino et al., 2012; Huang et al., 2014; Karzbrun et al., 2014; Dora Tang et al., 2015; Rideau et al., 2018), nor those based on new artificial molecules (Kurihara et al., 2011; Marguet et al., 2013; Taylor et al., 2015). Interested readers can refer to recent reviews for a broader discussion (Buddingh and van Hest, 2017; Salehi-Reyhani et al., 2017; Göpfrich et al., 2018; Schwille et al., 2018). The current review will focus only on SCs based on gene expression inside liposomes.

## BASIC PRINCIPLES ON LIPOSOME-BASED SCs OPERATING VIA GENE EXPRESSION

SCs based on gene expression inside liposomes find their origin in early studies on cell models aiming at achieving minimal lifelike behaviors (Morowitz et al., 1988; Luisi and Varela, 1989; Schmidli et al., 1991; Oberholzer et al., 1995a,b, 1999; Szostak et al., 2001; Luisi, 2002; Pohorille and Deamer, 2002; Mansy and Szostak, 2009). Born within the origins-of-life community, this research was intended as a means of investigating the emergence of life on Earth, more precisely by demonstrating the emergence of life as a system-level phenomenon due to a particular type of organization (the autopoietic one). Hence, the autopoietic (self-production) (Varela et al., 1974; Luisi and Varela, 1989; Luisi, 2003) (**Figure 1C**) and the chemoton theories (chemical automaton) (Gánti, 1975) are two valuable theoretical frameworks for the construction of SCs which display features of biological organisms. Starting in the first years of 2000, SCs and similar constructs became highly relevant also in the context of synthetic biology, either as tools for generating basic knowledge, or as systems designed for applied research, i.e., biotechnology and nanomedicine.

The SCs discussed in this review are liposomes, with a size ranging typically from 0.1 to 10–100 µm: they contain DNA and a cell-free gene expression system. They are made by assembling liposomes in an aqueous phase which contains all the molecules needed to be encapsulated for accomplishing protein synthesis from a DNA template (e.g., enzymes, ribosomes, tRNAs, nucleotides, amino acids etc.). The protein synthesis machinery can derive from a cell extract or from a reconstituted system [such as the PURE system (Shimizu et al., 2001)]. Accordingly, it can be noted that SC technology is based on liposome technology (including microfluidics) and cell-free systems (including biochemical reconstitution approaches). As a result of the reactions occurring in their aqueous lumen and/or on their boundary surface, SCs can display behavior(s) typical of living cells. For example, SCs produce proteins from a corresponding gene; in turn, the synthesized protein can be an enzyme that converts substrates into products, or it can be a pore-forming protein, creating pores on the liposome membrane, or it can be a receptor that binds a signal molecule, etc. More in general, SCs can be functionalized with any chemical network of biological relevance that is functional in vitro.

Several reactions different from gene expression have been successfully performed inside liposomes, confirming the potentiality of SCs in terms of scope, programmability, and functionality. Some examples are: PCR and RT-PCR (Oberholzer et al., 1995a; Shohda et al., 2011; Lee et al., 2014; Tsugane and Suzuki, 2018), DNA replication (Sakatani et al., 2018; van Nies et al., 2018), and several enzymatic reactions. Moreover cytoskeletal elements have been reconstituted inside SCs (Cabré et al., 2013; Furusato et al., 2018; Litschel et al., 2018). Ad hoc designed gene circuits lead to SCs that can perform useful operations in a programmable way, including communication, as discussed below. SCs with the capacity of self-producing all their own constitutive components, and which possibly growand-divide as living cells do, are still missing, although interesting reports that show progress in this directions have been published (Kurihara et al., 2011).

This mini-review focuses on SCs capable of communicating with biological cells and with each other. However, other interesting research directions are under current development, including the construction of SCs with nested design (Deng et al., 2017; York-Duran et al., 2017; Hindley et al., 2018), the production of ATP inside SCs (Feng et al., 2016; Altamura et al., 2017; Lee et al., 2018), the attempts of self-producing SC parts (Schmidli et al., 1991; Kuruma et al., 2009; Scott et al., 2016; Li et al., 2017; Exterkate et al., 2018), and the shift from isolated SCs to "SC communities", including tissue-like structures (Carrara et al., 2012; Hadorn et al., 2013; Booth et al., 2016).

## SCs THAT EXCHANGE CHEMICAL SIGNALS: A BOTTOM-UP SYNTHETIC BIOLOGY PLATFORM FOR MOLECULAR COMMUNICATIONS

SCs based on gene expression inside liposomes can be useful tools for developing molecular communication technologies (Stano et al., 2012). Current SC technology allows building simple systems capable of exchanging chemical signals, and therefore performing elementary signal processing. The idea is to design SCs capable of communicating with each other or with biological cells in a programmable manner (**Figure 1D**). This innovative perspective has multifold theoretical and practical consequences. From the theoretical viewpoint, SCs that can regulate their internal mechanisms in response to external perturbations (the chemical signaling) are de facto experimental tools for investigating minimal cognitive systems (Damiano and Stano, 2018a,b). Considering the proposed extension of the Turing imitation game to the SC realm (Cronin et al., 2006), molecular communication can contribute to the determination of life-likeness criteria as referred to SCs, as recently investigated by the Sheref Mansy group (Lentini et al., 2017). In a more practical perspective, an expansion of actual drug delivery strategies can be proposed. Inspired by the scenario depicted by Leduc and collaborators (**Figure 1E**) (Leduc et al., 2007), SCs could activate internal mechanisms upon perception of chemical signals, thus acting as "intelligent" drug carriers. As an example, SCs could be targeted to specific cells (e.g., tumoural cells) by exploiting antigen-antibody recognition. Once localized, their internal genetic circuit could be activated by chemical stimuli produced by the target cell itself or by other endogenous or exogenous chemical signals. These "smart" SCs could produce and release therapeutics (or drugs) in situ. Note that a recent study has reported SCs (injected into the tumor) that constitutively produce a toxin against breast cancer cells (Krinsky et al., 2018). The therapeutic (or diagnostic) use of SCs is, today, still a hypothetic scenario. Nevertheless, continuous improvements in SC design and construction is expected to favor a more rapid prototyping, thus accelerating the path toward applicative purposes.

### Sensors, Actuators, Controllers, and Molecular Diffusion

Like hardware robots or conventional communication devices, SCs are embodied systems composed of molecular elements that perform specific operations. Hardware components, such as sensors, controllers, and actuators (Mataric, 2007; Wang et al., 2013) have their molecular counterparts in SCs.

In the context of SCs operating by gene expression, sensors can be protein receptors or RNA aptamers that bind to a signal molecule and consequently change their conformation. This event directly or indirectly affects the "controller system," which is based on the regulation of gene expression by protein receptors or RNA aptamers (riboswitches) at the transcriptional or translational level, respectively. These mechanisms are wellunderstood (Alberts et al., 2014). Depending on its design, the regulatory circuit can involve a single gene or multiple genes. As a result of this sensing-and-regulation system, the synthesis of an actuator (a protein) is promoted or inhibited. In turn, the actuator operates on some further step (e.g., producing a signal molecule, catalyzing a useful reaction, creating a pore on the SC membrane, acting as a controller/regulator of another circuit, etc.). Key examples of this general mechanism will be commented on in section A Survey of Published Reports and listed in **Table 1**.

To provide SCs with communication capability, water-soluble proteins (sensors, regulators, signal-producing elements, or components of the gene expression machinery) should be either encapsulated, or synthesized in the SC lumen. This has become a standard practice, somehow, at least for some prokaryotic proteins (Stano et al., 2011). It is not trivial, instead, dealing with membrane-associated and integral membrane sensors/receptors, even if reports have shown that this is a feasible goal in SCs technology (strategies as membrane protein reconstitution Yanagisawa et al., 2011; Altamura et al., 2017; Jørgensen et al., 2017 or synthesis-from-within Kuruma et al., 2009; Hamada et al., 2014; Soga et al., 2014 have been employed). Genetic circuits of distinctive complexity have already been proven to be functional, also inside liposomes (Noireaux et al., 2003; Shin and Noireaux, 2012; Siegal-Gaskins et al., 2014).

In addition to molecular elements, in order to establish an intercellular communication channel, diffusion of the signal molecule in the outer aqueous environment should be taken into account. The signal molecule cannot be directed toward the communication partner, but it spreads in all direction, guided by the concentration gradient. Although the average behavior of many signal molecules can be foreseen, individual molecules follow an erratic path. In addition to free diffusion, for closely packed SCs, communication through gap junctions (reconstituted in liposomes) has been proposed (Ramundo-Orlando et al., 2005; Moritani et al., 2010).

# A Survey of Published Reports

The pioneer experimental report on a simple cell-like system sending a signal molecule to biological cells was published by the Ben Davis group (Gardner et al., 2009). The authors encapsulated the precursors of the formose reaction inside liposomes, and observed that one class of products of the intra-vesicular reaction escaped the liposomes through a channel formed by α-haemolysin and spontaneously reacted with the borate ions present in the external medium to generate furanosyl-boronates structurally similar to the quorum sensing (QS) signal molecule AI-2, that naturally triggers bioluminescence in Vibrio harveyi. Remarkably, the "synthetic" signal released by the liposome was able to induce natural behavior (i.e., light emission) in this bacterium.

Despite its great interest as proof of the concept study, the SCs used by Ben Devis and co-workers were not based on gene expression, therefore they lacked those aspects of programmability and control that are peculiar to synthetic biology. Being a novel research area, literature on the liposomebased SCs which operate by gene expression to interface with natural cells (or with other SCs) is, to the best of our knowledge, limited to the six studies that are summarized in **Table 1** together with the already cited study by Gardner et al. (2009). Additional cases involving non-liposome compartments are also available (Gupta et al., 2013; Schwarz-Schilling et al., 2016; Sun et al., 2016; Niederholtmeyer et al., 2018), but these will not be discussed in this mini-review.

In 2014, Sheref Mansy and collaborators designed SCs acting as "translators" for the bacterium Escherichia coli, using theophylline as trigger and isopropyl β-D-1 thiogalactopyranoside (IPTG) as signal molecule (Lentini et al., 2014). These SCs are liposomes containing IPTG, the PURE system as the transcription-translation (TX-TL) machinery, and a DNA template coding for a riboswitch that, after binding


(2009–2018).

TABLE

1

(Continued)


(2018). The most representative results are shown; more details can be found in the original publications. bufferthesameofSC

3OC6-HSL, N-(3-oxohexanoyl)-l-homoserine lactone; ACB, containing components interiors; AI-2, autoinducer-2; C2-CoA, acetyl-coenzyme A; C4-CoA, butyryl-coenzymeA; C4-HSL, N-butanoyl-l-homoserinelactone; C8-HSL, N-octanoyl-l-homoserine lactone; chol, cholesterol; DPhPC, 1,2-di-O-phytanoyl-sn-glycero-3-O-phosphocholine; EsaR and EsaI, signal receptor and synthase from Erwinia stewartii; GOx, glucose oxidase; HRP,horseradish peroxidase; LBS, lactobacillus selective; PNIPAAm, poly(N-isopropylacrylamide) cross-linked with glucose oxidase; POPC, 1-palmitoyl-2-oleoyl-sn-glycero-3-phsphatidylcholine; SAM, S-adenosylmethionine; Tet, ten-eleventranslocation protein.

Rampioni et al. Synthetic Cells for Molecular Communications

to the free-diffusible molecule theophylline, activated the expression of the pore forming protein α-haemolysin. The authors demonstrated that only in the presence of theophylline, did IPTG escape the liposomes through α-haemolysin, and activate the expression of the green fluorescent protein (GFP) gene in receiver E. coli cells. In this way, SCs acted as chemical translators allowing E. coli to sense theophylline (the latter molecule cannot be normally sensed by E. coli).

Adamala et al. (2017) built SCs containing engineered genetic circuits and regulatory cascades. These SCs can be controlled/triggered by external signals, and can be fused together in order to bring together products of incompatible reactions. In particular, the group lead by Edward Boyden showed that by using cell lysates with transcriptionaltranslational activity, DNA vectors encoding genes for IPTG (or doxycycline) detection and permeable chemical inducers, as arabinose or theophylline, the arabinose (or theophylline) activates the α-haemolysin production in the first SC population, so that pre-encapsulated impermeable IPTG (or doxycycline) could be released, and thus activate a response in a second SC population.

The group of Sheref Mansy recently reported two-way chemical communication between SCs and bacteria (Lentini et al., 2017). They exploited cell extracts to generate SCs able to synthesize molecules perceived by Vibrio fischeri, V. harveyi, E. coli, and Pseudomonas aeruginosa. In particular, the expression of LuxI-like synthases inside liposomes, in the presence of acetyl coenzyme A and S-adenosylmethionine (SAM), resulted in the production of molecules able to activate E. coli and V. fischeribased biosensor strains for acyl-homoserine lactone (AHLs) detection. Cell extracts operated both for TX-TL reactions and for the synthesis of some AHL precursors. Moreover, it was shown that SCs containing ad hoc designed genetic circuits could express QS signal molecule receptors able to trigger the expression of reporter and QS signal synthase genes (e.g., gfp and luxI), upon perception of QS signal molecules produced by bacteria. The extent to which SCs could "imitate" natural cells in term of their response to the investigated QS signal molecule was estimated by a sort of cellular Turing test (Cronin et al., 2006).

The signaling between liposome-based SCs and proteinosomes (cell-like particles made of proteins) mediated by glucose, has been recently reported by a joint work of the groups of Sheref Mansy and Stephen Mann (Tang et al., 2018). In this study, the unidirectional signaling pathway was based on: (i) liposome transmitters, containing the PURE system, a DNA plasmid carrying a chemically inducible repression switch (EsaR), a gene coding for α-haemolysin, and glucose; (ii) proteinosome receivers, consisting of a cross-linked enzymatically active glucose oxidase (Gox) poly(N-isopropylacrylamide) (PNIPAAm) membrane and encapsulated horseradish peroxidase (HRP). The addition of the permeable AHL molecule N-(3-oxohexanoyl)-L-homoserine lactone (3OC6-HSL) triggered intravesicular α-hemolysin expression and consequent membrane pore formation in liposome-based SCs, which allowed the release of glucose contained in the aqueous lumen. Glucose oxidation on the proteinosome membrane produced hydrogen peroxide, which in

turn converted a molecule into a fluorescent output by reacting with the HRP encapsulated in proteinosome. This study provides an example of molecular communication between two different types of artificial cell-like systems.

A recent report comes from our laboratory, and it deals with unidirectional SC–P. aeruginosa communication, based on the QS AHL signal molecule C4-HSL (Rampioni et al., 2018). In particular, SCs were prepared by encapsulating the PURE system inside GVs prepared by the droplet transfer method (Pautot et al., 2003; Fujii et al., 2014), together with butyryl coenzyme A and SAM as precursors, and a plasmid encoding for RhlI, the synthase for C4-HSL production. SCs produced C4- HSL (a natural QS signal molecule), which was perceived by P. aeruginosa both in liquid medium and in gel. In particular, P. aeruginosa modified its gene expression pattern in response to the C4-HSL-produced by SCs, demonstrating that reprogramming of gene expression in the bacterial cell is similar when interacting with other bacteria or with SCs. The entire TX-TL mechanism was assessed by rhlI mRNA and RhlI protein quantification, as well as by chemical identification of the C4-HSL signal produced by the SCs. The experimental results interestingly match with previously published numerical modeling (Rampioni et al., 2014), confirming the predictive power of in silico simulations in SCs research.

Finally, the Tan group reported an interesting study where SCs and bacteria engaged unidirectional communication in various ways (SCs to SCs, bacteria to SCs, and SCs to bacteria) (Ding et al., 2018). In this case, the QS signal molecule was produced via the EsaI synthase, and was perceived by the cognate EsaR receptor. Gene expression in SCs was triggered when binding of the QS signal molecule to EsaR led to derepression of an EsaRcontrolled promoter region. Quite interestingly, the authors designed SCs that produce an antimicrobial peptide (Bac2A) in response to QS signal molecules sent by bacteria—a proof of principle of the use of signal processing and actuation dynamics for the generation of SCs interfacing with natural cells. Moreover, SCs embedded in biofilms were also reported.

# DIRECTIONS AND CHALLENGES FOR FUTURE WORK

The works compared in **Table 1** represent proof-of-concept pioneer works that will likely stimulate further research to expand SC capabilities related to molecular communications. In this context, several challenges and open questions can be envisaged. Some refers to mechanistic, biochemical and biological aspects, others to the capability of engineering molecular communications.

With respect to the mechanisms of molecular communication, "sender" and "receiver" SCs mainly relied on transmembrane diffusion of signal molecules. This simple approach has been effective because some QS signal molecules, such as short-tail AHLs, can cross the lipid bilayer (Pearson et al., 1999). The generation of α-haemolysin pores is a drastic (yet effective) solution that has been used to bypass the low permeability of SC membranes when non free-diffusible signal molecules have been used (e.g., IPTG or glucose), but this causes the release of all the low-MW compounds contained inside SCs (the cut-off molecular weight value for the α-haemolysin pore is 3 kDa; Song et al., 1996). An alternative could be the use of DNA nanopores, whose properties are tunable by design (Krishnan et al., 2016). The future employment of more sophisticated import/export mechanisms based on membrane proteins will allow expanding the chemical repertoire of signal molecules secreted or perceived by SCs (e.g., peptides), thus increasing the communication capability and specificity. In this respect, ongoing progress on the functionalization of SC envelopes with integral membrane proteins is promising (see section Basic Principles on Liposome-Based SCs Operating via Gene Expression).

Looking at the biological partners of SCs for molecular communications, early studies focused on bacteria, since they are prone to genetic engineering and their intercellular communication systems have been thoroughly studied at the molecular level, especially in the case of QS systems. From a practical viewpoint, SC/bacteria communication is a technological platform for the long-term goal of interfering with bacterial populations and for therapeutic strategies that could be devised against infections. Indeed, the ability of SCs to drive gene expression in response to external cues envisages the generation of injectable SCs endowed with the ability to produce or release an antimicrobial compound only in response to a signal molecule produced by a bacterial pathogen. The study reported in **Table 1** by the Tan group (Ding et al., 2018) has provided a proof-ofprinciple that SCs can be generated which are able to kill bacteria by a mechanism triggered by the bacteria themselves.

Proving that SCs can communicate with eukaryotic cells is one of the next milestones, especially when nanomedicine applications are devised. This complex task could require the generation of SCs with internal operations that rely on eukaryotic signal synthesis or more complex signal reception machineries. The relevance of these approaches is that SCs could be employed as intelligent drug-delivery systems that perform a therapeutic action by extracting information from their microenvironment. As mentioned, the generation of SCs constitutively producing a tumor-killing protein (the Pseudomonas exotoxin A) has been recently described (Krinsky et al., 2018). Another task would involve enzyme replacement therapy (Itel et al., 2017). For example, SCs that consume excess phenylalanine could play a therapeutic role in phenylketonuria (Leduc et al., 2007). Notably, Thomas M. S. Chang proposed in a pre-liposome age the therapeutic use of enzyme-containing semi-permeable collodion capsules circulating in the bloodstream (Chang, 1964, 1972). The generation of SC interfacing via molecular communication with neural cells can also be imagined. The resulting hybrid bio/synthetic cell networks could also be exploited for innovative investigations of neural functions (Pinato et al., 2011).

Considering the engineering plan of networking SCs (or SCs and biological cells), the rigorous design of molecular communication channels requires a proper modeling of the physical and information levels. At the physical level stochastic diffusion plays a central role. This peculiar aspect is the ultimate limit of molecular communication (when compared to traditional electro-magnetic systems) because it is essentially a random process. Intercellular molecular communications rely on diffusion of chemical signals under a concentration gradient. They are, therefore, slow stochastic processes; their success depends on a number of factors, like the sender/receiver ratio, their spatial arrangement, the viscosity of the medium, and the temperature. Numerical models can be useful to understand the limiting factors and the constraints operating at this (inescapable) physical level (Nakano et al., 2011, 2013). The stochastic dimension of molecular communications affects its reliability. Facing with it represents an engineering challenge. The second aspect refers to the amount of information transmitted in the molecular communication "channel," and this is a theoretical issue. To apply classical information and communication theory to such a novel scenario, "information" should be defined with respect to the type of signal molecules, number of sent/received molecules, and time-dependent concentration profile (switchlike, pulse-like, etc.). Control theory for bottom-up synthetic biology should be delineated (Del Vecchio et al., 2016). Its peculiarity stems from molecular discreteness, random timing of sending/receiving, nature of "noise," etc.

In conclusion, SCs could significantly contribute to the origin of a very novel research field based on communication with biological cells. Thanks to their modular constructive principle, their biocompatibility and programmability, SCs of the type discussed in this review have the unique ability to act as passive carriers of hydrophilic and hydrophobic drugs, and to actively drive gene expression

#### REFERENCES


in response to chemical stimuli from other cells and from the environment.

At present, main challenges in this field rely on our capacity of (i) designing and build multi-functional SCs based on a proper genetic circuit and auxiliary molecular parts/devices, (ii) building homogeneous populations of SCs that are stable in biological fluids, (iii) and being able to control SC behavior even in a complex and fluctuating environment, such as a human host. All these challenges will probably be solved in the near future thanks to constant improvements in SC technology (in a broad sense, i.e., not necessarily restricted to liposomes). Along this path, there will be room for developing various systems in which in vitro usage will generate opportunities for understanding principles of biological systems and constructing short-term devices (e.g., biosensors).

#### AUTHOR CONTRIBUTIONS

PS conceived the research, all authors wrote the paper.

#### FUNDING

The authors acknowledge the Italian Ministry for Education, University and Research, MIUR-Italy for the FIRB2010 project No. RBFR10LHD1\_002, and for the Grant of Excellence Departments (Art. 1, c. 314-337, Legge 232/2016).


chemical messages that direct E. coli behaviour. Nat. Commun. 5:4012. doi: 10.1038/ncomms5012


dependence of membrane protein integration on vesicle volume. ACS Synth. Biol. 3, 372–379. doi: 10.1021/sb400094c


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Rampioni, D'Angelo, Leoni and Stano. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Improving Reproducibility in Synthetic Biology

#### Mathew M Jessop-Fabre and Nikolaus Sonnenschein\*

*The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Lyngby, Denmark*

Synthetic biology holds great promise to deliver transformative technologies to the world in the coming years. However, several challenges still remain to be addressed before it can deliver on its promises. One of the most important issues to address is the lack of reproducibility within research of the life sciences. This problem is beginning to be recognised by the community and solutions are being developed to tackle the problem. The recent emergence of automated facilities that are open for use by researchers (such as biofoundries and cloud labs) may be one of the ways that synthetic biologists can improve the quality and reproducibility of their work. In this perspective article, we outline these and some of the other technologies that are currently being developed which we believe may help to transform how synthetic biologists approach their research activities.

#### Edited by:

*Francesca Ceroni, Imperial College London, United Kingdom*

#### Reviewed by:

*Pablo Carbonell, University of Manchester, United Kingdom Manuel Porcar, University of Valencia, Spain*

> \*Correspondence: *Nikolaus Sonnenschein niso@biosustain.dtu.dk*

#### Specialty section:

*This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology*

Received: *15 November 2018* Accepted: *24 January 2019* Published: *11 February 2019*

#### Citation:

*Jessop-Fabre MM and Sonnenschein N (2019) Improving Reproducibility in Synthetic Biology. Front. Bioeng. Biotechnol. 7:18. doi: 10.3389/fbioe.2019.00018* Keywords: synthetic biology, automation, reproducibility, cloud lab, biofoundry

# INTRODUCTION

Science is reliant on the development of reproducible data. Despite the great importance of reproducibility, there have so far been few studies into the reproducibility of life science publications. One exception is within cancer biology, where it was reported that only 11% of landmark cancer studies could be reproduced (Begley and Ellis, 2012). In a survey of research scientists, 77% of biologists stated that they had tried and failed to reproduce someone else's result in the lab (Baker, 2016). And when asked "what percentage of published results in your field are reproducible?", biologists estimated that only 59% of results are reproducible.

Different measures have been taken to maintain high levels of rigor within the sciences. The process of peer-review is one of these measures and has so far served as a successful strategy. However, with the increasing volume, complexity, and detail of data that is being published, peer-review is arguably unable to ensure that published research is reproducible and other measures are needed prior to this step.

Synthetic biology aims to introduce engineering principles into the life sciences in the hope that it can greatly improve the reliability of the "Design-Build-Test-Learn cycle" (**Figure 1**). Any engineering discipline must be predictable and reproducible. In this perspective article, we discuss some of the ways the synthetic biology community is working towards achieving these goals.

# IMPROVEMENT OF EXISTING LABORATORY PRACTICES

One way to increase reproducibility is to increase the quality of protocol reporting. Experimental protocols are often specific to a laboratory or even to a single researcher, and are frequently taught to new lab members through a combination of practical guidance and a written protocol. This form of knowledge transfer is valuable in passing on tacit information (knowledge that is difficult to pass on in written form), but can lead to variation in the way experiments are performed. The passing on of written protocols without practical guidance can lead to even more problems, as written protocols are open to interpretation as they often contain ambiguities or rely on tacit knowledge (Miles and Lee, 2018). Online protocol editors, such as protocols.io, aim to improve the quality of protocols by creating a platform for scientists to easily share and edit protocol documents (Teytelman et al., 2016). Researchers are encouraged to upload detailed stepby-step instructions that can be followed, verified, and improved upon by researchers from other labs. In addition, videos can also be uploaded to capture some of the tacit information inherent in performing experiments. These protocols can then be linked to the methods sections of papers, and perhaps can help to improve the standardisation of common techniques and thereby improve experimental reproducibility.

Protocol editors combined with protocol management systems can provide a framework for labs to organise their research activities. The Aquarium software developed at the Klavins Lab is an example of such a system<sup>1</sup> . In this system a "designer" can accurately specify a protocol, which is then sent to the lab as a "job" to be performed by students, technicians, or robots. The system can be integrated with a laboratory information management system (LIMS) such as Benchling, enabling the use of materials to be tracked and giving trace-back of the items used for each experiment.

## AUTOMATION

For many years, laboratory automation has promised to revolutionise biology. However, the incorporation of traditional lab automation technologies (such as liquid handling robots) by academic labs has been slight. However, many of the techniques used within synthetic biology are performed on a routine basis, and as such may be desirable targets for automation. The successful implementation of automation technology can help to improve both the throughput and reproducibility of experiments, although the high cost and lack of flexibility of traditional lab automation has so far hindered their wide-spread adoption in academia. Recently though, there have been advances in automation technology that, when combined with new protocol sharing methods and protocol management systems, may allow for researchers to gain the benefits of automation.

The pharmaceutical industry has found particular favour with automation, as assay protocols are relatively simple to automate (Bogue, 2012). The high cash flow within large companies gives them the ability to invest in high quality and highly specialized equipment. This approach is unlikely to function well in an academic environment as the high costs and low flexibility of these operations are often prohibitive. Flexible and low cost solutions have generally not been offered by established vendors of liquid handling robots. Liquid handlers from vendors such as Tecan and Hamilton are state of the art, but expensive, large, and technically complex. In addition, operation of these machines usually requires skilled technicians, and can still take several months before they are operational. Interestingly, a recent Kickstarter campaign from Opentrons has aimed to address both the demand for a low cost and flexible liquid handler. The resulting OT-One robot has proved popular in both academia and industry, enabling Opentrons to establish itself in the field of lab automation. The more recent and advanced OT-2 liquid handler can be purchased from 4,000 USD, making it affordable to a wide range of academic labs. They have addressed the issue of flexibility through the adoption of open-source hardware and software. The relative ease of set-up and use of the Opentrons robots (programmable through a Python API, or soon through a graphical user interface) enables it to be used in a flexible manner by non-specialized workers. This means that many labs now have the opportunity to automate parts of their workflows and share their efforts throughout the community. The simple sharing of protocols will hopefully mean that other labs can easily test and verify the work of others.

Another issue that has been a problem for increased implementation of automation systems until recently, is that it has been difficult to link up devices from different manufacturers. Synthace is a start-up that aims to eliminate this issue by developing software that can control multiple robots from different vendors. An entire lab may be able to be controlled through a unified piece of software that allows for much simpler coordinated control of complex automation equipment.

Contrary to liquid handling robots, microfluidic technologies allow for the control of liquids on a microscopic scale and can also be used to automate and scale down many common laboratory procedures. Commercial microfluidic devices are becoming increasingly commonplace in the lab, with devices popular in the handling of DNA and RNA, such as fragment analysers used in next-generation sequencing workflows. The field of microfluidics is large and well-established, with over 3,000 articles published in 2017 alone (PubMed search for keyword: "microfluidic"). Microfluidics may provide a cheap and powerful alternative to traditional laboratory automation. Devices have been built that are capable of performing strain transformation and culturing, while other developments in the field have focused on DNA assembly (Ben Yehezkel et al., 2016; Gach et al., 2016; Linshiz et al., 2016). Even though microfluidic devices have been shown capable of automating protocols within synthetic biology, their adoption has been slow, and traditional liquid handlers remain the dominant method of automation, as the majority of laboratory automation has been designed to handle the array formats (such as 96 well plates) that are standard in the lab. Marrying microfluidic technologies to arraybased laboratory automation can prove difficult, and is perhaps one reason for the limited integration of microfluidics into automated workflows.

Tedious and repetitive tasks are highly error-prone for humans to conduct, and automation may be seen as a way to reduce such error (Yeow et al., 2014). However, careful consideration for the requirements of any automation system must be carried out before the purchase of equipment. Counterintuitively, robots may be slower and less accurate than humans under certain conditions. Zielinski et al. (2014) reported that their newly developed pipetting protocol had up to a 3x larger coefficient of variation and took twice as long to perform when carried out by a robotic liquid handler rather than by

<sup>1</sup>https://www.aquarium.bio

a human. Traditional liquid handlers are also limited in ways such as requiring "dead volumes" of reagents inside wells, which can increase the cost of experiments. To maintain high levels of accuracy, all automation equipment must undergo strict and frequent quality assessments (Chai et al., 2013). There is an additional investment of time required for the set-up and optimisation of new protocols, which when combined with the aforementioned issues can dissuade academic labs from purchasing such systems. However, as we discuss below, there have recently been developments that aim to give researchers easier access to automation.

# BIOFOUNDRIES

The issues mentioned in the previous section can prevent academic labs from investing in extensive automation. However, in the past few years there has been a movement to create fully or semi-automated labs termed "biofoundries." Biofoundries typically consist of a core laboratory that has been extensively automated to carry out a range of functions. These integrated systems are unique in their flexibility compared to standard automated pipelines. A user can submit a job (e.g., create a combinatorial library) to the biofoundry, where it will be scheduled before being carried out by the automated lab. Once a job has been successfully run, the user is sent (if applicable) the biological output of their job along with any experimental or log data that the user requires.

Much of the work conducted within synthetic biology revolves around DNA manipulation, and as such so does the work of many of the biofoundries. One example is the Edinburgh Genome Foundry; whose focus is on manufacturing long lengths of genetic material via several DNA assembly methods. Their approach differs from that of a standard laboratory by only requiring a minimum of human intervention. A range of standard equipment, such as thermocyclers and incubators, are linked together by a network of robotic arms that can carry plates and samples between instruments. They are able to construct and transform complex sections of DNA into host organisms and then conduct a range of experiments on the engineered strains. The minimisation of human intervention, and shared use of reagent resources helps to bring down costs, hopefully to a point where it becomes economically viable for many researchers to outsource parts of their construction workflows to facilities like this.

The UK and US are currently leaders in the development of biofoundries, with several found throughout both countries (Chao et al., 2017). Many of these are open to accepting jobs from outside their host university institutes with the goal to serve the wider scientific community (**Table 1**). A widespread use of these facilities for design construction within synthetic biology may help improve standardisation within the field, if the



*This is not a comprehensive list, but contains those centres and facilities that have accessible webpages.*

community of biofoundries agree upon the design and adoption of operational standards.

#### CLOUD LABORATORIES

A similar concept to biofoundries is the "cloud laboratory." Like a biofoundry, a cloud lab is usually comprised of a central facility that is heavily automated. However, the cloud lab gives a higher level of freedom to the customer by offering a real lab and its capabilities to biologists sat behind a computer in a remote location (Check Hayden, 2014). Two Californian companies have spearheaded this movement; Transcriptic, and the Emerald Cloud Lab. Both companies offer their extensive lab automation as a service, although each in a different way. Transcriptic has constructed a series of sterile work-cells. Within each work-cell is a set of basic lab equipment such as liquid handlers, thermocyclers, fridges, incubator, and a plate reader as well as a robotic arm to move materials between machines. Customers can control the actions within a workcell via an application programming interface (API). A user can create an experimental protocol through the high-level Python API, which then converts the user's protocol into the JSONformatted Autoprotocol language developed by Transcriptic to control their automation equipment (Miles and Lee, 2018). Users receive detailed results and diagnostics on their experiments, with troubleshooting on failed experiments made easier by the available metrics for all instruments on the executed run. The Emerald Cloud Lab has taken a different approach, where they have not broken down operations into discrete and identical work-cells, but have instead focused on offering a wider and more flexible range of services to the customer. The Emerald Cloud Lab has a stronger focus on the analytical side of biology through access to high-pressure liquid chromatography and mass spectrometry, among others. Cloud labs offer the chance for experimentalists to keep full control over their process, while also providing detailed and reproducible protocols. The hope is that academics will include these protocols within their publications, and share them online where others can use or modify them. Cloud labs may also enable early-stage biotech start-ups to test their early ideas and designs without the need for investing in setting up their own laboratory beforehand or entering into an incubator too early. However, there are still limitations with these technologies. Perhaps the most important is that it may be impossible to have any one facility that can cater to the needs of all biological researchers, since scientists make use of a wide array of instrumentation in their routine experiments. A recent review into cloud labs found that within biomedical research, 89% of published papers were found to contain one or more methods that could be carried out at a cloud lab (Groth and Cox, 2017). However, only 3% of papers had all of their methods supported by a cloud lab, suggesting that most complete workflows are as yet unable to be ported to a cloud lab. While it may not be necessary for a lab to implement complete workflows at a cloud lab, it may also prove disadvantageous to only perform a small part of their protocols (for example if it requires the frequent shipping of materials). There is therefore an inevitable judgement call that must be made to determine when it becomes beneficial to outsource a protocol.

# THE ROBOT SCIENTIST

The robust development of lab automation, whether inhouse or through a biofoundry or cloud lab, may lead to biologists automating parts of their hypothesis generation and experimental design cycle too. While both biofoundries and cloud labs implement instructions from a human operator, the robot scientist aims to go one step further by removing the scientist from the hypothesis generation and experimental design steps. An early pioneering effort in this field was the creation of a "Robot Scientist" named Adam (King et al., 2004, 2009). The physical design of Adam resembles that of a biofoundry, with several instruments handling the execution of a set of experimental techniques. Materials are moved between these instruments by three robotic arms, and the entire set-up is enclosed in a sterile plastic enclosure. The main difference to a biofoundry is that Adam has generated its own scientific hypotheses, which it then tests experimentally, resulting in the generation of new scientific knowledge in yeast genetics. Adam is not alone. The more recent Eve Robot Scientist has also generated novel scientific insights, this time revealing new drug candidates for the treatment of Plasmodium vivax (Williams et al., 2015). This marriage of automation and basic hypothesis generation and testing may prove to be valuable within synthetic biology, as a researcher can be left to generate higher level hypotheses, which are then interpreted and executed by an automated facility.

## STANDARDISATION

One of the key enablers of engineering disciplines is the adoption of a comprehensive set of standards (O'Connell, 1993). Such levels of standardisation are much more difficult to achieve in biological rather than electrical systems as there is a higher degree of uncertainty in the systems being engineered (Endy, 2005; Arkin, 2008; Canton et al., 2008). As an example, if one takes a regulatory element from one organism and places it in another organism then it is likely that the functionality of that element will differ. Even between different strains of the same species there is a large variation in how genetic elements are influenced by the host (Balagaddé et al., 2005; Cardinale et al., 2013). Synthetic biology requires standards that can cope with the issue of context dependency. Attempts to introduce standards, such as the biobrick standard, have been so far unable to fully deal with this issue (Decoene et al., 2018). However, promising work is being done to improve the orthogonality of biological parts (Stanton et al., 2013; Qian et al., 2017). Advances are also being made in to how biological parts are characterised and how orthogonality can be measured (Lucks et al., 2008; Mutalik et al., 2013; Ceroni et al., 2015). The creation of such standards may then help biofoundries and cloud labs to run standard building and characterisation protocols, that when combined with a developed set of operational standards can help to further increase the reliability and reproducibility of synthetic biology workflows. The standardisation of data practices is also just as vital, as the greatest of experimental results are worthless without the ability to learn from them, but such a discussion is out of the scope of this article [for a deeper discussion on data standardisation we refer the reader to Decoene et al. (2018)].

# CONCLUSIONS

One important question to address in synthetic biology is, how do we increase the predictability of our designed circuits and strains? Answering this question will have wide-reaching consequences for the field, but will require a shift in how synthetic biology is carried out in academia. Although synthetic biology has already broken the mould in many respects for how science is conducted at universities, deeper changes are needed in the funding and publishing infrastructures to guide the development of standardised practices.

In the meantime, recent developments in laboratory automation may help to improve the quality of experimental protocols, as machines must be programmed with precise and unambiguous instructions and protocols can be widely distributed and validated between labs around the world. The use of third-party facilities such as cloud labs or biofoundries can reduce some of the experimental variation between researchers while also reducing some of the burden. As we now order primers (and genes) instead of making them ourselves, perhaps we will soon too routinely order engineered strains. Robust lab automation holds the potential to bring computer-aided manufacturing (CAM) to synthetic biology by coupling computer-aided design (CAD) software for biological systems to the construction of cell factories, biological computers, and novel enzymes (Raman et al., 2009; Vasilev et al., 2011; Nielsen et al., 2016; Roehner et al., 2016; Cardoso et al., 2018).

There is an ever-growing demand for (high quality) data for machine learning applications, which requires a systematic approach to data generation, management, and sharing. Automation is capable of providing this approach as it can provide detailed logs of experimental runs and of the data acquisition from instruments. The data produced by academic labs needs to be made available to the greater scientific community in standardised formats, so that the greater community can learn from each other. Data collection and management is an area that requires a lot of attention. Although we have not had the space to offer a detailed discussion here, the synthetic biology community would greatly benefit from more focus on the standardisation of data practices. Traditional funding agencies are perhaps less likely to place emphasis on infrastructure, and so scientists are less incentivised to standardise their data practices.

# AUTHOR CONTRIBUTIONS

MJ-F and NS conceived the study, and contributed jointly to the research and writing of this article.

#### ACKNOWLEDGMENTS

This work was supported by a research grant (17683) from VILLUM FONDEN and the Novo Nordisk Foundation.

# REFERENCES


design to functional analysis. J. Biol. Eng. 10:3. doi: 10.1186/s13036-016- 0024-5


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Jessop-Fabre and Sonnenschein. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# New Applications of Synthetic Biology Tools for Cyanobacterial Metabolic Engineering

María Santos-Merino1†, Amit K. Singh1† and Daniel C. Ducat 1,2 \*

*<sup>1</sup> MSU-DOE Plant Research Laboratory, Michigan State University, East Lansing, MI, United States, <sup>2</sup> Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, United States*

Cyanobacteria are promising microorganisms for sustainable biotechnologies, yet unlocking their potential requires radical re-engineering and application of cutting-edge synthetic biology techniques. In recent years, the available devices and strategies for modifying cyanobacteria have been increasing, including advances in the design of genetic promoters, ribosome binding sites, riboswitches, reporter proteins, modular vector systems, and markerless selection systems. Because of these new toolkits, cyanobacteria have been successfully engineered to express heterologous pathways for the production of a wide variety of valuable compounds. Cyanobacterial strains with the potential to be used in real-world applications will require the refinement of genetic circuits used to express the heterologous pathways and development of accurate models that predict how these pathways can be best integrated into the larger cellular metabolic network. Herein, we review advances that have been made to translate synthetic biology tools into cyanobacterial model organisms and summarize experimental and *in silico* strategies that have been employed to increase their bioproduction potential. Despite the advances in synthetic biology and metabolic engineering during the last years, it is clear that still further improvements are required if cyanobacteria are to be competitive with heterotrophic microorganisms for the bioproduction of added-value compounds.

#### Edited by:

*Francesca Ceroni, Imperial College London, United Kingdom*

#### Reviewed by:

*Vijai Singh, Indian Institute of Advanced Research, India Anne M. Ruffing, Sandia National Laboratories (SNL), United States*

#### \*Correspondence:

*Daniel C. Ducat ducatdan@msu.edu*

*†These authors have contributed equally to this work*

#### Specialty section:

*This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology*

Received: *30 November 2018* Accepted: *05 February 2019* Published: *27 February 2019*

#### Citation:

*Santos-Merino M, Singh AK and Ducat DC (2019) New Applications of Synthetic Biology Tools for Cyanobacterial Metabolic Engineering. Front. Bioeng. Biotechnol. 7:33. doi: 10.3389/fbioe.2019.00033* Keywords: cyanobacteria, metabolic engineering, synthetic biology, genome scale models, photosynthesis

# INTRODUCTION

Over 1 billion years ago, cyanobacteria began to drive the rise of oxygen in Earth's atmosphere, setting the stage for the evolution of new forms of complex, multicellular organisms (Shih, 2015). More recently, these photosynthetic prokaryotes have come under increasing scrutiny for their potential to sustain the lifestyle of complex eukaryotic lifeforms, this time as potential bioproduction hosts. Cyanobacteria can efficiently harvest CO<sup>2</sup> as a carbon source, powering their metabolic processes by absorption of sunlight, the most abundant form of renewable energy. As fast-growing, and relatively simple bacteria, cyanobacteria hold the promise of being an ideal chassis for ambitious metabolic engineering projects. In recent years, cyanobacterial potential is being unlocked through the development of an increasing number of molecular tools (Berla et al., 2013; Carroll et al., 2018; Sun et al., 2018). These advances have been coupled with increasing capacities to manipulate endogenous genetic sequences and transfer exogenous DNA into multiple cyanobacterial strains (Berla et al., 2013; Cassier-Chauvat et al., 2016). Moreover, due to advances in sequencing technology, the genomes of >270 cyanobacterial species have been sequenced (Fujisawa et al., 2017), greatly facilitating the application of systems-level techniques, such as transcriptomics and proteomics. These technological improvements in the manipulation of cyanobacteria are set against the backdrop of increasingly refined tools that have been developed in the fields of metabolic engineering and synthetic biology, potentially setting the stage for cyanobacteria to significantly contribute as crop species in the Twentyfirst century.

Metabolic engineering can be defined as the practice of optimizing genetic and regulatory processes in cells to increase production of certain metabolic substances (Kumar and Prasad, 2011). This methodology has been used in cyanobacteria as well as other bacteria, to expand the list of products that they are able to make and increase production efficiency (Angermayr et al., 2015; Lai and Lan, 2015; Carroll et al., 2018). Traditional approaches to improve production of a specific compound were based on random mutagenesis and selection or the targeted introduction of individual genes and mutations, which requires considerable time to design and implement (Kumar and Prasad, 2011). Recently, systems metabolic engineering has emerged as new methodology for solving these issues (Nogales et al., 2013). It is based on the use of mathematic models to simulate and predict behaviors that emerge in complex systems, and has been used extensively for the improvement of microbial production (Lee et al., 2011).

In parallel, synthetic biology principles promote a bottom up approach to design biological systems, recombining defined parts or modules to restructure existing systems or build new pathways de novo (Sengupta et al., 2018). It is based on the construction of intricate biological systems using standardized, well-characterized, and interchangeable biological parts, or "modules" (Cheng and Lu, 2012). Collectively, these modules form "toolboxes" of components that can be used to modify an organism, including a catalog of characterized biological parts [e.g., promoters, ribosome binding sites (RBSs), riboswitches, terminator libraries], standardized methods for the assembly and manipulation of genetic components, and predictive models designed to facilitate pathway optimization (Gordon et al., 2016). Ideally, truly modular biological parts would be characterized in a manner that would allow researchers to accurately predict how they will function in the context of an interconnected system (Pasotti et al., 2012). In practice, it remains difficult to characterize parts in a manner that is completely transferrable to other organisms (Sengupta et al., 2018), because they are typically characterized by a specific research group in particular environment (Decoene et al., 2018). Another challenge to the transfer of parts is that biological components are usually non-orthogonal and may interact with genes, proteins, and metabolites of the chassis organism (Fu, 2013), or have their function influenced by host pathway components (Wang et al., 2013). Still, largely by using specific parts and strategies developed in model heterotrophic organisms, useful synthetic biology components have been recently adapted and debugged for use in cyanobacteria.

This review focuses on recent engineering tools and strategies developed for metabolic engineering of cyanobacteria as hosts for generating added-value compounds while using solar energy and CO<sup>2</sup> as inputs. We discuss the current challenges and opportunities for the application of synthetic biology principles in cyanobacteria. Finally, we highlight the potential for genomescale models as tools to assist cyanobacterial engineering.

# CYANOBACTERIA AS HOST FOR BIOMOLECULE PRODUCTION

Cyanobacteria stand out as one of the most promising candidates as hosts for bioproduction (Knoot et al., 2018). Because cyanobacteria utilize solar energy to fix carbon dioxide, a greenhouse gas, and can convert these reduced carbon products into valuable metabolites (Lau et al., 2015), they are especially attractive in an era where sustainable biotechnological processes are of increasing importance (Ruffing, 2011). Additionally, cyanobacteria possess a number of advantageous features relative to other photosynthetic organisms. In comparison to eukaryotic algae and plants, cyanobacteria are more genetically tractable (Parmar et al., 2011; Lau et al., 2015), grow more rapidly, and can achieve higher efficiencies of solar energy capture and conversion (Dismukes et al., 2008; National Academies of Sciences, 2018). Furthermore, cyanobacteria can be cultivated without the need for arable landmass or potable water supplies (Nozzi et al., 2013) and can potentially even degrade aquatic pollutants, such as aromatic hydrocarbons (Ellis, 1977; Cerniglia et al., 1979, 1980; Narro et al., 1992) and xenobiotics (Megharaj et al., 1987; Kuritz and Wolk, 1995) to remediate contaminated water supplies. Yet, relative to other photosynthetic organisms, especially plants, cyanobacteria are currently not used in many scaled agricultural or biotechnological applications. The underutilization of cyanobacteria stems partially from their relative novelty as crop species. Whereas, technologies for cultivating, harvesting, and breeding plants have been under extensive development for many millennia, comparable research to improve the prospects for cyanobacterial cultivation has largely been pursued only since the 1970s (Sheehan et al., 1998).

Cyanobacteria make a number of compounds that are comparable to food, fiber, and fuel products routinely acquired from plants, although cyanobacterial strains have not been as extensively modified to improve their compatibility for scaled cultivation. Like many genera of eubacteria, cyanobacteria can synthesize polyhydroxyalkanoates, a thermoplastic class of biodegradable polyesters that includes polyhydroxybutyrate (Quintana et al., 2011). Many cyanobacterial strains also produce a wide spectrum of secondary metabolites with high-value commercial properties, such as pigments, vitamins, amino acids, macrolides, fatty acids, lipopeptides, and amides (Lau et al., 2015). In total, cyanobacteria are estimated to have the capacity to produce around 1,100 secondary metabolites (specific cyanobacterial bioproducts are beyond the scope of this article, but are reviewed comprehensively in Dittmann et al., 2015; Salvador-Reyes and Luesch, 2015; Xiong et al., 2015, 2017). Beyond natural metabolites, engineering efforts have been used to redirect the metabolism of model cyanobacteria toward biosynthesis of heterologous bioproducts including alcohols, fatty acids, hydrocarbons, fatty alcohols, olefins, organic acids, sugars, and polyols (engineered cyanobacterial metabolites reviewed in Lai and Lan, 2015; Knoot et al., 2018). Finally, biomass derived from cyanobacterial production processes could be used in animal feed supplements or converted into organic fertilizers, especially if cells are engineered for optimal nutritional and nutraceutical content (Singh et al., 2016).

Relative to heterotrophic microbial host species, cyanobacteria possess key limitations that have kept them from being as widely adopted as bioproduction chassis. First, in comparison to many heterotrophic workhorses, such as the bacterium Escherichia coli and the yeast Saccharomyces cerevisiae, cyanobacteria have relatively slow division rates. In part, this is due to the relative energy density that is contained in solar energy as compared to a rich medium (Zhang, 2011), since the carbohydrates and other organic carbon sources in rich media have high levels of potential energy stored in their bonds (Utschig et al., 2011). The low energy density of sunlight also impacts cyanobacterial autotrophic productivity, and the specific productivity of target metabolites tend to be lower than that of heterotrophic microbes (Nogales et al., 2013). Nevertheless, some species of cyanobacteria have division rates that compete with that of industrial yeasts (Yu et al., 2015; Jaiswal et al., 2018) (see below). Secondly, the number of genetic tools available for cyanobacterial hosts continues to be limited relative to leading heterotrophic model organisms. Additionally, most cyanobacterial species are polyploid (Griese et al., 2011), which can complicate acquisition of fully segregated strains (Kelly et al., 2018), especially in the presence of restriction-modification systems in cyanobacteria that can limit transformation efficiencies (Stucken et al., 2013). Importantly, the technology for growing cyanobacteria at large scales is underdeveloped and low-cost bioreactors or other cultivation platforms systems such as open ponds need to be improved (Knoot et al., 2018). Bioreactors design must contend with the conflicting demands of scaling in two dimensions (to capture sunlight) while minimizing liquid volumes and reactor cost so that operational and capital expenses can be economically viable (Chisti, 2013; Nozzi et al., 2013; Acién et al., 2017). Genetic instability of heterologous pathways can also decrease bioproduction of cyanobacterial strains (Jones, 2014) and is increased by the abundance of repeated DNA motifs that lead to increased homologous recombination in cyanobacteria (Cassier-Chauvat et al., 2016). As outlined below, a number of research efforts have been directed toward overcoming cyanobacterial host limitations in recent years.

# Model Cyanobacterial Strains

While cyanobacteria are an extremely diverse phylum, a relatively small number of cyanobacterial strains have been selected as models, often because these strains have features that mitigate some of the limitations described above (**Table 1**). Notably, model workhorse cyanobacteria include Synechococcus elongatus PCC 7942 (hereafter S. elongatus PCC 7942), the first reported strain to be transformed through natural DNA uptake pathways (Shestakov and Khyen, 1970), and Synechocystis sp. PCC 6803 (Synechocystis PCC 6803), originally isolated in 1968 (Stanier et al., 1971), the first strain to have complex in silico models built for genome-scale prediction of metabolism (Fu, 2009). S. elongatus has been extensively used as a model of the circadian clock (Ditty et al., 2003), while Synechocystis PCC 6803 has served as a useful species for the investigation of core photosynthetic complexes due to its capacity to be grown under photoautotrophic, mixotrophic, or heterotrophic conditions (Vermaas, 1996). Nostoc sp. PCC 7120 (Nostoc PCC 7120) is a filamentous freshwater cyanobacterial strain which has been used extensively as a model to investigate cellular differentiation (Kumar et al., 2010). Nostoc PCC 7120 is capable of fixing nitrogen by forming heterocysts (Cai and Wolk, 1997), which are differentiated cells that efficiently catalyze the reduction of dinitrogen (Herrero et al., 2001), or be exploited for hydrogen production (Tamagnini et al., 2002). These three model species are arguably the best studied strains, but all have relatively modest doubling times: Nostoc PCC 7120 (14–15 h) (Callahan and Buikema, 2001), while S. elongatus PCC 7942 and Synechocystis PCC 6803 have doubling times around 7–12 h (Vermass et al., 1988; Mori et al., 1996). Furthermore, these established models have relatively limited capacity to withstand high light intensities and elevated temperatures that are expected to be encountered in outdoor bioreactors (Yu et al., 2013).

Driven by the desire for cyanobacterial models that are more amenable for bioindustrial applications, other cyanobacterial strains with much more rapid division times have emerged as important models. Significant efforts have been focused on Synechococcus sp. PCC 7002 (Synechococcus PCC 7002) in recent years, a unicellular cyanobacterium with a faster doubling time of ∼2.6 h (Ludwig and Bryant, 2012). Synechococcus PCC 7002 is also capable of growth in a variety of salt, temperature, and light conditions (Sheng et al., 2011; Ruffing et al., 2016), enabling the possibility of utilizing saltwater resources for growth media. More recently, still faster-growing strains have been reported, including Synechococcus elongatus UTEX 2973 (S. elongatus UTEX 2973) (Yu et al., 2015) and Synechococcus elongatus PCC 11801 (S. elongatus PCC 11801) (Jaiswal et al., 2018). Interestingly, although the growth rates of these strains are substantially faster (1.5–3 h), genome sequence and proteomic approaches have shown that they are exceptionally closelyrelated to the much slower-growing model, S. elongatus PCC 7942 (Mueller et al., 2017). Indeed, the genome of S. elongatus UTEX 2973 is 99.8% identical to S. elongatus PCC 7942 and differential regulation of a relatively small subset of common pathways largely accounts for the substantial growth differences (Abernathy et al., 2017; Mueller et al., 2017; Tan et al., 2018). Another related strain, S. elongatus PCC 11801 is ∼83% identical to S. elongatus PCC 7942 and shares some key modifications with that of S. elongatus UTEX 2973 that are responsible for its higher growth rates and increased tolerance to certain environmental stresses (Jaiswal et al., 2018). In a recent publication, Ungerer et al. identified five single nucleotide polymorphisms (SNPs) in three genes (atpA, ppnK, and rpaA) as responsible for rapid growth in S. elongatus UTEX 2973 (Ungerer et al., 2018) and these SNPs were also present in S. elongatus 11801. The atpA SNP yielded an ATP synthase with higher specific activity, the ppnK SNP encoded a NAD<sup>+</sup> kinase with significantly improved kinetics, and the rpaA SNPs caused broad changes



*Features of the most frequently-used model organisms for studies on cyanobacterial physiology and metabolic engineering. Size of primary genome, endogenous extrachromosomal plasmids, osmotolerance, and routinely utilized genetic transformation methods are reported.*

in the transcriptional profile. The filamentous cyanobacterium Leptolyngbya sp. strain BL0902 (Leptolyngbya BL0902) was also recently identified (Taton et al., 2012), can be transformed by conjugation (Taton et al., 2012) and some molecular tools have been characterized (Taton et al., 2014).

Curiously, while S. elongatus UTEX 2973 is closely related to S. elongatus PCC 7942, it can be only transformed by conjugation (Yu et al., 2015), while S. elongatus PCC 11801 is naturally competent for genetic transformation. Apart from Nostoc PCC 7120, the other models described above are naturally competent, facilitating the genetic modification of these strains (Porter, 1986; Koksharova and Wolk, 2002). Furthermore, the genomes of the model cyanobacteria described in **Table 1** have been sequenced and the information has been organized in the database, CyanoBase (Fujisawa et al., 2017). The access of genome information, together with the construction of metabolic models (see below), makes it possible to both understand the basic metabolism of cyanobacteria and to achieve higher levels of metabolic redirection and control (Lai and Lan, 2015).

#### NEW TOOLS FOR SYNTHETIC BIOLOGY IN CYANOBACTERIA

In comparison to the genetic toolboxes for work in popular heterotrophic chasses like E. coli and S. cerevisiae, relatively limited genetic tools have been developed in cyanobacteria (Sun et al., 2018). The design of standardized gene expression parts (e.g., promoters, terminators) that can be modularly recombined with other genetic elements with predictable outputs has catalyzed a revolution in the complexity of heterotrophic genetic circuits (Popp et al., 2017). Unfortunately, many of the synthetic biology tools and modular parts developed for model heterotrophs often do not perform as robustly in the context of cyanobacterial strains (Huang et al., 2010). For example, many characterized E. coli promoters do not display similar traits when used in cyanobacteria (Heidorn et al., 2011). These problems may arise in part because of species-dependent distinctions in key features, such as RBS sequences or promoter recognition by endogenous transcription, leading to unpredictability in gene expression (Huang et al., 2010; Wang et al., 2012; Camsund and Lindblad, 2014). In recent years, significant efforts have been made in adapting molecular techniques from other organisms in order to extend cyanobacterial toolboxes and facilitate cellular reprogramming for increased production yields. Here, we outline the latest advances in cyanobacterial synthetic biology tools, including promoters, riboswitches, RBSs, reporters, modular vectors, and markerless selection systems.

#### Engineering Promoters to Enhance Protein Expression

Many of the most advanced synthetic biology circuits and pathways today are firmly rooted in an extensive, well-defined library of promoter "parts" that exhibit predictable behaviors when used to drive the expression of a range of target genes (Shetty et al., 2008). By contrast, the number of constitutive and inducible promoters that have been well-characterized in cyanobacteria has been historically small, and considerable variation is often observed in the expression level achieved for distinct heterologous genes. Some limiting factors have compromised the development of cyanobacterial inducible genetic systems, including toxicity of inducers, leaky expression in the absence of inducer, and inducer photolability. Constitutive promoters may also be useful when continuous gene expression is desired, although many endogenous cyanobacterial promoters that are used for this purpose are dynamically regulated by circadian rhythms.

A few recent studies have focused on expanding and characterizing the collection of foreign promoters that can be used to drive gene expression in cyanobacteria. Some inducible promoter elements have been commonly used in selected cyanobacterial models for a number of years, such as the nickelinducible nrsB promoter (Englund et al., 2016; Santos-Merino et al., 2018) or the IPTG-responsive trc promoter (Ptrc) in S. elongatus PCC 7942 or Synechocystis PCC 6803 (Geerts et al., 1995; Huang et al., 2010). Ptrc has recently been adapted to other model cyanobacteria, such as Synechococcus PCC 7002 (Ruffing, 2014) and Leptolyngbya BL0902 (Ma et al., 2014). The Larabinose-inducible araBAD promoter (PBAD) that has long been a staple induction system in E. coli has recently been introduced in S. elongatus PCC 7942 (Cao et al., 2017) and Synechocystis PCC 6803 (Immethun et al., 2017). This system is based on the arabinose utilization network, which positively regulates PBAD through the AraC regulator protein (Schleif, 2010) (**Figure 1A**). Most recently, the use of the rhamnose-inducible rhaBAD promoter of E. coli has been implemented in the model freshwater cyanobacterium Synechocystis PCC 6803 (Kelly et al., 2018) (**Figure 1B**). Another orthogonal inducible promoter, Pvan from Corynebacterium glutamicum, that relies upon vanillateinduced suppression of the repressor VanR has been tested in S. elongatus PCC 7942 (Taton et al., 2017) (**Figure 1C**). Some of these promoters have been translated to fastergrowing cyanobacterial strains, such as Ptrc in Synechococcus PCC 7002 (Ruffing, 2014) and the IPTG-induced promoter Plac in S. elongatus UTEX 2973 (Song et al., 2016), although the number of inducible promoters remains relatively limited in the cyanobacterial strains that possess the most promising features for bioproduction.

In some model cyanobacteria, a limited number of constitutive promoters have been characterized relative to one another. For example, PA2520 and PA2579, two native promoters of Synechococcus PCC 7002, have been found to drive strong expression of heterologous genes (Ruffing et al., 2016). Most recently, nine native promoters have been characterized for their expression in Synechocystis PCC 6803 to enrich the promoter toolboxes (Liu and Pakrasi, 2018). Other examples of publications that characterize endogenous promoters in cyanobacteria can be found here (Huang et al., 2010; Markley et al., 2015). Yet, in cyanobacteria, many endogenous promoters are strongly influenced by the circadian clock machinery, and therefore they may not be truly constitutive through a 24 h period (Liu et al., 1995; Markson et al., 2013; Camsund and Lindblad, 2014).

Development of synthetic promoters that have been modified for improved expression specifically in cyanobacteria has emerged as another promising strategy. In S. elongatus PCC 7942, a tandem promoter composed of a truncated native promoter PR from rrnA of S. elongatus PCC 7942, and the consensus-σ70 PS promoter from E. coli has been designed (Chungjatupornchai and Fa-Aroonsawat, 2014). Zhou et al. reported the development of a constitutive promoter Pcpc<sup>560</sup> in Synechocystis PCC 6803 that produced a high-level of gene expression (Zhou et al., 2014). It is based on a truncated native promoter (PcpcB) and its strength resides in the presence of multiple transcription factor binding sites. In another study, Psca3−2, a variant of Ptac of E. coli was found to act as constitutive promoter with high levels of expression (Albers et al., 2015). Within the same context of synthetic constitutive promoters, a truncated version of psbA2 native promoter was developed in Synechocystis PCC 6803 (Englund et al., 2016). The expression of this derivate of PpsbA<sup>2</sup> increased 4-fold compared to the original. Similarly, inducible promoters can be tuned for better performance in cyanobacteria, such as the anhydrotetracycline-activated variant of PR40 from E. coli, PL03 (Huang and Lindblad, 2013), the IPTG-inducible promoter Psca6−<sup>2</sup> (Albers et al., 2015), or the T7 RNA Polymerase promoter (Ferreira et al., 2018). Because the promoters described above are modified and/or heterologous, it is possible that they could escape regulatory activities of the circadian clock, yet it is unusual for these promoters to be characterized in more than one circadian period, or at varied light intensities. To solve this problem, some promoters have been characterized for their capacity to drive heterologous gene expression under multiple conditions (e.g., light/dark, aerobic/anaerobic) such as the FNRactivated promoter, PO2, described in Synechocystis PCC 6803 (Immethun et al., 2016).

# Optimizing Ribosome Binding Sites for Biotechnological Applications

The rate of protein production from a mRNA transcript also depends on the strength of the RBS in recruiting ribosomes for translation. The position and sequence of a given RBS significantly influences translational efficiency. Although "RBS calculators" have long been under development for heterotrophic microbes (Salis, 2011), it is only relatively recently that such efforts extended toward development of RBS libraries for cyanobacteria. RBS sequences from the BioBrick Registry of standard biological parts have been characterized in Synechocystis PCC 6803 (Heidorn et al., 2011; Englund et al., 2016). In one recent example, 20 native RBS elements have been characterized in Synechocystis PCC 6803 (Liu and Pakrasi, 2018), including two previously described by Englund et al. (2016). These efforts are becoming increasingly coordinated with attempts to develop synthetic RBSs that are based on in silico modeling tools for Synechococcus PCC 7002 (Markley et al., 2015), Synechocystis PCC 6803 (Heidorn et al., 2011; Taton et al., 2014; Xiong et al., 2015; Thiel et al., 2018), or S. elongatus PCC 7942 (Taton et al., 2014); (Wang et al., 2018).

Efforts to expand the toolbox of characterized promoter elements are foundational contributions that will enable more sophisticated circuit design in cyanobacteria, yet the capacity to fully predict expression output from a given element remains elusive. Context-specific features of a given expression construct can alter the performance of a given promoter-RBS combination, decreasing the modularity of promoter elements. For example, it is well-established that secondary structure can arise between a specific heterologous gene that can interfere with transcription or translation, leading to variability when expressing different genes from an identical promoter-RBS cassette. Promoter elements that are based on a bicistronic design have been developed for E. coli that exhibit much more consistent performance, regardless of the downstream gene sequence that is being expressed (Mutalik et al., 2013). No such system has been described in cyanobacteria and few promoters have been as extensively characterized. Therefore, it often remains difficult to anticipate the likely expression level of a construct during the design phase.

FIGURE 1 | Regulatory mechanisms of inducible promoters recently introduced into cyanobacteria. (A) Arabinose-inducible gene expression from PBAD, which is both positively and negatively regulated by the transcriptional regulator AraC. In the absence of arabinose, an AraC dimer contacts the O2 and I1 half sites of the promoter favoring the formation of a DNA loop that is unfavorable for RNA polymerase binding. When arabinose is present, AraC undergoes a structural change that releases the DNA loop, and ideally positions an AraC activation domain that promotes gene transcription. (B) Rhamnose-inducible gene expression from P*rhaBAD*. In the absence of rhamnose RhaS is unable to dimerize, limiting its capacity to bind to promoter elements. In the presence of rhamnose, a dimer of RhaS binds to the I1 and I2 repeat half-sites of the *rhaBAD* promoter, recruiting RNA polymerase and activating the transcription of the target gene. (C) The vanillate-inducible promoter, P*van*, is repressed by a dimer of VanR bound to its operator. When vanillate is present, two molecules bind the VanR dimer, resulting in a conformation change and release of transcriptional repression. I1, binding site 1 in PBAD and P*rhaBAD*; I2, binding site 2 in PBAD and P*rhaBAD*; GOI, gene of interest; mRNA, messenger RNA; O2, operator of PBAD; OvanR, operator in P*van*; RNA pol, RNA polymerase. In all cases, left and right panel represent uninduced and induced promoters, respectively.

# Riboswitches as Tools for Robust Control of Gene Expression

Riboswitches are versatile tools for genetic engineering and allow control of gene expression by manipulating secondary structure within a mRNA transcript. A riboswitch is composed of an aptamer sequence that imposes a secondary-structural conformation on the mRNA and which influences translational efficiency from the transcript. The aptamer sequence has cisactivating or cis-repressing effects on the mRNA on which it is encoded, while a trans-acting factor can bind and encourage the formation of an alternative secondary-structure conformation (Nudler, 2006). Frequently, the aptamer sequence is designed to form a stem-loop structure that prevents the attachment of ribosomes to a 5′ RBS. A trans-acting factor (which can be a small molecule metabolite or a non-coding regulatory RNA) binds to the aptamer in a manner that stimulates a new conformational state and increases accessibility of the RBS, improving translation of the encoded protein. Riboregulators have features that make them powerful tools for controlling gene expression when used in tandem with transcriptional-based approaches. These include the fact that they typically exhibit a high degree of modularity, can drive the expression of proteins over a large, physiologically-relevant range (thus avoiding issues such as toxicity or inclusion body formation), minimize "leaky" expression (protein production in the absence of inductor), have fast response times (time between ligand binding and protein expression), are tunable (a large dynamic range of protein expression with increasing inducer), and can be used to regulate multiple genes simultaneously (Callura et al., 2010).

Relative to other microbial systems, the use of cyanobacterial riboregulators is just beginning to become more wide-spread and only a few riboswitch designs are in common use for metabolic engineering applications (Connor and Atsumi, 2010). A modified theophylline-dependent synthetic riboswitch was first reported in S. elongatus PCC 7942, allowing a strict regulation of protein production in this cyanobacterium (Nakahira et al., 2013). This effective riboswitch has since been implemented in other cyanobacteria such as Synechocystis PCC 6803, Leptolyngbya BL0902, Nostoc PCC 7120, and Synechocystis sp. strain WHSyn (Synechocystis WHSyn) (Ma et al., 2014; Armshaw et al., 2015; Ohbayashi et al., 2016). Similarly, Taton et al. have also developed NOT gate molecular circuits using transcriptional repressors controlled by theophylline-dependent synthetic riboswitches to downregulate gene expression in five diverse strains of cyanobacteria, including three model organisms, Nostoc PCC 7120, Synechocystis PCC 6803, and S. elongatus PCC 7942, as well as two recent isolates, Leptolyngbya BL0902 and Synechocystis WHSyn (Taton et al., 2017). A native cobalamin-dependent riboswitch has been reported in Synechococcus PCC 7002 (Pérez et al., 2016), although it remains unclear if this genetic tool can be implemented in other cyanobacteria that are not cobalamin auxotrophs. More recently, another native riboswitch, a glutamine aptamer, has been described in Synechocystis PCC 6803 (Klähn et al., 2018). However, it exhibited poor ligand affinity compared to the aptamers of other riboswitch classes. These newly-described native regulatory sequences open up the possibility of discovering new cyanobacterial riboswitches still not identified. Future approaches that focus on the combination of in silico structural analysis and in vivo genetic tuning of riboswitches that can be modulated by a wide variety of ligands hold the promise of contributing significantly to metabolic engineering efforts (Berens and Suess, 2015).

#### Reporter Proteins

Reporters allow for easy quantitation of gene expression, visualization of subcellular localization, and interaction of proteins with other cellular components (Berla et al., 2013). Fluorescent reporter proteins do not require additional substrates to be detected in vivo and there are an ample range of colors. However, the use of fluorophores can be complicated in cyanobacteria due to the autofluorescence of photosynthetic pigments. The phycobilins and chlorophyll molecules that are major components of the photosynthetic electron transport chain can cause competitive absorbance of excitation light source, reabsorbance of fluorescent or bioluminescent reporter emission, and signal interference (Yokoo et al., 2015; Ruffing et al., 2016). Because of strong chlorophyll autofluorescence, use of red fluorophores is not recommended, but many other reporters including GFPmut3B (a mutant of green fluorescent protein) and EYFP (enhanced yellow fluorescent protein) have been routinely used (Pédelacq et al., 2006; Huang et al., 2010; Yang et al., 2010; Heidorn et al., 2011; Huang and Lindblad, 2013; Landry et al., 2013; Cohen et al., 2014). Despite the frequent use of fluorescent proteins in cyanobacteria and the continual development of improved variants, further improvements to available reporters could greatly increase their range of applications (Rodriguez et al., 2017). Recently developed fluorophores with superior brightness, photostability, and quantum yield have great promise, including mOrange, mTurquiose, mNeonGreen, or Ypet (Chen et al., 2012; Ruffing et al., 2016; Jordan et al., 2017). In addition to fluorescent proteins, luciferase-based bioluminescence assays are routinely used, especially for tracking gene expression patterns throughout the circadian cycle or under different environmental conditions (Fernández-Piñas et al., 2000; Cohen et al., 2015). Luciferase reporters are ideal for monitoring gene expression because of the short half-life of the enzymes, which provides a readout that is close to real-time (Ghim et al., 2010). Alternatively, fluorophores can be modified with protein degradation sequences that greatly reduce their half-life (Wang et al., 2012), increasing their suitability as a transcriptional readout (Noguchi and Golden, 2017).

# Modular Vector Systems for Engineering Cyanobacteria

Vector systems based on the use of standard biological parts (http://parts.igem.org) and assembly schemes (e.g., BioBrick and BglBrick) were originally designed for organisms such as E. coli, B. subtilis, or yeast in order to increase the modular assembly of many different parts. By contrast, most standard genetic tools and vectors were developed for a specific cyanobacterial strain and generally, these tools have not been designed to be modular. In recent years, a few vector systems have been designed specifically to work in diverse strains and/or to contain a modular organization, which could facilitate standardization and characterization of component parts.

A notable modular vector system was described in 2014, using a range of plasmids designed to be compatible with a broad host-range (Taton et al., 2014). These plasmids included both autonomously replicating plasmids and suicide plasmids for gene knockout and knockin, and were characterized in diverse cyanobacterial strains to ensure their proper functioning. As a part of this work, the authors also created a web server, CYANO-VECTOR, that can assist in the in silico design of plasmids and assembly strategies. Similarly, chromosomal integration vectors carrying standard prefix and suffix sequences suitable for BioBrick-based cloning have been designed for Synechococcus PCC 7002 (Vogel et al., 2017) and S. elongatus PCC 7942 (Kim et al., 2017). In a recent article, a versatile system called CyanoGate based on the Plant Golden Gate MoClo kit and the MoClo kit for the microalgae Chlamydomonas reinhardtii was developed (Vasudevan et al., 2018). Vasudevan et al. demonstrated that the functionality of this system was robust across different two cyanobacterial species, Synechocystis PCC 6803 and S. elongatus UTEX 2973.

### Markerless Selection as a Tool to Facilitate Cyanobacterial Engineering

In cyanobacteria, as in other bacteria, metabolic engineering involving multiple genetic manipulations requires multiple selective markers. To make deletions, genes are normally replaced by antibiotic resistance markers to evaluate what phenotypic effects take place. However, the generation of a strain with numerous deletions is restricted using this method because the availability of resistance markers is limited. Alternatively, a range of markerless selection strategies has been developed to increase the number of modifications that may be performed to modify genetically these organisms.

The first markerless system described in the cyanobacterium S. elongatus PCC 7942 relies on a dominant streptomycin-sensitive rps12 mutation (Matsuoka et al., 2001). The method is based on a double selection cassette composed of a kanamycin resistance gene and, as an alternative negative selection marker, a rps12 wild-type copy that confers a dominant streptomycin sensitive phenotype. Streptomycin-resistant, kanamycin-sensitive markerless mutants can be recovered in a second transformation (Takahama et al., 2004). The main drawback of this method is the need to work in a genetic background that contains the appropriate rps12 mutation. Moreover, this strategy requires two, time-consuming transformation events and cloning of two different suicide vectors. Recently, new time-saving alternatives have been developed for markerless gene replacement in cyanobacteria.

Begemann et al. described a counter-selection method for Synechococcus PCC 7002 based on organic acid toxicity (Begemann et al., 2013). The system was based on the use of the product of the acsA gene, an acetyl-CoA ligase. The loss of AcsA function was used to develop an acrylate counterselection method. Another alternative counter-selection method was developed for Synechocystis PCC 6803, which involves the use of the endogenous nickel inducible promoter to drive an E. coli derived toxin gene known as mazF (Cheah et al., 2013). MazF is an endoribonuclease that acts as a global inhibitor for the synthesis of cellular proteins, because it cleaves mRNA at the ACA triplet sequence. A different markerless gene deletion system that only requires a single vector has also been described for Synechocystis PCC 6803 and uses a nptI-sacB double selection cassette (Viola et al., 2014). The nptI gene confers resistance to the antibiotic kanamycin, while expression of the sacB gene is toxic to bacteria grown on sucrose-containing media. Counterselection based on sacB is not functional in Synechococcus PCC 7002 (Zhang and Song, 2018), possibly because sacB selection is sensitive to salt (Kunst and Rapoport, 1995) required for growth of this marine cyanobacterium. A few markerless gene deletion systems have been shown to work in multiple cyanobacterial strains, for example Kojima et al. developed an efficient method for generating knockouts in Synechocystis PCC 6803 and Synechococcus PCC 7002 (Kojima et al., 2016). This system is based on knocking out the aas gene, an acyl-acyl carrier protein synthetase, and selecting the mutants by their free fatty acid tolerance.

The most recent markerless systems are based on CRISPRbased technology (**Figure 2**) that does not require any counterselection genes (Behler et al., 2018). In general, all CRISPR-Cas [clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein (Cas) system] technology relies on the capacity to target a protein (typically a nuclease) to a very precise genomic locus. This specificity is conferred by a single-guide RNA (sgRNA) is programmed to be a complement to the target genomic site, and which assembles with Cas9 or Cas12a (formerly known as Cpf1) into an effector complex (**Figure 2A**). Depending upon the CRISPR system, the effector complex may contain other RNA sequence: the CRISPR-Cas9 system requires two separate RNA strands, the CRISPR RNA (crRNA) that encodes the guide sequence, and trans-activating crRNA (tracrRNA), while CRISPR-Cas12a requires only a single crRNA (**Figure 2A**). Most cyanobacterial genomes naturally contain CRISPR-Cas repeat sequences (Cai et al., 2013) useful in defending the cell against foreign genetic material as the Cas nuclease can be directed to cleave sequences that are specific to an exogenous source (e.g., viral DNA).

The specificity of Cas9 and Cas12a targeting has been repurposed to design markerless genome editing systems, and other genetic control elements (Li et al., 2016; Ungerer and Pakrasi, 2016; Wendt et al., 2016; Niu et al., 2018; Ungerer et al., 2018; Xiao et al., 2018). When directed to a DNA target within the cyanobacterial genome, the double-stranded DNA breaks induced by the CRISPR effector are lethal unless they can be repaired in a manner that alters the DNA so that it is no longer recognized by the sgRNA (**Figure 2A**) (Behler et al., 2018). Double-stranded DNA breaks engage cyanobacterial DNA repair machinery, which can resolve the damage by error-prone nonhomologous end joining or by homologous recombination if a suitable template is available. To induce an inactivating mutation in a target gene it is often sufficient to express Cas9 or Cas12a along with a gene-specific sgRNA and to rely upon the error in the genetic repair systems to introduce point mutations and frameshifts that will inactivate the gene without the need for a selectable marker. More advanced genome editing (e.g., knockout of specific regions or insertion of new DNA at the target locus) can be accomplished through homologous recombination if a suitable template is also introduced into the cell (see **Figure 2B**). This markerless methodology has been used to introduce point mutations, knock-out large genomic regions, and "knock-in" genes in a range of cyanobacterial species, including Synechocystis PCC 6803 (Xiao et al., 2018), S. elongatus UTEX 2973 (Ungerer and Pakrasi, 2016; Wendt et al., 2016), S. elongatus 7942 (Li et al., 2016; Ungerer et al., 2018), and Nostoc 7120 (Ungerer and Pakrasi, 2016; Niu et al., 2018).

The CRISPR-Cas system is still under active development and some limitations may need to be overcome before it becomes a methodology that completely replaces more traditional cyanobacterial genome engineering techniques. One limitation is that colonies recovered from CRISPR-mediated transformations can have a low penetrance of the desired genomic alteration. For example, early reports of CRISPR editing in cyanobacteria have shown between 20 and 70% of recovered strains are the desired mutant (Li et al., 2016; Ungerer and Pakrasi, 2016; Wendt et al., 2016; Xiao et al., 2018). The low efficiency of some transformations requires screening and validation of a higher number of recovered colonies to obtain the correct strain. It may be possible to overcome this limitation by encoding two spacers that target the genomic region rather than one (see spacer depiction in **Figure 2B**), as shown in recent report in Nostoc PCC 7120 (Niu et al., 2018). The expression of Cas9 also appears to be toxic in a dose-dependent manner in some cyanobacteria, such as S. elongatus UTEX 2973 (Wendt et al., 2016), although this toxicity is not apparent in others (Xiao et al., 2018), even in some closely-related species (Li et al., 2016). While the mechanism by which Cas9 causes toxicity remains unclear, it is possible to substitute Cas12a for Cas9 to bypass the issue in many species (Ungerer and Pakrasi, 2016; Swarts and Jinek, 2018). Yet, such uncertainties also contribute to the concern that CRISPRmediated techniques can lead to alter off-target genomic sites that could lead to misinterpretation of observed phenotypes (Fu et al., 2013). Strategies to minimize off-target CRISPR-Cas activity have been extensively explored in other organisms, but have not been rigorously evaluated in cyanobacteria. Finally, although genomic modifications introduced by CRISPR are themselves markerless, the sgRNA and nuclease themselves are often introduced on plasmids that require selectable markers (**Figure 2B**), and following genome editing, it is often desirable to cure the plasmid through some form of counter-selection (Xiao et al., 2018). This can make the process of genome editing by CRISPR-Cas considerably longer than traditional selectionbased approaches.

are selected on BG11 agar plates supplemented with kanamycin. The positive colonies are streaked on BG11 agar plates to cure the plasmid and they are assayed to check the loss by the inability to grow on kanamycin-containing media. Depicted system is modeled after the one described by Ungerer and Pakrasi (2016).

CRISPR-based applications reach beyond markerless genome engineering to a wider range of applications. Briefly, nucleasedead variants of Cas9/Cas12a do not generate DNA breaks, but can still be targeted to specific DNA sequences via sgRNA can be fused to other functional domains (e.g., transcriptional activator or repressor domains). This creates hybrid Cas proteins that bind to endogenous sequences, but which can perform other functions, such as regulating gene expression. Recently, these approaches have been used to create synthetic transcription factors able to modulate the expression of essential genes and key metabolic pathways in cyanobacteria. For a more comprehensive review of CRISPR-based applications in cyanobacteria, see (Behler et al., 2018).

# IMPROVING CARBON CAPTURE AND DIRECTING CARBON FLUX THROUGH RATIONAL ENGINEERING APPROACHES

Engineering cyanobacterial strains for bioproduction requires that not only heterologous genes and pathways be expressed in new hosts, but also that carbon that is captured by cyanobacteria be efficiently directed toward the desired metabolic products (Stephanopoulos, 2012; Woo, 2017). Thus, it is a prerequisite for metabolic engineers to optimize pathways to increase the total carbon flux toward target product generation, and/or to enhance the pool size of rate-limiting metabolites (Kanno et al., 2017; Carroll et al., 2018). Although the regulation of carbon partitioning in cyanobacteria is not fully understood, it can be diverted under certain conditions like nutrient deprivation and irradiance stress, and therefore stress conditions are routinely used in studies that aim to increase product yield (Woo, 2017). New predictive tools that can assist in channeling fixed carbon to desired metabolic pathways have been in active development. These tools include models for metabolic analysis and flux balance analysis and will be described in greater detail below. Beyond channeling carbon flux, another strategy is to improve the total pool of available carbon, and therefore multiple efforts to improve carbon fixation rates of cyanobacteria have been pursued.

#### Improving Carbon Fixation Rates

Many efforts to engineer cyanobacteria for enhanced productivity have emphasized increasing the total rate of carbon fixation, most frequently by improving the activity or efficiency of RuBisCO. Cyanobacteria have highly efficient carbon concentration mechanisms (CCM) relative to other phototrophs such as algae and plants (Price et al., 2011). The cyanobacterial CCM is efficient in part because it uses bicarbonate transporters to actively transport bicarbonate into the cell, which effectively overcomes the slower (10<sup>4</sup> -fold) diffusion rates of CO<sup>2</sup> in water compared to air (Price et al., 2011). Ultimately, the accumulated bicarbonate is transported across the carboxysome shell, and converted to CO<sup>2</sup> in the carboxysome lumen, where RuBisCO is concentrated (Badger et al., 2002; Price et al., 2008). Increasing the efficiency of the cyanobacterial CCM could not only increase the concentration of RuBisCO's substrate, but also reduce the production of energetically-costly photorespiratory byproducts. Recently, extra bicarbonate transporters were expressed in Synechocystis PCC 6803, leading to a 2-fold enhancement of the growth rate and a higher amount of biomass accumulation (Kamennaya et al., 2015).

Other efforts have focused upon improving carbon fixation rates by balancing the enzymatic activities in the Calvin-Benson-Bassham (CBB) cycle to improve its total metabolic flux. Liang and Lindblad have demonstrated that overexpression of any of four separate enzymes in the CBB cycle (RuBisCO, Sedoheptulose bisphosphatase, fructose bisphosphate aldolase, or transketolase) can improve total carbon fixation rates in Synechocystis (Liang and Lindblad, 2016) (**Figure 3**). Furthermore, modifying the expression of these CBB enzymes could also enhance the rate of heterologous production of ethanol in Synechocystis PCC 6803 (Liang et al., 2018). Kanno et al. have recently described a novel strategy to improve carbon fixation rates by focusing upon importing a continuous supply of substrate for RuBisCO. By expressing glucose transporters, they successfully used the oxidative pentose phosphate pathway to convert some imported sugars for the synthesis of D-ribulose 1,5 bisphosphate. Using this approach, they demonstrated that an engineered strain was able to produce a targeted biochemical, 2,3-butanediol, under both light and dark conditions (Kanno et al., 2017).

# Improving Carbon Fixation Through Sink Engineering

While it is frequently assumed that the rate limiting metabolic step of carbon fixation is directly related to the slow catalytic activity of RuBisCO, photosynthetic activity can be limited by other metabolic steps. Utilization of the primary products of the CBB cycle can limit step photosynthesis rather than the fixation of carbon dioxide itself. This is concept was first explored in plant models, where the mechanisms underlying triose phosphate utilization (TPU) limitation are more deeply characterized. In plants, there is a physical separation between where primary products of the CBB are generated (source tissues, e.g., leaves) and where much of these products will ultimately be utilized to support metabolic activity and growth (sink tissues; e.g., roots). Efficient operation of photosynthetic metabolism requires the balance of light energy ("source") that can be highly dynamic in the environment with an equivalent capacity to utilize/dissipate this energy using anabolic metabolism or quenching mechanisms ("sinks"). Without absorption of adequate photons (i.e., low "source"), a photosynthetic organism will starve; and without sufficient pathways to process or dissipate absorbed energy (low "sink"), end products of photosynthesis can accumulate leading to feedback inhibition and overreduction of the electron transport chain (ETC) (Gifford et al., 1984; Paul and Foyer, 2001). TPU limitation on photosynthesis can occur either because of experimental manipulation (e.g., exogenously supplied carbohydrates, or chemical inhibition of carbohydrate transport/catabolism), or because the sum of metabolic processes downstream the CBB are insufficient to remove triose phosphates at an equivalent rate as they are being generated (Sawada et al., 1986; Sharkey et al., 1986; Krapp et al., 1991; Paul and Foyer, 2001; Adams et al., 2013; Demmig-Adams et al., 2014).

While a number of published reports in plants have sought to enhance biomass accumulation by reducing TPU limitation, relatively few studies exist that suggest that cyanobacterial photosynthesis may also be limited by downstream metabolism. Much of our current knowledge on sink limitation in cyanobacteria is indirect, where researchers have found the activity of a heterologous metabolic pathway expressed in a cyanobacterial model leads to increased photosynthetic activity

and/or quantum efficiency (Ducat et al., 2012; Oliver et al., 2013). For example, expression of transporters that allow export of a number of bioproducts in cyanobacteria can lead to increases in photosynthetic activity relative to a parental line that lacks the production pathway. Specifically, some degree of enhanced photosynthetic activity has been in diverse cyanobacterial strains engineered to export sucrose (Ducat et al., 2012), isobuteraldehyde (Li et al., 2014), 2,3-butanediol (Oliver et al., 2013), or ethylene (Ungerer et al., 2012) (**Figure 3**; red text and arrows).

Our group has recently reported a more comprehensive analysis on the photosynthetic effects that occur following activation of a heterologous carbon sink (i.e., sucrose production and secretion) in S. elongatus PCC 7942 (Abramson et al., 2016). In this analysis, we found that photosystem II and photosystem I activities were significantly increased within hours of activating the heterologous sucrose secretion pathway. The quantum efficiency of photosystem II transiently increased following sucrose export and photosystem I activity became less constrained by acceptor-side limitations (i.e., the ability of electron carriers such as ferredoxin to remove the excited electrons generated at the reactive chlorophyll pair of photosystem I; Abramson et al., 2016) and total CO<sup>2</sup> fixation rates increased (Ducat et al., 2012). Taken together, these results suggest that overall electron flux through the ETC is enhanced following activation of a heterologous export pathway, suggesting that the endogenous metabolism of S. elongatus PCC 7942 can be insufficient to completely utilize the products of the light reactions under standard laboratory conditions. Similar studies have shown that heterologous electron sinks also have potential to enhance photosynthesis. When a mammalian cytochrome P450, CYP1A1, was expressed in Synechococcus PCC 7002 it was able to utilize reductant from the ETC to catalyze a desired monooxygenation reaction and introduction of this pathway was also associated with an improved photosynthetic efficiency and increased electron flow rate by up to ∼30% (Berepiki et al., 2016).

From the above discussion, it appears that at least slowergrowing strains of cyanobacteria may be limited by their capacity to utilize products of the CBB cycle. However, it remains unknown if the fastest–growing strains of cyanobacteria (e.g., S. elongatus UTEX 2973 and S. elongatus PCC 11801) will also exhibit similar increases in photosynthetic flux when engineered to export bioproducts. Current evidence suggests that a greater flux of carbon is allocated away from storage products (e.g., glycogen) and instead invested in cell growth and light-harvesting/carbon-fixation machinery in fast-growing cyanobacteria (Mueller et al., 2017; Zhang et al., 2017; Jaiswal et al., 2018). It is possible that strains with naturally high growth rates will experience less photosynthetic limitations due to overaccumulation of CBB end products, and therefore not exhibit similar enhancements in photosynthesis upon activation of a heterologous metabolic sink.

## PREDICTING AND ENGINEERING CYANOBACTERIAL METABOLISM VIA GENOME-SCALE MODELS

Genome-scale models (GSMs) are large-scale stoichiometric models that describe metabolic pathways as stoichiometric coefficients and mass balances of participating metabolites, and are simulated using numerical optimization (Kim et al., 2016). The ultimate goal of these metabolic reconstructions is to give a comprehensive explanation of all biochemical conversions taking place within a living cell or organism, including transport and non-enzymatic reactions (Steuer et al., 2012). Due to ease of implementation and relatively high predictive power, modeling approaches have been used as tools to assist metabolic engineering and production strain development (O'Brien et al., 2015). Computational modeling methods based on the use of GSMs complement experimental research and give a powerful tool to rapidly generate and prioritize testable hypotheses that can be used to guide subsequent experimentation (Dreyfuss et al., 2013). On the other hand, GSMs also provide potential mechanistic explanations for the results obtained in the laboratory (O'Brien et al., 2015).

Using the genome sequence of an organism, a draft GSM can be relatively easily compiled (Thiele and Palsson, 2010). Standard procedures have been detailed in the literature, to generate high-quality GSM reconstructions (Feist et al., 2009; Thiele and Palsson, 2010). Moreover, many steps of the reconstruction process have been successfully automated by several software programs (Hamilton and Reed, 2014). This progress has allowed the fast reconstruction of draft GSMs of multiple species (Kim et al., 2016) (**Figure 4**, Central panel). However, some manual evaluation and curation is required to ensure a high-quality reconstruction (Hamilton and Reed, 2014).

# An Overview of Metabolic Models Developed for Cyanobacteria

A number of GSMs of phototrophic organisms have been published in the last decade, but they are still underrepresented in comparison to heterotrophic microorganisms (Gudmundsson et al., 2017). At the time of this writing, genomes for >270 cyanobacterial species have been sequenced (Fujisawa et al., 2017). However, a limited number of cyanobacterial reconstructions have been made available and preliminary models have only been refined for only a handful of species (**Table 2**). The earliest metabolic reconstructions were based on biochemical data, focusing mainly on central carbon metabolism and photosynthetic pathways (**Figure 4**, left panel). However, the recent advances in genome sequencing have allowed the generation of metabolic models at the genome scale. A general naming convention used to describe in silico models has been proposed with the form "iXXxxx"; where "i" refers to an in silico model, "XX" are the initials of the person who developed the model and "xxx" the number of genes included in the model (Reed et al., 2003). However, many of the published GSMs do not follow this rule. Moreover, most of them lack the universal metabolite and reaction conventions in the network model. This lack of consistency impedes direct information extraction between different models; maintenance and adherence to a universal standard would greatly improve the updating and curation of GSMs. The inconsistent nomenclature is a key bottleneck in the speed of reconstruction of new high quality GSMs (Kumar et al., 2012).

Among cyanobacteria, Synechocystis PCC 6803, is the most extensively studied and well-modeled cyanobacterium, with a total of 12 GSMs (**Table 2**). Network reconstruction is an iterative process and the most robust models are generally created by gradually expanding and updating a prior draft when new data and tools are available (Gudmundsson et al., 2017). This is the case of GSMs: iSyn811 (Montagud et al., 2011), iSyn731 (Saha et al., 2012), iHK677 (Knoop et al., 2013), and imSyn716 (Gopalakrishnan et al., 2018). While distinct cyanobacterial species have unique characteristics, there are common pathways and core carbon metabolic processes that tend to be described in most GSM models. These common pathways can be schematically decomposed as: photosynthesis (to produce ATP and NADPH and fix inorganic carbon), glycolysis (to produce ATP, NADH, and generate precursor metabolites), the citric acid cycle (to produce other precursor metabolites), oxidative phosphorylation (to produce ATP), the pentose phosphate pathway (to produce reducing equivalents and precursor metabolites), carbohydrate synthesis and triacylglycerol synthesis (to build cell walls and store carbon), and inorganic nitrogen assimilation (to produce proteins, DNA, RNA, chlorophyll, and other secondary metabolites using the relevant pertinent precursor molecules) (Baroukh et al., 2015).

Unlike obligate heterotrophic microorganisms, cyanobacteria can utilize light and inorganic carbon, in addition to organic compounds, for the generation of energy and metabolic precursors. The complex mechanisms of light capture makes it difficult to represent it as a simple biochemical reaction in a metabolic networks (Baroukh et al., 2015). Accurate modeling of phototrophic metabolism requires a new level of detail, including modeling the process of light harvesting and electron transport through a variety of possible pathways. Some advances in this area have been achieved in recent years. Nogales et al. proposed a modeling approach for photosynthetic electron flow pathways in detail in Synechocystis PCC 6803, including many cyclic electron flow and accessory pathways, enabling the study of photosynthetic processes at the system level (Nogales et al., 2012). In an updated representation of the GSM of Synechocystis PCC 6803, the role of photorespiration in cellular growth and the peculiarities of photosynthetic reactions such as light-dependent oxidative stress were integrated (Knoop et al., 2013). Most recently, the development of an approach to incorporate light absorption that factors in the effects of cell shading was achieved in S. elongatus PCC 7942 by modeling light as a metabolite (Broddrick et al., 2016). On the other hand, Qian et al. incorporated a lightdependent PSI/PSII electron transport rate algorithm in a GSM of Synechococcus PCC 7002, which allowed simulations of photoautotrophic growth at different light intensities (Qian et al., 2017). In a new GSM of Synechocystis PCC 6803, an unconstrained photo-respiratory reaction and a mechanism to

non-essential but in reality, it is essential (the fourth case). However, new improvements could further improve GSM predictive ability, including the integration of—omics datasets and the improvement of photosynthesis and photorespiration modeling. A long-term goal would be the development of genome-scale kinetic models, which might be expected to provide more accurate metabolic predictions. Logos for MetaCyc, TransportDB, and CyanoBase are used with permission.

account for changes in energy absorption from light at different wavelengths have been developed (Joshi et al., 2017). To facilitate a better understanding of respiratory and photosynthetic interactions, these authors included features to model known molecular mechanisms of the photosynthetic network around the thylakoid membrane.

A canonical test useful to benchmark and validate the accuracy of GSMs is to examine their capacity to predict essential genes of the metabolic network (Becker and Palsson, 2008) (**Figure 4**, central panel).Gene essentiality prediction has been successfully used in several bacteria such as E. coli (Suthers et al., 2009; Orth et al., 2011) and Pseudomonas putida (Nogales et al., 2008, 2017). The quality of two recent cyanobacterial GSMs has been assessing using gene essentiality datasets: iJB785 of S. elongatus PCC 7942 (Broddrick et al., 2016) and iSynCJ816 of Synechocystis PCC 6803 (Joshi et al., 2017). In the first case, 78% of genes were correctly assigned as either essential or non-essential based on data of essentiality in vivo obtained from previous dense-transposon mutagenesis experiments (Rubin et al., 2015). In the second case, the new GSM of Synechocystis PCC 6803 was able to predict gene deletions with 77% accuracy, based on a qualitative growth comparison of 167 gene-deletion mutants with experimental studies obtained from online databases and a detailed literature search (Joshi et al., 2017).

There are several approaches to improve the accuracy of GSMs. First, the use of data derived from high-throughput growth phenotyping experiments (e.g., knockout mutant strains grown in various media conditions) is very useful for validating and refining metabolic network reconstructions (Gawand et al., 2013). Secondly, integrating of transcriptional regulation has proved to be a vital alternative to build improved models and to investigate the capabilities of reconstructed metabolic networks (Vivek-Ananth and Samal, 2016). Finally, the integration of multi-omics datasets have been used in a number of instances to improve the accuracy of a GSM (Kim and Reed, 2014).

## Tools to Improve GSM Quality and Reconstruction

Manual curation is one the most important steps after the generation of the initial draft of a GSM. The process of manually

#### TABLE 2 | GSMs described for cyanobacterial strains.


reconstructing GSMs is complex and requires arduous and time-consuming curation, without which the model quality remains low (Machado et al., 2018). The manual review is required to reflect the true metabolic capabilities of the target organism (Gudmundsson et al., 2017) and relies heavily on experimental, organism-specific information (Thiele and Palsson, 2010). This step involves the examination of the mass and charge balance of individual reactions, gene associations of reactions and reaction directionality (Gudmundsson et al., 2017). This task is typically addressed using the information stored in many public databases such as CyanoBase (Fujisawa et al., 2017), CyanoEXpress (Hernandez-Prieto and Futschik, 2012), KEGG (Kanehisa and Goto, 2000), MetaCyc (Caspi et al., 2006), SEED (Overbeek et al., 2014), BRENDA (Schomburg et al., 2002), and TransportDB (Ren et al., 2004) (**Figure 4**, central panel). To solve this bottleneck, several tools for rapid automated reconstruction of GSMs are currently available, each offering different degrees of trade-off between automation and human intervention. Some of these tools haven been previously reviewed (Faria et al., 2018; Machado et al., 2018), but only a small number of them have been used in cyanobacteria.

In many cyanobacteria, a high percentage of proteins are annotated as "unknown function" or "hypothetical protein," e.g., in Synechocystis PCC 6803 up to 60% (Lv et al., 2015). Gene-protein-reaction (GPR) relationships define the association between genes, metabolic enzymes, and the biochemical transformations that they perform (Thomas et al., 2014) (**Figure 4**, central panel). GPR determines the set of metabolic reactions encoded in the genome and provides a mechanistic link between genotype and phenotype (Machado et al., 2016). The inclusion of GPRs within GSMs is essential to improve the quality of GSMs, with the aim of improving their phenotypic predictions (Krishnakumar et al., 2013). New methodologies have been applied to automate the process of adding GPR associations to cyanobacterial GSMs, such as SHARP (Systematic, Homologybased Automated Re-annotation for Prokaryotes). SHARP is a novel PSI-BLAST-based methodology GPR association of the metabolic enzymes involved in prokaryotes, which has been used in cyanobacteria (Krishnakumar et al., 2013). These authors were able to predict 3,781 new GPR associations for the 10 prokaryotes considered, eight of which were cyanobacterial species. These new GPR associations allowed them to annotate gaps in metabolic networks, and to discover several pathways that may be active, thereby providing new directions for metabolic engineering of cyanobacteria.

Another bottleneck during GSM reconstruction is the problem of gap-filling. Metabolic reconstruction via functional genomics (MIRAGE) has been developed specifically to address this problem (Vitkin and Shlomi, 2012), and searches for reactions that are missing from reconstructions purely based on enzyme homology, but where functional genomic data suggests the reactions are present. MIRAGE performance was directly tested by applying it to the reconstruction of a network model for Synechocystis PCC 6803, and was then successfully validated against an existing, manually-curated model for this cyanobacterium. Vitkin and Shlomi compared the reconstructed network model for Synechocystis PCC 6803 generated by MIRAGE with the manually curated models of Knoop et al. (2010) and iSyn811 (Montagud et al., 2011), and found a predictive precision of 70% in the first case and 37.5% for iSyn811. MIRAGE was also applied to reconstruct GSMs for 36 sequenced cyanobacteria, including some model cyanobacteria such as S. elongatus PCC 7942 and Synechococcus PCC 7002 (Vitkin and Shlomi, 2012). However, it has been demonstrated that this methodology was not useful to develop a GSM for the cyanobacterium Cyanothece sp. PCC 7424 (Mueller et al., 2013). For example, MIRAGE analysis generated a GSM that contained menaquinone and ubiquinone, compounds shown to not exist within Cyanothece PCC 7424 (Collins and Jones, 1981). Moreover, the biomass composition of this GSM did not contain some metabolites that are known to be important components of this species, including lipids, pigments, and cyanophycin. Automated model development tools are useful when there is enough information in the training set of models that they can extract to develop the new one (Mueller et al., 2013). However, some cyanobacterial metabolites are unique, therefore they cannot be detected using this methodology, and require manual annotation.

# Using Flux Balance Analysis for Systems Metabolic Engineering

The mathematical approach most widely used for studying the characteristics and capabilities of large-scale biochemical networks is flux balance analysis (FBA) (Orth et al., 2010). It depends on an assumption of steady-state growth and mass balance (influx equals efflux) (Orth et al., 2010; Qian et al., 2017). FBA calculates the flow of metabolites through a metabolic network, thereby making it possible to predict the growth rate of an organism or the rate of production of a biotechnologically important metabolite (Orth et al., 2010). In FBA simulations, the biomass function is normally used to simulate cellular growth, because it is composed of all necessary compounds needed to create a new cell including DNA, amino acids, lipids, and polysaccharides (O'Brien et al., 2015). However, when FBA is used to simulate the expected rates of production for a metabolite of interest, instead of maximizing the growth rate, it is necessary to maximize the production of this metabolite by optimizing the output flux of the reaction that produces it (Lewis et al., 2012).

FBA and related constraint-based methods can be used to predict the optimal set of gene knockout and overexpression targets to increase the ability of one organism to produce a chemical of interest. With this aim, FBA has been successfully used in GSMs of cyanobacteria to increase the production of several compounds, as summarized in **Table 3**. This method is a useful alternative when GSMs have little or no kinetic data available for their metabolic enzymes (Orth et al., 2010). Readers may also find more details about FBA results getting in cyanobacteria in this reference (Gudmundsson et al., 2017) and in the references detailed in **Table 3**.

# Algorithms Used in silico to Improve Cyanobacterial Productivity

Numerous constraint-based methods of GSMs are available to identify the phenotypic properties of an organism and to validate hypothesis-driven engineering of cellular functions toward specific objectives (Kim et al., 2015). In addition to the increasing refinement of higher-quality GSM reconstructions, other computational algorithms have been developed that have expanded the scope, accuracy, and applications for GSMs. Some of these algorithms have been used in cyanobacteria to predict promising gene deletion targets for increased production of target compounds, such as OptGene, minimization of metabolic adjustment (MOMA), OptKnock, and OptORF (Vu et al., 2013; Shabestary and Hudson, 2016). On the other hand, other algorithms have been developed to identify the possible interventions (e.g., up or downregulation of gene expression) that lead to overproduction of a target metabolite (e.g., OptForce) (Shabestary and Hudson, 2016; Lin et al., 2017).

The OptGene algorithm is based on the random implementation of reaction knockouts with the aim to create optimal knockout sets. This algorithm provides the advantage of a high computational speed, enabling solutions to be efficiently reached even for problems of larger size. Additionally, OptGene can optimize for non-linear objective functions, such as the productivity of one specific compound (Patil et al., 2005). This algorithm was recently used to identify knockouts in Synechocystis PCC 6803 that improve production of fermentation, fatty-acid, and terpene-derived biofuels (Shabestary and Hudson, 2016). The same authors also used MOMA, an algorithm that predicts the optimal flux distribution of altered metabolism that would require the smallest change from that of wild-type metabolism (Segrè et al., 2002). However, a primary drawback of the MOMA algorithm is that it does not assume optimality of growth or any other metabolic functions



(Raman and Chandra, 2009). Despite of these issues, it has been successfully applied in E. coli to predict the metabolic phenotype of gene knockouts and to find solutions not detected with FBA (Segrè et al., 2002). Shabestary et al. used MOMA to find knockout strategies that could increase biofuel productivity in Synechocystis PCC 6803 (Shabestary and Hudson, 2016). MOMA has been used in Synechococcus PCC 7002 to identify knockout mutants to improve chemical production under photoautotrophic and/or dark anoxic conditions (Vu et al., 2013). Shabestary et al. used a third algorithm to find a set of knockouts that led to coupling between biofuel and growth (Shabestary and Hudson, 2016). OptKnock is a powerful algorithm that identifies and subsequently removes metabolic reactions that are capable of coupling cellular growth with chemical production (Burgard et al., 2003). OptORF is an algorithm very similar to OptKnock, but it identifies gene deletions (instead of reaction deletions) and regulatory changes needed to couple growth and chemical production (Kim and Reed, 2010). Nevertheless, the applicability of this algorithm relies heavily on the availability of integrated metabolic and regulatory models, which is not always possible, especially for cyanobacteria (Maia et al., 2016). It was used in Synechococcus PCC 7002 to predict metabolic engineering strategies that improve production of both native and non-native chemicals (Vu et al., 2013).

OptForce is an algorithm that identifies all possible engineering interventions by classifying reactions in the metabolic model depending upon whether their flux values must increase, decrease or become equal to zero to meet a pre-specified overproduction target (Ranganathan et al., 2010). This algorithm has been recently extended to include kinetic descriptions for some of the reaction steps (Chowdhury et al., 2014). The new algorithm, k-OptForce, can only be directly applied when kinetic models of cellular pathways are available and unfortunately detailed kinetic models are extremely limited in cyanobacteria (Steuer et al., 2012). OptForce algorithm was applied in Synechocystis PCC 6803 to find a set of interventions to couple growth to 1-octanol and limonene production (Shabestary and Hudson, 2016). In the first case, OptForce predicted several reactions in alternative electron flow and required the upregulation of the ferredoxin:NADPH oxidoreductase reaction. In the case of limonene production-growth couple, a significant number of reaction knockouts were required, including some components of the cyclic electron flow and alternative electron flow. In a recent report, Lin et al. used this algorithm in Synechocystis PCC 6803 to enhance isoprenoid production (Lin et al., 2017). OptForce predicted several interventions: the up-regulation of two pentose phosphate pathway genes, ribose 5 phosphate isomerase and ribulose 5-phosphate 3-epimerase, and the overexpression of a geranyl diphosphate synthase involved in the limonene biosynthetic pathway. The optimized strain with these modifications demonstrated a 2.3-fold improvement in productivity.

## CONCLUSIONS AND PERSPECTIVES

Cyanobacteria possess desirable characteristics as a chassis for biotechnological production and have demonstrated capacities to produce high-value bioproducts. However, tools from heterotrophs are not always transferrable to these microorganisms, leading to the delay in their advancement as industrial hosts. Accelerating the development of sophisticated genomic tools is likely to greatly assist in making scaled cyanobacterial bioproduction a commercially-viable endeavor. Many recent efforts have been focused on the characterization of new parts and development of libraries and standardized modular parts in cyanobacteria. However, characterization of such parts could likely be significantly accelerated by the development of robust and modular expression libraries, like those that have been established in E. coli and yeast. Moreover, the development of well-defined genetic libraries such as genomic, expression, and knockout libraries might facilitate a better understanding of complex phenotypes in cyanobacteria (Ramey et al., 2015). On the other hand, the use of markerless modification systems as well as CRISPR/Cas based technologies could expand and improve the efficiency of genome editing in cyanobacteria. The iterative modification of the genome would allow the assembly of long and complex metabolic pathways, a capacity that is essential for many of the more advanced and ambitious metabolic engineering projects (Ungerer and Pakrasi, 2016; Behler et al., 2018). The development and/or optimization of high-throughput approaches that allow the introduction of several genetic modifications in a single step are needed (Ramey et al., 2015). In E. coli, techniques such as Multiplex Automated Genome Engineering (MAGE) (Wang et al., 2009) and trackable multiplex recombineering (Warner et al., 2010) have been successfully used to perform multiple genome engineering modifications in tandem. While some technical challenges would be complicate adoption of such techniques in cyanobacteria, it is notable that MAGE has been successfully applied in combination with CRISPR/Cas technology to engineer polyploid hosts, such as industrial yeast (Lian et al., 2018), and other organisms with more complex genomes, including plants (Sakuma et al., 2014; Hashimoto et al., 2018). The establishment of these approaches in higher organisms, together with the recent development of CRISPR/Cas systems in cyanobacteria, open up the possibility of the development of high-throughput genome engineering of cyanobacterial strains.

The increasing availability of GSMs for different cyanobacterial species have allowed for reiterative designbuild-test cycles whereby the predictions from in silico models can be validated and improved from experimental outcomes in vivo. Recent efforts have focused on the establishment of GPRs to improve the quality of GSMs, and thus, their phenotypic predictions. However, there are still some challenges that need to be addressed to ensure the accuracy of GSM predictions. Improving the scope and accuracy of existing GSMs will likely involve the integration of -omic datasets in to improve in silico representation of metabolism and identify additional biological unknowns (**Figure 4**, right panel). In addition, it is necessary to improve the modeling of photosynthesis, photorespiration, and photodamage (**Figure 4**, right panel). The correct modeling of photosynthesis and respiratory electron transfer processes might provide some insight into their physiological role in cyanobacteria. Another critical element that is currently absent in GSMs is the incorporation of regulatory pathways that can dynamically alter enzymatic activity and pathway flux (**Figure 4**, right panel), although in many cases information about regulatory functions is poorly understood in cyanobacteria.

#### REFERENCES


Although existing GSMs have been successfully applied to inform genetic engineering, it is clear that future efforts are being redirected to genome-scale kinetic models, which have stillgreater promise. These new models overcome many of GSM shortcomings, such as the lack of representation of metabolite concentrations and enzymatic regulation, which are necessary for a complete physiologically relevant model. Moreover, they enable dynamic analysis of biological systems for enhanced in silico hypothesis generation (Srinivasan et al., 2015).

Most efforts in the genetic and metabolic engineering of cyanobacteria for the production of compounds with biotechnological applications have been focused on increasing product yield under laboratory conditions. A limited number of studies has been performed with the aim to optimize cyanobacterial cells to be more compatible with downstream scaled cultivation and processing (Singh et al., 2016; Johnson et al., 2017). The successful utilization of cyanobacterial species for industrial production depends on the development of accurate large-scale cultivation systems. However, most academic researchers lack access to large-scale production systems that are necessary to evaluate the potential of engineered strains under more realistic environmental conditions (Schoepp et al., 2014) and such information is important for the recursive designbuild-test process for strain engineering. More communication between academic researchers and industrial partners could assist in the development of accurate equipment to mimic outdoor production conditions at the laboratory scale (as developed in Lucker et al., 2014). In addition to improving product yields, future research should also be addressed to the development of efficient and cost-effective photosynthetic bioreactors (Lau et al., 2015), as well as technologies to harvest the end products (Knoot et al., 2018) to minimize operation costs. Moreover, the high requirements of water and nutrients are other major challenges for economic profitability of large-scale cyanobacterial cultures (Pathak et al., 2018). Further investigation of the capacity of industrial strains of cyanobacteria to grow and remediate wastewater streams might help to alleviate some of these scalability concerns.

# AUTHOR CONTRIBUTIONS

MS-M, AS, and DD contributed to the writing and editing of the manuscript and generation of figures and tables.

#### FUNDING

This work was supported by the Department of Energy (Grant: DE-FG02-91ER20021).


Synechocystis sp. PCC6803 enhances biomass production. Metabol. Eng. 29, 76–85. doi: 10.1016/j.ymben.2015.03.002


reaction to gene-level phenotype prediction. PLoS Comput. Biol. 12:e1005140. doi: 10.1371/journal.pcbi.1005140


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Santos-Merino, Singh and Ducat. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Metabolic Engineering and Synthetic Biology: Synergies, Future, and Challenges

Raúl García-Granados 1,2, Jordy Alexis Lerma-Escalera1,2,3 and José R. Morones-Ramírez 1,2 \*

<sup>1</sup> Facultad de Ciencias Químicas, Universidad Autónoma de Nuevo León, San Nicolás de los Garza, Mexico, <sup>2</sup> Centro de Investigación en Biotecnología y Nanotecnología, Facultad de Ciencias Químicas, Universidad Autónoma de Nuevo León, Apodaca, Mexico, <sup>3</sup> Facultad de Ciencias Biológicas, Universidad Autónoma de Nuevo León, San Nicolás de los Garza, Mexico

#### Edited by:

Francesca Ceroni, Imperial College London, United Kingdom

#### Reviewed by:

Vijai Singh, Indian Institute of Advanced Research, India Borkowski Olivier, University of Évry Val d'Essonne, France

#### \*Correspondence:

José R. Morones-Ramírez jose.moronesrmr@uanl.edu.mx; morones.ruben@gmail.com

#### Specialty section:

This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology

Received: 30 November 2018 Accepted: 13 February 2019 Published: 04 March 2019

#### Citation:

García-Granados R, Lerma-Escalera JA and Morones-Ramírez JR (2019) Metabolic Engineering and Synthetic Biology: Synergies, Future, and Challenges. Front. Bioeng. Biotechnol. 7:36. doi: 10.3389/fbioe.2019.00036 The "-omics" era has brought a new set of tools and methods that have created a significant impact on the development of Metabolic Engineering and Synthetic Biology. These fields, rather than working separately, depend on each other to prosper and achieve their individual goals. Synthetic Biology aims to design libraries of genetic components (promoters, coding sequences, terminators, transcriptional factors and their binding sequences, and more), the assembly of devices, genetic circuits and even organism; in addition to obtaining quantitative information for the creation of models that can predict the behavior of biological systems (Cameron et al., 2014). Metabolic engineering seeks for the optimization of cellular processes, endemic to a specific organism, to produce a compound of interest from a substrate, preferably cheap and simple. It uses different databases, libraries of components and conditions to generate the maximum production rate of a desired chemical compound and avoiding inhibitors and conditions that affect the growth rate and other vital functions in the specific organism to achieve these goals; metabolic fluxes manipulation represents an important alternative (Stephanopoulos, 2012).

#### Keywords: synthetic biology, metabolic engineering, biotechnology, bioengineering, omics analyses

While synthetic biology provides the components and information about different biological phenomena, metabolic engineering tries to apply all this information toward the optimization of the biological synthesis trajectory of a desired compound. We can even mention, that some examples of synthetic biology could be also classified as examples of genetic engineering. Despite this, these two areas are dependent on the advances in the actual methods, techniques and tools for DNA modification. The reason is that both areas of research seek to achieve basic requirements such as rational changes in DNA sequences, the generation of specific mutations, the assembly of parts or components in genetic circuits or biosynthetic pathways, the knockout of genes, the integration of DNA pieces in the genome of an organism of interest or in a plasmid (Boyle and Silver, 2012). Although, the PCR and its variants are one of the best tools to generate some of the necessary modifications (it is especially useful for the extraction of fragments of a specific region) it results to be ineffective for other purposes.

The creation of complementary new techniques and methods to achieve other objectives has been vital in the development of both areas, some examples are: Biobricks, Recombinase technologies (integrases), Gibson Assembly, Gap-repair, Lambda-red, MAGE and CRISPR-Cas9, among others (Boyle and Silver, 2012; Stephanopoulos, 2012). During the "-omics" era, the sequencing of both, known and unknown organisms (metagenomics), allows obtaining more and better information from them; the catalog of enzymes and processes was expanded, and certain biological phenomena (for example viruses or phage infections) has been achieved (Goodwin et al., 2016).

Rational design is the introduction of desired mutations into a specific DNA sequence eliding the modification of proteins to improve their catalytic activity, stability or some other property (binding domain specificity); subsequently tools as the Gibson assembly, the biobricks or Golden Gate can assemble the different genetic components to form genetic circuits, expression cassettes or devices (Casini et al., 2015). Finally, the use of Lambda-red, CRISPR-Cas9 and other recombination techniques allow us to insert the different constructions into expression vectors or in strategic loci of the genome of a given organism (Esvelt and Wang, 2013; Liu et al., 2015; Wang et al., 2015).

Some of these technologies can be applied to insert genes of partial or complete metabolic pathways for the synthesis of a specific compound, they can also be used to eliminate genes from the organism that interfere with the synthesis. In addition, these methodologies allow carrying out point mutations that reduce the activity or expression of native proteins to modify the metabolic flux. However, to create all these modifications it is necessary to have certain information about the enzymes that participate in the reactions, the metabolic pathway that includes these reactions and information about the organism where the modifications will be made; because of this, sequencing genomes of organisms, characterization of proteins and metabolic studies provides extremely useful tools and information (**Figure 1**).

Therefore, one of the main issues is the lack of information; very few organisms have been sequenced and characterized at different levels (genome, transcriptome and metabolome); making the genetic modifications a difficult task to achieve. Added to this, the techniques and methods that currently exist are not effective or have low performance rates that depend on the organism. The creation of plasmids capable of being used in different organisms (shuttle vectors) will aid in the solution of this problem but sequencing remains a necessary step. With this information we could discover new proteins or metabolic pathways that could be useful for higher compound production titles (enzymes with better production rates or simpler metabolic pathways); we could even describe new biological phenomena that are useful for the modification of DNA sequences.

Another important point related to the information available, is the selection of the chassis (or microorganism on which we will working on). The basic information and techniques available, as well as the special qualities (specific metabolic pathways or certain resistant to conditions) are important criteria when we are choosing the chassis and could facilitate the development of a project (Brophy and Voigt, 2014; Khoury et al., 2014).

A problem related to the choice of the chassis and the modification of both the metabolic pathway and metabolic flux, is the production of toxic intermediates or the accumulation of an intermediate that could cause the inhibition of the route (feedback). The production of certain compounds can result in toxicity for the microorganism that produce it, especially when is handled at large scales (Sopko et al., 2006; Förster and Gescher, 2014). In these cases, a correct design of the bioreactor is usually helpful, removing the toxic compound before it reaches a threshold concentration of damage, and therefore improving significantly the production titles. Here appears another challenge for both areas, to discover a microorganism that can resist a greater concentration of that compound, or to discover and adapt the mechanism of resistance from organism to another (Keasling, 2012). However, to avoid the accumulation of an intermediate in any of the steps from the metabolic pathway, it is necessary to determine the rates of synthesis and production of each of the compounds; once you have this information, tinkering with the strength of promoters, RBS, transcriptional factors, or in some cases, the growth conditions allow us to avoid this problem, but the main problem still relies in having these information available.

The most important challenge for both areas is the obtention of quantitative data from different biological phenomena (Le Novère, 2015). We need more projects that are responsible for characterizing biological components; like the strength of promoters, RBSs, terminators; as well as the behavior of certain phenomena (transcription and translation levels of genes, average life of transcripts and proteins, catalytic activity, repressive force). Also, the creation of genetic circuits and devices has helped with the description of more complex behaviors and the understanding of previously mentioned phenomena (Canton et al., 2008).

All these data allow the creation of mathematical and computational models that provides us with a better understanding of the behavior of the system; and helps to have a visualization of expected behaviors, approximate productions, selection of conditions, etc. (Hwang et al., 2009). There is a diversity of works focused on biological simulation: Karr et al. (2012) report a whole-cell computational model of Mycoplasma genitalium, including the molecular components and their interactions by combining different mathematical and computational approaches (ODEs with Boolean, probabilistic and constraint-based submodels). Moreover, O'Brien et al. (2015) mentions the different applications that the mathematical models and the computational simulation offers: simulating gene and reaction knockouts, comparing inputs and outputs, iterative improvement, gap-filling approaches, discovery of regulatory interactions, quantification of predictable phenotypes from optimality principles, model-based design of environmental and genetic perturbations, to mention some.

The application of the mathematical models could be a powerful tool to understand biological systems and the computational simulations could give us a better perspective of the processes and the changes that occur or of the predictions. Technologies such as the Machine Learning, the Big Data and the Artificial Intelligence are being of great help for the analysis of biological data. Different methods (Artificial Neural Nets, Bayesian, Decision Tree and Random Forrest, Multi-layer perceptron (MLP), Radial basis function (RBF) networks, Support Vector Machines, K-means, Farthest First,

Density Base Clustering, etc.) are being used for microarrays, study of diseases, epidemiology; generating useful information for different researches (Pirooznia et al., 2008; Libbrecht and Noble, 2015). The use of quantum computers represents a very important tool for the development of these simulations and methods; these computers process the information in qubits (can represent either 1 or 0 or combinations of them, known as superposition) so it has a greater processing scope, something very valuable when you work with large amounts of information; the first big breakthrough has been the IBM to take public sale the first commercial quantum computer in history (the IBM Q System One). It is only a matter of time to discover the power that gives different researchers the analysis of information that provides the quantum computer in the biological and health area.

Both metabolic engineering and synthetic biology are two promising areas that have made great advances in biotechnology and have contributed significantly toward the resolution of problems in production of drugs, vaccines, chemical compounds, etc. (Khalil and Collins, 2010). In addition, these fields have advanced our knowledge regarding life function. Despite all these advances, it is still necessary to continue collecting information regarding functioning of cells and living organisms and discover new species of microorganisms that could aid in the development of new techniques and methods. Finally, synthesizing long sequences is often problematic because of the margin of error that exists with current chemical synthesis techniques (approximately 1 error in every 1,000 base pairs); the search for new nucleotide synthesis techniques (such as TdT-dNTP or enzymatic synthesis) or the improvement of the current chemical synthesis of nucleotides are aspects to be improved and could be very useful for both areas, especially because it opens the opportunity for the synthesis of whole genomes, vectors or artificial chromosomes (Kosuri and Church, 2014; Hughes and Ellington, 2017).

With all these tools at our disposal, we will be able to optimize microorganisms as small factories that allow us to obtain higher rates of yield and production of a chemical compound, preferably using simple substrates.

# AUTHOR CONTRIBUTIONS

RG-G, JL-E, and JM-R all contributed toward the writing and editing of this manuscript.

#### FUNDING

The Universidad Autónoma de Nuevo León and CONACyT for providing financial support through Paicyt 2016–2017

REFERENCES


Science Grant from the Universidad Autónoma de Nuevo León. CONACyT Grants for: Basic science grant 221332, Fronteras de la Ciencia grant 1502 and Infraestructura Grant 279957.

Trends Biotechnol. 32, 99–109. doi: 10.1016/j.tibtech.2013. 10.008


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 García-Granados, Lerma-Escalera and Morones-Ramírez. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# High-Performance Biocomputing in Synthetic Biology–Integrated Transcriptional and Metabolic Circuits

#### Angel Goñi-Moreno<sup>1</sup> \* and Pablo I. Nikel <sup>2</sup>

<sup>1</sup> School of Computing, Newcastle University, Newcastle upon Tyne, United Kingdom, <sup>2</sup> The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark

Biocomputing uses molecular biology parts as the hardware to implement computational devices. By following pre-defined rules, often hard-coded into biological systems, these devices are able to process inputs and return outputs—thus computing information. Key to the success of any biocomputing endeavor is the availability of a wealth of molecular tools and biological motifs from which functional devices can be assembled. Synthetic biology is a fabulous playground for such purpose, offering numerous genetic parts that allow for the rational engineering of genetic circuits that mimic the behavior of electronic functions, such as logic gates. A grand challenge, as far as biocomputing is concerned, is to expand the molecular hardware available beyond the realm of genetic parts by tapping into the host metabolism. This objective requires the formalization of the interplay of genetic constructs with the rest of the cellular machinery. Furthermore, the field of metabolic engineering has had little intersection with biocomputing thus far, which has led to a lack of definition of metabolic dynamics as computing basics. In this perspective article, we advocate the conceptualization of metabolism and its motifs as the way forward to achieve whole-cell biocomputations. The design of merged transcriptional and metabolic circuits will not only increase the amount and type of information being processed by a synthetic construct, but will also provide fundamental control mechanisms for increased reliability.

Keywords: biocomputing, synthetic biology, metabolic engineering, boolean logic, genetic circuits, metabolic networks

#### BIOCOMPUTING

Computation can be broadly defined as the formal procedure by which input information is processed according to pre-defined rules and turned into output data. Since this definition does not specify the type of information and rules involved in the process, it is applicable to electronic devices as well as to biological systems. In other words, biological systems do perform computations. While the computational ability of biological matter has been explicitly described a number of times along the twentieth century (Bennett, 1982), it was Leonard Adleman who showed the feasibility of implementing human-defined computations with molecular (i.e., genetic) hardware (Adleman, 1994). Although the discussion on what would be the equivalent of computer hardware and software in biological systems is still largely open (Danchin, 2009), the term hardware in this

#### Edited by:

Pablo Carbonell, University of Manchester, United Kingdom

#### Reviewed by:

Mario Andrea Marchisio, Tianjin University, China Jesus Picó, Universitat Politècnica de València, Spain

\*Correspondence:

Angel Goñi-Moreno angel.goni-moreno@newcastle.ac.uk

#### Specialty section:

This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology

Received: 14 December 2018 Accepted: 18 February 2019 Published: 11 March 2019

#### Citation:

Goñi-Moreno A and Nikel PI (2019) High-Performance Biocomputing in Synthetic Biology–Integrated Transcriptional and Metabolic Circuits. Front. Bioeng. Biotechnol. 7:40. doi: 10.3389/fbioe.2019.00040 article identifies any physical, tangible component (e.g., nucleic acids or metabolites) in a cell. On this first example of biocomputation, Adleman physically encoded an instance of the Hamiltonian path problem (a well-known mathematical problem in graph theory) in DNA strands, and solved it in vitro by using routine molecular biology methods. A bacterial computer (i.e., an in vivo computer), would solve an instance of the same problem 15 years later (Baumgardner et al., 2009). By the end of last century, Weiss et al. (2002) showed that synthetic regulatory networks could be conceptualized in vivo as a series of Boolean logic gates–the key device of cellular computers. This novel conceptual framework set the start of a frantic wave of electronicinspired bioengineering in synthetic biology. Additionally, these seminal works also shifted the inspiration within the biocomputing community drastically, from mathematics and computer science to electronic engineering.

# WHOLE-CELL BIOCOMPUTATIONS

Cells are able to process input information in many different and intricate ways. For the sake of clarity, in this article we propose to group the processing of information into two types of computing (i.e., genetic and metabolic) depending on the nature of the input and components thereof. To date, most of the biocomputing developments in synthetic biology dealt almost exclusively with genetic material and parts. This type of approach limits the scope of the potential synthetic biocomputations that can be executed, since a number of important resources are not being utilized. In a challenging paper entitled "It's the metabolism, stupid!," de Lorenzo (2015) suggested that "the interplay of DNA and metabolism is [. . . ] akin to that of politics and economy. Both realms drive their own autonomous agendas and obviously influence each other." In a similar fashion, the field of heterotic computing (Kendon et al., 2015) advocates the use of various types of computing that merge the strengths of individual types into more powerful, heterotic devices.

# Synthetic Biology as an Active Biocomputing Field

Boolean logic is central to the field of computing. Therefore, the design and implementation of Boolean logic functions in cells—typically encoded into genetic material (**Figure 1A**) is key to the development of synthetic biology approaches rooted on biocomputing (Amos and Goñi-Moreno, 2018). The engineering of a genetic toggle switch (Gardner et al., 2000) and an oscillator (Elowitz and Leibler, 2000) in Escherichia coli at the onset of the twenty-first century had set the start of what is a very active field nowadays. Over the last (almost) 20 years, a number of circuits have been successfully engineered in living cells, such as logic gates mentioned above (Wang et al., 2011), counters (Friedland et al., 2009), multiplexers (Moon et al., 2011), adders (Ausländer et al., 2012), and memories (Bonnet et al., 2013). Inspired by computer science, distributed computations have also been designed and build in multicellular systems by modifying cell-cell communication programmes (Goñi-Moreno et al., 2011, 2019; Regot et al., 2011). From solving relatively simple mathematical problems to compute intricate Boolean logic operations, biological systems have proved to be a powerful platform for tackling applications that are restricted to traditional "silicon-based" computer technologies, such as diagnosis, bioproduction, and bioremediation.

Synthetic biocomputing circuits are growingly more complex and accurate every day, mostly due to endless efforts in improving the genetic toolkit (Silva-Rocha et al., 2013; Martínez-García et al., 2014; Durante-Rodríguez et al., 2018), mathematical methods (Church et al., 2014; Goñi-Moreno and Amos, 2015) and design procedures (Goñi-Moreno and Amos, 2012; McLaughlin et al., 2018) for the so-called design-build-test-learn synthetic biology cycle (Goñi-Moreno et al., 2016). There are, nevertheless, major challenges on the genetic computing front (Goñi-Moreno, 2014; Manzoni et al., 2016), such as the urgent need for standardization of components, measurements, and information (Myers et al., 2017; Fabre and Sonnenschein, 2019). As long as synthetic biology claims to be a true engineering discipline, such a standardization problem must be tackled without delay to enable bona fide modularity and predictability of genetic circuits (Vilanova et al., 2015). Altogether, the implementation of biocomputations using genetic material is driven by an excellent scientific momentum at the present time.

# Metabolic Engineering as a Potential Biocomputing Field

While there is a phenomenal potential for development, the metabolic aspect of computation has not been explored to the same degree as it has been implemented via genetic circuits (**Figure 1B**). This fact arises from a still-limited knowledge on the complexity of metabolic networks even in the so-called "model" organisms (Benedetti et al., 2016; Calero and Nikel, 2019). Nielsen and Keasling (2016) have recently stressed the presence of metabolic networks with hard wired, tightly regulated lines of communication in virtually all living cells–which are inherently difficult to manipulate but, as the very definition implies, offer an unique opportunity for engineering multi-level computations. In the same way synthetic biology uses genetic parts and devices to build complex systems with pre-defined behaviors, metabolic networks are characterized by some (more or less conserved) principles that can be used for re-purposing biochemical nodes. The bowtie model of central metabolism indicates that the core biochemistry of the cell includes the biochemical transformations necessary for the synthesis of the 12 known essential biomass precursors (Noor et al., 2010). This architecture requires a high level of regulation, especially at the level of gene transcription (Kochanowski et al., 2017). Is precisely at this intersection between cellular processes that biocomputing could play a role in re-programming the metabolic machinery of cells.

# High-Performance Biocomputing

Natural cellular pathways are rarely based on genetic or metabolic activities alone. Thus, the concept of heterotic computing (i.e., the coordination between different types of computing), is intrinsic to biological systems. However, synthetic circuits are not often exploiting the full computational power of the cellular

machinery. Although the type of processes is very different, the cooperation between them could pave the way to a new generation of whole-cell circuits with enhanced abilities. This aspect is what we refer to as high-performance biocomputing.

Against this background, **Figure 1C** shows the flows of information in high-performance biocomputations. A first challenge would be to describe what in computing are called primitives, which are the simplest elements with which software programs are built upon. This will result in a set of wellcharacterized genetic and metabolic units (e.g., coding sequences and metabolic reactions) and motifs (e.g., oscillations and switches), including types of inputs and outputs for each computing end. Although current efforts are individually tackling this challenge in either the genetic (Nielsen et al., 2016) or metabolic (Sánchez-Pascuala et al., 2017) fronts, there is still the issue that genetic and metabolic units must be plugged together to allow information flow in both directions. This connectivity will enable the direct modification of genetic motifs by the action of their metabolic counterparts (and viceversa). Depending on the specific process, and the type of information being computed, either of the two ends could return the desired output.

The increasing focus on the interplay between genetic and metabolic networks (Shlomi et al., 2007; Kumar et al., 2018) is resulting in a revolution of metabolic engineering driven by the core principles of synthetic biology. Not only molecular tools are actively being developed (Keasling, 2012; Nikel and de Lorenzo, 2018), but also control strategies to engineer genetic circuits are being increasingly exploited for the regulation of metabolism in a pre-defined fashion (Oyarzún and Stan, 2013; Chen and Liu, 2018; Moser et al., 2018). The foundations of highperformance biocomputing are therefore established and ready to benefit from the input of the computing community. Yet, a solid representation framework is needed to fully realize this purpose.

# A UNIFIED REPRESENTATION FRAMEWORK

Boolean logic is a way to abstract the underlying mechanistic details of a device into its high-level functional performance. By doing so, gene expression can be abstracted into ON/OFF states (i.e., either the gene is, or is not, being expressed under a given environmental condition) regardless of the particularities the gene of interest might have. Even in the case of radical analog fluctuations in gene expression, the ON/OFF abstraction still provides an useful conceptual framework (García-Betancur et al., 2017; Goñi-Moreno et al., 2017). However, when it

comes to implementation, the Boolean abstraction needs to be complemented by a dynamical analysis of the components at stake. For example, the time-scales with which genetic and metabolic interactions occur can potentially be very different. Therefore, the dynamic analysis of individual reactions is as fundamental as the functional representation of the system as a whole.

To illustrate this point, we discuss a case of merged genetic and metabolic circuitry integration in a platform bacterium. **Figure 2A** shows a logic-gate representation of a simple merged transcriptional and metabolic circuit in the soil bacterium Pseudomonas putida KT2440. This device merges state-of-the-art DNA regulatory circuitry (Nielsen et al., 2016) with dynamics that are far beyond DNA reach: the metabolic ability of the cells to catabolize glycerol (Nikel et al., 2015). In this way, the circuit output depends not only in the upstream computation of typical genetic inputs (generic inputs A and B in the diagram) but also in the metabolic dynamics of glycerol uptake. The link that enables the functioning of the circuit is the transcriptional repressor GlpR, which somewhat encodes information about the metabolic state of the cell (the action of GlpR on the cognate glp gene cluster is relieved by the metabolite glycerol-3-phosphate, G3P) and acts on a specific promoter. Note that virtually any other signaling molecule or transcription factor that feeds the final genetic AND logic gate can be inserted downstream of this promoter. Moreover, any regulatory step in the circuit can be connected back to, e.g., the key enzyme GlpK (essential for glycerol processing) thus providing feedback control from the genetic to the metabolic side of the device. As a result, the combinatorial genetic logic circuit is now linked to the physiological state of the cell concerning the dynamics of carbon source uptake, which can be both read and controlled.

Using a lower, more specific, layer of representation, the Systems Biology Graphical Notation (SBGN) (Le Novère et al., 2009) and the Synthetic Biology Open Language (SBOL) (Galdzicki et al., 2014) were used to formalize the circuit (**Figure 2B**) for both metabolic and genetic parts, respectively. This helps identifying the link, which in this case is formed by the interaction between a metabolite (G3P) and a transcriptional repressor (GlpR)—thereby merging the metabolic and genetic layers of regulation in the bacterial cell (**Figure 2B**).

We recently coined the term metabolic widget to refer to such merged circuits (Chavarría et al., 2016). The metabolic machinery of the cell, often referred to as the context when focusing on genetic logic, offers powerful resources that can greatly improve current biocomputations. Far from trying to avoid the context, the framework proposed herein is taking full advantage of it, which can lead to widgets that assist more complex and accurate pre-defined processes of information. The adoption of such a configuration will have a double impact by providing essential information about both the metabolic and genetic wiring of

the cell while taking full advantage of these interactions for re-programming core cellular functions.

From a broader perspective, evolution has shaped intricate cellular processes that merge both genetic and metabolic networks; yet, human-defined biocomputations rarely make use of both computing types. We advocate for taking this path into account in order to access and exploit the highperformance biocomputing power intrinsic to natural systems, empowering the design-build-test-learn cycle to entirely new directions.

#### AUTHOR CONTRIBUTIONS

Both authors have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

#### REFERENCES


#### FUNDING

This work was supported by the SynBio3D (UK-EPSRC-EP/R019002/1) project of the UK Engineering and Physical Sciences Research Council, and the BioRoboost (EU-H2020-BIOTEC-820699) Contract of the European Union to AG-M. This work was also supported by The Novo Nordisk Foundation (Grant NNF10CC1016517) and the Danish Council for Independent Research (SWEET, DFF-Research Project 8021-00039B) to PIN.

#### ACKNOWLEDGMENTS

The authors wish to thank Prof. Víctor de Lorenzo (CNB-CSIC, Madrid, Spain) for inspiring discussions.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Goñi-Moreno and Nikel. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Biological Parts for Kluyveromyces marxianus Synthetic Biology

Arun S. Rajkumar <sup>1</sup> , Javier A. Varela<sup>1</sup> , Hannes Juergens <sup>2</sup> , Jean-Marc G. Daran<sup>2</sup> and John P. Morrissey <sup>1</sup> \*

*<sup>1</sup> School of Microbiology, Centre for Synthetic Biology and Biotechnology, Environmental Research Institute, APC Microbiome Institute, University College Cork, Cork, Ireland, <sup>2</sup> Department of Biotechnology, Delft University of Technology, Delft, Netherlands*

*Kluyveromyces marxianus* is a non-conventional yeast whose physiology and metabolism lends itself to diverse biotechnological applications. While the wild-type yeast is already in use for producing fragrances and fermented products, the lack of standardised tools for its genetic and metabolic engineering prevent it from being used as a next-generation cell factory for bio-based chemicals. In this paper, we bring together and characterise a set of native *K. marxianus* parts for the expression of multiple genes for metabolic engineering and synthetic biology. All parts are cloned and stored according to the MoClo/Yeast Tool Kit standard for quick sharing and rapid construction. Using available genomic and transcriptomic data, we have selected promoters and terminators to fine-tune constitutive and inducible gene expression. The collection includes a number of known centromeres and autonomously replication sequences (ARS). We also provide a number of chromosomal integration sites selected for efficiency or visible phenotypes for rapid screening. Finally, we provide a single-plasmid CRISPR/Cas9 platform for genome engineering and facilitated gene targeting, and rationally create auxotrophic strains to expand the common range of selection markers available to *K. marxianus*. The curated and characterised tools we have provided in this kit will serve as a base to efficiently build next-generation cell factories from this alternative yeast. Plasmids containing all parts are available at Addgene for public distribution.

Keywords: Kluyveromyces, synthetic biology, metabolic engineering, genome engineering, yeast

# INTRODUCTION

Cell factories can serve as the basis of a new bio-based economy based on the sustainable production of fine chemicals, pharmaceuticals, nutraceuticals, and biofuels from engineered or native microbes. As of now, the most commonly-used eukaryotic cell factory is baker's yeast Saccharomyces cerevisiae. An ease of genetic manipulation, along with a wealth of genomic, genetic and biochemical knowledge, have made it able to produce a diverse range of compounds with S. cerevisiae from simple or economical feedstocks. However, non-Saccharomyces yeasts can provide several advantages over S. cerevisiae for building cell factories as they often possess desirable tolerances or metabolic traits that would otherwise need to be extensively engineered into baker's yeast (Wagner and Alper, 2016). In general, such yeasts have had niche applications in biotechnology but are being developed to be next-generation cell factories. Kluyveromyces marxianus is one such alternative yeast. While thermotolerant, fast-growing and able to use various carbon and nitrogen sources, it broadly has the same nutritional requirements and culture

#### Edited by:

*Rodrigo Ledesma-Amaro, Imperial College London, United Kingdom*

#### Reviewed by:

*Dae-Hee Lee, Korea Research Institute of Bioscience and Biotechnology (KRIBB), South Korea Hyun Ah Kang, Chung-Ang University, South Korea*

> \*Correspondence: *John P. Morrissey j.morrissey@ucc.ie*

#### Specialty section:

*This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology*

Received: *25 November 2018* Accepted: *16 April 2019* Published: *07 May 2019*

#### Citation:

*Rajkumar AS, Varela JA, Juergens H, Daran JM and Morrissey JP (2019) Biological Parts for Kluyveromyces marxianus Synthetic Biology. Front. Bioeng. Biotechnol. 7:97. doi: 10.3389/fbioe.2019.00097* techniques as S. cerevisiae. Its natural strain diversity allows a wide number of phenotypes and gene variants to be exploited and combined to create an optimal cell factory chassis, and as a broadly Crabtree-negative species, it does not need to have metabolism routed away from ethanol production. While it has applications in the production of aroma compounds, fermented foods, and secreted enzymes (Morrissey et al., 2015; Gombert et al., 2016), K. marxianus is yet to be established as a metabolic engineering platform. Obstacles to such development include inefficient and random native gene targeting (hindering the stable expression of integrated heterologous genes), limited knowledge of its biochemistry and genetics, and a lack of standardised regulatory parts and expression systems on the level of baker's yeast. While such tools are starting to be developed, it still lacks the well-defined sets of regulatory elements to precisely control gene expression one uses in S. cerevisiae. As of now, individual parts are either selected from the native genome based on orthologues of genes of common parts from S. cerevisiae (Lee et al., 2013), or from other yeasts altogether (Chang et al., 2013), precluding the advantages of using K. marxianus' environmental triggers to fine-tune gene expression. Taken together, less than 20 native regulatory parts are currently in use for metabolic engineering (Bergkamp et al., 1993; Ball et al., 1999; Yang et al., 2015a; Gombert et al., 2016).

Kluyveromyces marxianus-specific techniques exist for the efficient in vivo assembly of large multigene constructs (Chang et al., 2012), and in conjunction with CRISPR/Cas9 (Löbs et al., 2017; Nambu-Nishida et al., 2017; Cernak et al., 2018) can allow us to specifically edit a genome, or efficiently target chromosmal integrations. Nonetheless in vivo assembly as it stands does not eliminate non-specific integrations of incomplete parts of the assembly. To further sidestep this problem, the MoClo standard, based on Golden Gate assembly, allows the efficient hierarchical in vitro assembly of multigene constructs either on episomal or integrative vectors for such purposes (Weber et al., 2011). It has been adapted for synthetic biology in diverse organisms, and is efficient enough to circumvent in vivo assembly. One variant of MoClo, the Yeast Toolkit (YTK), collects a number of well-characterised parts for S. cerevisiae (Lee et al., 2015). The YTK has 8 general classes of parts, defined by the 5' and 3' overhangs used for Golden Gate assembly, which allow directional cloning. Taken together, the parts of the original YTK and the system itself allow for the versatile construction of vectors for S. cerevisiae with several selection markers and integration sites of choice if needed.

The YTK's MoClo approach also sets up three tiers of plasmids for storage or use (**Figure 1B**). Level I plasmids correspond to part plasmids. A BsmBI site, and a BsaI site that after digestion will generate overhangs specific to that part type, flank each functional part. The BsmBI sites, in turn, are used to clone new parts into the entry vector YTK001 by Golden Gate assembly with that enzyme. At level II, level I plasmids are assembled together with BsaI to create gene expression cassettes or transcriptional units (TUs). Assembling TUs includes flanking them with synthetic and directional connector sequences which allow the construction of level III, plasmids. Here, multiple TUs are assembled together into a multigene expression or integrative vector, again with BsmBI based on unique overhangs present in the connectors.

The YTK, either with its standard or simply by using its parts collection, has since been used in diverse metabolic engineering and synthetic biology applications (Feng et al., 2015; Awan et al., 2017; Denby et al., 2018), and has already been adopted when cloning parts for Pichia pastoris (Obst et al., 2017). In this paper we present a collection of K. marxianus-specific parts and vectors, cloned to the YTK standard, for the genetic and metabolic engineering of this yeast. Over 30 constitutive and inducible promoters, terminators and centromeres and autonomously replicating sequence (ARS) elements have been selected from existing gene expression data and characterised as part of this collection. We have also identified a number of integration sites for the integration of single- and multi-gene constructs and have tested the best means to eliminate random integration. Taken together with existing YTK parts, our parts collection is a valuable set of tools for researchers working with K. marxianus as a cell factory.

# MATERIALS AND METHODS

# Strains and Cultivation

Kluyveromyces marxianus strain CBS6556 (Westerdijk Fungal Biodiversity Institute, Utrecht, The Netherlands) was used for parts, unless specified otherwise. For part characterization and mutant creation, we used NBRC1777 [Biological Resource Centre, NITE (NBRC), Tokyo, Japan]. All strains used in this study are listed in **Table 1**. Yeast cultures were routinely grown in YPD broth (1% yeast extract, 2% peptone, 2% glucose) at 30◦C. For promoter and terminator characterization, yeast was grown in synthetic drop-out medium without uracil [SD-ura; 0.17% yeast nitrogen base without amino acids or ammonium sulphate (Formedium, Hunstanton, UK), 0.5% ammonium sulphate and 0.19% SC-ura (Formedium)] containing 2% glucose or an alternate carbon source. For transformation of auxotrophic strains, yeast was grown in synthetic drop-out (SD) medium containing 2% glucose and lacking the appropriate nutrients (Formedium). For fermentation experiments, strains were cultured in minimal medium (MM) with 2% glucose (Fonseca et al., 2007). When needed, G418 or hygromycin (Fisher Scientific, Dublin, Ireland) was added to a concentration of 200 mg L−<sup>1</sup> for selection or 150 mg L−<sup>1</sup> for maintenance. Bacterial transformations used E.coli DH5α grown in LB medium (1% NaCl, 1% peptone, 0.5% yeast extract) supplemented with the appropriate antibiotics (100 mg L−<sup>1</sup> ampicillin, 50 mg L−<sup>1</sup> chloramphenicol or 50 mg L−<sup>1</sup> kanamycin).

# Selection and Amplification of Parts

Native promoters, terminators, and replicating sequences were amplified from the genomic DNA of K. marxianus CBS6556 (Jeong et al., 2012) unless specified otherwise. The sequences in question were identified from the genome sequence data (assembly GCA 000299195.2) following in-house gene prediction (Varela et al., 2017). In general, a promoter was defined as the first 1,000 bases upstream of a gene's start codon, unless this overlapped with a neighbouring gene or other

Yeast Toolkit (YTK) standard to express any gene of interest (GoI) under various conditions and expression platforms. Several constitutive and inducible promoters allow precise expression of a GoI under different conditions specific to *K. marxianus*, and terminator choice further fine-tunes gene expression. A number of metabolic and antibiotic markers can be used in wild-type yeast, or in auxotrophs generated by the CRISPR/Cas9 system provided. Finally, a number of species-specific origins and integration homology arms allow the expression of the GoI on a stable plasmid or as an integration cassette. (B) Hierarchy of YTK assemblies. Alternating the type IIS enzymes between assemblies allows construction of different plasmid "levels." Starting from amplified PCR products, parts are cloned into level I plasmids for storage. At the time of cloning, they are given overhangs corresponding to the numbered parts used to build expression systems. From there, individual transcriptional units are built from level I plasmids either for use or for storage; these are level II plasmids. Finally, multiple TU-bearing level II plasmids can be combined to create multi-TU level III plasmids that are either episomal or integrative vectors. The use of different bacterial markers at each level allows us to use the previous level's plasmids directly for assembly.



features. Terminators were similarly defined as the first 250bp downstream of a gene's Stop codon (Curran et al., 2013). Parts were amplified from genomic DNA using Q5 High-Fidelity Polymerase (New England Biolabs, Ipswich, UK) and purified using a GeneJet PCR clean-up kit (Fisher Scientific) and eluted in sterile water prior to use. Yeast ARS elements and centromeres were selected from the literature, and either amplified from genomic DNA or from an appropriate plasmid. All assembled part plasmids with their Addgene plasmid IDs are listed in **Table S1**, and primers to amplify the parts are listed in **Table S2**.

#### Golden Gate Assembly

Golden Gate assemblies were carried out as recommended (Lee et al., 2015) with minor modifications. In a typical reaction, 40 fmol of each insert (either as an existing level I plasmid or PCR product) were combined with 20 fmol of the plasmid containing the backbone for the final product to be assembled along with 1µL T4 DNA ligase 0.5 µL each of T7 DNA ligase (3,000U µL −1 ) and BsmBI or BsaI, (10U µL −1 , NEB) and water to a total volume of 10 µL.

For Golden Gate assembly the protocol was as follows: 25 cycles of digestion and ligation (2 min at 42◦C followed by 5 min at 16◦C), followed by 10 min digestion at 60◦C and 10 min inactivation at 80◦C. For assemblies involving BsaI, the digestion step was carried out at 37◦C for 3 min. When plasmids containing type IIS restriction sites that could not be removed, the final digestion and heat inactivation steps were eliminated.

Typically, half of a Golden Gate assembly was used to transform chemically competent E.coli DH5α. Transformants were screened by colony PCR with OneTaq Quick-Load DNA polymerase (NEB), using primers in the backbone vector flanking the insert if size permitted, or with one primer binding in the backbone and one in the insert. In the latter case, this was usually a primer used to amplify one of the component parts. For level I plasmids, colonies with the correct PCR product size had their plasmids extracted and inserts sequenced. For level II and III plasmids, PCR-positive colonies had their plasmids extracted and digested with NotI to verify the insert size All primers used to genotype strains are listed in **Table S3**, and maps of all the plasmids in **Table S1** are included as **Supplementary Material**.

#### Construction of Reporter Plasmids

20fmol of a suitable "vector" level I plasmid from the Yeast Toolkit [pYTK083; (Lee et al., 2015)] were combined with 40fmol each of the following "insert" level I plasmids: left and right connectors from the Yeast Toolkit (pYTK002 and pYTK067), mVenus (pYTK034), a kanMX expression marker for G418 resistance, and the appropriate K. marxianus promoters (P1-19), terminators (T1-5), and centromere (typically C1; **Table 2**). A Golden Gate assembly for a level II plasmid was then carried out as described above. In total, 8 parts were assembled and transformed into E.coli per plasmid. Assembly of the construct was verified by colony PCR with ASR\_K001F and ASR\_K005R as primers and subsequent restriction mapping of plasmids from PCR-positive colonies with NotI. All original YTK plasmids used for assemblies are listed in **Table S4**.

#### Construction of Integrative Reporter Constructs

For the construction of integrative reporter vectors to evaluate insertion sites, a similar 8-part Golden Gate assembly was carried out as above, except that pYTK002 was replaced with plasmids containing 850–900 bp left homology arms targeting an insertion site (I1L-I5L), and C1 was replaced with plasmids containing 850–900 bp right homology arms targeting the same insertion site (I1R-I5R). For the experiments carried out here, for the purpose of evaluating integration sites, mVenus was always under the control of the PDC1 promoter (P2) and the INU1 terminator (T1). Following assembly and transformation, transformants were screened by colony PCR with primers ASR\_K001F and ASR\_P002R, followed by NotI restriction digestion as above.

#### Promoter and Terminator Characterization

We constructed mVenus (YFP) reporter plasmids to characterise promoter and terminator strength. To minimise variations in expression due to copy number, we used centromeric plasmids with a kanMX cassette. While evaluating promoter strengths alone the reporter plasmids used the inulinase terminator in common (INU1t). In a similar manner, we used histone B promoter (HHF1pr) in common to regulate YFP expression when evaluating terminator strengths. Three hundred nanogram of reporter plasmids were transformed into K. marxianus by the LiOAc/PEG method (Gietz, 2014). After 48 h growth on selective medium, three transformant colonies were inoculated into 2 mL YPD with G418 and grown overnight at 30◦C with 200 rpm agitation. The following day, the cultures were diluted 100-fold into 2 mL SD medium with 150mg L−<sup>1</sup> G418 (approximately corresponding to a starting optical density of 0.1) and grown at 30◦C for 24 h with 200 rpm shaking. For promoter inductions at high temperatures and xylose, overnight cultures were typically inoculated to a starting OD of 0.2–0.3 so that a comparable cell number would be present after 24 h growth. The cultures were then diluted 5 to 20-fold in identical SD medium on a 96-well microtitre plate, and YFP fluorescence measured on a Sirius HT platereader (MWG/BioTek, Winooski, USA) with excitation and emission set to 485 and 525 nm, respectively (bandwidth 20 nm). After correcting for the autofluorescence of wild-type K. marxianus, the signal was normalised to cell number by dividing by the OD at 600 nm. Differences between normalised YFP values under different conditions were tested for statistical significance by a paired t-test, with p < 0.05 taken to be significant.

For the characterization of integration sites, 2 µg of integrative plasmid containing an YFP expression cassette and kanMX marker was digested with SgsI/AscI (Fisher Scientific) and transformed into yeast. The amount corresponds to approximately 400 fmol of insertion cassette. G418-resistant colonies were screened for correct insertion at the intended locus by colony PCR and these alone were selected for YFP measurements. When the LAC4 locus was targeted, transformation plates were replica-plated onto YPGal (2% galactose, 2% peptone, 1% yeast extract) containing 200 mg L−<sup>1</sup> G418 and 40 mg L−<sup>1</sup> X-Gal (Melford Laboratories, Ipswich, UK). The inability of disrupted LAC4 to metabolise X-Gal—and not produce a blue dye—was made use of to pre-screen transformant colonies before genotyping them.

#### Genome Editing Using CRISPR/Cas9

The cross-yeast CRISPR/Cas9 plasmid pUDP002 (Juergens et al., 2018) was modified to pUCC001 to allow easy cloning of new guide RNA (gRNA) targets by Golden Gate assembly. The original gRNA expression cassette of pUDP002 consisted of a target gRNA and structural element flanked by selfcleaving hammerhead and HDV ribozymes at the 5′ and 3′ ends respectively (Ng and Dean, 2017). This was modified to contain a BsaI cloning site between the hammerhead ribozyme and gRNA structural element (Vyas et al., 2015). The new cassette was assembled from long oligonucleotides (Integrated DNA Technologies, USA) by annealing them in a thermocycler.

#### TABLE 2 | List of parts provided in the collection.


*<sup>a</sup>Based on the CBS6556 genome annotations generated in Varela et al. (2017); <sup>b</sup>Based on the NBRC annotations in (Inokuma et al., 2015); <sup>c</sup>Based on the genome sequence assembled in Ortiz-Merino et al. (2018).*

A plasmid backbone was then amplified from the original pUDP002 using primers pUDP002-F and pUDP002-R, and Gibson assembly was used to create pUCC001 from the two parts.

When using CRISPR/Cas9 to edit a site in the genome, a gRNA sequence targeting the gene of interest was first predicted using the sgRNA software (Xie et al., 2014). Complementary oligonucleotides comprising the gRNA sequence are designed with 5 ′ and 3′ overhangs (5′ -CGTC-3′ and 5′ -AAAC-3′ ,) on the sense and antisense oligonucleotides respectively, creating sticky ends to be ligated into pUCC001 cut by BsaI. One hundred pico mole oligos are phosphorylated with 1 µL of T4 polynucleotide kinase (10U µL −1 , NEB) in a total volume of 10 µL and then denatured and annealed. Fifty femto mole of the annealed gRNA insert is then used with 100 ng of pUCC001 in a Golden Gate reaction with BsaI. Following transformation, the correct insertion of the gRNA was subsequently verified by PCR using the primer Bsa-R and the sense oligo containing the gRNA target, and sequencing of the plasmid. gRNA targets used in this study are listed in **Table S5**.

Three hundred nanograms of a gRNA expression plasmid are used in a typical genome engineering experiment. As K. marxianus predominantly repairs double stranded breaks by non-homologous end-joining (Daley et al., 2005), mutations are created around the gRNA target site without providing a repair fragment (Cernak et al., 2018). After 48–72 h growth following a transformation, hygromycin-resistant colonies are then screened for mutations at the targeted locus by colony PCR and sequencing (Jakociunas et al., 2015). If the intended mutations create an observable phenotype (e.g., an auxotrophy), the transformed plate is replica-plated to appropriate medium to pre-screen colonies based on the phenotype. Colonies with frameshift mutations are then grown overnight in YPD and then passaged twice to fresh YPD cultures to ensure loss of the gRNA plasmid before being preserved. When multiple genes were to be edited, sequential mutation was performed; following each mutation, the gRNA plasmid was cured and the mutation confirmed before proceeding with transformation of the next gRNA expression plasmid.

#### RESULTS

#### A Collection of Parts and Assembly Pipeline for Kluyveromyces marxianus Using the Yeast Toolkit Standard

The biological parts collected and characterized here are intended to provide the same functions for K. marxianus in all the relevant categories of YTK parts (**Figure 1A**). A number of the parts have been described and identified elsewhere (Bergkamp et al., 1993; Yang et al., 2015a), but we also provide a more substantial number of characterised promoters, terminators and integration sites for expression cassettes. A number of highand low-copy number origins have also been identified from K. marxianus genomes, or created from minimal elements. We have included three centromeric elements—one minimal (Yarimizu et al., 2013) and two genomic (Iborra and Ball, 1994; Ball et al., 1999)—and a minimal ARS element (Cernak et al., 2018) for the construction of expression vectors (**Table 2**).While the parts can be used to construct K. marxianus expression systems in general, they are optimally compatible with selection markers, bacterial vectors and synthetic connectors in the original YTK available from Addgene (#1000000061). They allow the in vitro construction of cloning and expression systems for K. marxianus with the same flexibility one can for S. cerevisiae. Golden Gate assembly is an established in vitro assembly technique, and with it we were able to assemble up to 8 part-containing plasmids into reporter plasmids to characterize our parts.

# A Set of Native Promoters to Fine-Tune Gene Expression

When selecting native yeast promoters to use in strain engineering, an important source is gene expression studies under conditions of interest. While a number of gene expression studies have been carried out on K. marxianus strains, few of them have focused on strains with the potential to be synthetic biology chassis, or on have instead focused on explicitly industrial conditions (Gao et al., 2015; Schabort et al., 2016; Diniz et al., 2017). To select promoters, we turned to two studies based on the strain DMKU3-1042, which also has the best quality publicly-available genome. One was TSS-seq transcriptome data published alongside its genome (Lertwattanasakul et al., 2015), as well as a smallscale gene expression study using fluorescent reporters of genes involved in carbon metabolism (Suzuki et al., 2015).The former gave us the opportunity to select not only promoters with distinct strengths, but also those induced by high temperature and xylose. Following this, the corresponding promoters were then identified from the CBS6556 genome. We excluded promoters with internal BsaI or BsmBI sites, as we lacked information on regulatory elements to justify removing the sites by mutagenesis.

This analysis allowed us to select 19 promoters: 10 constitutive and 9 inducible by heat, xylose, lactose and inulin, whose strengths we characterised using YFP reporter assays (**Table 2**, **Figure 2A**; **Figure S1**). While the promoter sequences came from CBS6556, we tested them in strain NBRC1777, due to its faster growth rate and superior thermotolerance compared to the former. Under standard conditions (30◦C, glucose-rich medium), we provide a broad selection of promoter strengths for gene expression. From the weakest (REV1pr) to the strongest (PDC1pr) promoters, we can achieve a 40-fold range of gene expression in NBRC1777. The strongest promoters were those of genes involved in central carbon metabolism (PDC1pr, FBA1pr, TDH3pr), as well as that of the orthologue of the translation elongation factor EF-1α (TEF1) in S. cerevisiae. Interestingly, several of the same orthologous genes in S. cerevisiae also have strong promoters. However, the latter do not always achieve strong expression when used in K. marxianus (**Figure S2**) a trade-off against the advantages of using an orthogonal yeast promoter.

Kluyvermomyces marxianus's thermotolerance provides a unique induction signal for this yeast's promoters. With this in mind, we picked three heat-inducible promoters (HSP104pr, SSA2pr, TSA1pr) with different fold-induction at high temperatures, and one intended to be stable at ambient and high temperatures (HSP150pr) based on expression data for the strain DMKU3-1042 as earlier (Lertwattanasakul et al., 2015). All of our inducible promoters selected from expression data

exhibited induction under the relevant conditions, though not all to the same extent (**Table S6**). When tested at 37 and 42◦C, TSA1pr, SSA2pr, and HSP104pr promoters induced to different levels. Expression of YFP by HSP150pr, as predicted, remained relatively stable at higher temperatures, inducing weakly when exposed to 37◦C but no further at 42◦C (**Figure 2B**). The other three promoters offer a range of induction varying from 2 to 6.5-fold depending on the temperature. TSA1pr has the strongest fold-induction at 42◦C and the highest measured fluorescence at high-temperature, whereas that of SSA2pr remains at increasing medium-to-high levels (relative to the constitutive promoters) as temperature increases (**Figure 2B**). In comparison, two strong promoters at 30◦C—PDC1pr and TEF1pr—had their YFP expression relatively unchanged by increased temperature.

Another feature of K. marxianus's physiology that makes it an attractive cell factory chassis is its ability to utilize a wider range of carbon sources than S. cerevisiae, thus circumventing the need to engineer this yeast to consume the carbon sources in question. Wild-type K. marxianus can consume sugars found in plant (xylose and inulin) and dairy (lactose) waste, thus allowing reduced costs if using these as feedstocks for fermentations. For xylose induction, we cloned and tested the induction of three xylose-inducible promoters: XYL1pr, XYL2pr, and ALD4pr. The xylose reductase XYL1 reduces xylose to xylitol, this in turn converted to xylulose by the xylitol dehydrogenase XYL2. XKS1 then phosphorylates it to xylulose-5-phosphate, which can then enter the non-oxidative branch of the pentose phosphate pathway. ALD4pr was chosen from a genome-wide expression study for its low background in glucose and high fold-induction by xylose (Lertwattanasakul et al., 2015). These promoters exhibited a 6-to-10-fold increase in YFP using xylose as a carbon source relative to glucose, with XYL2pr having the strongest fold-induction (**Figure 2C**). For a lactose-inducible promoter, we chose the promoter for the beta-galactosidase LAC4. Unlike the other promoters in this set, we chose LAC4pr from strain CBS397, known to grow well on lactose as a carbon source (Varela et al., 2017). It exhibited a 10-fold induction by lactose relative to glucose alone when used to express YFP in NBRC1777. In comparison, the GAL1 promoter from S. cerevisiae was inducible by both galactose and lactose, but more strongly for the former (15-fold vs. 6 fold, **Figure 2C**).

# Terminators Provide a Second Level of Control Over Gene Expression

Besides promoters, terminators can also control gene expression by affecting the lifetime of mRNA, and provide us with an extra means to fine-tune gene expression (Curran et al., 2013). While the terminators included with the original YTK aimed to keep gene expression output roughly constant (Lee et al., 2015), we have provided five native terminators to broaden the range of gene expression our parts can achieve (**Figure 3A**). Interestingly, two terminators from the YTK—those for ScADH1 and ScPGK1—can change gene expression between them (as measured by YFP fluorescence) by nearly a factor of two in K. marxianus. The full ability of promoters to further optimise gene expression can be seen when a set of four terminators and three terminators are combinatorically used to express YFP. While the promoter of choice is still the dominant factor in determining the level of gene expression, choosing a "weak" or "strong" terminator can significantly affect expression as well (TSA1pr vs. TDH1pr, **Figure 3B**). In the case of inducible promoters, terminator effects are more pronounced under non-inducible conditions. Nonetheless, the increase in basal expression as seen in the inulinase promoter due to a change in a "stronger" terminator is enough to halve fold-induction by inulin (**Figure 3C**). In summary, depending on the promoter and the means of its induction, the terminators we provide could be used to minimise background, maintain a level of basal expression, or smooth out changes in expression during different conditions if needed when expressing a heterologous gene.

# Efficient CRISPR/Cas9 Editing With pUCC001

Our CRISPR/Cas9 editing plasmid pUCC001 takes advantage of the cross-species pUDP002 system and makes it a more flexible and economical tool by introducing a BsaI cloning site for new gRNA targets (**Figure 4A**). In this way, annealed oligos containing overhangs matching those generated by the BsaI sites in pUCC001 can be easily cloned into a cut plasmid, or using Golden Gate assembly. As a proof of principle, we transformed K. marxianus with pUCC001 containing a gRNA targeting the LAC4 locus. lac4 mutants are unable to convert X-Gal to a blue dye. We observed that over 50% of the hygromycinresistant colonies did not turn blue when grown on medium containing galactose and X-Gal, as opposed to ∼10–20% when a deletion cassette is used without CRISPR/Cas9 (**Figure 4B**, **Table 3**). As a demonstration of more practical applications, we used pUCC001 to rapidly generate defined single, double and triple auxotrophs for uracil, leucine, and histidine (**Figures 4C,D**; **Table 1**). Using gRNA plasmids targeting the K. marxianus orthologues of URA3, HIS3, and LEU2 we created frameshift mutations in these genes leading to loss of function. Given that no repair fragment was used, the efficiency of mutations was good (≥50%); in general, we were able to retrieve the frameshift mutants in the figure by screening fewer than 8 colonies per transformation. The auxotrophic mutants so created allow us to use the orthogonal metabolic markers from the original YTK (ScURA3, ScLEU2, and ScHIS3), and show insignificant background when used in transformations with metabolic markers. Separately, the pUCC001 system was used to construct mutants in several other K. marxianus genes (Varela et al., 2019).

# Inactivating Different Genes Involved in Non-homologous End-Joining Has Different Effects on Gene Targeting Efficiency

While metabolically engineering strains it is advantageous to integrate heterologous gene cassettes for stronger and more stable expression. In yeasts other than S. cerevisiae, the ability to efficiently target gene integration is hindered by their usage of non-homologous end-joining (NHEJ) as the dominant DNA repair mechanism (Daley et al., 2005). As a result, random and incomplete integrations are frequent, and much larger targeting homology sequences are also required (up to 1 kb) compared to S. cerevisiae (50 bp) for a successful integration (Baudin et al., 1993; Choo et al., 2014). While random integrations of multiple copies of a gene can be advantageous in some biotechnology applications (Lin et al., 2017), it is equally important to integrate single copies of heterologous genes or pathways to rationally construct a cell factory or evaluate different metabolic engineering strategies. Inactivating any of the key genes involved in NHEJ—YKU70/80, NEJ1 or DNL4 (Abdel-Banat et al., 2010; Choo et al., 2014; Nambu-Nishida et al., 2017)—has been shown to increase targeted integration in K. marxianus by forcing it to use homologous recombination (HR) alone for DNA repair. It remains unclear if the different mutations suppress random integration to the same extent. To decide which NHEJ mutant gave us the highest rate of targeted integration, we tested the integration of a YFP expression cassette at the LAC4 locus in backgrounds that were either wild-type or had either YKU80, DNL4, or NEJ1 previously inactivated by CRISPR/Cas9 (YBL001, KmASR.005, and YBL003, respectively; **Table 1**). The cassette was flanked by 880 bp on either end targeting LAC4 (**Figure 5A**). As with inactivation by CRISPR/Cas9, correct integration would disrupt the gene and render the yeast unable to break down X-Gal, allowing us to pre-screen colonies by blue/white selection for genotyping (**Figure 4B**). The YFP cassette would also allow us to determine if transformants contained multiple integrants in a semi-quantitative manner. Sequencing around the insertion site revealed that integration of the cassette was "seamless" in all backgrounds; the sequencing region immediately downstream of the insertion sites revealed no mutations surrounding the insertion point (**Figure 5A**).

Our experiments revealed that inactivating NHEJ resulted in disruption of the LAC4 locus in nearly all transformants, as opposed to only occurring for about ∼10% of transformants in wild-type K. marxianus. Furthermore, all white colonies of the NHEJ-deficient strains had the YFP cassette correctly integrated, as opposed to <20% for the wild-type strain (**Table 3**). In spite of correct integration, wild-type transformants still had a wide spread of measured fluorescent intensities (**Figure 5B**). This

Frontiers in Bioengineering and Biotechnology | www.frontiersin.org

non-inducing conditions when used to express YFP with different terminators. Here the strong induction of the inulinase (*INU1*) promoter by inulin is not significantly affected by terminator choice, while it can affect leaky expression under non-inducing conditions. All data are plotted as the mean ± s.d. of at least three replicates. YFP values significantly different from those under baseline conditions (expression using *INU1t*) are marked with an asterisk (*p* < 0.05) or a hash (*p* < 0.001).

both as (C) single and (D) multiple mutants. Frameshift mutations in each gene created the desired auxotrophies, and the strains so generated could be targeted with other plasmids to create defined double and triple mutants.

led us to hypothesise that correct integration in a wild-type strain does not preclude random integration. The contrasting low variation in YFP fluorescence seen in DNL4 and YKU80 mutants suggests that these backgrounds incorporate a single copy of the YFP cassette. On the other hand, NEJ1 mutants seemed to incorporate multiple copies of YFP. In the case of the latter, it is admittedly unclear if these extra copies are integrated sequentially at LAC4, or elsewhere in the genome. In summary, it is clear that DNL4 or YKU80 are the best NHEJ components to inactivate to ensure targeted, single chromosomal integrations of DNA. Extrapolating from research in S. cerevisiae, our findings are given weight by the different roles in NHEJ of the proteins we targeted. YKU80, as part of the Ku complex, is the first protein to bind to and stabilise double-stranded breaks, followed by DNL4 (Emerson and Bertuch, 2016). Therefore, the role of these "first responders" in DNA repair might make them better targets to thoroughly inactivate NHEJ. Furthermore, as other research has found that the binding of DNL4 to DNA may not require NEJ1 TABLE 3 | Gene targeting and integration efficiencies at the *LAC4* locus in different backgrounds.


*The data are presented as the mean* ± *s.d of three replicates.*

*<sup>a</sup>The percentage of white colonies with respect to all blue and white colonies; <sup>b</sup>Colony PCR was performed on eight white colonies per replicate to check for correct integration of the YFP cassette at LAC4.*

(Wu et al., 2008), and the latter enhances but is not essential for NHEJ entirely (Yang et al., 2015b), this may explain why the inactivation of NEJ1 in YBL003 was insufficient to ensure single-copy gene integration. In general, expression for a single integrated cassette was higher than that when the same cassette was expressed on a low-copy-number plasmid using the same antibiotic marker (**Figure 5B**); the same trend was seen across several promoters (**Figure S3**).

# Evaluation of Sites for Chromosomal Integration

While selecting chromosomal integration sites in the K. marxianus genome, we considered three criteria: (i) providing a visible or easily detectable phenotype on integration, simplifying screening if necessary, (ii) near clusters of activelytranscribed genes, and (iii) near essential genes to ensure that only cells with correctly-inserted DNA would survive(Mikkelsen et al., 2012). For insertion sites of the first type, we chose the LAC4 locus (I1); insertion there has a visible phenotype that does not affect core metabolism (as opposed to ADE2, for example). For selecting sites of the second type, we examined existing TSS-seq data (Lertwattanasakul et al., 2015) and found two such sites where the gene clusters were on the same coding strand: one each on chromosome IV and V (I2 and I4) (**Figure 5C**; **Table 2**). We also selected an insertion site upstream of ARO1, the pentafunctional protein involved in aromatic amino acid biosynthesis (I3); an incorrect insertion would result in auxotrophy for the aromatic amino acids phenylalanine, tyrosine, and tryptophan. While cloning the 5' (left) homology arms into level I plasmids, we added the overhangs matching the 5' multigene connector ConLS' by PCR, turning it into a type 1 part. This included the nested connector sequence containing the BsmBI site with the appropriate overhang for multigene constructs, which we added by PCR. This way the part with the homology arm can also be used to construct integrative vectors for a single expression cassette, as it still possesses the appropriate BsaI overhang at its 3' end to clone a TU in, as described below. The 3' (right) homology arms, as in the original YTK, contain BsaI overhangs corresponding to a yeast replicative origin, or type 7 part (Lee et al., 2015). Finally, the homology arms have AscI restriction sites directly flanking them to linearise the insertion vector prior to transformation, to improve integration efficiency.

Using these arms in place of connectors provided for TUs in the original YTK, we constructed novel integrative reporter vectors, each expressing YFP under the control of the strongest promoter we identified (PDC1pr; P2) and targeting I1 to I4. We then evaluated the integration efficiency at each site in KmASR.005, in each case using the same YFP expression cassette. Insertion efficiency was 100% at all loci, and while variations in YFP expression were observed at all loci, none varied by more than 25% (**Figure 5D**). However, this difference in expression may be more significant when expressing multiple genes in a heterologous biosynthetic pathway as opposed to a single fluorescent protein. Integrating the gene at I3, upstream of ARO1, did not cause the auxotrophies for Phe, Trp, and Tyr expected if the gene was disrupted, suggesting that the integration there was seamless as well. However, insertion sites near essential genes may not always be this accessible; another site we tested near an orthologue of the translation initiation factor TIF1 yielded <10% correct integrations in the wild-type strain and no colonies in an NHEJ-deficient background (data not shown).

# DISCUSSION

Standardised biological parts for the expression and maintenance of genes, and means to assemble them, are a cornerstone of synthetic biology to build new biological systems. On a more practical level, they allow us to accelerate the design-build-test cycle which forms the core of applied synthetic biology and its sister discipline, metabolic engineering. A versatile collection of these parts is the Yeast Toolkit collected 3 years ago. Grouped into eight categories of parts, the standard set forth by the YTK allows the hierarchical assembly of expression or storage vectors for expressing recombinant genes in S. cerevisiae with a wide range of selection markers, regulatory elements, and functional protein tags to fine-tune gene expression and engineer the genome as needed. Clearly, to move beyond being niche organisms in biotechnology, alternative yeasts require such part collections to make rapid metabolic engineering feasible. It is also advantageous to maintain a common standard for assembly to facilitate the exchange of parts between researchers and yeasts as needed. It was with this goal in mind that we selected and characterized the parts presented in this kit, while maintaining the YTK standard. Large part collections have been developed for other yeasts (Celinska et al., 2017; Prielhofer et al., 2017), and it is with such collections in mind that we have created ours. Using the established Golden Gate assembly protocols, we were able to assemble episomal and integrative reporter constructs from up to 8 component plasmids to characterize our parts.

Our extension of the YTK also includes the first collection of homology arms for insertion vectors targeting four loci in the K. marxianus genome, each with different characteristics. They are considered "full-length," but can easily be shortened using PCR or re-cloning in the case of their use in NHEJdeficient strains to an optimal length to minimize construct size without compromising gene targeting efficiency. As better genomic and transcriptomic knowledge of K. marxianus is acquired, more insertion sites and parts will be identified to be added to the modest set we provide. We also foresee the set being expanded by synthetic promoters, engineered promoters (as has been done with INU1pr) and secretion tags (Zhou et al., 2018).

While the promoters we have characterized are largely selected from strain CBS6556, all but one of them had >90% sequence similarity with strain DMKU3-1042 and with strain NBRC1777, where the characterization was carried out. Within the observed differences, only a few promoters do exhibit significant sequence differences that could affect gene expression between strains based on orthologous transcription factor binding sites from S. cerevisiae (**Table S7**). Sequence differences between promoters for the same gene in different yeast strains can have implications in gene expression, and therefore should be taken into consideration for experimental and industrial applications (Liu et al., 2015; de Paiva et al., 2018); however, at this stage not enough is known about K. marxianus' native transcription factors and regulatory network to functionally dissect our promoter sequences. Their measured activity under the selected expression conditions, and in two different strains, demonstrate their practical usability.

Alongside parts collections, the existence of genome editing tools for allelic replacement and deletion speeds up the creation of strains with defined mutant genotypes and mating, similar to standard S. cerevisiae lab strains. This also opens up the possibility of creating the best K. marxianus strains for the laboratory and industry using classical genetics methods and synthetic biology side-by-side (Cernak et al., 2018; Lee et al., 2018). It is with this end in mind that our collection also provides pUCC001, a Cas9/gRNA genome editing plasmid derived from the broad-host platform pUDP002 (Juergens et al., 2018), into which gRNA targets can be rapidly cloned by Golden Gate assembly simply as annealed and phosphorylated oligos. This saves the cost and time of cloning the entire gRNA expression cassette for every target as for the original plasmid. We have also cloned the ScTRP1 expression cassette for use in future strains auxotrophic for tryptophan. The YTK standard makes it possible to multiplex gRNA expression using pUCC001. In theory, Cas9 and multiple gRNA cassettes could be separately cloned as level II plasmids and then reassembled into a level III "multi-TU" plasmid. The ease of assembly of both individual gRNA plasmids and the Golden Gate assembly would provide a credible, if not more efficient, alternative to existing multiplexing systems, (Löbs et al., 2018). However, further optimization of the gRNA expression system is necessary for an optimal K. marxianus specific multiplexing system.

While investigating different NHEJ-deficient backgrounds, we found that inactivating YKU80 or DNL4, but not NEJ1, was the best way to eliminate multiple or random integration. Identifying such a background is beneficial to improve the efficiency and specificity of K. marxianus-based in vivo assembly techniques such as PGASO (Chang et al., 2012), and to further define a genotype for a potential future "lab strain" for K. marxianus. As much as the versatility of the YTK standard is of relevance to synthetic biologists and metabolic engineers, the parts we have gathered may be of broader interest in the long term. Kluyveromyces marxianus is slowly emerging from its niche applications to become an alternative cell factory to S. cerevisiae. Several efforts have been, and are being made, to make it produce bio-based compounds of value (Cheon et al., 2014; Kim et al., 2014; Lin et al., 2017). The lack of standardised parts, and efficient synthetic biology tools and strategies has limited the scope or sophistication of these efforts. We believe this collection can enrich the existing synthetic biology landscape of K. marxianus and allow researchers to make more informed choices for the

#### REFERENCES


more efficient, predictable and practical design and testing of future cell factories for a bio-based economy.

#### AUTHOR CONTRIBUTIONS

AR and JV carried out the experimental work, interpreted the data and wrote the manuscript. HJ carried out the experimental work. J-MD and JM conceived the study, supervised the research, interpreted the data and contributed to writing the manuscript.

### FUNDING

AR, JV, J-MD, and JM were supported by the CHASSY project which received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No. 720824. JV was a fellow in the YEASTCELL training network, which received funding from the People Programme (Marie Curie Actions) of the European Union's Seventh Framework Programme FP7/2007–2013/ under REA grant agreement n◦ 606795. HJ was supported by the BE-Basic R&D Program, which was granted an FES subsidy from the Dutch Ministry of Economic Affairs, Agriculture and Innovation (EL&I).

#### ACKNOWLEDGMENTS

We thank Jasmijn Hassing and Macarena Larroudé for advice and discussions regarding parts assembly. We acknowledge the help of Amy Bergin and Beth Mulcahy in cloning the parts plasmids.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fbioe. 2019.00097/full#supplementary-material

pathway manipulation in Yarrowia lipolytica. Microb. Biotechnol. 10, 450–455. doi: 10.1111/1751-7915.12605


Kluyveromyces marxianus by non-homologous end joining-mediated integrative transformation with genes from Saccharomyces cerevisiae. Yeast 30, 485–500. doi: 10.1002/yea.2985

Zhou, J., Zhu, P., Hu, X., Lu, H., and Yu, Y. (2018). Improved secretory expression of lignocellulolytic enzymes in Kluyveromyces marxianus by promoter and signal sequence engineering. Biotechnol. Biofuels 11:235. doi: 10.1186/s13068-018-1232-7

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Rajkumar, Varela, Juergens, Daran and Morrissey. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Production of 3-Hydroxypropanoic Acid From Glycerol by Metabolically Engineered Bacteria

#### Carsten Jers <sup>1</sup> \*, Aida Kalantari <sup>2</sup> , Abhroop Garg<sup>1</sup> and Ivan Mijakovic1,3

*<sup>1</sup> Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Lyngby, Denmark, <sup>2</sup> Department of Biomedical Engineering, Duke University, Durham, NC, United States, <sup>3</sup> Systems and Synthetic Biology Division, Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden*

#### Edited by:

*Rodrigo Ledesma-Amaro, Imperial College London, United Kingdom*

#### Reviewed by:

*Dae-Hee Lee, Korea Research Institute of Bioscience and Biotechnology (KRIBB), South Korea Vinod Kumar, Cranfield University, United Kingdom*

> \*Correspondence: *Carsten Jers cjer@biosustain.dtu.dk*

#### Specialty section:

*This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology*

Received: *30 September 2018* Accepted: *07 May 2019* Published: *24 May 2019*

#### Citation:

*Jers C, Kalantari A, Garg A and Mijakovic I (2019) Production of 3-Hydroxypropanoic Acid From Glycerol by Metabolically Engineered Bacteria. Front. Bioeng. Biotechnol. 7:124. doi: 10.3389/fbioe.2019.00124* 3-hydroxypropanoic acid (3-HP) is a valuable platform chemical with a high demand in the global market. 3-HP can be produced from various renewable resources. It is used as a precursor in industrial production of a number of chemicals, such as acrylic acid and its many derivatives. In its polymerized form, 3-HP can be used in bioplastic production. Several microbes naturally possess the biosynthetic pathways for production of 3-HP, and a number of these pathways have been introduced in some widely used cell factories, such as *Escherichia coli* and *Saccharomyces cerevisiae*. Latest advances in the field of metabolic engineering and synthetic biology have led to more efficient methods for bio-production of 3-HP. These include new approaches for introducing heterologous pathways, precise control of gene expression, rational enzyme engineering, redirecting the carbon flux based on *in silico* predictions using genome scale metabolic models, as well as optimizing fermentation conditions. Despite the fact that the production of 3-HP has been extensively explored in established industrially relevant cell factories, the current production processes have not yet reached the levels required for industrial exploitation. In this review, we explore the state of the art in 3-HP bio-production, comparing the yields and titers achieved in different microbial cell factories and we discuss possible methodologies that could make the final step toward industrially relevant cell factories.

Keywords: 3-hydroxypropanoic acid, glycerol, biosynthesis, cell factory, synthetic biology, metabolic engineering

# INTRODUCTION

The development of microbial cell factories is fueled by aspirations to develop sustainable processes based on renewable resources. The goal is mitigation of the negative environmental consequences of production of fuels, chemicals and other materials.

The development of microbial cell factories for production of the platform chemical 3-hydroxypropanoic acid (3-HP) has attracted much attention in the last decade. 3-HP is a non-chiral (optically inactive), small three-carbon molecule and a structural isomer of lactic acid. Specifically, 3-HP is a precursor for the production of a number of valuable chemicals including acrylic acid and bioplastics (**Figure 1**). Bio-based production of 3-HP also has the potential to "turn waste into a resource" since several metabolic pathways exist for converting glycerol, a by-product of biodiesel production, into 3-HP.

Here, we will briefly introduce the metabolic pathways that can be used to produce 3-HP from glycerol. Based on work accumulated over the last decade, we will present current knowledge pertaining to different production hosts, enzymes, and strategies for optimization of production pathway and host metabolism as well as process engineering for attaining highlevel production of 3-HP. Finally, we will discuss how recent developments in synthetic biology and metabolic engineering might form the basis for further improvements of production strains and the eventual goal of industrially viable and sustainable 3-HP production.

#### METABOLIC PATHWAYS FOR SYNTHESIS OF 3-HP STARTING FROM GLYCEROL

#### Metabolic Pathways Found in Nature

A number of microorganisms have been reported to naturally produce 3-HP using various pathways and diverse substrates such as glycerol, glucose, CO2, and uracil. Several reviews have described these in detail (Kumar et al., 2013a; de Fouchécour et al., 2018), and in this review we will focus only on the pathways for which glycerol is the substrate. Two pathways are known for conversion of glycerol into 3-HP: the CoA-dependent pathway and the CoA-independent pathway (**Figure 2**).

The CoA-dependent pathway has been most extensively studied in Lactobacillus reuteri and proceeds with conversion of glycerol to 3-hydroxypropanal (3-HPA) catalyzed by coenzyme B12-dependent glycerol hydratase (PduCDE). 3-HPA is subsequently converted to 3-HP via 3-hydroxypropanoyl-CoA and 3-hydroxypropanoyl-phosphate, catalyzed by the enzymes propionaldehyde dehydrogenase (PduP), phosphotransacylase (PduL), and propionate kinase (PduW), respectively (Dishisha et al., 2014).

In the CoA-independent pathway, glycerol is converted to 3- HP in 2 steps. As in the case of the CoA-dependent pathway, glycerol is first converted to 3-HPA, while in the second step, 3-HPA is converted directly to 3-HP in a reaction catalyzed by aldehyde dehydrogenase (Kumar et al., 2013a). While the CoA-independent pathway is by far the most employed for production of 3-HP using engineered bacteria, it appears to have little relevance in nature. This is possibly due to low activity of aldehyde dehydrogenase in wild type bacteria (Zhu et al., 2009).

#### PRODUCTION HOSTS

A number of microorganisms have been used for the production of 3-HP, both natural isolates as well as engineered microorganisms (**Table 1**). For the selection of an appropriate production host, there are several parameters to consider. The microorganism should demonstrate tolerance to organic acids, specifically 3-HP, as well as potentially toxic impurities in crude glycerol. As one of the widely used glycerol dehydratases is coenzyme B12-dependent, the production host should preferentially be capable of synthesizing coenzyme B12. Addition to the production medium instead of in situ synthesis would significantly increase the overall production cost.

Lactobacillus reuteri is capable of naturally producing 3-HP from glycerol via the CoA-dependent pathway (Luo et al., 2011). In one study, three L. reuteri strains were evaluated, and all were found to produce 3-HP (Burgé et al., 2015). It also was shown that 3-HPA is toxic at a concentration of 5 g/L while 2.5 g/L 3- HP is not (as long as pH is maintained above 5). Notably, L. reuteri is capable of synthesizing coenzyme B<sup>12</sup> (Burgé et al., 2015) and has been used for the bioconversion of biodieselderived glycerol into 3-HP (14 g/L) and 1,3-propanediol (1,3- PDO) (Dishisha et al., 2015). To our knowledge, there have been no attempts to engineer L. reuteri strains to improve production. Among lactic acid bacteria, production of 3-HP is not restricted to L. reuteri. Of 67 lactic acid bacteria isolates tested, 22 isolates belonging to Lactobacillus diolivorans and Lactobacillus collinoides were positive for 3-HP production from glycerol (Garai-Ibabe et al., 2008). Additionally, L. diolivorans is capable of using biodiesel-derived glycerol for the production of 3-HPA (Lindlbauer et al., 2017).

Klebsiella pneumoniae is a pathogenic bacterium that has been widely used for production of 3-HP. Along with E. coli, K. pneumoniae is one of the most frequently used hosts for strain development by genetic engineering for improved 3-HP production. In fact, the highest 3-HP titer reported, 83.8 g/L, was obtained in K. pneumoniae by the combining of optimized expression of aldehyde dehydrogenase (K. pneumoniae PuuC), blocking of lactic acid synthesis (ldh1, ldh2, and pta mutant), and optimization of the fermentation conditions (Li et al., 2016). An important asset of K. pneumoniae is its capability of producing coenzyme B<sup>12</sup> (Luo et al., 2011).

E. coli is commonly used for metabolic engineering to produce a wide variety of compounds. It has also been widely used as a chassis for the production of 3-HP. The highest 3-HP titer reported in E. coli so far is 71.9 g/L, using a strain in which, besides introduction of glycerol dehydratase and aldehyde dehydrogenase, the central metabolism was modified to reduce by-product formation (Chu et al., 2015). It has been indicated that careful choice of the strain could be of importance. A comparison of nine E. coli strains in which a heterologous pathway for 3-HP synthesis was introduced, demonstrated differences in 3-HP production, as well as in enzyme level and activity (Sankaranarayanan et al., 2014). A drawback of using E. coli is the fact that it is not naturally capable of producing coenzyme B12. However, insertion of the Pseudomonas denitrificans genes for coenzyme B<sup>12</sup> synthesis (more than 25 genes in 6 operons) on three plasmids led to the production of coenzyme B<sup>12</sup> under both anaerobic and aerobic conditions (Ko et al., 2014).

While K. pneumoniae and E. coli are by far the most widely used production hosts, a number of other bacteria have been engineered for 3-HP production as well. P. denitrificans synthesizes coenzyme B<sup>12</sup> in aerobic conditions where NAD(P)<sup>+</sup> is efficiently regenerated (Zhou et al., 2013). Introduction of glycerol dehydratase and glycerol dehydratase reactivase from K. pneumoniae allowed for the production of 3.4 g/L 3-HP. By further introducing the K. pneumoniae aldehyde dehydrogenase, the yield increased to 4.9 g/L (Zhou et al., 2013).

M\$ (yellow).

Another bacterium able to synthesize coenzyme B12, Shimwellia blattae, is a 1,3-PDO producer using a native coenzyme B12-dependent glycerol dehydratase. Introduction of various genes including aldehyde dehydrogenase from Pseudomonas putida KT2442 enabled the production of 0.26 g/L Poly-3HP using crude glycerol from biodiesel production as the substrate (Heinrich et al., 2013).

Corynebacterium glutamicum was engineered to produce 3- HP from glucose and xylose. By a combination of efforts, an impressive titer of 62.6 g/L 3-HP was obtained in a fed-batch fermentation (Chen et al., 2017). The steps taken to achieve this titer included enabling efficient production of glycerol from glucose, introduction of the K. pneumoniae pduCDEGH genes encoding diol dehydratase and its reactivase and Cupriavidor necator aldehyde dehydrogenase and modification in sugar uptake and glycolytic flux. In Bacillus subtilis, introduction of glycerol dehydratase along with its reactivases, and aldehyde dehydrogenase from K. pneumoniae, enabled the production of 3-HP from glycerol. Upon inactivation of glycerol kinase and optimization of growth conditions, a 3-HP titer of 10 g/L was obtained in a shake flask culture (Kalantari et al., 2017). A drawback of both of these bacteria is their inability to synthesize coenzyme B12, which thus has to be supplemented to the production medium.

In an engineered cyanobacterium Synechococcus elongatus CO<sup>2</sup> was converted via glycerol to 3-HP, albeit at very low titer (31.7 mg/L). Synechococcus elongatus is unable to synthesize coenzyme B12, but the problem was alleviated by producing 3- HP under anaerobic conditions using the oxygen-sensitive B12 independent glycerol dehydratase from Clostridium butyricum (Wang et al., 2015). The use of glycerol as an additional carbon source in a strain engineered to assimilate glycerol has been suggested for increasing productivity (Kanno and Atsumi, 2017).

# THE SUITE OF ENZYMES USED IN HETEROLOGOUS 3-HP SYNTHESIS PATHWAYS

As mentioned above, there are two principal pathways for production of 3-HP from glycerol. In both CoA-dependent and -independent pathways, the initial step is the conversion of glycerol to 3-HPA. In the CoA-dependent pathway, 3-HPA is converted via 3-hydroxypropanoyl-coenzyme A (3-HP-CoA) and 3-hydroxypropanoyl-phosphate (3-HP-P) to 3-HP, catalyzed by propionaldehyde dehydrogenase, phosphotransacylase, and propionate kinase, respectively (Dishisha et al., 2014). In the CoA-independent pathway, 3-HPA is converted directly to 3- HP by the action of aldehyde dehydrogenase (Kumar et al., 2013a). Both of the pathways have characteristics that are of importance for the design of cell factory and process conditions, for example, oxygen-sensitivity of enzymes and the need for cofactors such as coenzyme B12, and NAD+. In the following, we will describe the enzymes that make up these two pathways. Where available, kinetic data for the enzymes is presented in **Table 2**, to facilitate comparison.

# Glycerol and Diol Dehydratase

#### Coenzyme B12-Dependent Dehydratases

Coenzyme B12-dependent glycerol dehydratase (EC 4.2.1.30) and diol dehydratase (EC 4.2.1.28) are isofunctional enzymes that catalyse the dehydration of 1,2-diols to the corresponding aldehyde (e.g., glycerol to 3-HPA). Both glycerol dehydratase and diol dehydratase are composed of three subunits that form a dimer of a heterotrimer (α2β2γ2; Shibata et al., 1999; Yamanishi et al., 2002). In mycobacteria, the α and β subunits are fused in a single polypeptide (Liu et al., 2010). Both of the enzymes

catalyze a radical process in which an adenosyl radical is formed by homolytic cleavage of the Co-C bond in coenzyme B<sup>12</sup> (Daniel et al., 1998). The adenosyl radical abstracts a hydrogen from the substrate, glycerol, to generate a substrate radical. The substrate radical, upon rearrangement, re-abstracts a hydrogen to form the final product and regenerate coenzyme B<sup>12</sup> (Daniel et al., 1998). If a radical side reaction takes place, coenzyme B<sup>12</sup> is not regenerated, and instead, a catalytically inactive cobalamin species is formed, which binds tightly to the enzyme and thereby inactivates it (Toraya, 2003). Radical side reaction can be induced by both glycerol and oxygen, making the enzyme sensitive to oxygen (Wei et al., 2014). The inactive hydratase can be reactivated by the action of glycerol/diol dehydratase reactivase in the presence of ATP and intact coenzyme B<sup>12</sup> (Mori and Toraya, 1999). Glycerol/diol dehydratase reactivase consists of two subunits that form a heterotetramer (α2β2; Liao et al., 2003). The reactivase binds ATP and hydrolyzes it to ADP. The ADPbound reactivase forms a complex with the inactivated glycerol dehydratase. This leads to release of the damaged coenzyme B12. Subsequently, reactivase binds ATP and is released from the TABLE 1 | Overview of bacterial species applied for production of 3-HP.


*<sup>a</sup>Naturally produces 3-HPA.*

hydratase, which in turn binds coenzyme B<sup>12</sup> to regenerate the active form of the enzyme (Toraya, 2003).

From the above, it can be deduced that the expression of five proteins is necessary for the efficient catalysis of glycerol to 3-HPA. In the context of engineering microorganisms for 3- HP production, the most widely used glycerol dehydratase is that of K. pneumoniae (dhaB123, and gdrAB), which has been used in E. coli (Rathnasingh et al., 2009), K. pneumoniae (Wang et al., 2013), B. subtilis (Kalantari et al., 2017), and S. elongatus (Wang et al., 2015). The use of other dehydratases has also been reported. In E. coli, the glycerol dehydratase from Lactobacillus brevis encoded by dhaB123 and its reactivase dhaR12 were used (Kwak et al., 2013). The diol dehydratase (pduCDE) and activator (pduGH) from K. penumoniae were used in C. glutamicum (Chen et al., 2017).

Considering the problem of enzyme instability, it is somewhat surprising that relatively few glycerol dehydratases have been tested. These enzymes have, by now, been discovered in many bacteria and this should provide a resource for enzyme discovery endeavors. In fact, for three selected glycerol dehydratases, the α subunit was systematically swapped, which led to identification of several combinations with improved stability and activity (Qi et al., 2006). Interestingly, fusion of the α and β subunit of K. pneumoniae glycerol dehydratase led to an increase in the kcat (albeit with concomitant increase in KM; Wang et al., 2009). As mentioned, mycobacteria contain a glycerol dehydratase variant in which the α and β subunits are fused, but to our knowledge they have not been characterized or used for 3-HP production. Enzyme engineering could also prove beneficial for generation of enzyme variants with improved properties such as increased stability and activity. As an example, by rational engineering, improved resistance to mechanismbased inactivation was conferred to a glycerol dehydratase from Klebsiella oxytoga (Yamanishi et al., 2012).

#### Coenzyme B12-Independent Glycerol Dehydratase

This class of glycerol dehydratases is of particular interest since it negates the need for the rather costly coenzyme B12. This enzyme also performs a radical catalysis, but instead of coenzyme B12, it uses S-adenosylmethionine as a co-factor (Raynaud et al., 2003). This glycerol dehydratase is a homodimer and it is thus structurally simpler than its coenzyme B12-dependent



counterpart that is composed of three subunits (Raynaud et al., 2003). While it would seem the obvious enzyme of choice, it is important to note that it is extremely sensitive to oxygen and its use requires production under strict anaerobic conditions (Raynaud et al., 2003).

Considering the fact that coenzyme B<sup>12</sup> is not produced in most of the microorganisms that have been used as potential 3- HP production hosts, the use of a coenzyme B12-independent dehydratase would be desirable. So far, its use has only been reported in a few studies. The C. butyricum glycerol dehydratase was used in the cyanobacterium S. elongatus for production of 3-HP under anaerobic conditions. As expected, it was not functional under aerobic conditions (Wang et al., 2015). In case of E. coli, the coenzyme B12-independent dehydratase was used in the context of 1,3-PDO production. Although the reported 1,3- PDO production was relatively low, an accumulation of 3-HPA was observed, thus demonstrating functionality of the glycerol dehydratase (Dabrowski et al., 2012). Furthermore, it was used for engineering E. coli to produce poly (3-Hydroxypropanoate) (Andreeßen et al., 2010). The scarcity of reported use of this enzyme could be due to difficulties in engineering 3-HP producing strains using this dehydratase. Alternatively, and perhaps more likely, is that reports are scarce because of the requirement of strict anaerobic conditions for the enzyme to be functional, which is not compatible with growth of most production microorganisms.

#### Enzymes for Converting 3-HPA to 3HP Enzymes in the CoA-Dependent Pathway

In the CoA-dependent pathway, 3-HPA is converted via 3- HP-CoA and 3-HP-P into 3-HP. This requires three enzymes: propionaldehyde dehydrogenase, phosphotransacylase, and propionate kinase, respectively (Dishisha et al., 2014). The propionaldehyde dehydrogenase requires the cofactor NAD+, while one ATP is generated in the conversion of 3-HP-P to 3-HP. This pathway has been described in several organisms including L. reuteri and K. pneumoniae (Luo et al., 2012; Dishisha et al., 2014). The propionaldehyde dehydrogenase PduP from L. reuteri has been purified and characterized in vitro and shown to exhibit activity toward a broad spectrum of aldehydes, including 3-HPA, using either NAD<sup>+</sup> (preferred) or NADP<sup>+</sup> as a cofactor (Luo et al., 2011). Lactobacillus reuteri PduP was also reported to display substrate inhibition for 3-HPA (at 7 mM; Sabet-Azad et al., 2013).

The CoA-dependent pathway has been used in several engineered bacteria. The K. pneumoniae pduPLW genes were introduced in an E. coli strain harboring the K. pneumoniae glycerol dehydratase. Here they were shown to be functional (although the obtained titer was lower than that obtained using an aldehyde dehydrogenase from Azospirillum brasilense; Honjo et al., 2015). In K. pneumoniae, overexpression of the first gene in the pathway, pduP, led to a 4-fold (0.72 g/L) increase in the 3-HP titer (Luo et al., 2011). It could be speculated that the native K. pneumoniae CoA-dependent pathway enzymes were responsible for the conversion of 3-HP-CoA to 3-HP, and that over-expression of all three genes in the pathway would further increase the titer.

It remains to be seen whether the CoA-dependent pathway could be an alternative to the more widely used aldehyde dehydrogenases for the conversion of 3-HPA to 3-HP. Although both pathways use an NAD(P)+, ATP is generated in the CoAdependent pathway. However, this should be balanced against the metabolic burden of over-expressing three enzymes instead of one. Further, it is unclear if the substrate inhibition exhibited by L. reuteri PduP at a relatively low concentration of 3-HPA is a general trait of propionaldehyde dehydrogenase, or whether a more robust enzyme can be discovered or engineered.

#### Aldehyde Dehydrogenase

Aldehyde dehydrogenase (EC 1.2.1.3) has decidedly been the enzyme of choice for conversion of 3-HPA to 3-HP in engineered cells. It catalyses the conversion of an aldehyde to its corresponding carboxylic acid using NAD(P)<sup>+</sup> as a cofactor. In the context of the 3-HP production pathway, it is of utmost importance that enzyme activity is sufficient to ensure that no accumulation of the toxic intermediate 3-HPA takes place. Aldehyde dehydrogenases are present in most organisms, but generally their activity is not high enough to sustain a high production of 3-HP (Raj et al., 2008; Zhu et al., 2009). Consequently, many studies have focused on discovering suitable aldehyde dehydrogenases. Earlier studies mainly employed the aldehyde dehydrogenase AldH from E. coli K-12 in both, E. coli and K. pneumoniae (Raj et al., 2008; Zhu et al., 2009). In a study by Chu and co-workers, an indirect comparison (based on 3-HP production in E. coli) was made amongst 17 aldehyde dehydrogenases from various organisms, benchmarked against E. coli AldH. One aldehyde dehydrogenase performing better than AldH, GabD4 from C. necator, was identified (Chu et al., 2015). The enzyme was further improved by enzyme engineering, substantially increasing its Vmax and kcat/KM, and used for high-titer production of 3-HP in both E. coli (71.9 g/L) and C. glutamicum (62.6 g/L) (Chu et al., 2015; Chen et al., 2017).

The α-ketoglutaric semialdehyde dehydrogenase (KGSADH) from A. brasilense has also been used in several studies. KGSADH was evaluated against E. coli AldH and PuuC from K. pneumoniae (Ko et al., 2012). This study indicated higher activity (Vmax) of KGSADH in extracts of the K. pneumoniae strain, while in contrast it had the lowest Vmax and affinity for 3-HPA when enzymes purified from E. coli were evaluated (Ko et al., 2012). Whether this discrepancy is due to e.g., stability in vivo is unclear. The structure of KGSADH has been solved and has provided a basis for rational enzyme engineering. This has yielded enzyme variants with improved activity toward 3-HPA (Park et al., 2017; Son et al., 2017; Seok et al., 2018).

PuuC from K. pneumoniae has also been used in several production hosts, including K. pneumoniae and B. subtilis (Li et al., 2016; Kalantari et al., 2017). Substrate specificity was evaluated against 3-HPA and three other aldehydes indicating broad substrate specificity (and lowest activity when 3-HPA was used; Raj et al., 2010). Comparative studies indicated that E. coli AldH and C. necator GabD4 E209Q/E269Q were better alternatives for converting 3-HPA to 3-HP (Huang et al., 2012; Chen et al., 2017). Nevertheless, PuuC was successfully used in an engineered K. pneumoniae strain that produced 3-HP with a titer of 83.8 g/L (highest reported) in a bioreactor (Li et al., 2016).

Several other aldehyde dehydrogenases have been tested in the context of 3-HP production. E. coli YneI was shown to selectively target 3-HPA over other aldehydes (Luo et al., 2013), although it was not tested against succinic semialdehyde which was previously reported as its primary substrate (Kurihara et al., 2010). The aldehyde dehydrogenase DhaS from B. subtilis has also been suggested as an aldehyde dehydrogenase that is specific for 3-HPA (Su et al., 2015).

Due to the oxygen-sensitivity of glycerol dehydratase, 3-HP production is often performed under anaerobic or microaerophilic conditions, where the regeneration of NAD<sup>+</sup> is diminished. Thus, efficient 3-HP production becomes a compromise between assuring optimal glycerol dehydratase activity (low oxygen level) and NAD+-dependent aldehyde dehydrogenase activity (high oxygen level). An interesting strategy was suggested to overcome the limitation imposed by NAD+, namely, the use of an aldehyde oxidase in place of dehydrogenase. Li and co-workers characterized the NAD+ independent aldehyde oxidase from Pseudomonas sp. AIU 362 and found it to exhibit a broad substrate specificity but relatively low affinity for 3-HP. When expressed in K. pneumoniae, a relatively low 3-HP titer was obtained (Li et al., 2014). As can be seen in **Table 2**, the affinity of aldehyde oxidase to 3-HPA is several-fold lower than the described aldehyde dehydrogenases, with a K<sup>M</sup> in the range where 3-HPA becomes toxic to the cell. Considering the importance of maintaining a low concentration of 3-HPA, better enzymes would likely be needed to successfully exploit aldehyde oxidase for 3-HP production. To the best of our knowledge, no other attempts have been made at identifying a more suitable aldehyde oxidase for the conversion of 3-HPA to 3-HP.

As outlined in this section, many enzyme variants have been used in the design of 3-HP-producing strains. While some of the enzymes have been characterized biochemically, it is obvious from **Table 2** that this is not a norm in the field. Nevertheless, a better understanding of the catalytic properties of these enzymes could provide a platform for more optimal selection of enzymes for constructing 3-HP pathways.

#### STRAIN ENGINEERING

There are several parameters to consider when engineering bacteria to produce 3-HP. As covered in sections Production Hosts and The Suite of Enzymes Used in Heterologous 3-HP Synthesis Pathways, a suitable host should be established and in most cases some or all of the enzymes needed for converting glycerol to 3-HP should be introduced. This provides a basis for 3-HP production, but is not enough to ensure the desired high titers. There are several hurdles that need to be overcome, and these include, but are not limited to, proper balancing of enzyme activities to prevent buildup of the toxic intermediate 3-HPA, efficient channeling of substrate into product, cofactor regeneration, prevention of by-product formation and countering of stress. In the following, we will explore challenges and strategies that have been applied in the engineering of strains for 3-HP production.

### Improving 3-HP Production by Optimizing Expression of the Production Pathway

As illustrated above, a number of different enzymes have been used for the construction of synthetic operons to confer the ability to produce 3-HP to various bacteria. In a number of these studies, it was shown that further optimization of the expression of the pathway genes could enhance 3-HP production.

A critical parameter is to ensure that the intermediate product 3-HPA is kept at a low concentration. 3-HPA (also known as reuterin) is a broad range antimicrobial compound. Its minimal inhibitory concentrations for various bacteria are in the range of <1.9–50 mM, and it exerts its effect via modification of thiol groups of proteins and small molecules (Cleusix et al., 2007; Schaefer et al., 2010). Considering the fact that 3-HPA is toxic to bacteria even at minute concentrations, it is evident that balancing enzyme activities to prevent 3-HPA accumulation is critical. To this end, different approaches have been attempted to fine-tune the expression levels of glycerol dehydratase and aldehyde dehydrogenase. The order in which the genes encoding aldehyde dehydrogenase and glycerol dehydratase are arranged was shown to impact 3-HP production. The rationale here is that the gene adjacent to the promoter in an operon is normally more expressed. Specifically, the arrangement favoring aldehyde dehydrogenase expression led to a higher 3-HP production due to the diminished build-up of 3-HPA (Li et al., 2013a). Different promoters for driving the expression of aldehyde dehydrogenase gene has also been successfully evaluated (Li et al., 2016).

While these approaches are simple, they do not offer a high degree of tunability, and a number of more advanced approaches have been applied for optimization of expression levels. Optimization of expression level can be done by applying various 5′ untranslated regions (5′ UTRs) in the genetic constructs, which leads to differences in the translation rate. Using this approach, a better balance of the enzyme activities was obtained by fine-tuning expression of dhaB1 encoding a glycerol dehydratase subunit. Testing only four different UTRs led to construction of a strain that enabled a 2.4-fold improvement of 3-HP titer in shake flask experiments (Lim et al., 2016). Several tools are available for the prediction of translation rate of 5′UTRs, including the UTR Designer (Seo et al., 2013), and the RBS Calculator v2.0 (Espah Borujeni et al., 2014). To the same end, a study showed potential in tuning translation by modulation of the Shine-Dalgarno sequence for which an online tool, EMOPEC, is available (Bonde et al., 2016). For a selection of 106 Shine-Dalgarno sequences, the measured protein level was within 2 fold of the predicted level in 91 % of cases (Bonde et al., 2016). While in silico prediction of expression level is a very attractive venue, it should be noted that even for the good predictors, the number of false predictions appear to be significant (Bonde et al., 2016). Nevertheless, these computational approaches should allow for design and evaluation of smaller, more focused libraries, compared to random mutagenesis libraries.

In most studies, researchers have taken advantage of plasmids for the expression of relevant genes in the production host. While these studies provide proof of principle, they might not be optimal for industrial production due to the wellknown problem of plasmid instability in bioreactors (Gao et al., 2014). Consequently, the generation of genetically stable strains where expression cassettes are integrated into the genome will be a necessity. Additionally, the elimination of plasmids might improve the production via a reduced metabolic burden associated with plasmid maintenance and replication (Silva et al., 2012). To construct a plasmid-free strain, the E. coli aldH was inserted in the genome of K. pneumoniae. While 3-HP production was improved (over parent strain) the 3-HP titer upon production in shake flasks was low (Wang and Tian, 2017). In B. subtilis, a plasmid-free strain allowed the production of 3-HP in shake flasks with a titer of 10 g/L (Kalantari et al., 2017).

Considering that both multicopy vectors and strong promoters have been used for the expression of relevant genes, one foreseeable difficulty in shifting to a single (or a few) genome-integrated expression cassette will be to ensure sufficient protein synthesis in the cell. A recent study described an approach based on translational coupling between the gene of interest and an antibiotic resistance cassette, thus making a high rate of translation selectable via an increased resistance to an antibiotic (Rennig et al., 2018). With limited screening, the production of two selected proteins (a nanobody and an affibody) in E. coli was improved 2- and 10-fold, respectively (Rennig et al., 2018). The methodology was shown to also work in Gram-positive bacteria B. subtilis and Lactococcus lactis where production of a sialidase and a tyrosine ammonia lyase was improved 2- and 8-fold, respectively (Ferro et al., 2018).

#### Engineering Host Metabolism for Improving 3-HP Production

Besides integrating an efficient pathway for the conversion of glycerol to 3-HP, it is often necessary to further modulate the host metabolism, in order to direct the substrate into the production pathway more efficiently, eliminate unwanted byproducts and/or reduce stress.

One of the more common approaches is to ensure that glycerol is converted more efficiently into 3-HP. To this end, targeting the first step in the glycerol utilization pathways is common. In E. coli, two pathways for glycerol utilization exist. Glycerol is converted to the glycolytic metabolite dihydroxyacetone phosphate either via glycerol-3-phosphate catalyzed by glycerol kinase under aerobic conditions or via dihydroxyacetone catalyzed by glycerol dehydrogenase under anaerobic conditions (Durnin et al., 2009; **Figure 3**). Inactivating the gene glpK (which encodes glycerol kinase in E. coli) increased the titer of 3- HP 1.6-fold in an aerobic fed batch fermentation when co-fed with glucose and glycerol. Under the conditions used (aerobic), further inactivating the gene encoding glycerol dehydrogenase had no effect (Kim et al., 2014). Similarly, the 3-HP titer increased in B. subtilis co-fed on glycerol and glucose upon inactivation of

glpK (Kalantari et al., 2017). Inactivation of glpK is possible in process setups where glycerol is not the sole carbon source. When cells grow only on glycerol, targeting glpK becomes less trivial as the fluxes toward biomass accumulation and 3-HP production need to be balanced. In E. coli, the glpK was placed under control of an inducible promoter and fine-tuning of its expression led to an increase in both 3-HP titer and yield on glycerol (Jung et al., 2014). In that study, the gene glpF encoding glycerol facilitator was also overexpressed to increase the influx of glycerol in the cell, which led to a modest increase in 3-HP production (Jung et al., 2014). Interestingly, the opposite approach was evaluated by Su and co-workers. They overexpressed glycerol dehydrogenase in K. pneumoniae to stimulate growth, and found that this strain not only exhibited faster initial growth and similar final biomass yield while using less glycerol, but also exhibited increased 3-HP production (Su et al., 2014).

In natural 3-HP producers such as K. pneumoniae and L. reuteri, 3-HPA can be oxidized to 3-HP and reduced to 1,3- PDO (Zhu et al., 2009; Dishisha et al., 2014). The conversion of 3-HPA to 1,3-PDO is catalyzed by an oxidoreductase. In K. pneumoniae, the most important oxidoreductase appears to be the 1,3-propanediol reductase encoded by dhaT. However, significant amounts of 1,3-PDO are still produced in the dhaT mutant, indicating that other oxidoreductases in the cell can also catalyse this reaction (Ko et al., 2012). In a subsequent attempt to eliminate 1,3-PDO formation, four oxidoreductases were inactivated, but the quadruple mutant still retained the ability to produce 1,3-PDO (Ko et al., 2015). With respect to cofactor balance, 3-HP synthesis requires NAD<sup>+</sup> while 1,3- PDO formation requires NADH and thus regenerates NAD+. Thus, the elimination of 1,3-PDO formation adversely affects the NAD+/NADH balance, and in fact, it was reported that the dhaT mutation under 3-HP production conditions impeded cell growth (Ko et al., 2012). In E. coli, 3-HPA is also converted to 1,3-PDO by the action of a broad-range aldehyde oxidoreductase encoded by yqhD. While the deletion of yqhD dramatically reduces 1,3-PDO formation, it is also here not fully abolished, likely due to the presence of alternate oxidoreductases (Tokuyama et al., 2014).

Other important by-products in 3-HP production are lactate and acetate. The deletion of K. pneumoniae ldhA encoding lactate dehydrogenase was reported to eliminate lactate formation (Kumar et al., 2013b). Another study reported a reduction in lactate formation upon inactivation of ldh1 and ldh2 in K. pneumoniae (Li et al., 2016). Likewise, to reduce the acetate formation, the synthesis gene pta was inactivated, leading to a reduction but not elimination of acetate formation in K. pneuomoniae (Li et al., 2016). To further optimize 3-HP producing cell factories, it is essential to assure that by-product formation is reduced to a minimum. Besides the obvious loss of carbon that could have otherwise been used for biomass and product formation, it also leads to a more complex fermentation broth thus potentially increasing costs associated with downstream purification.

The NAD+/NADH balance is of importance for the 3-HP production due to the NAD+-dependence of the employed aldehyde dehydrogenases. The synthesis of by-products 1,3-PDO and lactate also generates NAD+. Elimination of respective pathways thus further skews the NAD+/NADH balance (**Figure 3**). Under aerobic conditions, NAD<sup>+</sup> is also regenerated by the electron transport chain. However, due to the oxygensensitivity of glycerol dehydratase, 3-HP production is often performed under microaerophilic or anaerobic conditions where NAD<sup>+</sup> regeneration is reduced. To counter these effects, various enzymes can be used to modulate the NAD+/NADH balance. In the context of 3-HP production, over-expression of the genes encoding either NADH oxidase, NADH dehydrogenase, or glycerol-3-phosphate dehydrogenase in K. pneumoniae showed potential in stimulating 3-HP production via regenerating the NAD<sup>+</sup> pool in all cases (Li et al., 2013b).

From an economical point of view, it would be beneficial to use a production strain capable of synthesizing coenzyme B12. However, even in K. pneumoniae which is capable thereof, it has been reported that addition of coenzyme B<sup>12</sup> increases the glycerol dehydratase activity (Ashok et al., 2013). This could indicate that upregulation of coenzyme B<sup>12</sup> synthesis is another parameter for increasing the glycerol dehydratase activity. Studies on Bacillus megaterium and more recently on P. denitrificans show good potential for increasing coenzyme B<sup>12</sup> synthesis, e.g., by removing the riboswitchbased feedback inhibition system (Biedendieck et al., 2010; Nguyen-Vo et al., 2018).

The application of genome scale metabolic models is a common approach in the field of metabolic engineering and has been used successfully in multiple studies. Using a genomescale metabolic model for K. pneumoniae, no single mutants that would improve 3-HP production were predicted. Instead, a double knockout of tpi (triose phosphate isomerase) and zwf (glucose-6-phosphate-1-dehydrogenase) involved in central metabolism was suggested. The introduction of these two mutations led to a 4.4-fold increase in the 3-HP yield compared to the parent strain (Tokuyama et al., 2014). Predictions were made of beneficial mutations in B. subtilis, but here only the more obvious candidate, glycerol kinase, was suggested as a beneficial mutation (Kalantari et al., 2017).

Another important aspect of optimizing the production strain is to improve the robustness of the strain, specifically, alleviating stress. Probably the most significant stressor is that imposed by the intermediate product 3-HPA and, to a lesser extent, the product 3-HP. Warnecke et al. developed a method to screen for regions and genes that could be involved in the tolerance to 3-HP. Starting with a genomic library of E. coli clones, each carrying a plasmid with a genomic DNA insert, they selected for increased tolerance to 3-HP and subsequently quantified the enriched plasmid DNA by microarray analysis (Warnecke et al., 2010). Using this method, they concluded that 3-HP inhibition is due to the limitations in the chorismate and threonine pathways. Surprisingly, they found that over-expression of almost any of the genes in the pathways dramatically alleviated the stress imposed by 3-HP (Warnecke et al., 2010). In a follow up study, the authors identified a 21-amino acid peptide that, when expressed, increased the 3-HP tolerance about 2.3-fold (Warnecke et al., 2012).

The potential of using omics data to identify the targets for improving 3-HP tolerance has been demonstrated. Using 2D gelbased proteomics, Liu and co-workers identified 46 up- and 23 down-regulated proteins upon challenging E. coli with 5 g/L of 3-HP. Over-expression of several of these proteins alleviated the stress imposed by 3-HP (Liu et al., 2016).

In recent years, there have been key improvements in the technology for generating targeted libraries of mutants, which is exemplified in a study by Liu and co-workers where a method termed iCREATE for iterative genome editing was used to generate 162,000 mutations in 115 genes (Liu et al., 2018). Using this approach, the production of 3-HP via the Malonyl-CoA pathway was increased 60-fold (Liu et al., 2018). However, for its successful employment, appropriate screening and/or selection techniques need to be in place. Small molecule biosensors based on transcription factors or riboswitches can couple metabolite concentration to a measurable output such as fluorescence for selection or selectable output such as antibiotic resistance (Liu et al., 2017). It is, thus, of particular importance that a 3-HP-inducible system was identified in P. putida recently and demonstrated to be functional when expressed in E. coli (Hanko et al., 2017). This biosensor has already found application for screening in the directed evolution of the aldehyde dehydrogenase KGSADH which improved the catalytic efficiency of the enzyme 2.8-fold (Seok et al., 2018).

Combining approaches such as iCREATE for generating diversity, and biosensors for detection or selection of improved variants should hold high promises for moving bacterial 3-HP production strains to industrially relevant levels of production.

### PROCESS ENGINEERING FOR OPTIMIZING 3-HP PRODUCTION

In a majority of the studies reviewed for this paper, the ability of the recombinant strains to produce 3-HP was tested in 1.5–5 L bioreactors operated in fed-batch mode (**Table 3**; **Supplementary Table 1**). A fed-batch operation typically entails growing the cells until the end of the exponential phase. This is followed by the addition of feed medium containing the substrates, thus keeping the growth rate of the microorganism at a desired level (Liu, 2017). A fed-batch mode of operation has certain advantages over batch fermentation. For instance, a fedbatch operation has the ability to prevent overflow metabolism, that is, the phenomenon where fast-growing cells, in the presence of oxygen, utilize fermentation instead of respiration to produce energy (Basan et al., 2015). Another advantage of a fed-batch operation is its ability to maintain the substrate concentration at a low level (de Fouchécour et al., 2018). This is particularly important when the substrate is glycerol. It was reported that no growth of K. pneumoniae cells could be seen above 110 and 133 g/L glycerol (extrapolated values) in aerobic and anaerobic conditions, respectively, with glycerol concentrations higher than 40 g/L showing significant growth inhibition (Cheng et al., 2005). Consequently, high titers of 3-HP cannot be obtained from glycerol in a batch fermentation. In L. reuteri, the conversion of glycerol to 3-HPA was reported to be about 10 times faster than the subsequent conversion of 3-HPA to 3-HP and 1,3- PDO, resulting in the accumulation of the toxic intermediate 3-HPA. To circumvent this problem, a fed-batch fermentation process was established. It produced 3-HP and 1,3-PDO without any accumulation of 3-HPA. This was achieved by identifying a maximum value of specific glycerol consumption rate at which only 1,3-PDO and 3-HP are produced, and no accumulation of 3-HPA takes place. By maintaining the glycerol amount in the bioreactor such that its specific consumption rate stayed under this maximum value, a 3-HP titer of 14 g/L with a productivity



*<sup>a</sup>Unless otherwise mentioned, the units of titer, yield, and productivity are g/L, mol*3−*HP/molGlycerol and g/L.h, respectively*

*<sup>b</sup>Calculated based on the published data*

*<sup>c</sup>Mutated catabolite repression element (CRE) in the upstream region of the pdu operon*

*<sup>d</sup>L. reuteri converted glycerol to 3-HP and 1,3-propanediol followed by G. oxydans converting 1,3-propanediol to 3-HP*

*<sup>e</sup>K. pneumoniae converted glycerol to 1,3-propanediol followed by G. oxydans converting 1,3-propanediol to 3-HP*

*adhE, alcohol dehydrogenase; aldA, aldehyde dehydrogenase A (E. coli); aldH,* γ*-glutamyl-*γ*-aminobutyraldehyde dehydrogenase (E. coli); aldHk, NAD*+*-dependent homolog of E. coli aldH; CRE, catabolite repression element; dhaB, glycerol dehydratase; dhaR, dhaB reactivating factor in L. brevis; dhaS, putative aldehyde dehydrogenase from B. subtilis; dhaT, 1,3-propanediol oxidoreductase; DO, dissolved oxygen; frdA, succinate dehydrogenase; gabD4, aldehyde dehydrogenase (Cupriavidus necator); gdrA, gdrB, glycerol dehydratase reactivase; glpF, glycerol uptake facilitator protein; glpK, glycerol kinase; KGSADH,* α*-ketoglutaric semialdehyde dehydrogenase (A. brasilense); ldhA, lactate dehydrogenase; n.a. , not available; pduP, propionaldehyde dehydrogenase; puuC,* γ*-glutamyl-*γ*-aminobutyraldehyde dehydrogenase (K. pneumoniae); yqhD, NADPH-dependent aldehyde reductase/alcohol dehydrogenase.*

of 0.25 g/L.h was obtained (Dishisha et al., 2014, 2015). This underlines how the knowledge of cell metabolism can be used to solve production problems by means of process engineering.

The challenge concerning the requirement of anaerobic conditions for coenzyme B<sup>12</sup> generation and aerobic conditions for efficient NAD<sup>+</sup> regeneration for production of 3-HP from glycerol was tackled by electro-fermentation in a recently published study (Kim et al., 2017). Certain microorganisms possess the ability to deliver electrons to solid electrodes when grown under anaerobic conditions. These exoelectrogens, as these microorganisms are called, can be used to control the redox state in the cells independent of the electron transport chain or fermentation. Electro-fermentation utilizes this property of exoelectrogens to drive the unbalanced fermentations. Anodic electro-fermentation has been used for bioconversion of glycerol to 3-HP using recombinant K. pneumoniae over-expressing KGSADH from A. brasilense. An electrical potential of + 0.5 V vs. Ag/AgCl was applied to the anode, with 2-hydroxy-1,4-naphthoquinone used for shuttling the electrons between the bacteria and the anode. The transfer of electrons from the bacteria to the anode during anaerobic fermentation led to a decrease in NADH/NAD<sup>+</sup> ratio in the cells, and hence to an enhanced 3-HP production as compared to the fermentative control (Kim et al., 2017). Although the 3-HP titer reached was low, the study nevertheless demonstrated that it is possible to improve NAD<sup>+</sup> regeneration under anaerobic conditions. With further optimization, this novel strategy might pave the way for providing new opportunities to enhance 3- HP production from glycerol. It might similarly enhance other bioconversions where there is a need to control the intracellular redox state of a cell factory (Kim et al., 2017; de Fouchécour et al., 2018).

In a fermentation operation, the process parameters such as aeration rate, medium composition, etc. play an important role in determining the final outcome. While there are ample reports on strain development, generally less focus has been put on the optimization of the production conditions. Conversion of glycerol to 3-HP being an oxidative reaction, the aeration rate does affect the final yield of the product (de Fouchécour et al., 2018). In one study, aerobic, micro-aerobic and anaerobic conditions were tested in batch fermentations performed to convert glycerol to 3-HP using recombinant K. pneumoniae. They reported that amongst the three conditions Jers et al. 3-HP Production in Bacteria

tested, the micro-aerobic condition yielded the highest 3-HP titer (2.2 g/L; Zhu et al., 2009). In a more recent study, the transcript levels of the genes encoding the enzymes of the CoA-independent pathway of glycerol conversion to 3- HP and the genes coding for the enzymes of the formate hydrogen lyase pathway were analyzed to understand the intracellular response of K. pneumoniae under aerobic, microaerobic and anaerobic growth conditions (Huang et al., 2016). The transcription of the glycerol dehydratase operon was downregulated in the presence of oxygen, while aldehyde dehydrogenase-, hydrogenase- and formate dehydrogenasecoding genes were found to be upregulated. The authors suggested that in the presence of oxygen, the formate hydrogen lyase pathway consumed the excess NADH generated due to the over-expression of aldH (Huang et al., 2016). In another study aimed at enhancing the coproduction of 3-HP and 1,3-PDO in K. pneumoniae, the production of by-products such as lactate, ethanol, succinate, and acetate was reduced by disrupting their synthesis (Ko et al., 2017). However, the disruption of pta-ackA gene of the acetate pathway led to a reduction in cell growth, glycerol uptake, and 3-HP and 1,3-PDO production. The authors bypassed the disruption of this gene for reducing acetate production by testing various agitation speeds between 200 and 600 rpm. They reported an increase in 3-HP yield from 0.18 to 0.38 mol3−HP/molglycerol, respectively, which could be attributed to an enhanced cell growth rate because of better oxygen transfer at 600 rpm. Although acetate production was not completely abolished, they did manage to reduce it from 163 mM (at 400 rpm) to only 80 mM (at 600 rpm; Ko et al., 2017). Recently, a production of 61.9 g/L 3-HP with a yield of 0.58 mol3−HP/molglycerol in 38 h in a 5 L bioreactor operated in fed-batch mode by using an engineered K. pneumoniae strain was reported (Jiang et al., 2018). The aeration rate was reduced to half of the initial rate once the cell biomass OD reached close to the maximum value. Furthermore, using the same strategy, they were able to scale-up the process in a 300 L bioreactor and reported a titer of 54.5 g/L for 3-HP with a yield of 0.58 mol3−HP/molglycerol in 51 h.

The growth medium used for cultivating a microorganism plays an important role in influencing the production process. The current record holder for the highest 3-HP titer, addressed this issue by testing different media for the production of 3-HP in K. pneumoniae in shake flasks (Li et al., 2016). A gradient concentration of each component of the media (except CaCl2) was individually analyzed. The ameliorated medium enabled an increase of 80.5% in the production of 3-HP. They also tested the effect of pH on 3-HP production and found pH 7.0 to be the most appropriate for 3-HP production (Li et al., 2016). To optimize the growth medium for 3-HP production using L. reuteri, 30 different media with variable amounts of sugar beet and wheat processing coproducts (used as the carbon source), yeast extract, tween 80 and vitamin B<sup>12</sup> were tested. The authors reported an increase in 3-HP production yield by 70 %, accompanied with a decreased 3-HPA titer (Couvreur et al., 2017).

Another way to solve the earlier-mentioned challenge of conflicting requirements for coenzyme B<sup>12</sup> generation and NAD<sup>+</sup> regeneration, is to divide the conversion of glycerol to 3-HP into 2 steps. First, glycerol is converted into either 3-HPA or a mixture of 3-HP and 1,3-PDO. The second step involves the conversion of 3-HPA and 1,3-PDO to 3-HP. This two-step strategy was employed in a study where wild-type L. reuteri resting cells first converted glycerol to 1,3-PDO and 3-HP in equimolar amounts. In the subsequent step, the supernatant was fed to the resting cells of Gluconobacter oxydans which oxidized 1,3-PDO to 3-HP, yielding a 3-HP titer of 23.6 g/L (Dishisha et al., 2015). In a similar study, Zhao et al. employed K. pneumoniae in the first step to convert glycerol into 1,3-PDO. In the second step, the resting cells of G. oxydans were introduced in the same bioreactor after heat inactivating K. pneumoniae. This led to the conversion of 1,3-PDO to 3-HP with a yield of 0.52 mol/mol on glycerol and 0.94 mol/mol on 1,3-PDO, and a 3-HP titer of 60.5 g/L (Zhao et al., 2015). The advantage of using two bioconversion steps is that each uncoupled step can be optimized separately. As attractive as it may appear because of high yields, the industrial implementation of this two-step strategy may not be as cost-effective and economical as a single step fermentation because of higher associated costs. Nevertheless, this strategy could well be a first step for achieving an integrated continuous process for the production of 3-HP, which in addition would be highly desirable because of its low operating costs (de Fouchécour et al., 2018).

# CONCLUSIONS AND PERSPECTIVES

As outlined in this review considering the case of 3-HP production in bacterial cell factories, there are many elements to the successful design of a suitable production strain, ranging from the choice of organism, pathway, and enzymes to engineering of the cell for improved flux through the production pathway and the ability to tolerate the stress imposed by intermediate, by-, and final products (**Figure 4**). Most of the studies described in this review have targeted some of these elements. The highest titer recorded so far is 83.8 g/L which was achieved in K. pneumoniae by optimizing the expression of aldehyde dehydrogenase in the production pathway and increasing flux through the production pathway by blocking synthesis of the by-products lactic acid and acetic acid (Li et al., 2016). This impressive feat was based on rational design and accomplished by combining some of the successful approaches outlined above in very few engineering steps.

For bacterial 3-HP production to reach a level of economic viability, further improvements in titer, and productivity will be necessary. By now, many different enzyme variants have been used for constructing 3-HP production pathways but in many cases the enzymes have not been characterized biochemically. A better understanding of the properties of the enzymes with respect to catalytic properties, pH range, and stability could prove instrumental in identifying the optimal enzymes and would also help to identify enzyme qualities to improve by enzyme engineering.

The impressive body of research on strain engineering has identified numerous challenges and solutions such as balancing enzyme activities, preventing by-product formation and alleviating stresses. With this accumulating knowledge base pertaining to 3-HP cell factory design, it seems likely that it will be possible to further tweak production by combining all of the successful approaches described herein. To facilitate this work, powerful engineering techniques are continually emerging that allow the design of large combinatorial libraries of targeted mutations. Notably, the recent development of a 3-HP biosensor can be used in combination with adaptive laboratory evolution for identifying as-of-yet unknown mechanisms for further fine-tuning of the production strains. While it seems reasonable that further improvements in titer will be achieved in the coming years, there should be an increased focus to also develop industrially relevant process designs and subsequent upscaling in order to provide proof of concept.

In conclusion, the massive developments in synthetic biology and metabolic engineering could be expected to soon deliver the long-sought goal of converting the abundant by-product glycerol into the much desired value-added chemical 3-HP.

# AUTHOR CONTRIBUTIONS

CJ, AG, and AK performed literature search and drafted the manuscript. All authors revised the manuscript and approved the final version.

#### FUNDING

This work was supported by grants from the Novo Nordisk Foundation (Grant no. NNF16OC0021474 and NNF10CC1016517) to IM.

#### ACKNOWLEDGMENTS

We would like to thank Kirsten Leistner for proofreading and revising the manuscript.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fbioe. 2019.00124/full#supplementary-material

#### REFERENCES


acid from glycerol by recombinant Klebsiella pneumoniae. Process Biochem. 47, 1135–1143 doi: 10.1016/j.procbio.2012.04.007


of Lactobacillus reuteri for 3-hydroxypropionic acid production from glycerol. Appl. Microbiol. Biotechnol. 89, 697–703. doi: 10.1007/s00253-010-2887-6


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Jers, Kalantari, Garg and Mijakovic. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Build Your Bioprocess on a Solid Strain—β-Carotene Production in Recombinant Saccharomyces cerevisiae

#### Edited by:

Rodrigo Ledesma-Amaro, Imperial College London, United Kingdom

#### Reviewed by:

Mattheos Koffas, Rensselaer Polytechnic Institute, United States Mingfeng Cao, University of Illinois at Urbana-Champaign, United States

> \*Correspondence: Eduardo Agosin agosin@ing.puc.cl

†These authors have contributed equally to this work

‡Present Address:

Maximiliano Ibaceta, Advanced Biotechnology Inc., Totowa, NJ, United States

#### Specialty section:

This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology

Received: 24 January 2019 Accepted: 03 July 2019 Published: 18 July 2019

#### Citation:

López J, Cataldo VF, Peña M, Saa PA, Saitua F, Ibaceta M and Agosin E (2019) Build Your Bioprocess on a Solid Strain—β-Carotene Production in Recombinant Saccharomyces cerevisiae. Front. Bioeng. Biotechnol. 7:171. doi: 10.3389/fbioe.2019.00171 Javiera López 1,2†, Vicente F. Cataldo2†, Manuel Peña<sup>2</sup> , Pedro A. Saa<sup>2</sup> , Francisco Saitua<sup>1</sup> , Maximiliano Ibaceta2‡ and Eduardo Agosin1,2 \*

<sup>1</sup> Centro de Aromas and Sabores, DICTUC S.A., Santiago, Chile, <sup>2</sup> Department of Chemical and Bioprocess Engineering, School of Engineering, Pontificia Universidad Católica de Chile, Santiago, Chile

Robust fermentation performance of microbial cell factories is critical for successful scaling of a biotechnological process. From shake flask cultivations to industrial-scale bioreactors, consistent strain behavior is fundamental to achieve the production targets. To assert the importance of this feature, we evaluated the impact of the yeast strain design and construction method on process scalability -from shake flasks to bench-scale fed-batch fermentations- using two recombinant Saccharomyces cerevisiae strains capable of producing β-carotene; SM14 and βcar1.2 strains. SM14 strain, obtained previously from adaptive evolution experiments, was capable to accumulate up to 21 mg/gDCW of β-carotene in 72 h shake flask cultures; while the βcar1.2, constructed by overexpression of carotenogenic genes, only accumulated 5.8 mg/gDCW of carotene. Surprisingly, fed-batch cultivation of these strains in 1L bioreactors resulted in opposite performances. βcar1.2 strain reached much higher biomass and β-carotene productivities (1.57 g/L/h and 10.9 mg/L/h, respectively) than SM14 strain (0.48 g/L/h and 3.1 mg/L/h, respectively). Final β-carotene titers were 210 and 750 mg/L after 80 h cultivation for SM14 and βcar1.2 strains, respectively. Our results indicate that these substantial differences in fermentation parameters are mainly a consequence of the exacerbated Crabtree effect of the SM14 strain. We also found that the strategy used to integrate the carotenogenic genes into the chromosomes affected the genetic stability of strains, although the impact was significantly minor. Overall, our results indicate that shake flasks fermentation parameters are poor predictors of the fermentation performance under industrial-like conditions, and that appropriate construction designs and performance tests must be conducted to properly assess the scalability of the strain and the bioprocess.

Keywords: bioprocess, scale up, fermentation, Saccharomyces cerevisiae, β-carotene

# INTRODUCTION

The improvement of experimental and analytical techniques, together with emerging synthetic biology tools, have enabled rapid and precise genetic manipulation of microorganisms for the industrial production of diverse compounds (Ajikumar et al., 2010; Paddon and Keasling, 2014; Meadows et al., 2016; Lian et al., 2018). These advances have greatly accelerated the construction and evaluation of promising microbial cell factories replacing traditional chemical synthesis processes, however, there are still many obstacles that difficult scaling lab-scale experiments up to economically attractive industrial bioprocesses (Yadav et al., 2012). In addition to low production titers, yields, and productivities, the often unpredictable physiology of microbial cell factories renders the scale-up process an expensive, timeconsuming, labor-intensive task (Woolston et al., 2013; Wu et al., 2016). While much attention has been paid to the development of upstream operations (i.e., strain construction), seldom assessed under production conditions (i.e., fed-batch cultivations), the production performance is typically only evaluated in laboratoryscale batch cultures, which may greatly differ from the actual process behavior (Lee and Kim, 2015; Petzold et al., 2015; Gustavsson and Lee, 2016).

A family of interesting compounds with seemingly attractive scalability potential in yeast cell factories are carotenoids, particularly β-carotene (Mata-Gómez et al., 2014; Larroude et al., 2018). This C<sup>40</sup> isoprenoid is widely used in the food and health industries as feed additive and/or nutraceutical (Mata-Gómez et al., 2014; Niu et al., 2017). A number of studies have reported high heterologous β-carotene production in shake flasks (Yamano et al., 1994; Verwaal et al., 2007; Li et al., 2013, 2017; Zhao et al., 2015) and bench-scale batch fermentations (Reyes et al., 2014; Olson et al., 2016). However, there is scarce data about the scalability of the bioprocess under more realistic production conditions (i.e., fed-batch mode). To the best of our knowledge, there is only one study reporting high β-carotene production in a recombinant yeast strain during fed-batch fermentations (achieving up to 20.8 mg/gDCW total carotenoid content and 9.6 mg/L/h volumetric productivity) (Xie et al., 2015). However, analysis of the overall scalability of the generated strain and proposed bioprocess are still lacking.

In this work, we evaluated the impact of the design and construction method on the production performance and scalability potential of two engineered yeast strains. As a case study, we chose the heterologous production of β-carotene in S. cerevisiae. The evaluated strains were built following two radically different approaches; one transformed and evolved by adaptive evolution using oxidative stress as selective pressure for accumulating high amounts of β-carotene (SM14)—to date the highest carotenoid-accumulating yeast strain in test tubes (18 mg/gDCW) (Reyes et al., 2014) and batch fermenters (25 mg/gDCW) (Olson et al., 2016); and the other, constructed using the industrial CEN.PK2-1c strain as metabolic chassis and transformed employing state-of-the-art molecular biology tools (βcar1.2). In addition to the usual fermentation characterization in shake flasks, genomic stability assays and fed-batch culture experiments in bench-scale bioreactors were performed to assess the suitability of the strains for high and robust β-carotene production. Our results show that conclusions drawn from preliminary characterizations performed under settings different than the actual production conditions can be misleading and that rigorous evaluation of the producer strains should be conducted to properly assess the scalability of both, the strain and the bioprocess.

# MATERIALS AND METHODS

# Plasmids Construction

Integrative plasmids needed for heterologous β-carotene production in the S. cerevisiae CEN.PK2-1c strains were constructed using Gibson assembly (Gibson, 2009). The carotenogenic enzyme genes (crtE, crtYB, and crtI) from Xanthophyllomyces dendrorhous were amplified by PCR, using genomic DNA from the S. cerevisiae SM14 strain as template. The catalytic domain of the truncated HMG-CoA reductase gene (tHMG1) was amplified by PCR using genomic DNA from the CEN.PK2-1c strain of S. cerevisiae as template. Backbone vectors were amplified by PCR, using the plasmid library developed by Mikkelsen et al. (2012) with primer pair homology to either TEF1 or PGK1 promoters, and to either ADH1 or tCYC1 terminators. All PCR products were gel-extracted to eliminate original vector residues. Purified PCR amplified vectors (100 ng) were mixed in a molar ratio depending on their length, following the manufacturer's instruction. DNA fragments were mixed with home-made Gibson master mix (5X isothermal mix buffer, T5 exonuclease 1 U/µL, Phusion DNA polymerase 2 U/µL, Taq DNA ligase 40 U/µL and Milli-Q purified water) until reaching 10 µL working volume. The mixture was incubated for 60 min at 50◦C. Finally, the reaction mix was used to transform chemically competent E. coli TOP10 cells (ThermoFisher, USA). All vectors contained one marker gene (URA3, TRP1 or LEU2) depending on the parental strain auxotrophy.

#### Strains

Two S. cerevisiae strains, SM14 and βcar1.2, were employed to compare their production performance and evaluate their genetic stability. The SM14 is a β-carotene hyper-producer strain derived from adaptive evolution experiments (Reyes et al., 2014). For the genomic stability evaluation, the SM14 was transformed with a 120-bp PCR product containing a 60-base-pair homology with the flanking regions of the URA3 gene, yielding a yeast strain with uracil auxotrophy (SM14-1URA3). In the case of βcar1.2 strain, the CEN.PK2-1c strain was the chassis employed for its incremental construction through βcar1 and βcar1.1 strains. A summary of the strains employed in this study and their genotype is shown in **Table 1**.

Transformations were performed using lithium acetate/single-stranded DNA carrier/PEG procedure (Gietz and Woods, 2002) and SC proper plates for transformants selection. Finally, correct cassette integration into the specific loci was tested by colony PCR, and carotenoid production was evaluated in YPD medium at 30◦C after 72 h (see Carotenoid extraction and analysis).

TABLE 1 | Strains used in this study.


#### Shake Flask Cultures and Genetic Stability Evaluation

For ethanol, glucose, acetate, biomass and total carotenoids quantification, a single colony was picked from YPD or CSM (with or without auxotrophy) agar plates, subcultured in tubes overnight at 30◦C and 160 rpm in a rotary shaker incubator in 3 mL of YPD medium. On the next day, the optical density at 600 nm (OD600) of each culture tube was measured (see Biomass determination). The content of the tubes was then transferred to a 250 mL baffled shake flask with 50 mL final culture volume at an initial OD600 of 0.1 in YPD medium. Culture samples were periodically collected for biomass, extracellular metabolites and carotenoids quantification.

Genetic stability was evaluated in 72-h batch cultures, following the same shake-flask cultivation protocol. After 72 h, an aliquot of each culture (previously diluted to 1 mL at OD600 = 10) was diluted 10,000-fold and 100 µL were plated in YPD. Furthermore, the kinetics of carotenogenic gene loss after several generations was determined in exponential phase cultures. For this purpose, the cultures were started at OD600 of 0.1 in YPD until late exponential phase (12 h) and then diluted again to an OD600 of 0.1 (twice per day). Samples at different cultivation times were diluted and plated in YPD. After 4 days at 30◦C, orange and white colonies were counted from all plates. The number of generations was calculated according to the duplication time of the cultures.

## Fermentation Conditions and Culture Media

Fed-batch cultures were performed in 1-L in-house bioreactors equipped with a condenser, a stirrer and two Rushton turbines operated with brushless DC motors (Oriental Motor, Japan). A SIMATIC PCS7 control system (Siemens, Germany) was used to monitor and control the cultivations at 30◦C, pH = 5.0 and dissolved oxygen above 2.8 mg/L. These culture conditions were employed throughout the entire study for all fermentations. Aerobiosis was maintained with a modified split-range control scheme varying the agitation, air and pure oxygen gas flows (Cárcamo et al., 2014). Briefly, as the oxygen demand increases, the control scheme first increases the agitation from 200 to 500 rpm, then the air flow from 0.3 to 1 L/min, and finally, if needed, pure oxygen gas flow from 0.05 to 1 L/min with the concomitant decrease in air flow, thereby maintaining the total gas inflow constant.

The batch medium used in the bioreactor cultivations of both strains contained (per liter): 20 g glucose, 15 g ammonium sulfate, 500 mg leucine, 160 mg histidine, 100 mg tryptophan, 4 g KH2PO4, 1.2 g MgSO4·7H2O, and 150 mg NaCl. The medium was supplemented with 15 mL/L of a vitamin solution, 3 mL/L of a trace solution, 0.75 mL/L of a CaCl2·2H2O solution at 40 g/L, and 0.75 mL/L of a FeSO4·7H2O solution at 4.2 g/L. The trace solution is composed of 3.3 g/L of zinc sulfate heptahydrate, 2 g/L of cobalt chloride hexahydrate, 3.3 g/L of manganese sulfate monohydrate, 4.67 g/L of copper sulfate pentahydrate, 2 g/L of boric acid, 0.2 g/L of potassium iodide, 0.46 g/L of molybdic acid sodium salt dihydrate and 8 g/L of EDTA. In addition, the vitamin solution consists of 0.05 g/L of D-Biotin, 5 g/L of calcium pantothenate, 3.75 g/L of nicotinic acid, 40 g/L of myo-inositol, 1 g/L of thiamine- HCl, 2.5 g/L of Pyridoxine-HCl, 0.02 g/L of p-aminobenzoic acid, 1 g/L of riboflavin and 0.02 g/L of folic acid. Finally, in the case of the fed-batch cultures, the fed-batch feeding contained (per liter): 450 g glucose, 15 g KH2PO4, 5.5 g MgSO4·7H2O, and 15 mL/L casamino acids solution at 75 g/L. The fed-batch feeding was supplemented with 15 mL/L and 9 mL/L of the previous vitamin and trace solutions, respectively, 1.35 mL/L of CaCl2·2H2O solution at 400 g/L, and 1.35 mL/L of a FeSO4·7H2O solution at 84 g /L. Both vitamin and trace solutions were filter-sterilized before their use in all the mentioned media.

#### Batch Cultures

A single colony from a working plate was cultured in 3 mL YPD medium in a pre-inoculum tube at 30◦C for 10–14 h (overnight). On the next morning, 1 mL was cultured in a shake flask with 20 mL YPD medium for 8 h. Batch cultures in bioreactors were inoculated to a final concentration of OD600 of 0.1. Fermentation conditions were the same as the previous section. Culture samples were periodically collected every 2–3 h for biomass, extracellular metabolites and carotenoids quantification. Batch fermentations were stopped when all major carbon sources were exhausted (i.e., glucose, ethanol and acetic acid).

#### Fed-Batch Cultures

Fed-batch cultures were fed following an exponential feeding (Equation 1, 2) with an exponentially decreasing specific growth rate set (µset) (Equation 3). Both S. cerevisiae strains (SM14 and βcar1.2) followed the same feeding profile, albeit with different initial parameters, depending on the fermentation profile observed in the shake flask cultivations. As both strains behaved differently in these cultures (refer to ethanol and βcarotene production, **Figures 1C,D**), and in order to provide a fair evaluation, we employed slightly different feeding parameters

so that the strains could be more easily compared. For both strains, the feeding started after both glucose and ethanol were depleted using an initial fixed specific growth rate. The exponentially decreasing specific growth rate feeding strategy was started once the biomass concentration reached approx. an OD<sup>600</sup> of 60. The general feeding profile is described by the conventional exponential formula (Villadsen and Patil, 2007),

$$F(t) = F\_{\text{in}} \cdot \exp\{\mu\_{\text{set}}(t) \cdot t\} \tag{1}$$

$$F\_{\rm in} = \mu\_{\rm set} (XV)\_{\rm in} / (Y\_{\rm sx} \cdot \text{S}\_{\rm in}) \tag{2}$$

where F<sup>0</sup> describes the initial feeding rate. The parameters of the latter formula correspond to the total amount of biomass in the reactor at the beginning of the feeding (XV)in, the biomass yield on glucose (Ysx), and the glucose concentration in the feed (Sin). In the case of µset(t), this parameter is strain-dependent and was defined by Equation 3 with values between 0.1–0.13 h−<sup>1</sup> for µinit and 0.03 h−<sup>1</sup> for µend.

$$
\mu\_{\rm set}(t) = (\mu\_{\rm init} - \mu\_{\rm end}) \cdot \exp\{-\mu\_{\rm end}(t) \cdot t\} + \mu\_{\rm end} \tag{3}
$$

Finally, culture samples were periodically collected for biomass, extracellular metabolites and carotenoids quantification.

#### Biomass Determination

Biomass concentration was determined by optical density (OD600) using an UV-160 UV-visible spectrophotometer (Shimadzu, Japan). Biomass concentration was estimated using the linear relationship: 1 OD<sup>600</sup> = 0.4 g/L determined experimentally.

#### Extracellular Metabolite Quantification

Culture samples were centrifuged at 10,000 rpm for 3 min and the supernatant stored at −80◦C for metabolite analysis. Extracellular glucose, ethanol and acetic acid concentrations were quantified in duplicate by High-Performance Liquid Chromatography (HPLC) as detailed in Sánchez et al. (2014).

#### Carotenoid Extraction and Analysis

For each sample, 8 mg of biomass were pelleted into 2 mL Eppendorf tubes and the supernatant was discarded. Four hundred microliter of acid-washed glass beads (Sigma Aldrich, USA) and 1 mL of hexane were then added for cell disruption and carotenoid extraction from the cell membranes. Cells were disrupted at room temperature in a BeadBug6 cell homogenizer (Benchmark Scientific, USA) using a program consisting in 4 cycles of 90 s of disruption at 3,700 rpm, followed by a 10-s rest. Cell lysate was then centrifuged and the supernatant (hexane) was stored at −80◦C until further analysis. Carotenoid quantification was performed by measuring the absorbance at 453 nm of the hexane extracts, and then converted into concentrations using a standard curve of β-carotene ranging from 0.5 to 10 mg/L.

#### RESULTS

# Construction of βCar Yeast Strains

A series of β-carotene-producing yeast strains (βcar) were constructed using a CEN.PK strain as host cell. Integration of carotenogenic genes in stable constructs was achieved using two different promoters (in a bidirectional arrangement) and two different terminators. The integration of the three carotenogenic genes (crtE, crtYB and crtI), together with the tHMG1 gene, resulted in a strain (βcar1) that generated faint orange colonies. This strain was capable of accumulating 0.034 mg/gDCW βcarotene in shake flasks after 72 h of incubation. This initial βcar strain was further optimized by integrating two extra copies of the tHMG1 gene (βcar1.1 strain) and, then, by adding an extra copy of the crtYB and crtI genes (final βcar1.2 strain) (**Figures 1A,B**). The βcar1.1 and βcar1.2 strains accumulated, respectively, 0.3 mg/gDCW and 5.8 mg/gDCW β-carotene in shake flasks after 72 h of incubation (**Figure 1A**).

### Strain Performance in Shake Flask Cultures

Shake flask batch cultures of the βcar1.2 and SM14 displayed similar fermentation profiles during the first 24 h incubation (**Figures 1C,D**). Both, glucose consumption and ethanol production behaved similarly during the first 12 h, and there were no substantial differences in β-carotene levels before 24 h. In addition, the maximum specific growth rate reached similar values during this time (0.43 h−<sup>1</sup> for SM14 and 0.40 h−<sup>1</sup> for βcar1.2), consistent with the profiles of the main fermentation substrates and products. However, this trend was not maintained throughout the cultivation. While βcar1.2 showed an almost constant β-carotene concentration after the first 24 h of cultivation (∼5 mg/gDCW), SM14 quadrupled this value, rising from 7.6 mg/gDCW at 24 h, to 21 mg/gDCW after 72 h cultivation (**Figures 1C,D**). Interestingly, the SM14 strain displayed an important increase in the specific β-carotene production rate upon reaching the stationary phase, which was not replicated by the βcar1.2 strain. Altogether, the final β-carotene titer of the SM14 strain reached 159.6 mg/L by the end of the shake flask fermentation, roughly four times higher than the βcar1.2 titers.

#### Strain Performance in Fed-Batch Cultures

SM14 strain was grown in fed-batch mode to evaluate its performance under production conditions (**Figure 2A**). Once the glucose from the batch phase was depleted (after approx. 26 h of cultivation), the feeding was started according to Equation 1 with a constant specific growth rate set of 0.1 h−<sup>1</sup> . After 65 h of cultivation and upon reaching 60 OD600, the exponentially decreasing specific growth rate set protocol was initiated using an initial µinit of 0.1 h−<sup>1</sup> and a final µend of 0.03 h−<sup>1</sup> to be reached within the next 24 h. This change in the feeding policy was performed to enable reaching higher cell densities using a more conservative feeding strategy.

After 68 h cultivation, biomass growth stopped, which was consistent with the glucose (2.4 g/L) accumulation (**Figure 2A**). At this point, biomass concentration reached 32.3 g/L (72.6 OD600, biomass volumetric productivity q<sup>X</sup> = 0.475 gDCW/L/h)

and the carotenoid content of the cells was 6.48 mg/gDCW, yielding a total carotenoid titer of 209 mg/L (carotenoid volumetric productivity q<sup>C</sup> = 3.07 mg/L/h) (**Table 2**).

Both biomass and carotenoid concentrations plateaued thereafter. For instance, after 77 h of cultivation, 30.5 g/L of biomass (76.2 OD600, q<sup>X</sup> = 0.396 gDCW/L/h) with a carotenoid content of 7.18 mg/gDCW were achieved, yielding a total carotenoid titer of 218 mg/L (carotenoid volumetric productivity q<sup>C</sup> = 2.83 mg/L/h). Finally, ethanol accumulation started as soon as the constant exponential feed was initiated, reaching concentrations of 12.3 g/L at 68 h cultivation and >18 g/L after 77 h.

Similar to SM14 strain, the engineered βcar1.2 strain, was evaluated in fed-batch fermentations under production conditions (**Figure 2B**). Once the glucose and ethanol from the batch phase were depleted, the feeding was started according to Equation 1 with a constant specific growth rate set of 0.13 h−<sup>1</sup> . A slightly higher µset was employed in this case as this strain showed a higher µcritical, as determined in preliminary fermentations. Again, upon reaching 60 OD<sup>600</sup> (approx. after 33 h cultivation), the feeding policy was changed to the exponentially decreasing policy with an initial µinit of 0.13 h −1 and a final µend of 0.03 h−<sup>1</sup> .

After 68 h of fed-batch cultivation, the βcar1.2 strain reached a biomass concentration of 107.1 gDCW/L (267.9 OD600, biomass volumetric productivity q<sup>X</sup> = 1.576 gDCW/L/h) with a carotenoid content of 6.91 mg/gDCW, overall yielding a total carotenoid titer of 739.6 mg/L (carotenoid volumetric productivity q<sup>C</sup> = 10.88 mg/L/h). Later fermentation results were consistent with this data. After 77 h of fermentation, 103.8 g/L of biomass were produced, yielding a volumetric productivity of 1,34 gDCW/L/h. Importantly, the biomass productivity of the βcar1.2 strain significantly outperformed (∼3.5-fold higher) the previous results for the SM14 strain (**Table 2**). Likewise, the carotene productivity of the βcar1.2 strain was 3.3-fold higher than that reached by the SM14 strain. After 77 h cultivation, a total carotenoid titer of 729 mg/L (7 mg/gDCW carotene yield) was achieved with this strain (**Table 2**).

#### Strain Stability Analysis

In order to determine if genetic instability had a significant impact on β-carotene productivity, we evaluated the rate of generation of white cells in the yeast population, which is indicative of carotenogenic genes loss. The SM14 strain showed a 3.9% of white colonies when 72-h shake flask cultures were plated, indicating an intrinsic genetic instability of the carotenogenic construct (**Figure 3C**). Since this strain contained repeated URA3 loci (URA3 and ura3-52) - flanking the carotenogenic genes -, and the same TDH3 promoters and CYC1 terminators for the latter genes, homologous recombination

between direct repeat sequences, e.g., URA3 and ura3-52, had likely occurred (**Figure 3A**). In order to avoid possible recombination of the construct and improve the genetic stability of this strain, the URA3 marker was deleted and a new strain, SM14-1URA3, was generated. Targeted deletion of the marker decreased the white colony number in YPD plates from 3.9% to 1.3% (**Figure 3C**). However, the SM14-1URA3 still showed carotenogenic gene loss, suggesting an inter-promoter or interterminator homologous recombination. Genomic PCR results further supported the loss of all carotenogenic genes in the white phenotype of the SM14 and the loss of two genes (crtI and crtYB) in the case of the white phenotype of the SM14- 1URA3 (**Figure 3D**). These results are in concordance with the recombination scheme proposed, i.e., URA3 loci recombination in SM14 and inter-promoter recombination in SM14-1URA3, although other recombinations cannot be discarded (e.g., interpromoter recombination in SM14).

**Figure 3E** shows a more detailed analysis of the genetic stability of the SM14 strain. We observed an increase in the proportion of white cells as the number of generations progressed, consistent with a rate of appearance of 0,06% white cells/generation. Notably, after 15 generations (close to the duration of a fed-batch cultivation), the proportion of white cells reached ∼1.7 %.


TABLE 2 | Fermentation parameters in fed-batch cultivations at 68 h cultivation for


#### DISCUSSION

## Design and Construction Strategy of the Engineered Yeast Strains Determines Fermentation Performance in Fed-Batch Cultures

Depending on fermentation phase, Reyes et al. (2014) reported different β-carotene to biomass yields for the SM14 strain in batch cultures: 6 mg/gDCW during the glucose consumption phase, and 15 mg/gDCW during the ethanol consumption phase. These results were in line with our findings for the SM14 strain behavior in shake flasks (5-7 mg/gDCW during the glucose consumption phase, and 21 mg/gDCW after the ethanol consumption phase, **Figure 1C**). However, these results were not scalable in fed-batch cultures for this strain (**Figure 2A**). In spite of employing a (conservative) µ decreasing strategy, the SM14 strain was unable to assimilate glucose without producing ethanol, even at low specific growth rates. In fact, ethanol accumulated to such high levels that growth was completely arrested (**Figure 2A**).

Previous DNA microarrays analysis for the SM14 strain showed that several genes involved in mitochondrial respiration and electron transport were downregulated, relative to its parental strain (e.g., SDH1, COX4, QCR9 and SDH3; Reyes et al., 2014). Apparently, β-carotene accumulation was not sufficient to overcome the oxidative stress from the oxygen peroxide shocks used to evolve the SM14 strain; thus, SM14 cells might reduce their mitochondrial oxidative capacity to lower the generation of radical oxygen species (ROS). These transcriptional changes are consistent with the enhanced Crabtree effect observed in our fed-batch fermentations. The latter suggests that impaired mitochondrial respiration is the main cause for the poor fermentation performance of the SM14 strain.

Since the SM14 strain has a poor oxidative capacity, other feeding policies could be explored in order to increase the biomass productivity. Nevertheless, the lack of knowledge of the complete genetic background of SM14 renders this strain unsuitable for its transfer to larger scales. In contrast to SM14, the βcar1.2 strain exhibited a more robust and satisfactory performance in the fed-batch cultivations (**Figure 2B**). Despite accumulating a quarter of the concentration of βcarotene compared to SM14 in shake flask cultures after 72 h (**Figures 1C,D**), the βcar1.2 strain greatly surpassed the production performance of SM14 in bioreactors (**Table 2**).

In fed-batch cultures, βcar1.2 exhibited a high oxidative rate, consistent with a fully oxidative metabolism on glucose and the high biomass and β-carotene productivities achieved. These results clearly illustrate that the strain performance must be evaluated in a proper setting, such that initially modest but more robust producers are not discarded early.

# Gene Integration Architecture Affects Strain Stability

SM14 strain was constructed using the YIPlac211 YBIE plasmid reported by Verwaal et al. (2007), which has the classic features of an integrating plasmid for yeast, i.e., it possesses a URA3 marker that also serves as recombination site for integration into the ura3-52 locus of the auxotrophic strain. As a consequence of a unique recombination event, the vector containing the carotenogenic construct is integrated and flanked by the URA3 loci (URA3 and ura3-52). Since these genes have sufficient sequence identity, direct repeat recombination and loss of the whole construct may occur. This can even happen in a selective medium (drop-out without uracil), as the recombination event can leave the URA3 allele instead of the ura3-52 (**Figures 3A,B**). Based on the time-course stability analysis up to 24 generations (**Figure 3E**), we note, however, that genetic instability of the SM14 strain does not heavily impact the fed-batch culture performance in the relevant time-scale (ca. 15 generations). This is also supported by the β-carotene production profile in the fed-batch cultivation that shows a proportional increase of carotene with biomass (**Figure 2A**). However, the genetic instability is still a disadvantage in pre-bioreactor stages considering that it introduces practical difficulties (e.g., selection of pigmented colonies) when handling the strain before bioreactor cultures. Lange and Steinbüchel (2011) reported that episomal expression of the same carotenogenic construct led to an entire plasmid loss due to segregational and structural instabilities, even when grown in selective medium. In this sense, integrating plasmids with one recombination site or repeated sequences are better choices over episomal counterparts. Even so, the stability of classical integrating plasmids like YIp was shown to be unsatisfactory to arrive at a robust strain for bioproduction (**Figure 3**).

To avoid the above stability issues, we built a new βcarotene strain producer, using the plasmid set developed by Mikkelsen et al. (2012). In these plasmids, the target genes are flanked by two different sequences for integration, preventing excision of the construct by homologous recombination. Moreover, the use of two different promoters and two different terminators further decreases the probability of gene loss. Consistent with these features, white colonies were not observed neither in βcar1.2 shake flask cultivations nor in stability studies, even in the absence of a selective pressure (medium with uracil), confirming the high stability of the genomic construct (**Figures 3B,C**).

#### CONCLUSION

Robust fermentation performance is critical for the development and satisfactory scale up of biotechnological processes. Strain evaluation under realistic production conditions is critical to ensure appropriate process behavior. Here, we have compared the β-carotene production of two yeast strains built following radically different strategies. Our results indicated that the initially most promising evolved strain performed poorly under fed-batch production conditions compared to the conventionally-built strain. These results highlight the impact of the methodology employed for constructing and screening superior strains and its subsequent scale up. Particularly, the adaptive laboratory evolution impacted on the general microbial physiology, while the construct architecture affected the genetic stability of the strain, rendering a poorly scalable producer under production conditions. In contrast, the conventionallybuilt strain performed robustly, achieving higher biomass and β-carotene productivity in fed-batch cultivations. Overall, this work underscores the importance of carefully choosing the strain construction strategy and its optimization method taking into consideration the end goal. While some strains may perform well in batch cultures at small scales, and could be very useful for gene screening purposes, they may not necessarily perform adequately under production conditions. To this end, evaluation of a satisfactory production performance is a must and by no means can be extrapolated.

#### AUTHOR CONTRIBUTIONS

JL and VC constructed the strains. JL and MP performed shake flask cultures of the strains. VC designed and carried out the stability experiments. MP, FS, and MI carried out the fed-batch fermentation experiments in the bioreactors. JL, VC, MP, and PS analyzed the data. JL, VC, MP, FS, and PS participated in design, coordination of the study, and draft the manuscript. EA supervised the whole research and revised the manuscript. All authors read and approved the final manuscript.

# FUNDING

This work was supported by FONDECYT grant number 1170745 from CONICYT and a joint seed fund between Texas A&M University and Pontificia Universidad Católica de Chile.

#### ACKNOWLEDGMENTS

We acknowledge and thank the technical assistance of Gabriela Diaz, Bastián Pérez, Conrado Camilo, and Diego Bustos during the execution of this work. Finally, we thank Dr. Katy Kao from Texas A&M University for kindly providing the SM14 yeast strain.

#### REFERENCES


**Conflict of Interest Statement:** JL and FS were employed by company DICTUC S.A. EA is an advisor for DICTUC S.A.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 López, Cataldo, Peña, Saa, Saitua, Ibaceta and Agosin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.