Synthetic Biology Tools for Genome and Transcriptome Engineering of Solventogenic Clostridium

Strains of Clostridium genus are used for production of various value-added products including fuels and chemicals. Development of any commercially viable production process requires a combination of both strain and fermentation process development strategies. The strain development in Clostridium sp. could be achieved by random mutagenesis, and targeted gene alteration methods. However, strain improvement in Clostridium sp. by targeted gene alteration method was challenging due to the lack of efficient tools for genome and transcriptome engineering in this organism. Recently, various synthetic biology tools have been developed to facilitate the strain engineering of solventogenic Clostridium. In this review, we consolidated the recent advancements in toolbox development for genome and transcriptome engineering in solventogenic Clostridium. Here we reviewed the genome-engineering tools employing mobile group II intron, pyrE alleles exchange, and CRISPR/Cas9 with their application for strain development of Clostridium sp. Next, transcriptome engineering tools such as untranslated region (UTR) engineering and synthetic sRNA techniques were also discussed in context of Clostridium strain engineering. Application of any of these discussed techniques will facilitate the metabolic engineering of clostridia for development of improved strains with respect to requisite functional attributes. This might lead to the development of an economically viable butanol production process with improved titer, yield and productivity.


INTRODUCTION
Strain improvement for production of fuels or any biobased industrial product could be achieved by employing any of the following two strategies: (i) heterologous expression of metabolic pathway genes in a non-native producers, and (ii) improvement of native producers (Arora et al., 2019;Banerjee et al., 2019;Choi et al., 2019). However, achieving titer values in heterologous host matching to those being produced by native organisms, it requires a significant effort with high chances of failure. Therefore, the strategy of improving native strains with necessary genes of the desired pathway and cofactor regeneration capability is preferred (Park et al., 2018;Rhie et al., 2019).
However, this strategy of strain improvement in Clostridium sp. has been limited by the availability of appropriate genome engineering tools.
Clostridium genus comprises many industrially important strains for biorefinery applications such as cellulosic and hemicellulosic biomass degradation, carbon fixation, advanced biofuel and platform chemical production and as anti-cancer therapeutics Malaviya et al., 2012;Jones et al., 2016;Staedtke et al., 2016;Noh et al., 2018;Woo et al., 2018;Xin et al., 2018;Strecker et al., 2019a). The full potential of Clostridium genus for biorefinery applications could only be realized by advancement in the synthetic biology toolkits for strain improvement. During the last decade, tremendous progresses have been made in the development of genome engineering toolkit for strain engineering of Clostridium species. Development of genetic tools in Clostridium have been well reviewed by various research groups (Pyne et al., 2014;Liu Y. J. et al., 2015;Minton et al., 2016;Moon et al., 2016;Joseph et al., 2018;Kuehne et al., 2019;McAllister and Sorg, 2019;Wen et al., 2019b,c). Most of these reports are focused on couple of tools with an explanation in depth.
In this work, we have reviewed overall recent toolbox for genome and transcriptome engineering in solventogenic Clostridium, which could be used to develop improved clostridia strains, for production of sustainable and commercially viable industrial scale products. Brief features of the synthetic toolbox are summarized in Table 1. Consolidated information in this review dealing with strain improvement tools for Clostridium will aid the scientific and industrial sector to select the appropriate tools for strain improvement.

MOBILE GROUP II INTRON BASED GENE-KNOCKOUT
Mobile group II intron technology is also known as "ClosTron" when applied in context of Clostridium genus. In this method a gene is disrupted by inserting the mobile intron into a target locus in the chromosome by a process termed as retrohoming, making this technology a convenient, efficient and specific method of gene disruption (Heap et al., 2007(Heap et al., , 2010Shao et al., 2007;Jang et al., 2012Jang et al., , 2014Mohr et al., 2013;Liu Y. J. et al., 2015). Among various mobile group II introns, Ll.LtrB and TeI3c/4c have been extensively used for gene knockout in the solventogenic Clostridium. Ll.LtrB intron includes intron RNA domain and open reading frame (ORF) domain. Intron RNA domain contains splicing sites consisting of exon binding sites (EBS) 1, EBS 2, and δ ( Figure 1A). The ORF domain contains genes encoding reverse transcriptase (RTase), maturase, and endonuclease ( Figure 1A). TeI3c/4c intron has been employed to develop genome engineering tool for thermophilic Clostridium thermocellum, since the intron could be melted down at high temperatures (Mohr et al., 2013).
Moreover, Ll.LtrB intron has further been modified to include a retrotransposition-activated selection marker (RAM) (Zhong et al., 2003). RAM consists of a selection marker and is inserted into the intron. A group I intron is inserted into the marker to inactivate the marker itself. Inserted group I intron is self catalytically spliced out of mRNA in an orientation dependent manner, so that a functional marker gene can only be expressed after successful chromosomal insertion occurs (Joseph et al., 2018).
At the first stage of the clostridia gene knockout using Ll.LtrB intron, single gene knockouts mutant, such as spo0A, pta, ack, ptb, buk, hbd, hydA and argA variants have been constructed across the Clostridium genus, including C. acetobutylicum, C. beijerinckii, C. botulinum, and C. difficile (Heap et al., 2010;Dingle et al., 2011;Jang et al., 2012;Baban et al., 2013;Honicke et al., 2014;Lawson and Rainey, 2016;Liu et al., 2016). In 2012, a new method for second gene deletion was reported which could overcome the necessity of removing the plasmid used for the first gene deletion and resulted in the construction of various C. acetobutylicum strains, including pta/buk, pta/ctfB, ptb/buk, and triple mutant pta/buk/ctfB strains . In this technique, two genes encoding the erythromycin and chloramphenicol resistance enzymes were used as mutant selection marker and the concept of plasmid incompatibility was employed . In 2014, the same group reported the fourth and fifth gene deletion process for the construction of mutants pta/buk/ctfB/adhE1 and pta/buk/ctfB/adhE1/hydA of C. acetobutylicum (Jang et al., 2014).
Curing and off-target manipulation remained one of the major limitations of mobile group II intron technology (Wen et al., 2019c). Curing efficiency of the plasmid containing mobile intron was enhanced by cloning pyrF (orotidine 5-phosphate decarboxylase) to ClosTron plasmid. The pyrF encodes essential enzyme of pyrimidine biosynthesis which can use 5-fluoroorotic acid (FOA) as a substrate and converts it to toxic compound and is widely used as counter selection marker (Sato et al., 2005;Tripathi et al., 2010;Heap et al., 2012). Once FOA gets converted to toxic compound by pyrF in the ClosTron plasmid, only cured strain could survive in the FOA added media. The cured strain can be rapidly selected by pyrF-based screening system, even on one plate (Cui et al., 2014).
Another problem with ClosTron is that it accidently affects and manipulates the off-target genome and cause unexpected genotypes and phenotypes (Heap et al., 2012). To overcome this, a highly regulated ClosTron system has been developed by inducing L-arabinose inducer (ARAi) to reduce off-target possibility . To verify the impact of inducible ClosTron using ARAi system, pSY6-mspI (Cui et al., 2012) and pGZ-pyrF-cipC (Cui et al., 2014) were modified by introducing ARAi system in C. cellulolyticum H10 pyrF strain. Surprisingly, it was found that the off-target manipulation frequency was decreased to 0 by inducible ClosTron ARAi system .  Chen et al., 2005Chen et al., , 2007Shao et al., 2007;Heap et al., 2007;Baban et al., 2013;Jang et al., 2012Jang et al., , 2014Pyne et al., 2014;Liu Y. J. et al., 2015;Liu et al., 2016;Xu et al., 2015;Meaney et al., 2015Meaney et al., , 2016 pyrE allele exchange • Works on the principle of deactivating an easily screenable gene (pyrE) • Complementing the mutant strain with a heterologous version of pyrE gene as a counter selective marker Tripathi et al., 2010;Heap et al., 2012;Ng et al., 2013;Bankar et al., 2015;Zhang N. et al., 2015;Croux et al., 2016;Ehsaan et al., 2016a; CRISPR/Cas • RNA-guided target specific DNA cleavage system • Originated from bacterial adaptive immune system  Zhang N. et al., 2015;Ehsaan et al., 2016a;Minton et al., 2016). In ACE, a counter selection marker is coupled to a desired double crossover event ( Figure 1B). The counter selection marker entitles the isolation of double cross over through homologous recombination. The pyrE and codA genes are the most frequently used selectable marker in ACE Technology. The gene codA encodes for the enzyme cytosine deaminase while, pyrE encodes orotate phosphoribosyl transferase, which is a key enzyme required in the de novo pathway for pyrimidine biosynthesis.
In clostridia genome editing, pyrE allele has been primarily employed. Mutant and wild type pyrE allele confers resistance and sensitivity to FOA, respectively. The advantages of pyrE allele based recombination includes: (i) rapid insertion of heterologous DNA, (ii) double crossovers which forms the stable integration, (iii) allows large insert size, and (iv) has higher efficiency as compared to simple ClosTron and random mutagenesis (Ng et al., 2013;Ehsaan et al., 2016b;Minton et al., 2016).
The pyrE cassettes consists of two arms, i.e., right homology arm (RHA) and left homology arm (LHA) with the internal region comprising of pyrE gene (Figure 1B). A plasmid is constructed with a selectable marker (antibiotic resistance gene), origin of replication and a sequence containing ∼300-bp homologous to pyrE gene and a longer sequence of ∼1,200bp homologous to the adjacent region of 3 end of pyrE. Once the pyrE based pseudo-suicide plasmid is delivered into Clostridium cells, single crossover is formed through homologous recombination. Subsequently, the single crossover mutant is inoculated into the media containing FOA and uracil (Heap et al., 2012). Metabolization of FOA kills the single crossover cells carrying the active pyrE gene. Inactivation of pyrE happens only if double recombination had occurred on both 1200-bp long sequence and 300-bp short sequence and the FOA does not affect the cells obtained by such double crossovers (Ng et al., 2013). The final double crossovers are formed by ACE of shorter left homology arm of 300-bp by the second single crossover, which also leads to the excision of the plasmid . This technology has been found to be applicable for many species of Clostridium genus (Heap et al., 2012).
Butanol yield in C. pasteurianum has been reported to be improved by application of pyrE based genome editing toolkit. For this, deletion mutations were created in three genes of C. pasteurianum: hydrogenase (hydA), redox response regulator (rex), and glycerol dehydratase (dhaBCE), using plasmid pMTL-KS01. This resulted in increased availability of NADPH in cell due to depletion of 1,3-propanediol synthesis, which eventually contributed to improved butanol production (Schwarz et al., 2017). Similarly, successful expression of cellulosomal subunits in C. acetobutylicum has also been achieved using this method (Kovacs et al., 2013). Few other Clostridium species modified using ACE technology includes C. acetobutylicum, C. sporogenes, and C. difficile (Heap et al., 2012;Ng et al., 2013;Bankar et al., 2015;Ehsaan et al., 2016b;Minton et al., 2016;Willson et al., 2016).

CRISPR/CAS BASED CLOSTRIDIA GENOME ENGINEERING
Clustered regulatory interspaced short palindromic repeats (CRISPR) have been developed as one of the most advanced genetic engineering tools along with CRISPR-associated (Cas) protein (Doudna and Charpentier, 2014). As bacterial genome manipulation tool, CRISPR/Cas system needs single guide RNA (sgRNA), Cas endonuclease, and homologous arms for recombination (Jiang et al., 2013). The Streptococcus pyogenes type II CRISPR was the first CRISPR system which was exploited for genome engineering applications. Cas9 endonuclease is the basis of CRISPR based genome editing system. Cas9 recognize the protospacer adjacent motif (PAM) site (5 -NGG-3 in S. pyogenes) and cleave at the 3 end of the target gene (Mojica et al., 2009;Garneau et al., 2010;Jinek et al., 2012) (Figure 1C).
Moreover, modified CRISPR systems like CRISPR interference (CRISPRi) and dCas9 has also been developed to knockdown of the essential genes required for host survival (Jinek et al., 2012;Qi et al., 2013;Peters et al., 2016;Zheng et al., 2019). The dCas9 has two silenced catalytic domains (D10A and H840A; RuvC-like and HNH domains, respectively) which remains bound and block the target DNA instead of cleavage. CRISPRi/dCas9 system has also been applied to develop several mutant strains of Clostridium sp. Wang et al., 2016a,b;Wen et al., 2017;Woolston et al., 2018;Muh et al., 2019).
Additionally, endogenous CRISPR systems have been developed in C. pasteurianum and C. tyrobutyricum to overcome the toxic effect associated with Cas9 and Cpf1 endonucleases Zhang et al., 2018b). The endogenous CRISPR system uses endonuclease encoded by the genome and can contain multiple pre-crRNAs under one promoter, facilitating multiple genome modification using a single plasmid (Luo et al., 2014;Makarova et al., 2015;Pyne et al., 2016;Zhang et al., 2018b).

SYNTHETIC SRNA AND UNTRANSLATED REGION ENGINEERING AS POTENTIAL DOMAINS FOR CLOSTRIDIUM STRAIN IMPROVEMENT
Prokaryotic small RNAs (sRNA) are short strands of ribonucleotides (about 50-500 nucleotides) which have a regulatory role in maintaining the cellular processes (Gottesman, 2004). Based on the existence of natural sRNA, synthetic small RNAs are produced to alter the gene expression of the organisms. Many such naturally occurring sRNAs have been detected and analyzed in Clostridium sp. (Chen et al., 2011), which leads to the development of synthetic sRNA .
The sRNA mediated gene expression usually results in repression of the gene which complements the sRNA nucleotide sequence, mediated by a protein called Hfq (De Lay et al., 2013). Hfq is the chaperone mediated protein which stabilizes the sRNA-mRNA binding. The translation process is prevented by sRNA binding to ribosome binding site (RBS) or by masking the access to the start codon Yoo et al., 2013). Recently, Cho and Lee (2017) have reported the development of synthetic small regulatory RNA (sRNA) system for controlled gene expression in C. acetobutylicum, consisting of a target recognition site, MicC scaffold, and an RNA chaperon Hfq ( Figure 1D). In this study, C. acetobutylicum Hfq was found to be ineffective in binding with Escherichia coli MicC scaffold-based synthetic sRNA, however Hfq from E. coli itself resulted in much enhanced knockdown efficiency. This E. coli MicC-Hfq sRNA system was used to knockdown adhE1 gene expression resulting in 40% reduction in butanol production. Further, this synthetic sRNA system was used to knockdown the pta gene expression in PJC4BK strain, resulting in PJC4BK (pPta-Hfq Eco ) strain with improvement of butanol titer from 14.9 to 16.9 g/l (Cho and Lee, 2017).
Untranslated regions (UTRs) are non-coding regions in the mRNA helps to regulate the gene expression. UTRs are present on both the ends of the mRNA (5 -UTR and 3 -UTR) ( Figure 1E). There are sufficient reports to confirm that the 5 -UTR in C. acetobutylicum has the regulatory effect on the secondary structure of enzyme adhE1, which is involved in solvent production (Thormann et al., 2002;Scotcher et al., 2003). Lee et al. (2016) has recently found that the presence of a single stranded short 5 -UTR in the solventogenic C. acetobutylicum leads to decreased gene expression ( Figure 1E). The insertion of a small stem loop structure in the 5 -UTR was found to increase the mRNA stability and gene expression by 4.6 folds, without any modification in the promoter or RBS . On the other hand, the 3 -UTR mostly harbors the terminator sequence for transcription process in mRNA (Richard and Manley, 2009). sRNA sequence containing the codons that regulates the post transcriptional and translation machinery is also attached to 3 -UTR. Most importantly 3 -UTR confer stability to the mRNA (Zhao et al., 2018). Although, there are very limited studies related to 3 -UTR regions in Clostridium, the presence of transcripts with long 3 -UTR is confirmed in Clostridium (Ralston and Papoutsakis, 2018). Although several RNAseq studies were reported in the Clostridium, only few studies show the data related to regulation of mRNA based on 5 -and 3 -UTRs, leaderless transcripts and non-coding RNA (Soutourina et al., 2013;Wilson et al., 2013;Sedlar et al., 2018). Further research in RNAseq and proteomics will explore the complex regulations that control mRNA stability and degradation, which will be more useful to construction synthetic toolkit.
In conclusion, many Clostridium sp. have potential to be utilized at industrial scale to produce value added chemicals, including butanol as fossil fuel substitute. Up to date, their true potential was underexploited due to challenges in strain improvement and unavailability of genome and transcriptome editing tools for this genus. Nevertheless, during the last decade, synthetic biology toolkits for Clostridium sp. have been expanded rapidly ( Figure 1F). Furthermore, a recent advancement, such as phage serine integrase mediated sitespecific genome engineering technique for C. ljungdahlii could be extended to other Clostridium species (Huang et al., 2019). The synthetic biology techniques that have been applied in other microorganisms may also be adopted to solventogenic clostridia in the near future: CRISPR associated site-specific insertion of transposons and base editing techniques (Ronda et al., 2015;Zhang et al., 2016;Lim and Choi, 2019;Strecker et al., 2019b). Utilization of improved clostridia strains could be a starting point for development of an industrial scale, commercially viable bio-based fuel and chemical production using Clostridium sp. using a consolidated bioprocessing concept (Wen et al., 2019a). Furthermore, these synthetic biology tools could be applied to another biotechnology fields such as degradation of plastics, such as polyethylene terephthalate and polyethylene.

AUTHOR CONTRIBUTIONS
Y-SJ and AM conceived the project. All authors analyzed the literature, compiled data, planned content, wrote the manuscript, read, and approved the final manuscript.