A TetR-Family Protein (CAETHG_0459) Activates Transcription From a New Promoter Motif Associated With Essential Genes for Autotrophic Growth in Acetogens

Acetogens can fix carbon (CO or CO2) into acetyl-CoA via the Wood–Ljungdahl pathway (WLP) that also makes them attractive cell factories for the production of fuels and chemicals from waste feedstocks. Although most biochemical details of the WLP are well understood and systems-level characterization of acetogen metabolism has recently improved, key transcriptional features such as promoter motifs and transcriptional regulators are still unknown in acetogens. Here, we use differential RNA-sequencing to identify a previously undescribed promoter motif associated with essential genes for autotrophic growth of the model-acetogen Clostridium autoethanogenum. RNA polymerase was shown to bind to the new promoter motif using a DNA-binding protein assay and proteomics enabled the discovery of four candidates to potentially function directly in control of transcription of the WLP and other key genes of C1 fixation metabolism. Next, in vivo experiments showed that a TetR-family transcriptional regulator (CAETHG_0459) and the housekeeping sigma factor (σA) activate expression of a reporter protein (GFP) in-frame with the new promoter motif from a fusion vector in Escherichia coli. Lastly, a protein–protein interaction assay with the RNA polymerase (RNAP) shows that CAETHG_0459 directly binds to the RNAP. Together, the data presented here advance the fundamental understanding of transcriptional regulation of C1 fixation in acetogens and provide a strategy for improving the performance of gas-fermenting bacteria by genetic engineering.

Acetogens can fix carbon (CO or CO 2 ) into acetyl-CoA via the Wood-Ljungdahl pathway (WLP) that also makes them attractive cell factories for the production of fuels and chemicals from waste feedstocks.Although most biochemical details of the WLP are well understood and systems-level characterization of acetogen metabolism has recently improved, key transcriptional features such as promoter motifs and transcriptional regulators are still unknown in acetogens.Here, we use differential RNAsequencing to identify a previously undescribed promoter motif associated with essential genes for autotrophic growth of the model-acetogen Clostridium autoethanogenum.RNA polymerase was shown to bind to the new promoter motif using a DNA-binding protein assay and proteomics enabled the discovery of four candidates to potentially function directly in control of transcription of the WLP and other key genes of C 1 fixation metabolism.Next, in vivo experiments showed that a TetR-family transcriptional regulator (CAETHG_0459) and the housekeeping sigma factor (σ A ) activate expression of a reporter protein (GFP) in-frame with the new promoter motif from a fusion vector in Escherichia coli.Lastly, a protein-protein interaction assay with the RNA polymerase (RNAP) shows that CAETHG_0459 directly binds to the RNAP.Together, the data presented here advance the fundamental understanding of transcriptional regulation of C 1 fixation in acetogens and provide a strategy for improving the performance of gas-fermenting bacteria by genetic engineering.

INTRODUCTION
The Wood-Ljungdahl pathway (WLP) of acetogens is speculated to be the first biochemical pathway on Earth that emerged when the atmosphere was still highly reduced and rich in CO, CO 2 , and H 2 (Russell and Martin, 2004;Fuchs, 2011;Weiss et al., 2016).These C1 gases can be converted into acetyl-CoA through the WLP (Wood, 1991;Ragsdale and Pierce, 2008) and acetogens are the only known organisms using the WLP as a terminal electronaccepting, energy-conserving process to fix CO 2 into biomass (Drake et al., 2006;Fuchs, 2011).This pathway is responsible for the production of acetic acid in quantities surpassing the billion ton mark annually.It is estimated that the pathway contributes to fixing ∼20% of the CO 2 on Earth (Drake et al., 2006;Ljungdahl, 2009).All this takes place with the WLP operating at the edge of thermodynamic feasibility (Schuchmann and Müller, 2014) and requires the use of the third mode of energy conservation, electron bifurcation, which likely contributed to the emergence of life on Earth (Herrmann et al., 2008;Li et al., 2008;Nitschke and Russell, 2011).Acetogens are also attractive cell factories for the sustainable production of fuels and chemicals from gaseous waste feedstocks (e.g., syngas from gasified municipal solid waste and industrial waste gases) (Dürre and Eikmanns, 2015;Claassens et al., 2016;Liew et al., 2016;Molitor et al., 2016).While the field has advanced enormously in the last decade (Liew et al., 2016;Molitor et al., 2016), better fundamental understanding of acetogen metabolism is needed to guide rationale metabolic engineering, for example, to increase their substrate uptake or product yields.
Recent quantitative studies of acetogen physiology have expanded understanding of their metabolism considerably (reviewed in Schuchmann and Müller, 2014;Molitor et al., 2017).Although most biochemical details of the WLP are well established (Ragsdale, 1991(Ragsdale, , 1997(Ragsdale, , 2008) and systems-level understanding of acetogen metabolism has recently improved (Valgepea et al., 2017a(Valgepea et al., , 2018)), key transcriptional features such as promoter motifs and transcriptional regulators controlling the expression of genes needed for autotrophic growth are yet unknown.This information could benefit acetogen metabolic engineering and improve our understanding of their complex transcriptional regulation (Nagarajan et al., 2013;Tan et al., 2013;Marcellin et al., 2016;Aklujkar et al., 2017).Prediction of promoter motifs strictly based on computational analysis (based solely on the organism's genome sequence) has the drawback of detection of promoter-like sequences across the genome, which is particularly pronounced in non-conserved DNA motifs (Patrik, 2006).An instrumental step toward more accurate promoter motif identification was the development of the differential RNAsequencing (dRNA-Seq) technology, first described in 2010 by Sharma and colleagues (Sharma et al., 2010) for the human pathogen Helicobacter pylori.
dRNA-Seq enables the experimental determination of transcription start sites (TSSs) and correct mapping of TSSs enables genome-wide identification of promoters and gene expression regulatory sequences, besides providing experimental data for a more accurate genome annotation.Once a TSS has been experimentally determined, promoter sequences can be mapped from there.Thus, characterization of the transcriptional architecture (i.e., TSSs and promoter motifs) and a more accurate annotation of acetogen genomes have the potential to yield valuable insights into the complex transcriptional regulation of acetogens.To date, only one study has determined TSSs in acetogens, using Eubacterium limosum (Song et al., 2017).Here, we used dRNA-Seq as a tool to identify the TSSs in the model-acetogen Clostridium autoethanogenum grown under autotrophic and heterotrophic conditions.The subsequent search for promoter motifs detected a previously undescribed motif associated with essential genes in acetogens.We then provide experimental evidence for the relevance of this new promoter motif (termed hereafter P cauto ) by identifying a TetRfamily protein that activates gene expression from this motif by directly binding to the RNA polymerase.

Bacterial Strains and Growth Conditions
Clostridium autoethanogenum strain DSM 10061 was obtained from The German Collection of Microorganisms and Cell Cultures (DSMZ).Cells were grown as described before (Marcellin et al., 2016) for acquiring samples for differential RNA-sequencing (dRNA-seq).Briefly, heterotrophic and autotrophic growth were investigated in serum bottles on fructose (5 g/L) and on steel mill off-gas (35% CO, 10% CO 2 , 2% H 2 and 53% N 2 ), respectively.Cells were grown at 37 • C on a shaker (100 RPM, rounds per minute) and sampled for dRNA-Seq analysis from the exponential growth phase (OD 600 nm = 0.5-0.6).

Differential RNA-Sequencing (dRNA-Seq)
Extraction and preparation of RNA for cDNA library construction were performed as described elsewhere (Marcellin et al., 2016).Briefly, RNA was extracted using TRIzol followed by column purification with RNAeasy (Qiagen).The resulting total RNA pools were sent to Vertis Biotechnologie AG (Freisig, Germany) for sequencing.The cDNA libraries were prepared using the 5 tagRACE method (Fouquier et al., 2011).Firstly, the 5 Illumina TruSeq sequencing adapter carrying sequence tag TCGACA was ligated to the 5 -monophosphate groups (5 P) of processed transcripts (TAP-on Figure 1A).Samples were then treated with Tobacco Acid Pyrophosphatase (TAP) to convert 5 -triphosphate (5 PPP) structures of primary transcripts into 5 P ends to which the 5 Illumina TruSeq sequencing adapter carrying sequence tag GATCGA was ligated (TAP+ on Figure 1A).Next, first-strand cDNA was synthesized using an N6 randomized primer to which the 3 Illumina TruSeq sequencing adapter was ligated after fragmentation.
The 5 cDNA fragments were amplified with PCR using a proofreading enzyme and primers designed for TruSeq sequencing according to the manufacturer's instructions.The main advantage of using the 5 tagRACE method (Fouquier et al., 2011) for dRNA-Seq comes from amplifying the 5 ends of processed and primary transcripts in a single PCR reaction, which preserves their quantitative representation in an RNA pool.(Amman et al., 2014) for automated de novo determination of TSSs from dRNA-Seq data using the following parameters: p-Value 1e-3, Noise threshold 10, Merge range 5.The Shine-Dalgarno sequence was searched 30 nt upstream of annotated genes (CP006763.1 and NC_022592.1)using the MEME software (Bailey et al., 2009) and the same parameters as for promoter motif search, except for -nmotifs 10, -maxw 30.See section Materials and Methods for details.
Finally, 5 cDNAs were purified using the Agencourt AMPure XP Kit (Beckman Coulter Genomics) and analyzed by capillary electrophoresis before sequencing the single-end libraries using the Illumina NextSeq 500 system and a MID 150 Kit with 75 bp read length.

Determination of Transcription Start Sites (TSSs)
Sequencing reads were aligned and mapped to the genome of C. autoethanogenum DSM 10061 (CP006763.1)using the software TopHat2 (Kim et al., 2013) without trimming or removal of any reads.Reads were processed with the TSSAR (TSS Annotation Regime) software (Amman et al., 2014) for automated de novo determination of TSSs from dRNA-Seq data using the following parameters: p-value 1e-3, Noise threshold 10, Merge range 5.The identified TSSs were classified as primary (within 250 nt upstream of an annotated gene), internal (within an annotated gene), antisense (on the opposite strand of an annotated gene), or orphan (not assigned to any of the previous classes) (Figure 1B).Since our main aim was the identification of the TSSs of essential genes for autotrophic growth in acetogens (e.g., WLP), we focused on the primary TSSs.

Search for Promoter Motifs and the Shine-Dalgarno Sequence
To determine promoter motifs, we searched for consensus sequence motifs 50, 100, and 150 nt upstream of primary TSSs using the MEME software (Bailey et al., 2009) with the following parameters: -dna, -max size 10000000, -mod zoops, -nmotifs 50, -minw 4, -maxw 50, -revcomp, -oc.Only motifs with E-value ≤ 0.05 and at least 13 TSSs associated to it (i.e., at least two genes associated to it, Figure 1C) were considered and ranked based on the number of assigned TSSs (Supplementary File 1).
To search for the Shine-Dalgarno sequence, 30 nt upstream of annotated genes (CP006763.1 and NC_022592.1)were searched with the MEME software (Bailey et al., 2009) using the same parameters as in the promoter motif search, except fornmotifs 10, -maxw 30.

Search for the New Promoter Motif in Acetogens
Occurrence of the new promoter motif (see results) in C. autoethanogenum, C. ljungdahlii, C. ragsdalei, C. coskatii, M. thermoacetica, and E. limosum was determined using the FIMO tool (Grant et al., 2011) within the MEME software by searching for the sequence up to 300 nt upstream of annotated genes (since no TSS data is available) with default FIMO parameters.Occurrence in each acetogen relative to C. autoethanogenum was normalized with the number of annotated genes.

DNA-Binding Protein Assay
Firstly, C. autoethanogenum-DSM 19630-cells were acquired from autotrophic bioreactor chemostat cultures (CO or CO + H 2 ) described in a separate work (Valgepea et al., 2018).Briefly, cells were grown in bioreactor chemostat cultures in the chemically defined medium on either CO or CO + H 2 at 37 • C, pH = 5, dilution rate of ∼1 day −1 (µ∼0.04 h −1 ), and at a biomass concentration ∼1.4 gDCW/L.Cells were pelleted by immediate centrifugation (20,000 × g for 2 min at 4 • C), and stored at −80 • C until analysis.
Frozen pellets were thawed, resuspended in BS/THES buffer described in Jutras et al. (2012) with pH adjusted to 7.0, and passed five times through the EmulsiFlex-C5 High Pressure Homogenizer (Avestin Inc.) according to the manufacturer's instructions, with the final sample volume adjusted to 35 mL with the BS/THES buffer.Samples were then centrifuged (35,000 × g for 15 min at 4 • C) and the supernatant filtered using a 0.22 µM filter (Merck).
The DNA-binding protein assay was based on a pulldown/DNA affinity chromatography method described by Jutras et al. (2012) with the following modifications.The DNA sequences were of 125 bp length containing the respective promoter sequence in the middle with flanking regions downstream and upstream.pH of the buffers was adjusted to 7.
The bait-target/ligand binding step was performed with 1 mL of cell extract without the addition of non-specific competitor DNA.
Next, either salmon sperm (Thermo) or Poly dI-dC (Sigma) were used as non-specific competitor DNA in the subsequent washing steps.Briefly, Dynabeads TM M-280 Streptavidin (Thermo Fisher Scientific) were mixed with DNA containing either the promoter sequence of CAETHG_1615, 1617 (WLP genes assigned with the new promoter motif), or 3224 (a glycolytic gene as a control for our assay since it was assigned the well-known TATAAT motif, which should yield binding of the RNAP and the housekeeping σ factor σ A ). Next, the cell extract was added and samples were incubated for 30 min at room temperature.This was followed by two washing steps with the BS/THES buffer (Jutras et al., 2012) to remove proteins not bound to the target DNA.Finally, protein elution was performed in Tris-HCl (pH 7) with a successively increasing concentration of NaCl (200,300,500,750 mM,1M,and 2M).The eluted protein solutions were analyzed by gel electrophoresis NuPAGE R Novex R Bis-Tris (Invitrogen) and visualized using Sypro R Ruby (Molecular Probes) according to the manufacturer's instructions.The 500 mM NaCl eluate yielded the most prominent bands and therefore this eluate was used for further analysis.No bands were observed in the negative control when water was used instead of DNA (data not shown), confirming that the identified proteins were pulled down by the DNA sequences.

Protein Digestion for Mass Spectrometry-Based Proteomics
For the digestion of proteins from gel band excision, the gel bands of interest were cut and de-stained for 1 h with a buffer of 50 mM ammonium bicarbonate (ABC) in 50% acetonitrile (ACN).Following buffer removal, 50 µL of 10 mM DTT was added and samples were incubated for 30 min at 60 • C to reduce disulfide bonds.Next, the DTT solution was removed, and 50 µL of 55 mM iodoacetamide (IAA) was added and samples were incubated for 30 min in the dark at room temperature to alkylate sulfhydryl groups.After removal of the IAA solution, gel pieces were washed twice with 100 µL of 50 mM ABC, and dehydrated with 100% ACN.Protein digestion was performed overnight at 37 • C by rehydrating gel pieces with 50 µL of Trypsin/Lys-C mix (10 ng/µL in 25 mM ABC) and 100 µL of ABC.
Extraction of peptides from gel pieces was performed by repeating the following steps five times: addition of 100 µL of 0.1% formic acid (FA) in 50% ACN and sonication of samples in a water bath for 10 min.Samples were then concentrated to near dryness using a centrifugal vacuum concentrator (Eppendorf) and resuspended in 50 µL of 0.1% FA.Finally, samples were desalted using C 18 ZipTips (Merck Millipore) as follows: the column was wetted using 0.1% FA in 100% ACN, equilibrated with 0.1% FA in 70% ACN, and washed with 0.1% FA before loading the sample and washing again with 0.1% FA.Finally, peptides were eluted with 0.1% FA in 70% ACN, and then diluted 10-fold with 0.1% FA for mass spectrometry analysis.
For the digestion of proteins from the whole purified DNA bound material, the whole purified DNA-bound material from the DNA-protein binding assay was incubated for 30 min at 95 • C. Next, 30 µL of 10 mM DTT was added and samples were incubated for 45 min at 55 • C to reduce disulfide bonds.Then, 40 µL of 55 mM IAA was added and samples were incubated for 30 min in the dark at room temperature to alkylate sulfhydryl groups.Protein digestion was performed overnight at 37 • C using 50 µL of Trypsin/Lys-C mix (10 ng/µL in 25 mM ABC) and stopped by lowering the pH to 3 using FA.Finally, the samples were desalted and prepared for mass spectrometry analysis as described above.

Protein Identification Using Mass Spectrometry
Detection of proteins in both the digestion products of gel band excision and the whole captured material was performed using a QTOF Sciex 5600 or a Thermo Orbitrap Elite mass spectrometer (depending on instrument availability) with details described elsewhere (Kappler and Nouwens, 2013;Yang et al., 2016) with a modified liquid chromatographic (LC) gradient.Briefly, a Shimadzu Prominence nanoLC system was used for desalting (on an Agilent C18 trap) and separating peptides (on a Vydac Everest C18 column) using a gradient consisting of 10-60% buffer B over 30 min followed by 60-97% buffer B over 8 min, where buffer A was 1% ACN in 0.1% FA and buffer B was 80% ACN in 0.1% FA.Protein identification was performed using the software ProteinPilot v5.0 (ABSciex) with the Paragon Algorithm against the NC_022592.1 and CP006763 genome annotations with the following search parameters: Trypsin + LysC digestion; IAA as cysteine alkylation; thorough search effort; FDR analysis.Only proteins below 1% false discovery rate (FDR; estimated global) and with at least two peptides with more than 95% confidence were considered as identified.
Proteomics data have been deposited to the ProteomeXchange Consortium1 via the PRIDE partner repository with the dataset identifier PXD014421.

Molecular Biology Techniques
The full list of bacterial strains, plasmids, and primers used in this work for the in vivo transcription assay and protein overexpression step are shown in Supplementary File 2. Luria-Bertani (LB) broth or agar with antibiotics were used for growth.
Escherichia coli DH5α was used as the cloning strain and performed transformations according to the manufacturer's instructions (BIOLINE).E. coli BL21 was used in the in vivo transcription assay and protein overexpression step.E. coli BL21 chemically competent cells were prepared using the RuCl 2 method (Green and Rogers, 2013).
PCR amplification of targeted sequences was performed using the Phusion polymerase (NEB) and the OneTaq polymerase (NEB).Plasmid were assembled using standard ligation with the T4 DNA ligase or using Gibson assembly (Gibson et al., 2009).

Construction of a σ-Factor Candidate Expression System in E. coli
Candidates for potential σ factors were selected based on protein identification using mass spectrometry (see above) from proteins annotated as transcriptional regulators (Table 1).Additionally, we also built a plasmid for the L-seryl-tRNA(Sec) selenium transferase (CAETHG_2839) (identified as a stronger band in the pull-down assay, Figure 3B), and the housekeeping σ in Clostridia (σ A ) (CAETHG_2917) (Figure 3B).
The potential σ factor candidates were cloned into plasmid pET28a+ to be expressed under the control of a T7 promoter.DNA sequences were PCR amplified using the primers shown in Supplementary File 2 and purified using a QIAGEN kit.Next, the plasmid pET28a+ was linearized using restriction enzymes NdeI and HindIII and purified using a QIAGEN kit.Codon optimization was required to express the σ factor candidates of TetR-family protein (CAETHG_0459) and σ A (CAETHG_2917) before DNA sequences were synthesized as gene block (gBlock R ) fragments.
Plasmids with the σ factor candidates were then assembled by Gibson assembly using equimolar concentrations of the linearized backbone plasmid and the PCR fragment in a 20 µL reaction.After incubation at 50 • C, 5 µL of the Gibson mix was then used to transform E. coli DH5α by heat shock.After recovery on SOC media at 37 • C for 60 min, 100 µL of cells were spread on LB agar plates containing kanamycin (50 µg/mL).Plates were then incubated at 37 • C for 16 h and kanamycin resistant colonies were tested by colony PCR for proper assembly using pET_conf(FWD)/pET_conf(REV) primers (Supplementary File 2).A colony that tested positive for assembly was then picked and grown overnight on LB media containing kanamycin.Plasmids were recovered from 5 mL of overnight culture using a QIAGEN miniprep kit and the digestion profile was verified with the assembly.Plasmids were then used to transform E. coli BL21 chemically competent cells (described above).
Escherichia coli BL21 strains harboring σ factor candidateexpressing plasmids were then grown overnight on LB media containing kanamycin and 1 mM IPTG (Isopropyl β-D-1thiogalactopyranoside).Next, 2 mL of overnight culture were spun down and the supernatant was removed.The cell pellet was then resuspended in the BugBuster master mix solution (Novagen) for protein extraction following the manufacturer's instructions.The insoluble and soluble fractions were loaded into an SDS-PAGE gel to confirm the overexpression of the σ factor candidates (data not shown).

Construction of a P cauto _GFP-UV Reporter Fusion System in E. coli
To determine whether the σ factor candidates could activate transcription, we assembled a GFP-based reporter expression  (Brown et al., 2014).
Frontiers in Microbiology | www.frontiersin.orgsystem under the control of the new promoter motif (termed here P cauto ).Firstly, plasmid pBR322 was digested with HindIII, purified, and used as the backbone followed by PCR amplification of the DNA sequence containing P cauto from the C. autoethanogenum genome (500 bp upstream of the start codon of the gene CAETHG_1617) and purification using a QIAGEN kit.
Next, the GFP-UV gene was PCR amplified from plasmid pBR_PprpR-GFPUV and purified after which the three DNA fragments were added at an equimolar concentration to a Gibson assembly mix subsequently incubated at 50 • C. Five µL of the Gibson mix was used to transform chemically competent E. coli DH5α cells by heat shock and after incubation at 37 • C, 100 µL of cells were spread on LB agar plates containing ampicillin (100 µg/mL) and incubated at 37 • C for 16 h.Ampicillin resistant colonies were then tested by colony PCR using the primer sets of P cauto -GFP_conf(FWD-1)/P cauto -GFP_conf(REV-1) and P cauto -GFP_conf(FWD-2)/P cauto -GFP_conf(REV-2) (Supplementary File 2).Confirmed colonies were picked and grown overnight on LB containing antibiotic for plasmid recovery.The digestion profile confirmed the assembly of plasmid pBR_P cauto _GFP.
The P cauto -GFP-UV was excised from pBR_P cauto _GFP using restriction enzyme HindIII.Digestion mix was loaded on a 1% agarose gel and the P cauto -GFP-UV region recovered using a QIAGEN gel extraction kit.Then, the recovered DNA sequence was cloned into plasmid pACYC184, which was previously digested with HindIII and purified using a QIAGEN kit.
Ligation was performed according to the manufacturer's instruction and 5 µL of the mix was used to transform E. coli DH5α competent cells.After heat shock and incubation, 100 µL of cells were spread on LB agar containing chloramphenicol (30 µg/mL) and incubated at 37 • C for 16 h.Chloramphenicol-resistant colonies were tested by colony PCR for proper assembly.Positive colonies were then grown overnight on LB media and the plasmid was recovered.Assembly of plasmid pACYC_P cauto _GFP was confirmed by digestion profile and Sanger sequencing (AGRF, Australia) (data not shown).

Construction of Variants for the P cauto Promoter Motif Region
Later a new reporter system including the P cauto and the GFPuv sequences was built to remove the 500 bp upstream region in pBR_P cauto _gfp.The idea was to keep only the sequence used for the pull-down assay plus including the ribosomal binding site (Shine-Dalgarno sequence) to be tested in vivo with TetR-family protein (CAETHG_0459) and σ A (CAETHG_2917) (see next section), the two proteins that responded positively in the in vivo assay (see section Results).This new plasmid, pBR_P cauto 130_gfp, was built by cloning the PCR product of primers WLP130F and WLP130R using pBR_P cauto _gfp as template, at the HindIII site of pBR322 by Gibson assembly (Supplementary File 2).Then, the P cauto 130_gfp region was excised from pBR_P cauto 130_gfp using HindIII and ClaI and cloned by ligation in pACYC184 to build plasmid pAC_P cauto 130_gfp.A variation of the promoter region (pAC_P cauto 30C_gfp) was also built to introduce single nucleotide changes in the WLP promoter motif.Changes were as follow: ctggagcaggttttgtagttgcagtaactggttcaata, changed to ccatcaaaggtcttaaagttgcagtaactggttcaata.This promoter was again tested with the TetR-family protein (CAETHG_0459) and σ A (CAETHG_2917).Plasmids maps available upon request.
In vivo Transcription Activation of P cauto -GFP(UV) Fusion by the Candidate Genes in E. coli Escherichia coli BL21 was used for the in vivo assay.Firstly, six biological replicate cultures of cells were grown in a 96-well plate (Corning Costar catalog number #3799) carrying the pACYC plasmid with or without (to correct for the autofluorescence of the cells) the promoter-GFPuV fusion reporter in trans with a pET plasmid carrying each of the σ factor candidates.Additionally, a system with cells carrying either the pACYC promoter-GFPuV fusion reporter or its backbone plasmid plus the pET plasmid with no candidate was used as the control.
Cells were grown in 150 µL of LB media containing kanamycin and chloramphenicol at 30 • C and agitation of 200 RPM.At mid-exponential phase, cells were sub-cultured to a black 96-well plate (Greiner #655090) to an initial OD of 0.05-0.1 in LB media containing kanamycin and chloramphenicol supplemented with either 0.0 mM IPTG (No IPTG) or 1.0 mM IPTG.The in vivo experiment was performed at 30 • C and agitation of 200 RPM.
Growth was followed by measuring the optical density (OD) at 600 nm while fluorescence intensity (FI; for GFP expression) was measured using the excitation filter of 355 nm and an emission filter of 520 nm.The experiment was conducted using the FLUOstar Omega microplate reader and the Omega software v.1.20(BMG LabTech).Fluorescence intensity was normalized per OD (FI/OD) and the signal resulting from the cells harboring the backbone plasmid only was subtracted from the cells carrying the promoter-GFP fusion reporter (Normalized FI/OD).
For the WLP promoter motif variants (described in the previous sentence) four biological replicates were used.
Student's t-test (two-tailed) was performed between each of the candidate's normalized FI/OD value without and with IPTG and between the control system.A candidate gene was considered to activate gene expression from P cauto if it met both of the following two conditions: (1) there was a statistically significant difference (p-value < 0.01) in FI/OD between the candidate without and with IPTG; (2) there was a statistically significant difference (p-value < 0.01) between the FI/OD signal of the candidate and the control vector (PET_) with IPTG.

Overproduction and Purification of TetR-Family Protein (CAETHG_0459)
To enable the test whether the TetR-family protein CAETHG_0459 activates transcription from P cauto by interacting directly with the RNAP, the target protein had to be heterologously expressed and purified for the protein-protein interaction (PPI) assay.
The E. coli strain harboring the plasmid pET_TetR1 (CAETHG_0459) was grown at 30 • C and 200 RPM until midexponential phase in LB media containing kanamycin.Cells were sub-cultured to 1 L LB media containing kanamycin to an initial OD of 0.05-0.1 and subsequently grown until OD ∼1 at 30 • C and 200 RPM.Then, 1.0 mM IPTG was added and cells were left growing until OD ∼3.Cells were pelleted from 1 L culture by centrifugation at 5,000 × g for 20 min at 4 • C, the pellet was resuspended in 5 mL of the BugBuster Master Mix (Merck Millipore #71456) per gram of wet cell weight with EDTA-free protease inhibitor cocktail (Sigma #11836170001), and then incubated in a rotating mixer for 20 min at room temperature.Next, cells debris were removed by centrifugation at 16,000 × g for 20 min at 4 • C and the supernatant (supplemented with 20 mM Imidazole) was loaded on a 1 mL Ni + -HisTrapHP column (GE Healthcare #71-5027-68 AK) and washed with a buffer containing 100 mM Tris-HCl (pH 7), 100 mM NaCl, 20 mM Imidazole.
The TetR-family protein CAETHG_0459 was eluted in the same wash buffer containing a stepwise imidazole gradient (50-500 mM) following a buffer exchange performed using a HiTrap Desalting column (GE Healthcare #17-1408-01).Finally, the purified protein was stored in 50 mM Na 2 HPO 4 , 300 mM NaCl, pH7, 50% glycerol.Protein purity was analyzed by gel electrophoresis using NuPAGE R Novex R Bis-Tris (Invitrogen) and stained with SimplyBlue TM SafeStain (Novex).Protein concentration was measured by the Direct Detect Spectrometer (Merck Millipore).

Electrophoresis Mobility Shift Assay (EMSA)
The EMSA experiment was performed using the purified protein extract of the TetR-family protein (CAETHG_0459) and the 130 bp oligonucleotide containing the new promoter motif used for the pull-down assay using both agarose and polyacrylamide gels.The binding reaction contained 2 µL of 5x binding buffer (50 mM Tris HCl pH 8.0, 720 mM KCl, 2.6 mM EDTA, 0.5% Triton-X-100, 62.5% glycerol, 1 mM DTT), 5 µL of the extracted protein, 3 µL of a 20 µM oligonucleotide and 0.5 µL of 100x BSA.
The binding reaction was incubated for 1 h at room temperature.For the reaction on the polyacrylamide gel including the E. coli RNA polymerase Core enzyme (BioLabs #M0550S), the reaction was further incubated for 30 min after adding 3 µL (3 units) of RNAP.Gel electrophoresis was performed in either 1% agarose or 7.5% polyacrylamide gels.Controls with only the extracted protein or DNA were also loaded.For agarose, the entire sample was loaded and run at 90 V for 120 min.For polyacrylamide, the entire sample was loaded and run at 150 V for 50 min.The gels were stained with SYBR Safe and visualized using BIO-RAD ChemiDoc.

Visualization of Cells Harboring the P cauto -GFP(UV) Fusion and the TetR-Family Protein (CAETHG_0459) Plasmids by Microscopy
Cells carrying the P cauto -GFP(UV) fusion reporter and the TetRfamily protein (CAETHG_0459) plasmids were analyzed by microscopy to visualize the expression of GFP.For this, cells were plated in a LB agar plate (LB media containing 6 g/L of agar) containing 1.0 mM IPTG, kanamycin, and chloramphenicol.After overnight incubation at 37 • C, colonies were visualized using the ZOE TM Fluorescent Cell Imager (Bio-Rad) using the manufacturer's instructions and following parameters: Gain: 40; Exposure (ms): 340; LED intensity: 22; Contrast: 59.

Differential RNA-Sequencing (dRNA-Seq)
In this work, we aimed to determine the TSSs of essential genes for autotrophic growth of the model-acetogen C. autoethanogenum (e.g., genes in the WLP and of hydrogenases).We thus performed dRNA-Seq analysis (Sharma et al., 2010) of autotrophic (CO, CO 2 , and H 2 ; referred to as 'syngas') and heterotrophic (fructose) cultures of C. autoethanogenum to experimentally determine TSSs and promoter motif(s) associated with essential genes for autotrophic growth in acetogens.
Previously described batch cultures (Marcellin et al., 2016) were sampled during exponential growth and subjected to dRNA-Seq cDNA library preparation and sequencing.The cDNA libraries were prepared using the 5 tagRACE method (Fouquier et al., 2011), an improved library preparation method compared to TEX (5 -phosphate-dependent Terminator RNA exonuclease) that has the advantage of preserving the quantitative representation of 5 ends between processed (5 -P end) and primary (5 -PPP end) transcripts (see Materials and Methods).TSSs were determined by comparing the libraries enriched for processed (TAP-) and primary (TAP+) transcripts (Figure 1A) using the TSSAR tool (Amman et al., 2014).

Overall dRNA-Seq Features of C. autoethanogenum
We classified TSSs as primary, internal, antisense, and orphan (Figure 1B and Supplementary Table S1) and found primary TSSs only for around half of the annotated genes (3,983) in C. autoethanogenum (Brown et al., 2014) (Supplementary Table S1).More than 60% of the genes contain only one primary TSS, while the rest show up to 12 TSSs (Figure 1C and Supplementary Table S1).Focusing on the 14 main metabolic groups of C. autoethanogenum genes as described in Brown et al. (2014), we detected primary TSSs for all genes except for the Nfn transhydrogenase complex (CAETHG_1580) (Supplementary Table S2).While primary TSSs were detected for seven of the 11 genes of the WLP biosynthetic gene cluster (CAETHG_1606-21), only half of the WLP TSSs were shared between syngas and fructose.For example, genes of the WLP methyl branch (CAETHG_1614-17) contained 20 primary TSSs on syngas compared to only nine on fructose.On the other hand, the TSSs associated with Hydrogenases and ATPase genes were found in similar numbers between syngas and fructose.
Determination of nucleotide base preferences for transcription initiation within five nucleotides downstream and upstream of the primary TSSs showed a clear enrichment of adenine (A) and guanine (G) at +1 (∼90%) and thymine (T) at −1 for both syngas (Figure 1D) and fructose (data not shown).Overall, adenine and cytosine were the most and least preferred nucleotide bases, respectively.
Analysis of 5 untranslated regions (5 UTRs)-the sequence between the TSS and the annotated start codon-indicates transcripts potentially associated with post-transcriptional regulation and thus of mRNA stability and translational efficiency (Cho et al., 2014).Calculation of 5 UTR lengths for primary TSSs showed a median length of 63 nt with 65% of TSSs < 100 nt for both growth conditions (Figure 1E and Supplementary Table S1).Genes with longer UTR lengths tend to be regulated more at the post-transcriptional level (David et al., 2006;Cho et al., 2009).On the other hand, leaderless mRNAs-mRNAs with no or <10 nt 5 UTR-are translated in the absence of upstream signals (typically the Shine-Dalgarno sequence) (Shine and Dalgarno, 1974;Zheng et al., 2011) used for regulating translational efficiency through ribosome binding.We found ∼70 (∼2%) leaderless mRNAs with <10 nt 5 UTRs, none of which were in the WLP, Hydrogenases, Acetate or Ethanol groups (Figure 1E and Supplementary Table S3).
In addition to the ability to determine TSSs, dRNA-Seq analysis also facilitates a more accurate annotation of the genome.Based on the TSSs and the Shine-Dalgarno (AGGAGG) position that was found to be highly conserved within 9-14 nt upstream of the first start codon (ATG/CTG/GTG/TTG) (Figure 1F), we re-annotated the start codon for 38 genes and confirmed the changes in one gene by peptide identification using mass spectrometry (Supplementary Table S4).Moreover, either the start or stop codon of an additional 99 genes, which had previously been annotated in different frames, were manually corrected.The corrections have been deposited into NCBI under the accession number BK010482 and the complete manually corrected genbank file of C. autoethanogenum is available in Supplementary Table S5.

Discovery of a New Promoter Motif
The RNA polymerase (RNAP) needs to form a holoenzyme with a σ factor in bacteria to recognize a specific promoter motif (sequence) and initiate transcription (Gruber and Gross, 2003;Feklistov et al., 2014).Experimentally determined TSS data from dRNA-Seq analysis is ideal for in silico determination of promoter motifs, which is important for understanding transcriptional regulation, especially in less-studied bacteria such as acetogens.
We searched for consensus sequence motifs 50 nt upstream of primary TSSs using the MEME software (Bailey et al., 2009) and were able to determine seven promoter motifs in C. autoethanogenum (E-value ≤ 0.05) (Supplementary Tables S6, S7 for syngas and fructose growth, respectively).Of those identified, only three motifs were assigned with more than 100 TSSs and shared between the two datasets, likely representing the most conserved motifs in C. autoethanogenum (Figure 2A).
The third most abundant promoter motif has, to the best of our knowledge, not previously been reported in the literature (Figure 2A).The new promoter motif (termed here P cauto ), is highly conserved both during growth on syngas (Motif 02 in Supplementary Table S5; 392 TSSs; E-value < 10 −174 ) and fructose (Motif 03 in Supplementary Table S6; 224 TSSs; E-value < 10 −77 ).Importantly, P cauto seems to be involved in the transcriptional regulation of essential genes for acetogens and was assigned to genes of the WLP cluster (CAETHG_1606-21) and the metabolic groups, as described in Brown et al. (2014), of Hydrogenases, Acetate, ATPase, and Pyruvate (Figure 2B and Supplementary Tables S2, S6, S7).We confirmed the unique presence of the "new promoter motif " upstream of the TSSs.Investigation of its upstream regions up to 100 or 150 nt showed no other motif apart from the one conserved within 50 nt upstream of TSSs.This new promoter is well characterized by an evenly interspaced (A/T)G repetition with an almost central A/T position (Supplementary Figure S1).These observations potentially indicate the presence of a new σ factor or transcriptional regulator of critical importance in acetogens.

RNA Polymerase and Proteins Annotated as Transcriptional Regulators Specifically Bind P cauto
We performed DNA-protein binding assays to determine if the RNAP and/or other protein(s) bind to P cauto .The promoter sequences of two WLP genes (CAETHG_1615 and 1617, Methylene-tetrahydrofolate reductase domain-containing protein and Methenyl-tetrahydrofolate cyclohydrolase,  S2, S5-S8 for all TSSs and genes associated with P cauto .(C) The P cauto motif is represented in other industrially relevant acetogens.Occurrence in each acetogen relative to C. autoethanogenum is normalized with the number of annotated genes.To determine promoter motifs in C. autoethanogenum, we searched for consensus sequence motifs 50 nt upstream of primary TSSs using the MEME software (Bailey et al., 2009) with the following parameters: -dna, -max size 10000000, -mod zoops, -nmotifs 50, -minw 4, -maxw 50, -revcomp, -oc.
respectively) annotated with P cauto were used for the DNAprotein binding assay using the promoter pull down/DNA affinity chromatography method (Figure 3A; Jutras et al., 2012).
The promoter sequence of a glycolytic gene (CAETHG_3424, glyceraldehyde-3-phosphate dehydrogenase, type I) was included as a control for the assay since it was assigned the well-known TATAAT motif, which should yield binding of the RNAP and the housekeeping σ factor, σ A .DNA-bound proteins captured using streptavidin-coupled magnetic Dynabeads TM were identified using mass spectrometry of the digestion products of the whole captured material and of gel band excisions.Since this DNA-protein binding assay requires significant amounts of cellular protein material, especially for efforts to identify low abundance proteins such as σ factors or transcriptional regulators, autotrophic bioreactor chemostat cultures (CO or CO + H 2 ) of C. autoethanogenum described in a separate work (Valgepea et al., 2018) were sampled for this analysis.
The promoter pull down/DNA affinity chromatography method (Figure 3A; Jutras et al., 2012) was fine-tuned for C. autoethanogenum.Eluting the proteins with 500 mM NaCl yielded the most prominent bands while no bands were observed in the negative control when water was used instead of DNA (data not shown), which confirms that the identified proteins were pulled down by the DNA sequences (see section Materials and Methods).The alpha and beta subunits of the RNAP (CAETHG_1920 and 1954-55) were successfully identified for both P cauto (CAETHG_1615 and CAETHG_1617) and the TATAAT motif control (Figure 3B).Additionally, the RNAP omega subunit was identified in the whole purified DNAbound material for both motifs (Supplementary Table S8).The housekeeping σ A (CAETHG_2917) was detected for the TATAAT motif control as expected (Figure 3B).A stronger band was identified in the P cauto gels around 50 kDa and identified as a protein annotated as L-seryl-tRNA(Sec) selenium transferase (CAETHG_2839; 51.5 kDa) (Figure 3B).Finally, mass spectrometry analysis of the whole purified DNA-bound material identified three proteins annotated as transcriptional regulators (based on NC_022592.1)that were unique for the P cauto (Table 1) and found for both CO and CO + H 2 cultures across technical replicates of the DNA-protein binding assay (Supplementary Table S8).These proteins were likely not visible on the DNA-protein binding assay gels as both their respective mRNA and protein abundances in C. autoethanogenum are very low (Valgepea et al., 2017a(Valgepea et al., , 2018)).

TetR-Family Transcriptional Regulator (CAETHG_0459) Activates Transcription From P cauto in vivo
To determine whether any of the three identified protein candidates annotated as transcriptional regulators that uniquely bind to P cauto (Table 1) could activate transcription from this promoter, we created a transcriptional fusion reporter vector harboring the sequence of P cauto in-frame with a green fluorescence protein (GFPuV).We also tested transcriptional activation using the L-seryl-tRNA(Sec) selenium transferase (CAETHG_2839) (identified as a stronger band in the pulldown assay, Figure 3B), and using the housekeeping σ factor in clostridia (σ A ) (CAETHG_2917), since it has been reported that promoter binding sites of different σ factors can overlap (Cho et al., 2014).Transcriptional activation of P cauto with concomitant GFP production was investigated in E. coli by inducing the expression of the candidate activator proteins from a second T7 protein over-expression vector cloned into plasmid pET28e+ by the addition of IPTG (see section Materials and Methods).Fluorescence was measured at early exponential growth (OD ∼0.26) as FI/OD.
After subtracting the signal from cells harboring the two plasmids but lacking the fusion reporter (promoter + GFP, see section Materials and Methods), only induction of the TetRfamily transcriptional regulator protein (CAETHG_0459) (out of the three transcriptional regulator candidates) led to statistically higher levels of GFP expression (p < 0.01) compared to the control vector with no candidate (Figure 4A and Supplementary Table S9).Interestingly, induction of σ A also led to transcription activation (p < 0.01).We then confirmed expression of GFP in the strain expressing CAETHG_0459 grown on a plate with IPTG using fluorescence microscopy (Figure 4B).This shows that both CAETHG_0459 and σ A independently activate transcription from P cauto .Importantly, the motif is associated with the expression of essential genes in gas-fermenting acetogens including genes in the WLP and hydrogenases (Supplementary Tables S2, S6, S7) that show higher transcript expression during growth on gas compared to sugar (Supplementary Table S10).
The 130 bp variant (which includes the sequence used for the pull-down assay plus the ribosomal binding site) also showed statistically significance (p-value < 0.01) of fluorescence increase when TetR-family transcriptional regulator protein (CAETHG_0459) was present.Similarly, σ A could also activate transcription, however, only at the level of p-value < 0.05.Interestingly when mutations were included in the promoter motif, TetR (CAETHG_0459) could no longer activate expression of GFP, as expected (Figure 4A).Additionally, electrophoretic mobility shift assay (EMSA) experiments confirmed binding of the TetR-family protein (CAETHG_0459) together with the RNAP to the 130 bp promoter sequence used for the pulldown assay (Supplementary Figure S2).We tried to test the effect of TetR (CAETHG_0459) expression knock-down on the phenotype of C. autoethanogenum but were unsuccessful in obtaining reproducible results.

CAETHG_0459 Directly Binds to the RNA Polymerase Core Enzyme
As TetR-family proteins often act as transcriptional regulators (Cuthbertson and Nodwell, 2013), we next investigated whether TetR-family protein CAETHG_0459 activates transcription from P cauto by interacting directly with the RNAP.Transcriptional regulators can reversibly interact with the RNAP Core enzyme independently of a DNA sequence to help activate transcription from a range of promoters (Burgess and Anthony, 2001;Feklistov et al., 2014).We thus performed an in vitro PPI assay to test whether protein CAETHG_0459 directly interacts with RNA polymerase Core in the absence of DNA.The purified His-tagged CAETHG_0459 protein linked to Ni 2+ -beads was incubated with the RNAP Core enzyme (see section Materials and Methods).SDS-PAGE analysis clearly demonstrated an interaction between the core RNA polymerase and CAETHG_0459 (Figure 4C lane 6) and shows that CAETHG_0459 acts as a positive transcriptional regulator that activates transcription from P cauto by directly binding to the RNAP.

P cauto Is Represented in Other Acetogens
We next investigated if P cauto was represented in other industrially relevant acetogens with available genomes: Clostridium ljungdahlii, C. ragsdalei, C. coskatii, Moorella thermoacetica, and Eubacterium limosum (Bengelsdorf et al., 2016;Shin et al., 2016;Redl et al., 2017;Song et al., 2017).We performed the reverse of the methodology previously used to search for consensus sequence motifs by looking for the occurrence of P cauto 300 nt upstream of annotated genes (since no TSS data was available) using the FIMO tool (Grant et al., 2011) within MEME.As expected based on their phylogenetic proximity (Bengelsdorf et al., 2013;Brown et al., 2014;Shin et al., 2016), C. ljungdahlii, C. ragsdalei, and C. coskatii showed similar occurrences of P cauto (Figure 2C).Interestingly, while the representation in M. thermoacetica was very low, P cauto seems to be present also in E. limosum.This result highlights the need for experimental determination of TSSs in more acetogens.

DISCUSSION
Acetogens offer an enormous potential for the production of fuels and chemicals from gaseous waste feedstocks (Dürre and Eikmanns, 2015;Claassens et al., 2016;Liew et al., 2016;Molitor et al., 2016), with ethanol already being produced at industrial scale by LanzaTech.Acetogens have two major carbon fixation pathways: the WLP for autotrophic growth and glycolysis for heterotrophic growth.Although both the WLP and glycolysis/gluconeogenesis pathways operate during autotrophic and heterotrophic growth, the WLP carries a substantially higher metabolic flux during autotrophy (Valgepea et al., 2017a(Valgepea et al., , 2018) ) and vice versa (Valgepea et al., 2017b).Transcriptomic studies have shown that transcriptional regulation between autotrophic and heterotrophic growth in acetogens is complex and includes many non-obvious expression changes (Nagarajan et al., 2013;Tan et al., 2013;Marcellin et al., 2016;Aklujkar et al., 2017).We thus aimed to determine TSSs and transcriptional features of promoter motifs and transcriptional regulators associated with essential genes (including genes of the WLP) in the modelacetogen C. autoethanogenum.
Our study revealed a new promoter motif and the identification of two proteins activating gene expression from the new motif [the TetR-family protein (CAETHG_0459) and the housekeeping σ A (CAETHG_2917)].An alternative TetR transcriptional regulator has been previously found to be a σ factor in Clostridium tetani, and its homologs, TcdR in C. difficile, BotR in C. botulinum, and UviA in C. perfringens have also been found to regulate toxin production (Raffestin et al., 2005;Dupuy and Matamouros, 2006;Dupuy et al., 2006).In combination, these results suggest that TetR proteins can play an important role in transcriptional regulation in clostridia.These studies support our PPI assay potentially suggesting that the TetR-family protein might function as a σ factor in C. autoethanogenum, but further studies (in vitro transcription assay) are needed to confirm this.In fact, unequivocal demonstration of σ factor activity requires that a protein is necessary and sufficient for activation of promoter recognition and transcription initiation by RNAP, independent of any other σ factor subunit.Thus our results do not exclude the possibility that a native σ factor of the in vivo expression host (E.coli), e.g., σ 70 , could have induced the TetR-family protein to drive transcription from P cauto .Additional studies should also be performed to study whether both the σ A and the TetR-family protein show an overlap in the promoter motif for transcriptional activation (Cho et al., 2014).
Notably, there are several TetR-family proteins, commonly regarded as transcriptional regulators (Cuthbertson and Nodwell, 2013), annotated in the C. autoethanogenum genome.In pathogenic clostridia these TetR-family proteins are often described as alternative σ factors, belonging to a class of σ factors called extracytoplasmic function (ECF) σ factors (Feklistov et al., 2014;Sineva et al., 2017).Their discovery led to a novel class of σ factors (group 5), which show a −35 and/or −10 conserved region in their target promoters (Dupuy et al., 2005(Dupuy et al., , 2006;;Dupuy and Matamouros, 2006;Staroń et al., 2009).It will be interesting to see whether transcription from P cauto described here with an interspaced repetition of (A/T)G notably distinct from the canonical −35/−10 conserved regions is also activated by a novel σ factor.
Our work also shows that the housekeeping σ factor (σ A ) in clostridia can activate transcription from P cauto associated with essential genes for autotropic growth in acetogens.Interestingly, in another acetogen E. limosum, the promoter regions of genes of the WLP, hydrogenases, and ATPase contain the well-known -35 TTGACA and -10 TATAAT motifs for the housekeeping σ factor (σ A ) (Burgess and Anthony, 2001;Song et al., 2017).This potentially indicates that the housekeeping σ A in acetogens can initiate transcription from different promoter motifs and illustrates well the great extent of genetic diversity among the non-taxonomic group of acetogens.While the WLP itself is highly conserved, it is not surprising that transcriptional regulation is diverse (Drake et al., 2006;Shin et al., 2016).The work presented here also highlights the importance of P cauto in other industrially relevant acetogens (Figure 2C).We believe, however, that more studies are needed for the experimental determination of TSSs and transcriptional features to facilitate a broader understanding of transcriptional regulation in acetogens.
Our findings have the potential to significantly advance the understanding of transcriptional regulation and metabolic engineering of the ancient metabolism of acetogens.Firstly, acetogen metabolism, which operates at the thermodynamic edge of feasibility (Schuchmann and Müller, 2014), seems to be wired for utilizing less energy-consuming mechanisms (i.e., transcriptional vs. translational regulation) for operating under different conditions evidenced by the complexity of the condition-specific transcriptional architecture (Valgepea et al., 2018).More importantly, the discovery of P cauto and a key positive transcription factor (TetR-family protein) in acetogens can lead to the mechanistic description of transcriptional regulation of arguably the first biochemical pathway on Earth (Russell and Martin, 2004;Fuchs, 2011;Weiss et al., 2016).
In addition to expanding the fundamental understanding of a model acetogen, knowledge of the features controlling the expression of essential genes in acetogens could also contribute for the improvement of commercial gas fermentation for the sustainable production of fuels and chemicals.Increasing or modulation of the activity of the described TetR transcription factor (either through over-expression and/or protein engineering or by deleting transcriptional repressor genes) could enhance the uptake of C 1 substrates through the WLP and thus improve growth and/or product formation (possibly by introducing P cauto in front of key genes).It could also be used as an orthologous system in other organisms, as, for instance, the TcdR system has been used in other Clostridium species (Zhang et al., 2015;Minton et al., 2016).Importantly, the newly discovered promoter P cauto could be harnessed to couple expression of heterologous pathways to mimic those of key central metabolism enzymes, potentially alleviating the common problem of imbalanced flux throughput between heterologous and native metabolic pathways.

FIGURE 1 |
FIGURE 1 | Characteristics of transcriptional and translational architecture in C. autoethanogenum.(A) Our dRNA-Seq approach generated genome-wide TSS maps through the comparison of libraries enriched for processed (TAP-) and primary (TAP+) transcripts.(B) Classification of TSSs for syngas and fructose as: primary, within 250 nt upstream of an annotated gene; internal, within an annotated gene; antisense, on the opposite strand of an annotated gene; orphan, not assigned to any of the previous classes.(C) Distribution of primary TSSs per gene for syngas and fructose.(D) Nucleotide base preference for transcription initiation from primary TSSs on syngas.+1 denotes the position of the TSS.(E) Distribution of 5 UTR lengths for primary TSSs for syngas and fructose.(F) The Shine-Dalgarno sequence AGGAGG is highly conserved within 9-14 nt upstream of the first start codon.Sequencing reads were processed with the TSSAR software(Amman et al., 2014) for automated de novo determination of TSSs from dRNA-Seq data using the following parameters: p-Value 1e-3, Noise threshold 10, Merge range 5.The Shine-Dalgarno sequence was searched 30 nt upstream of annotated genes (CP006763.1 and NC_022592.1)using the MEME software(Bailey et al., 2009) and the same parameters as for promoter motif search, except for -nmotifs 10, -maxw 30.See section Materials and Methods for details.

FIGURE 2 |
FIGURE 2 | In silico determination of genome-wide promoter motifs in C. autoethanogenum.(A) The top-3 promoter motifs for primary TSSs are shared among syngas and fructose.The height of the letter indicates its relative frequency at the given position within the motif.Refer to Supplementary Tables S5-S8 for all the determined motifs and their assigned TSSs.The mutated nucleotides used in the in vivo assay for P cauto motif are also shown.We show the nucleotide position relative to the TSS in all top3 motifs.(B) The new promoter motif (P cauto ) is assigned with TSSs of essential genes in acetogens.Motifs with the lowest p-value for syngas are shown.Refer to Supplementary TablesS2, S5-S8 for all TSSs and genes associated with P cauto .(C) The P cauto motif is represented in other industrially relevant acetogens.Occurrence in each acetogen relative to C. autoethanogenum is normalized with the number of annotated genes.To determine promoter motifs in C. autoethanogenum, we searched for consensus sequence motifs 50 nt upstream of primary TSSs using the MEME software(Bailey et al., 2009) with the following parameters: -dna, -max size 10000000, -mod zoops, -nmotifs 50, -minw 4, -maxw 50, -revcomp, -oc.

FIGURE 3 |
FIGURE 3 | DNA-protein binding assay shows specific binding of C. autoethanogenum RNAP subunits and a selenium transferase to the P cauto .(A) Overview of the DNA-protein binding assay (i.e., the promoter pull down/DNA affinity chromatography method,Jutras et al., 2012).(B) Separation of proteins specifically bound to the TATAAT motif (for gene CAETHG_3424) or the P cauto (for gene CAETHG_1617) with gel electrophoresis and identification using mass spectrometry.The alpha and beta subunits of the RNAP (CAETHG_1920 and 1954-55) were successfully identified for both the P cauto (CAETHG_1615 and CAETHG_1617) and the TATAAT motif control.Technical replicate denotes replicate of the DNA-protein binding assay (A) (data not shown for CAETHG_1615).

TABLE 1 |
Clostridium autoethanogenum proteins annotated as transcriptional regulators uniquely binding to the new promoter motif P cauto .