Original Research ARTICLE
A TetR-Family Protein (CAETHG_0459) Activates Transcription From a New Promoter Motif Associated With Essential Genes for Autotrophic Growth in Acetogens
- 1Australian Institute for Bioengineering and Nanotechnology (AIBN), The University of Queensland, Brisbane, QLD, Australia
- 2ERA Chair in Gas Fermentation Technologies, Institute of Technology, University of Tartu, Tartu, Estonia
- 3Queensland Node of Metabolomics Australia, The University of Queensland, Brisbane, QLD, Australia
- 4LanzaTech Inc., Skokie, IL, United States
Acetogens can fix carbon (CO or CO2) into acetyl-CoA via the Wood–Ljungdahl pathway (WLP) that also makes them attractive cell factories for the production of fuels and chemicals from waste feedstocks. Although most biochemical details of the WLP are well understood and systems-level characterization of acetogen metabolism has recently improved, key transcriptional features such as promoter motifs and transcriptional regulators are still unknown in acetogens. Here, we use differential RNA-sequencing to identify a previously undescribed promoter motif associated with essential genes for autotrophic growth of the model-acetogen Clostridium autoethanogenum. RNA polymerase was shown to bind to the new promoter motif using a DNA-binding protein assay and proteomics enabled the discovery of four candidates to potentially function directly in control of transcription of the WLP and other key genes of C1 fixation metabolism. Next, in vivo experiments showed that a TetR-family transcriptional regulator (CAETHG_0459) and the housekeeping sigma factor (σA) activate expression of a reporter protein (GFP) in-frame with the new promoter motif from a fusion vector in Escherichia coli. Lastly, a protein–protein interaction assay with the RNA polymerase (RNAP) shows that CAETHG_0459 directly binds to the RNAP. Together, the data presented here advance the fundamental understanding of transcriptional regulation of C1 fixation in acetogens and provide a strategy for improving the performance of gas-fermenting bacteria by genetic engineering.
The Wood–Ljungdahl pathway (WLP) of acetogens is speculated to be the first biochemical pathway on Earth that emerged when the atmosphere was still highly reduced and rich in CO, CO2, and H2 (Russell and Martin, 2004; Fuchs, 2011; Weiss et al., 2016). These C1 gases can be converted into acetyl-CoA through the WLP (Wood, 1991; Ragsdale and Pierce, 2008) and acetogens are the only known organisms using the WLP as a terminal electron-accepting, energy-conserving process to fix CO2 into biomass (Drake et al., 2006; Fuchs, 2011). This pathway is responsible for the production of acetic acid in quantities surpassing the billion ton mark annually. It is estimated that the pathway contributes to fixing ∼20% of the CO2 on Earth (Drake et al., 2006; Ljungdahl, 2009). All this takes place with the WLP operating at the edge of thermodynamic feasibility (Schuchmann and Müller, 2014) and requires the use of the third mode of energy conservation, electron bifurcation, which likely contributed to the emergence of life on Earth (Herrmann et al., 2008; Li et al., 2008; Nitschke and Russell, 2011). Acetogens are also attractive cell factories for the sustainable production of fuels and chemicals from gaseous waste feedstocks (e.g., syngas from gasified municipal solid waste and industrial waste gases) (Dürre and Eikmanns, 2015; Claassens et al., 2016; Liew et al., 2016; Molitor et al., 2016). While the field has advanced enormously in the last decade (Liew et al., 2016; Molitor et al., 2016), better fundamental understanding of acetogen metabolism is needed to guide rationale metabolic engineering, for example, to increase their substrate uptake or product yields.
Recent quantitative studies of acetogen physiology have expanded understanding of their metabolism considerably (reviewed in Schuchmann and Müller, 2014; Molitor et al., 2017). Although most biochemical details of the WLP are well established (Ragsdale, 1991, 1997, 2008) and systems-level understanding of acetogen metabolism has recently improved (Valgepea et al., 2017a, 2018), key transcriptional features such as promoter motifs and transcriptional regulators controlling the expression of genes needed for autotrophic growth are yet unknown. This information could benefit acetogen metabolic engineering and improve our understanding of their complex transcriptional regulation (Nagarajan et al., 2013; Tan et al., 2013; Marcellin et al., 2016; Aklujkar et al., 2017). Prediction of promoter motifs strictly based on computational analysis (based solely on the organism’s genome sequence) has the drawback of detection of promoter-like sequences across the genome, which is particularly pronounced in non-conserved DNA motifs (Patrik, 2006). An instrumental step toward more accurate promoter motif identification was the development of the differential RNA-sequencing (dRNA-Seq) technology, first described in 2010 by Sharma and colleagues (Sharma et al., 2010) for the human pathogen Helicobacter pylori.
dRNA-Seq enables the experimental determination of transcription start sites (TSSs) and correct mapping of TSSs enables genome-wide identification of promoters and gene expression regulatory sequences, besides providing experimental data for a more accurate genome annotation. Once a TSS has been experimentally determined, promoter sequences can be mapped from there. Thus, characterization of the transcriptional architecture (i.e., TSSs and promoter motifs) and a more accurate annotation of acetogen genomes have the potential to yield valuable insights into the complex transcriptional regulation of acetogens. To date, only one study has determined TSSs in acetogens, using Eubacterium limosum (Song et al., 2017). Here, we used dRNA-Seq as a tool to identify the TSSs in the model-acetogen Clostridium autoethanogenum grown under autotrophic and heterotrophic conditions. The subsequent search for promoter motifs detected a previously undescribed motif associated with essential genes in acetogens. We then provide experimental evidence for the relevance of this new promoter motif (termed hereafter Pcauto) by identifying a TetR-family protein that activates gene expression from this motif by directly binding to the RNA polymerase.
Materials and Methods
Bacterial Strains and Growth Conditions
Clostridium autoethanogenum strain DSM 10061 was obtained from The German Collection of Microorganisms and Cell Cultures (DSMZ). Cells were grown as described before (Marcellin et al., 2016) for acquiring samples for differential RNA-sequencing (dRNA-seq). Briefly, heterotrophic and autotrophic growth were investigated in serum bottles on fructose (5 g/L) and on steel mill off-gas (35% CO, 10% CO2, 2% H2 and 53% N2), respectively. Cells were grown at 37°C on a shaker (100 RPM, rounds per minute) and sampled for dRNA-Seq analysis from the exponential growth phase (OD600 nm = 0.5–0.6).
Differential RNA-Sequencing (dRNA-Seq)
Extraction and preparation of RNA for cDNA library construction were performed as described elsewhere (Marcellin et al., 2016). Briefly, RNA was extracted using TRIzol followed by column purification with RNAeasy (Qiagen). The resulting total RNA pools were sent to Vertis Biotechnologie AG (Freisig, Germany) for sequencing. The cDNA libraries were prepared using the 5′tagRACE method (Fouquier et al., 2011). Firstly, the 5′ Illumina TruSeq sequencing adapter carrying sequence tag TCGACA was ligated to the 5′-monophosphate groups (5′P) of processed transcripts (TAP- on Figure 1A). Samples were then treated with Tobacco Acid Pyrophosphatase (TAP) to convert 5′-triphosphate (5′PPP) structures of primary transcripts into 5′P ends to which the 5′ Illumina TruSeq sequencing adapter carrying sequence tag GATCGA was ligated (TAP+ on Figure 1A). Next, first-strand cDNA was synthesized using an N6 randomized primer to which the 3′ Illumina TruSeq sequencing adapter was ligated after fragmentation.
Figure 1. Characteristics of transcriptional and translational architecture in C. autoethanogenum. (A) Our dRNA-Seq approach generated genome-wide TSS maps through the comparison of libraries enriched for processed (TAP–) and primary (TAP+) transcripts. (B) Classification of TSSs for syngas and fructose as: primary, within 250 nt upstream of an annotated gene; internal, within an annotated gene; antisense, on the opposite strand of an annotated gene; orphan, not assigned to any of the previous classes. (C) Distribution of primary TSSs per gene for syngas and fructose. (D) Nucleotide base preference for transcription initiation from primary TSSs on syngas. +1 denotes the position of the TSS. (E) Distribution of 5′UTR lengths for primary TSSs for syngas and fructose. (F) The Shine-Dalgarno sequence AGGAGG is highly conserved within 9–14 nt upstream of the first start codon. Sequencing reads were processed with the TSSAR software (Amman et al., 2014) for automated de novo determination of TSSs from dRNA-Seq data using the following parameters: p-Value 1e-3, Noise threshold 10, Merge range 5. The Shine-Dalgarno sequence was searched 30 nt upstream of annotated genes (CP006763.1 and NC_022592.1) using the MEME software (Bailey et al., 2009) and the same parameters as for promoter motif search, except for -nmotifs 10, -maxw 30. See section Materials and Methods for details.
The 5′ cDNA fragments were amplified with PCR using a proofreading enzyme and primers designed for TruSeq sequencing according to the manufacturer’s instructions. The main advantage of using the 5′tagRACE method (Fouquier et al., 2011) for dRNA-Seq comes from amplifying the 5′ ends of processed and primary transcripts in a single PCR reaction, which preserves their quantitative representation in an RNA pool. Finally, 5′ cDNAs were purified using the Agencourt AMPure XP Kit (Beckman Coulter Genomics) and analyzed by capillary electrophoresis before sequencing the single-end libraries using the Illumina NextSeq 500 system and a MID 150 Kit with 75 bp read length.
Determination of Transcription Start Sites (TSSs)
Sequencing reads were aligned and mapped to the genome of C. autoethanogenum DSM 10061 (CP006763.1) using the software TopHat2 (Kim et al., 2013) without trimming or removal of any reads. Reads were processed with the TSSAR (TSS Annotation Regime) software (Amman et al., 2014) for automated de novo determination of TSSs from dRNA-Seq data using the following parameters: p-value 1e-3, Noise threshold 10, Merge range 5. The identified TSSs were classified as primary (within 250 nt upstream of an annotated gene), internal (within an annotated gene), antisense (on the opposite strand of an annotated gene), or orphan (not assigned to any of the previous classes) (Figure 1B). Since our main aim was the identification of the TSSs of essential genes for autotrophic growth in acetogens (e.g., WLP), we focused on the primary TSSs.
Search for Promoter Motifs and the Shine-Dalgarno Sequence
To determine promoter motifs, we searched for consensus sequence motifs 50, 100, and 150 nt upstream of primary TSSs using the MEME software (Bailey et al., 2009) with the following parameters: -dna, -max size 10000000, -mod zoops, -nmotifs 50, -minw 4, -maxw 50, -revcomp, -oc. Only motifs with E-value ≤ 0.05 and at least 13 TSSs associated to it (i.e., at least two genes associated to it, Figure 1C) were considered and ranked based on the number of assigned TSSs (Supplementary File 1).
To search for the Shine-Dalgarno sequence, 30 nt upstream of annotated genes (CP006763.1 and NC_022592.1) were searched with the MEME software (Bailey et al., 2009) using the same parameters as in the promoter motif search, except for -nmotifs 10, -maxw 30.
Search for the New Promoter Motif in Acetogens
Occurrence of the new promoter motif (see results) in C. autoethanogenum, C. ljungdahlii, C. ragsdalei, C. coskatii, M. thermoacetica, and E. limosum was determined using the FIMO tool (Grant et al., 2011) within the MEME software by searching for the sequence up to 300 nt upstream of annotated genes (since no TSS data is available) with default FIMO parameters. Occurrence in each acetogen relative to C. autoethanogenum was normalized with the number of annotated genes.
DNA-Binding Protein Assay
Firstly, C. autoethanogenum—DSM 19630—cells were acquired from autotrophic bioreactor chemostat cultures (CO or CO + H2) described in a separate work (Valgepea et al., 2018). Briefly, cells were grown in bioreactor chemostat cultures in the chemically defined medium on either CO or CO + H2 at 37°C, pH = 5, dilution rate of ∼1 day–1 (μ∼0.04 h–1), and at a biomass concentration ∼1.4 gDCW/L. Cells were pelleted by immediate centrifugation (20,000 × g for 2 min at 4°C), and stored at −80 °C until analysis.
Frozen pellets were thawed, resuspended in BS/THES buffer described in Jutras et al. (2012) with pH adjusted to 7.0, and passed five times through the EmulsiFlex-C5 High Pressure Homogenizer (Avestin Inc.) according to the manufacturer’s instructions, with the final sample volume adjusted to 35 mL with the BS/THES buffer. Samples were then centrifuged (35,000 × g for 15 min at 4°C) and the supernatant filtered using a 0.22 μM filter (Merck).
The DNA-binding protein assay was based on a pull-down/DNA affinity chromatography method described by Jutras et al. (2012) with the following modifications. The DNA sequences were of 125 bp length containing the respective promoter sequence in the middle with flanking regions downstream and upstream. pH of the buffers was adjusted to 7. The bait-target/ligand binding step was performed with 1 mL of cell extract without the addition of non-specific competitor DNA.
Next, either salmon sperm (Thermo) or Poly dI-dC (Sigma) were used as non-specific competitor DNA in the subsequent washing steps. Briefly, DynabeadsTM M-280 Streptavidin (Thermo Fisher Scientific) were mixed with DNA containing either the promoter sequence of CAETHG_1615, 1617 (WLP genes assigned with the new promoter motif), or 3224 (a glycolytic gene as a control for our assay since it was assigned the well-known TATAAT motif, which should yield binding of the RNAP and the housekeeping σ factor σA). Next, the cell extract was added and samples were incubated for 30 min at room temperature. This was followed by two washing steps with the BS/THES buffer (Jutras et al., 2012) to remove proteins not bound to the target DNA. Finally, protein elution was performed in Tris-HCl (pH 7) with a successively increasing concentration of NaCl (200, 300, 500, 750 mM, 1M, and 2M). The eluted protein solutions were analyzed by gel electrophoresis NuPAGE® Novex®Bis-Tris (Invitrogen) and visualized using Sypro® Ruby (Molecular Probes) according to the manufacturer’s instructions. The 500 mM NaCl eluate yielded the most prominent bands and therefore this eluate was used for further analysis. No bands were observed in the negative control when water was used instead of DNA (data not shown), confirming that the identified proteins were pulled down by the DNA sequences.
Protein Digestion for Mass Spectrometry-Based Proteomics
For the digestion of proteins from gel band excision, the gel bands of interest were cut and de-stained for 1 h with a buffer of 50 mM ammonium bicarbonate (ABC) in 50% acetonitrile (ACN). Following buffer removal, 50 μL of 10 mM DTT was added and samples were incubated for 30 min at 60°C to reduce disulfide bonds. Next, the DTT solution was removed, and 50 μL of 55 mM iodoacetamide (IAA) was added and samples were incubated for 30 min in the dark at room temperature to alkylate sulfhydryl groups. After removal of the IAA solution, gel pieces were washed twice with 100 μL of 50 mM ABC, and dehydrated with 100% ACN. Protein digestion was performed overnight at 37°C by rehydrating gel pieces with 50 μL of Trypsin/Lys-C mix (10 ng/μL in 25 mM ABC) and 100 μL of ABC.
Extraction of peptides from gel pieces was performed by repeating the following steps five times: addition of 100 μL of 0.1% formic acid (FA) in 50% ACN and sonication of samples in a water bath for 10 min. Samples were then concentrated to near dryness using a centrifugal vacuum concentrator (Eppendorf) and resuspended in 50 μL of 0.1% FA. Finally, samples were desalted using C18 ZipTips (Merck Millipore) as follows: the column was wetted using 0.1% FA in 100% ACN, equilibrated with 0.1% FA in 70% ACN, and washed with 0.1% FA before loading the sample and washing again with 0.1% FA. Finally, peptides were eluted with 0.1% FA in 70% ACN, and then diluted 10-fold with 0.1% FA for mass spectrometry analysis.
For the digestion of proteins from the whole purified DNA bound material, the whole purified DNA-bound material from the DNA-protein binding assay was incubated for 30 min at 95°C. Next, 30 μL of 10 mM DTT was added and samples were incubated for 45 min at 55°C to reduce disulfide bonds. Then, 40 μL of 55 mM IAA was added and samples were incubated for 30 min in the dark at room temperature to alkylate sulfhydryl groups. Protein digestion was performed overnight at 37°C using 50 μL of Trypsin/Lys-C mix (10 ng/μL in 25 mM ABC) and stopped by lowering the pH to 3 using FA. Finally, the samples were desalted and prepared for mass spectrometry analysis as described above.
Protein Identification Using Mass Spectrometry
Detection of proteins in both the digestion products of gel band excision and the whole captured material was performed using a QTOF Sciex 5600 or a Thermo Orbitrap Elite mass spectrometer (depending on instrument availability) with details described elsewhere (Kappler and Nouwens, 2013; Yang et al., 2016) with a modified liquid chromatographic (LC) gradient. Briefly, a Shimadzu Prominence nanoLC system was used for desalting (on an Agilent C18 trap) and separating peptides (on a Vydac Everest C18 column) using a gradient consisting of 10–60% buffer B over 30 min followed by 60–97% buffer B over 8 min, where buffer A was 1% ACN in 0.1% FA and buffer B was 80% ACN in 0.1% FA. Protein identification was performed using the software ProteinPilot v5.0 (ABSciex) with the Paragon Algorithm against the NC_022592.1 and CP006763 genome annotations with the following search parameters: Trypsin + LysC digestion; IAA as cysteine alkylation; thorough search effort; FDR analysis. Only proteins below 1% false discovery rate (FDR; estimated global) and with at least two peptides with more than 95% confidence were considered as identified.
Proteomics data have been deposited to the ProteomeXchange Consortium1 via the PRIDE partner repository with the dataset identifier PXD014421.
Molecular Biology Techniques
The full list of bacterial strains, plasmids, and primers used in this work for the in vivo transcription assay and protein overexpression step are shown in Supplementary File 2. Luria-Bertani (LB) broth or agar with antibiotics were used for growth.
Escherichia coli DH5α was used as the cloning strain and performed transformations according to the manufacturer’s instructions (BIOLINE). E. coli BL21 was used in the in vivo transcription assay and protein overexpression step. E. coli BL21 chemically competent cells were prepared using the RuCl2 method (Green and Rogers, 2013).
PCR amplification of targeted sequences was performed using the Phusion polymerase (NEB) and the OneTaq polymerase (NEB). Plasmid were assembled using standard ligation with the T4 DNA ligase or using Gibson assembly (Gibson et al., 2009).
Construction of a σ-Factor Candidate Expression System in E. coli
Candidates for potential σ factors were selected based on protein identification using mass spectrometry (see above) from proteins annotated as transcriptional regulators (Table 1). Additionally, we also built a plasmid for the L-seryl-tRNA(Sec) selenium transferase (CAETHG_2839) (identified as a stronger band in the pull-down assay, Figure 3B), and the housekeeping σ in Clostridia (σA) (CAETHG_2917) (Figure 3B).
Table 1. Clostridium autoethanogenum proteins annotated as transcriptional regulators uniquely binding to the new promoter motif Pcauto.
The potential σ factor candidates were cloned into plasmid pET28a+ to be expressed under the control of a T7 promoter. DNA sequences were PCR amplified using the primers shown in Supplementary File 2 and purified using a QIAGEN kit. Next, the plasmid pET28a+ was linearized using restriction enzymes NdeI and HindIII and purified using a QIAGEN kit. Codon optimization was required to express the σ factor candidates of TetR-family protein (CAETHG_0459) and σA (CAETHG_2917) before DNA sequences were synthesized as gene block (gBlock®) fragments.
Plasmids with the σ factor candidates were then assembled by Gibson assembly using equimolar concentrations of the linearized backbone plasmid and the PCR fragment in a 20 μL reaction. After incubation at 50°C, 5 μL of the Gibson mix was then used to transform E. coli DH5α by heat shock. After recovery on SOC media at 37°C for 60 min, 100 μL of cells were spread on LB agar plates containing kanamycin (50 μg/mL). Plates were then incubated at 37°C for 16 h and kanamycin resistant colonies were tested by colony PCR for proper assembly using pET_conf(FWD)/pET_conf(REV) primers (Supplementary File 2). A colony that tested positive for assembly was then picked and grown overnight on LB media containing kanamycin. Plasmids were recovered from 5 mL of overnight culture using a QIAGEN miniprep kit and the digestion profile was verified with the assembly. Plasmids were then used to transform E. coli BL21 chemically competent cells (described above).
Escherichia coli BL21 strains harboring σ factor candidate-expressing plasmids were then grown overnight on LB media containing kanamycin and 1 mM IPTG (Isopropyl β-D-1-thiogalactopyranoside). Next, 2 mL of overnight culture were spun down and the supernatant was removed. The cell pellet was then resuspended in the BugBuster master mix solution (Novagen) for protein extraction following the manufacturer’s instructions. The insoluble and soluble fractions were loaded into an SDS-PAGE gel to confirm the overexpression of the σ factor candidates (data not shown).
Construction of a Pcauto_GFP-UV Reporter Fusion System in E. coli
To determine whether the σ factor candidates could activate transcription, we assembled a GFP-based reporter expression system under the control of the new promoter motif (termed here Pcauto). Firstly, plasmid pBR322 was digested with HindIII, purified, and used as the backbone followed by PCR amplification of the DNA sequence containing Pcauto from the C. autoethanogenum genome (500 bp upstream of the start codon of the gene CAETHG_1617) and purification using a QIAGEN kit.
Next, the GFP-UV gene was PCR amplified from plasmid pBR_PprpR-GFPUV and purified after which the three DNA fragments were added at an equimolar concentration to a Gibson assembly mix subsequently incubated at 50°C. Five μL of the Gibson mix was used to transform chemically competent E. coli DH5α cells by heat shock and after incubation at 37°C, 100 μL of cells were spread on LB agar plates containing ampicillin (100 μg/mL) and incubated at 37°C for 16 h. Ampicillin resistant colonies were then tested by colony PCR using the primer sets of Pcauto-GFP_conf(FWD-1)/Pcauto-GFP_conf(REV-1) and Pcauto-GFP_conf(FWD-2)/Pcauto-GFP_conf(REV-2) (Supplementary File 2). Confirmed colonies were picked and grown overnight on LB containing antibiotic for plasmid recovery. The digestion profile confirmed the assembly of plasmid pBR_Pcauto_GFP.
The Pcauto-GFP-UV was excised from pBR_Pcauto_GFP using restriction enzyme HindIII. Digestion mix was loaded on a 1% agarose gel and the Pcauto-GFP-UV region recovered using a QIAGEN gel extraction kit. Then, the recovered DNA sequence was cloned into plasmid pACYC184, which was previously digested with HindIII and purified using a QIAGEN kit.
Ligation was performed according to the manufacturer’s instruction and 5 μL of the mix was used to transform E. coli DH5α competent cells. After heat shock and incubation, 100 μL of cells were spread on LB agar containing chloramphenicol (30 μg/mL) and incubated at 37°C for 16 h. Chloramphenicol-resistant colonies were tested by colony PCR for proper assembly. Positive colonies were then grown overnight on LB media and the plasmid was recovered. Assembly of plasmid pACYC_Pcauto_GFP was confirmed by digestion profile and Sanger sequencing (AGRF, Australia) (data not shown).
Construction of Variants for the Pcauto Promoter Motif Region
Later a new reporter system including the Pcauto and the GFPuv sequences was built to remove the 500 bp upstream region in pBR_Pcauto_gfp. The idea was to keep only the sequence used for the pull-down assay plus including the ribosomal binding site (Shine-Dalgarno sequence) to be tested in vivo with TetR-family protein (CAETHG_0459) and σA (CAETHG_2917) (see next section), the two proteins that responded positively in the in vivo assay (see section Results). This new plasmid, pBR_Pcauto130_gfp, was built by cloning the PCR product of primers WLP130F and WLP130R using pBR_Pcauto_gfp as template, at the HindIII site of pBR322 by Gibson assembly (Supplementary File 2). Then, the Pcauto130_gfp region was excised from pBR_Pcauto130_gfp using HindIII and ClaI and cloned by ligation in pACYC184 to build plasmid pAC_Pcauto130_gfp. A variation of the promoter region (pAC_Pcauto30C_gfp) was also built to introduce single nucleotide changes in the WLP promoter motif. Changes were as follow: ctggagcaggttttgtagttgcagtaactggttcaata, changed to ccatcaaaggtcttaaagttgcagtaactggttcaata. This promoter was again tested with the TetR-family protein (CAETHG_0459) and σA (CAETHG_2917). Plasmids maps available upon request.
In vivo Transcription Activation of Pcauto-GFP(UV) Fusion by the Candidate Genes in E. coli
Escherichia coli BL21 was used for the in vivo assay. Firstly, six biological replicate cultures of cells were grown in a 96-well plate (Corning Costar catalog number #3799) carrying the pACYC plasmid with or without (to correct for the autofluorescence of the cells) the promoter-GFPuV fusion reporter in trans with a pET plasmid carrying each of the σ factor candidates. Additionally, a system with cells carrying either the pACYC promoter-GFPuV fusion reporter or its backbone plasmid plus the pET plasmid with no candidate was used as the control.
Cells were grown in 150 μL of LB media containing kanamycin and chloramphenicol at 30°C and agitation of 200 RPM. At mid-exponential phase, cells were sub-cultured to a black 96-well plate (Greiner #655090) to an initial OD of 0.05–0.1 in LB media containing kanamycin and chloramphenicol supplemented with either 0.0 mM IPTG (No IPTG) or 1.0 mM IPTG. The in vivo experiment was performed at 30°C and agitation of 200 RPM.
Growth was followed by measuring the optical density (OD) at 600 nm while fluorescence intensity (FI; for GFP expression) was measured using the excitation filter of 355 nm and an emission filter of 520 nm. The experiment was conducted using the FLUOstar Omega microplate reader and the Omega software v.1.20 (BMG LabTech). Fluorescence intensity was normalized per OD (FI/OD) and the signal resulting from the cells harboring the backbone plasmid only was subtracted from the cells carrying the promoter-GFP fusion reporter (Normalized FI/OD).
For the WLP promoter motif variants (described in the previous sentence) four biological replicates were used.
Student’s t-test (two-tailed) was performed between each of the candidate’s normalized FI/OD value without and with IPTG and between the control system. A candidate gene was considered to activate gene expression from Pcauto if it met both of the following two conditions: (1) there was a statistically significant difference (p-value < 0.01) in FI/OD between the candidate without and with IPTG; (2) there was a statistically significant difference (p-value < 0.01) between the FI/OD signal of the candidate and the control vector (PET_) with IPTG.
Overproduction and Purification of TetR-Family Protein (CAETHG_0459)
To enable the test whether the TetR-family protein CAETHG_0459 activates transcription from Pcauto by interacting directly with the RNAP, the target protein had to be heterologously expressed and purified for the protein-protein interaction (PPI) assay.
The E. coli strain harboring the plasmid pET_TetR1 (CAETHG_0459) was grown at 30°C and 200 RPM until mid-exponential phase in LB media containing kanamycin. Cells were sub-cultured to 1 L LB media containing kanamycin to an initial OD of 0.05–0.1 and subsequently grown until OD ∼1 at 30°C and 200 RPM. Then, 1.0 mM IPTG was added and cells were left growing until OD ∼3. Cells were pelleted from 1 L culture by centrifugation at 5,000 × g for 20 min at 4°C, the pellet was resuspended in 5 mL of the BugBuster Master Mix (Merck Millipore #71456) per gram of wet cell weight with EDTA-free protease inhibitor cocktail (Sigma #11836170001), and then incubated in a rotating mixer for 20 min at room temperature. Next, cells debris were removed by centrifugation at 16,000 × g for 20 min at 4°C and the supernatant (supplemented with 20 mM Imidazole) was loaded on a 1 mL Ni+-HisTrapHP column (GE Healthcare #71-5027-68 AK) and washed with a buffer containing 100 mM Tris-HCl (pH 7), 100 mM NaCl, 20 mM Imidazole.
The TetR-family protein CAETHG_0459 was eluted in the same wash buffer containing a stepwise imidazole gradient (50–500 mM) following a buffer exchange performed using a HiTrap Desalting column (GE Healthcare #17-1408-01). Finally, the purified protein was stored in 50 mM Na2HPO4, 300 mM NaCl, pH7, 50% glycerol. Protein purity was analyzed by gel electrophoresis using NuPAGE® Novex®Bis-Tris (Invitrogen) and stained with SimplyBlueTM SafeStain (Novex). Protein concentration was measured by the Direct Detect Spectrometer (Merck Millipore).
TetR-Family Protein (CAETHG_0459)-RNA Polymerase Core Enzyme Interaction Experiment
The PPI experiment was performed as described previously (Raffestin et al., 2005) with some modifications. The purified TetR-family protein (CAETHG_0459) with 6-His-tag (2 μg) was coupled to Ni+-NTA agarose beads (Thermo #88831) in 800 μL of buffer A (50 mM Na2HPO4, 300 mM NaCl, 50 mM imidazole, pH 7). The beads coupled with the target protein were then washed three times in buffer B (50 mM Na2HPO4, 300 mM NaCl, 0.1% Tween 20, 50 mM imidazole, pH 7). Next, the beads-protein complex was incubated with E. coli RNA polymerase Core enzyme (2.5 μg) (BioLabs #M0550S) at 37°C for 2 h. After two washes in buffer A, the beads-protein complex was suspended in 15 μL of Laemmli Buffer [32.9 mM Tris-HCl, pH6.8, 13.15% (w/v) glycerol, 1.05% SDS, 0.005% bromophenol blue, 355 mM 2-mercaptoethanol], heated at 100°C for 5 min, and analyzed by gel electrophoresis using NuPAGE® Novex®Bis-Tris (Invitrogen) and stained with SimplyBlueTM SafeStain (Novex). The negative control was performed by incubating the RNA polymerase Core enzyme with Ni + -NTA agarose beads following the same procedure.
Electrophoresis Mobility Shift Assay (EMSA)
The EMSA experiment was performed using the purified protein extract of the TetR-family protein (CAETHG_0459) and the 130 bp oligonucleotide containing the new promoter motif used for the pull-down assay using both agarose and polyacrylamide gels. The binding reaction contained 2 μL of 5x binding buffer (50 mM Tris HCl pH 8.0, 720 mM KCl, 2.6 mM EDTA, 0.5% Triton-X-100, 62.5% glycerol, 1 mM DTT), 5 μL of the extracted protein, 3 μL of a 20 μM oligonucleotide and 0.5 μL of 100x BSA. The binding reaction was incubated for 1 h at room temperature. For the reaction on the polyacrylamide gel including the E. coli RNA polymerase Core enzyme (BioLabs #M0550S), the reaction was further incubated for 30 min after adding 3 μL (3 units) of RNAP. Gel electrophoresis was performed in either 1% agarose or 7.5% polyacrylamide gels. Controls with only the extracted protein or DNA were also loaded. For agarose, the entire sample was loaded and run at 90 V for 120 min. For polyacrylamide, the entire sample was loaded and run at 150 V for 50 min. The gels were stained with SYBR Safe and visualized using BIO-RAD ChemiDoc.
Visualization of Cells Harboring the Pcauto-GFP(UV) Fusion and the TetR-Family Protein (CAETHG_0459) Plasmids by Microscopy
Cells carrying the Pcauto-GFP(UV) fusion reporter and the TetR-family protein (CAETHG_0459) plasmids were analyzed by microscopy to visualize the expression of GFP. For this, cells were plated in a LB agar plate (LB media containing 6 g/L of agar) containing 1.0 mM IPTG, kanamycin, and chloramphenicol. After overnight incubation at 37°C, colonies were visualized using the ZOETM Fluorescent Cell Imager (Bio-Rad) using the manufacturer’s instructions and following parameters: Gain: 40; Exposure (ms): 340; LED intensity: 22; Contrast: 59.
Differential RNA-Sequencing (dRNA-Seq)
In this work, we aimed to determine the TSSs of essential genes for autotrophic growth of the model-acetogen C. autoethanogenum (e.g., genes in the WLP and of hydrogenases). We thus performed dRNA-Seq analysis (Sharma et al., 2010) of autotrophic (CO, CO2, and H2; referred to as ‘syngas’) and heterotrophic (fructose) cultures of C. autoethanogenum to experimentally determine TSSs and promoter motif(s) associated with essential genes for autotrophic growth in acetogens.
Previously described batch cultures (Marcellin et al., 2016) were sampled during exponential growth and subjected to dRNA-Seq cDNA library preparation and sequencing. The cDNA libraries were prepared using the 5′tagRACE method (Fouquier et al., 2011), an improved library preparation method compared to TEX (5′-phosphate-dependent Terminator RNA exonuclease) that has the advantage of preserving the quantitative representation of 5′ ends between processed (5′-P end) and primary (5′-PPP end) transcripts (see Materials and Methods). TSSs were determined by comparing the libraries enriched for processed (TAP-) and primary (TAP+) transcripts (Figure 1A) using the TSSAR tool (Amman et al., 2014).
Overall dRNA-Seq Features of C. autoethanogenum
We classified TSSs as primary, internal, antisense, and orphan (Figure 1B and Supplementary Table S1) and found primary TSSs only for around half of the annotated genes (3,983) in C. autoethanogenum (Brown et al., 2014) (Supplementary Table S1). More than 60% of the genes contain only one primary TSS, while the rest show up to 12 TSSs (Figure 1C and Supplementary Table S1). Focusing on the 14 main metabolic groups of C. autoethanogenum genes as described in Brown et al. (2014), we detected primary TSSs for all genes except for the Nfn transhydrogenase complex (CAETHG_1580) (Supplementary Table S2). While primary TSSs were detected for seven of the 11 genes of the WLP biosynthetic gene cluster (CAETHG_1606-21), only half of the WLP TSSs were shared between syngas and fructose. For example, genes of the WLP methyl branch (CAETHG_1614-17) contained 20 primary TSSs on syngas compared to only nine on fructose. On the other hand, the TSSs associated with Hydrogenases and ATPase genes were found in similar numbers between syngas and fructose.
Determination of nucleotide base preferences for transcription initiation within five nucleotides downstream and upstream of the primary TSSs showed a clear enrichment of adenine (A) and guanine (G) at +1 (∼90%) and thymine (T) at −1 for both syngas (Figure 1D) and fructose (data not shown). Overall, adenine and cytosine were the most and least preferred nucleotide bases, respectively.
Analysis of 5′untranslated regions (5′UTRs)—the sequence between the TSS and the annotated start codon—indicates transcripts potentially associated with post-transcriptional regulation and thus of mRNA stability and translational efficiency (Cho et al., 2014). Calculation of 5′UTR lengths for primary TSSs showed a median length of 63 nt with 65% of TSSs < 100 nt for both growth conditions (Figure 1E and Supplementary Table S1). Genes with longer UTR lengths tend to be regulated more at the post-transcriptional level (David et al., 2006; Cho et al., 2009). On the other hand, leaderless mRNAs—mRNAs with no or <10 nt 5′UTR—are translated in the absence of upstream signals (typically the Shine-Dalgarno sequence) (Shine and Dalgarno, 1974; Zheng et al., 2011) used for regulating translational efficiency through ribosome binding. We found ∼70 (∼2%) leaderless mRNAs with <10 nt 5′UTRs, none of which were in the WLP, Hydrogenases, Acetate or Ethanol groups (Figure 1E and Supplementary Table S3).
In addition to the ability to determine TSSs, dRNA-Seq analysis also facilitates a more accurate annotation of the genome. Based on the TSSs and the Shine-Dalgarno (AGGAGG) position that was found to be highly conserved within 9–14 nt upstream of the first start codon (ATG/CTG/GTG/TTG) (Figure 1F), we re-annotated the start codon for 38 genes and confirmed the changes in one gene by peptide identification using mass spectrometry (Supplementary Table S4). Moreover, either the start or stop codon of an additional 99 genes, which had previously been annotated in different frames, were manually corrected. The corrections have been deposited into NCBI under the accession number BK010482 and the complete manually corrected genbank file of C. autoethanogenum is available in Supplementary Table S5.
Discovery of a New Promoter Motif
The RNA polymerase (RNAP) needs to form a holoenzyme with a σ factor in bacteria to recognize a specific promoter motif (sequence) and initiate transcription (Gruber and Gross, 2003; Feklistov et al., 2014). Experimentally determined TSS data from dRNA-Seq analysis is ideal for in silico determination of promoter motifs, which is important for understanding transcriptional regulation, especially in less-studied bacteria such as acetogens.
We searched for consensus sequence motifs 50 nt upstream of primary TSSs using the MEME software (Bailey et al., 2009) and were able to determine seven promoter motifs in C. autoethanogenum (E-value ≤ 0.05) (Supplementary Tables S6, S7 for syngas and fructose growth, respectively). Of those identified, only three motifs were assigned with more than 100 TSSs and shared between the two datasets, likely representing the most conserved motifs in C. autoethanogenum (Figure 2A).
Figure 2. In silico determination of genome-wide promoter motifs in C. autoethanogenum. (A) The top-3 promoter motifs for primary TSSs are shared among syngas and fructose. The height of the letter indicates its relative frequency at the given position within the motif. Refer to Supplementary Tables S5–S8 for all the determined motifs and their assigned TSSs. The mutated nucleotides used in the in vivo assay for Pcauto motif are also shown. We show the nucleotide position relative to the TSS in all top3 motifs. (B) The new promoter motif (Pcauto) is assigned with TSSs of essential genes in acetogens. Motifs with the lowest p-value for syngas are shown. Refer to Supplementary Tables S2, S5–S8 for all TSSs and genes associated with Pcauto. (C) The Pcauto motif is represented in other industrially relevant acetogens. Occurrence in each acetogen relative to C. autoethanogenum is normalized with the number of annotated genes. To determine promoter motifs in C. autoethanogenum, we searched for consensus sequence motifs 50 nt upstream of primary TSSs using the MEME software (Bailey et al., 2009) with the following parameters: -dna, -max size 10000000, -mod zoops, -nmotifs 50, -minw 4, -maxw 50, -revcomp, -oc.
The top motif was found 10 nt upstream of primary TSSs (447 and 543 TSSs for syngas and fructose, respectively; E-value < 10–111) and resembles the Pribnow box (TATGnTATAAT), which is associated with the housekeeping σ factors of Escherichia coli (σ70; Walker and Osuna, 2002), Helicobacter pylori (σ80; Sharma et al., 2010) and Clostridium acetobutylicum (σA; Sauer et al., 1994, 1995). Expectedly, the well-known −35 TTGACA and −10 TATAAT motifs (TATA box in eukaryotes and archaea) for housekeeping σ factors (Burgess and Anthony, 2001) was also among the top-3 promoter consensus sequences (392 and 262 TSSs for syngas and fructose, respectively; E-value < 10–46). These two motifs were assigned for most of the genes of glycolysis/gluconeogenesis and the TCA cycle (Supplementary Table S2).
The third most abundant promoter motif has, to the best of our knowledge, not previously been reported in the literature (Figure 2A). The new promoter motif (termed here Pcauto), is highly conserved both during growth on syngas (Motif 02 in Supplementary Table S5; 392 TSSs; E-value < 10–174) and fructose (Motif 03 in Supplementary Table S6; 224 TSSs; E-value < 10–77). Importantly, Pcauto seems to be involved in the transcriptional regulation of essential genes for acetogens and was assigned to genes of the WLP cluster (CAETHG_1606-21) and the metabolic groups, as described in Brown et al. (2014), of Hydrogenases, Acetate, ATPase, and Pyruvate (Figure 2B and Supplementary Tables S2, S6, S7). We confirmed the unique presence of the “new promoter motif” upstream of the TSSs. Investigation of its upstream regions up to 100 or 150 nt showed no other motif apart from the one conserved within 50 nt upstream of TSSs. This new promoter is well characterized by an evenly interspaced (A/T)G repetition with an almost central A/T position (Supplementary Figure S1). These observations potentially indicate the presence of a new σ factor or transcriptional regulator of critical importance in acetogens.
RNA Polymerase and Proteins Annotated as Transcriptional Regulators Specifically Bind Pcauto
We performed DNA-protein binding assays to determine if the RNAP and/or other protein(s) bind to Pcauto. The promoter sequences of two WLP genes (CAETHG_1615 and 1617, Methylene-tetrahydrofolate reductase domain-containing protein and Methenyl-tetrahydrofolate cyclohydrolase, respectively) annotated with Pcauto were used for the DNA-protein binding assay using the promoter pull down/DNA affinity chromatography method (Figure 3A; Jutras et al., 2012). The promoter sequence of a glycolytic gene (CAETHG_3424, glyceraldehyde-3-phosphate dehydrogenase, type I) was included as a control for the assay since it was assigned the well-known TATAAT motif, which should yield binding of the RNAP and the housekeeping σ factor, σA. DNA-bound proteins captured using streptavidin-coupled magnetic DynabeadsTM were identified using mass spectrometry of the digestion products of the whole captured material and of gel band excisions. Since this DNA-protein binding assay requires significant amounts of cellular protein material, especially for efforts to identify low abundance proteins such as σ factors or transcriptional regulators, autotrophic bioreactor chemostat cultures (CO or CO + H2) of C. autoethanogenum described in a separate work (Valgepea et al., 2018) were sampled for this analysis.
The promoter pull down/DNA affinity chromatography method (Figure 3A; Jutras et al., 2012) was fine-tuned for C. autoethanogenum. Eluting the proteins with 500 mM NaCl yielded the most prominent bands while no bands were observed in the negative control when water was used instead of DNA (data not shown), which confirms that the identified proteins were pulled down by the DNA sequences (see section Materials and Methods). The alpha and beta subunits of the RNAP (CAETHG_1920 and 1954-55) were successfully identified for both Pcauto (CAETHG_1615 and CAETHG_1617) and the TATAAT motif control (Figure 3B). Additionally, the RNAP omega subunit was identified in the whole purified DNA-bound material for both motifs (Supplementary Table S8). The housekeeping σA (CAETHG_2917) was detected for the TATAAT motif control as expected (Figure 3B). A stronger band was identified in the Pcauto gels around 50 kDa and identified as a protein annotated as L-seryl-tRNA(Sec) selenium transferase (CAETHG_2839; 51.5 kDa) (Figure 3B). Finally, mass spectrometry analysis of the whole purified DNA-bound material identified three proteins annotated as transcriptional regulators (based on NC_022592.1) that were unique for the Pcauto (Table 1) and found for both CO and CO + H2 cultures across technical replicates of the DNA-protein binding assay (Supplementary Table S8). These proteins were likely not visible on the DNA-protein binding assay gels as both their respective mRNA and protein abundances in C. autoethanogenum are very low (Valgepea et al., 2017a, 2018).
Figure 3. DNA-protein binding assay shows specific binding of C. autoethanogenum RNAP subunits and a selenium transferase to the Pcauto. (A) Overview of the DNA-protein binding assay (i.e., the promoter pull down/DNA affinity chromatography method, Jutras et al., 2012). (B) Separation of proteins specifically bound to the TATAAT motif (for gene CAETHG_3424) or the Pcauto (for gene CAETHG_1617) with gel electrophoresis and identification using mass spectrometry. The alpha and beta subunits of the RNAP (CAETHG_1920 and 1954-55) were successfully identified for both the Pcauto (CAETHG_1615 and CAETHG_1617) and the TATAAT motif control. Technical replicate denotes replicate of the DNA-protein binding assay (A) (data not shown for CAETHG_1615).
TetR-Family Transcriptional Regulator (CAETHG_0459) Activates Transcription From Pcauto in vivo
To determine whether any of the three identified protein candidates annotated as transcriptional regulators that uniquely bind to Pcauto (Table 1) could activate transcription from this promoter, we created a transcriptional fusion reporter vector harboring the sequence of Pcauto in-frame with a green fluorescence protein (GFPuV). We also tested transcriptional activation using the L-seryl-tRNA(Sec) selenium transferase (CAETHG_2839) (identified as a stronger band in the pull-down assay, Figure 3B), and using the housekeeping σ factor in clostridia (σA) (CAETHG_2917), since it has been reported that promoter binding sites of different σ factors can overlap (Cho et al., 2014). Transcriptional activation of Pcauto with concomitant GFP production was investigated in E. coli by inducing the expression of the candidate activator proteins from a second T7 protein over-expression vector cloned into plasmid pET28e+ by the addition of IPTG (see section Materials and Methods). Fluorescence was measured at early exponential growth (OD ∼0.26) as FI/OD.
After subtracting the signal from cells harboring the two plasmids but lacking the fusion reporter (promoter + GFP, see section Materials and Methods), only induction of the TetR-family transcriptional regulator protein (CAETHG_0459) (out of the three transcriptional regulator candidates) led to statistically higher levels of GFP expression (p < 0.01) compared to the control vector with no candidate (Figure 4A and Supplementary Table S9). Interestingly, induction of σA also led to transcription activation (p < 0.01). We then confirmed expression of GFP in the strain expressing CAETHG_0459 grown on a plate with IPTG using fluorescence microscopy (Figure 4B). This shows that both CAETHG_0459 and σA independently activate transcription from Pcauto. Importantly, the motif is associated with the expression of essential genes in gas-fermenting acetogens including genes in the WLP and hydrogenases (Supplementary Tables S2, S6, S7) that show higher transcript expression during growth on gas compared to sugar (Supplementary Table S10).
Figure 4. TetR-family transcriptional regulator (CAETHG_0459) and σA (CAETHG_2917) activate expression from the new promoter motif. (A) In vivo experiment using E. coli cells carrying the pACYC plasmid with the new promoter-GFPuV fusion report in trans with a pET plasmid carrying each of the candidates. The experiment was conducted with either 0.0 or 1.0 mM IPTG. Only in the presence of TetR-family protein (CAETHG_0459) and σA (CAETHG_2917) the fluorescence intensity normalized per OD (FI/OD) is statistically significantly different (p-value < 0.01) compared to the control system (with no candidate protein). (1) Cells harboring the PET_ (Negative control with no candidate gene); (2) Selenium transferase (CAETHG_2839); (3) TetR-family protein (CAETHG_0459); (4) TetR-family protein (CAETHG_0936); (5) GntR (CAETHG_3915); (6) σA (CAETHG_2917); (7) Short version (130 bp) of pAC_Pcauto30C_gfp and TetR-family protein (CAETHG_0459); (8) Mutated version of the promoter region (pAC_Pcauto30C_gfp) by introducing nucleotide changes as follow: ctggagcaggttttgtagttgcagtaactggttcaata, changed to ccatcaaaggtcttaaagttgcagtaactggttcaata and TetR-family protein (CAETHG_0459); (9) Short version (130bp) of pAC_Pcauto30C_gfp and σA (B). Cells carrying the TetR-family protein (CAETHG_0459) grown in LB-agar plate with 1 mM IPTG were visualized under microscopy for fluorescence (GFP) visualization. (C) Protein–protein interaction assay. TetR-family protein (CAETHG_0459) was incubated with E. coli RNA polymerase Core enzyme. Lane 1: Marker (Thermo #26614); Lane 2: E. coli RNA polymerase Core Enzyme; Lane 3: E. coli RNA polymerase Core incubated with Ni+ agarose beads and washed; Lane 4: Purified TetR-family protein (CAETHG_0459); Lane 5: Ni+ agarose beads coupled with TetR-family protein (CAETHG_0459); Lane 6: Ni + agarose beads coupled with TetR-family protein (CAETHG_0459) incubated with RNA polymerase Core and washed; Lane 7: Marker
The 130 bp variant (which includes the sequence used for the pull-down assay plus the ribosomal binding site) also showed statistically significance (p-value < 0.01) of fluorescence increase when TetR-family transcriptional regulator protein (CAETHG_0459) was present. Similarly, σA could also activate transcription, however, only at the level of p-value < 0.05. Interestingly when mutations were included in the promoter motif, TetR (CAETHG_0459) could no longer activate expression of GFP, as expected (Figure 4A). Additionally, electrophoretic mobility shift assay (EMSA) experiments confirmed binding of the TetR-family protein (CAETHG_0459) together with the RNAP to the 130 bp promoter sequence used for the pull-down assay (Supplementary Figure S2). We tried to test the effect of TetR (CAETHG_0459) expression knock-down on the phenotype of C. autoethanogenum but were unsuccessful in obtaining reproducible results.
CAETHG_0459 Directly Binds to the RNA Polymerase Core Enzyme
As TetR-family proteins often act as transcriptional regulators (Cuthbertson and Nodwell, 2013), we next investigated whether TetR-family protein CAETHG_0459 activates transcription from Pcauto by interacting directly with the RNAP. Transcriptional regulators can reversibly interact with the RNAP Core enzyme independently of a DNA sequence to help activate transcription from a range of promoters (Burgess and Anthony, 2001; Feklistov et al., 2014). We thus performed an in vitro PPI assay to test whether protein CAETHG_0459 directly interacts with RNA polymerase Core in the absence of DNA. The purified His-tagged CAETHG_0459 protein linked to Ni2+-beads was incubated with the RNAP Core enzyme (see section Materials and Methods). SDS-PAGE analysis clearly demonstrated an interaction between the core RNA polymerase and CAETHG_0459 (Figure 4C lane 6) and shows that CAETHG_0459 acts as a positive transcriptional regulator that activates transcription from Pcauto by directly binding to the RNAP.
Pcauto Is Represented in Other Acetogens
We next investigated if Pcauto was represented in other industrially relevant acetogens with available genomes: Clostridium ljungdahlii, C. ragsdalei, C. coskatii, Moorella thermoacetica, and Eubacterium limosum (Bengelsdorf et al., 2016; Shin et al., 2016; Redl et al., 2017; Song et al., 2017). We performed the reverse of the methodology previously used to search for consensus sequence motifs by looking for the occurrence of Pcauto 300 nt upstream of annotated genes (since no TSS data was available) using the FIMO tool (Grant et al., 2011) within MEME. As expected based on their phylogenetic proximity (Bengelsdorf et al., 2013; Brown et al., 2014; Shin et al., 2016), C. ljungdahlii, C. ragsdalei, and C. coskatii showed similar occurrences of Pcauto (Figure 2C). Interestingly, while the representation in M. thermoacetica was very low, Pcauto seems to be present also in E. limosum. This result highlights the need for experimental determination of TSSs in more acetogens.
Acetogens offer an enormous potential for the production of fuels and chemicals from gaseous waste feedstocks (Dürre and Eikmanns, 2015; Claassens et al., 2016; Liew et al., 2016; Molitor et al., 2016), with ethanol already being produced at industrial scale by LanzaTech. Acetogens have two major carbon fixation pathways: the WLP for autotrophic growth and glycolysis for heterotrophic growth. Although both the WLP and glycolysis/gluconeogenesis pathways operate during autotrophic and heterotrophic growth, the WLP carries a substantially higher metabolic flux during autotrophy (Valgepea et al., 2017a, 2018) and vice versa (Valgepea et al., 2017b). Transcriptomic studies have shown that transcriptional regulation between autotrophic and heterotrophic growth in acetogens is complex and includes many non-obvious expression changes (Nagarajan et al., 2013; Tan et al., 2013; Marcellin et al., 2016; Aklujkar et al., 2017). We thus aimed to determine TSSs and transcriptional features of promoter motifs and transcriptional regulators associated with essential genes (including genes of the WLP) in the model-acetogen C. autoethanogenum.
Our study revealed a new promoter motif and the identification of two proteins activating gene expression from the new motif [the TetR-family protein (CAETHG_0459) and the housekeeping σA (CAETHG_2917)]. An alternative TetR transcriptional regulator has been previously found to be a σ factor in Clostridium tetani, and its homologs, TcdR in C. difficile, BotR in C. botulinum, and UviA in C. perfringens have also been found to regulate toxin production (Raffestin et al., 2005; Dupuy and Matamouros, 2006; Dupuy et al., 2006). In combination, these results suggest that TetR proteins can play an important role in transcriptional regulation in clostridia. These studies support our PPI assay potentially suggesting that the TetR-family protein might function as a σ factor in C. autoethanogenum, but further studies (in vitro transcription assay) are needed to confirm this. In fact, unequivocal demonstration of σ factor activity requires that a protein is necessary and sufficient for activation of promoter recognition and transcription initiation by RNAP, independent of any other σ factor subunit. Thus our results do not exclude the possibility that a native σ factor of the in vivo expression host (E. coli), e.g., σ70, could have induced the TetR-family protein to drive transcription from Pcauto. Additional studies should also be performed to study whether both the σA and the TetR-family protein show an overlap in the promoter motif for transcriptional activation (Cho et al., 2014).
Notably, there are several TetR-family proteins, commonly regarded as transcriptional regulators (Cuthbertson and Nodwell, 2013), annotated in the C. autoethanogenum genome. In pathogenic clostridia these TetR-family proteins are often described as alternative σ factors, belonging to a class of σ factors called extracytoplasmic function (ECF) σ factors (Feklistov et al., 2014; Sineva et al., 2017). Their discovery led to a novel class of σ factors (group 5), which show a −35 and/or −10 conserved region in their target promoters (Dupuy et al., 2005, 2006; Dupuy and Matamouros, 2006; Staroń et al., 2009). It will be interesting to see whether transcription from Pcauto described here with an interspaced repetition of (A/T)G notably distinct from the canonical −35/−10 conserved regions is also activated by a novel σ factor.
Our work also shows that the housekeeping σ factor (σA) in clostridia can activate transcription from Pcauto associated with essential genes for autotropic growth in acetogens. Interestingly, in another acetogen E. limosum, the promoter regions of genes of the WLP, hydrogenases, and ATPase contain the well-known –35 TTGACA and –10 TATAAT motifs for the housekeeping σ factor (σA) (Burgess and Anthony, 2001; Song et al., 2017). This potentially indicates that the housekeeping σA in acetogens can initiate transcription from different promoter motifs and illustrates well the great extent of genetic diversity among the non-taxonomic group of acetogens. While the WLP itself is highly conserved, it is not surprising that transcriptional regulation is diverse (Drake et al., 2006; Shin et al., 2016). The work presented here also highlights the importance of Pcauto in other industrially relevant acetogens (Figure 2C). We believe, however, that more studies are needed for the experimental determination of TSSs and transcriptional features to facilitate a broader understanding of transcriptional regulation in acetogens.
Our findings have the potential to significantly advance the understanding of transcriptional regulation and metabolic engineering of the ancient metabolism of acetogens. Firstly, acetogen metabolism, which operates at the thermodynamic edge of feasibility (Schuchmann and Müller, 2014), seems to be wired for utilizing less energy-consuming mechanisms (i.e., transcriptional vs. translational regulation) for operating under different conditions evidenced by the complexity of the condition-specific transcriptional architecture (Valgepea et al., 2018). More importantly, the discovery of Pcauto and a key positive transcription factor (TetR-family protein) in acetogens can lead to the mechanistic description of transcriptional regulation of arguably the first biochemical pathway on Earth (Russell and Martin, 2004; Fuchs, 2011; Weiss et al., 2016).
In addition to expanding the fundamental understanding of a model acetogen, knowledge of the features controlling the expression of essential genes in acetogens could also contribute for the improvement of commercial gas fermentation for the sustainable production of fuels and chemicals. Increasing or modulation of the activity of the described TetR transcription factor (either through over-expression and/or protein engineering or by deleting transcriptional repressor genes) could enhance the uptake of C1 substrates through the WLP and thus improve growth and/or product formation (possibly by introducing Pcauto in front of key genes). It could also be used as an orthologous system in other organisms, as, for instance, the TcdR system has been used in other Clostridium species (Zhang et al., 2015; Minton et al., 2016). Importantly, the newly discovered promoter Pcauto could be harnessed to couple expression of heterologous pathways to mimic those of key central metabolism enzymes, potentially alleviating the common problem of imbalanced flux throughput between heterologous and native metabolic pathways.
Data Availability Statement
dRNA-Seq data have been deposited in the NCBI Gene Expression Omnibus depository under accession number GSE108700. Re-annotation of C. autoethanogenum genome was deposited in the NCBI GenBank Third Party Annotation database under accession number BK010482. Proteomics data have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository with the dataset identifier PXD0014421.
RS, KV, MK, RT, LN, and EM designed the study and experiments. RS, RG, RP, KV, and CB performed the experiments. RS, KV, RT, RP, MK, SS, LN, and EM analyzed and interpreted the data. RS, KV, RT, and EM wrote the manuscript. All authors reviewed the manuscript. RT, MK, and SS were involved in the experimental design, data analysis and interpretation, and writing of the manuscript.
This work was funded by the Australian Research Council (ARC LP140100213) in collaboration with LanzaTech. The ARC had no role in study design, data collection and interpretation, or the decision to submit the work for publication. There was no funding support from the European Union for the experimental part of the study. However, KV acknowledges support also from the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement N810755.
Conflict of Interest
MK, RT, and SS are employed by LanzaTech.
The authors declare that this study received funding from the Australian Research Council (ARC LP140100213) in collaboration with LanzaTech. The ARC was not involved in the study design, collection, analysis, interpretation of data, and the writing of this article or the decision to submit it for publication. LanzaTech was involved in the study design, collection, analysis, interpretation of data, and the writing of this article and the decision to submit it for publication. LanzaTech has interest in commercializing technology using C. autoethanogenum.
We acknowledge the support from the Queensland node of Metabolomics Australia (MA) at The University of Queensland, an NCRIS initiative under Bioplatforms Australia Pty Ltd. We thank Dr. Christopher Howard for his helpful advice in the pull down/affinity chromatography assay, EMSA, and protein purification. We also thank the following investors in LanzaTech’s technology: BASF, CICC Growth Capital Fund I, CITIC Capital, Indian Oil Company, K1W1, Khosla Ventures, the Malaysian Life Sciences, Capital Fund, L. P., Mitsui, the New Zealand Superannuation Fund, Petronas Technology Ventures, Primetals, Qiming Venture Partners, Softbank China, and Suncor.
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmicb.2019.02549/full#supplementary-material
Aklujkar, M., Leang, C., Shrestha, P. M., Shrestha, M., and Lovley, D. R. (2017). Transcriptomic profiles of Clostridium ljungdahlii during lithotrophic growth with syngas or H2 and CO2 compared to organotrophic growth with fructose. Sci. Rep. 7:13135. doi: 10.1038/s41598-017-12712-w
Amman, F., Wolfinger, M. T., Lorenz, R., Hofacker, I. L., Stadler, P. F., and Findeiß, S. (2014). TSSAR: TSS annotation regime for dRNA-seq data. BMC Bioinformatics 15:89. doi: 10.1186/1471-2105-15-89
Bailey, T. L., Boden, M., Buske, F. A., Frith, M., Grant, C. E., Clementi, L., et al. (2009). MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 37, 202–208. doi: 10.1093/nar/gkp335
Bengelsdorf, F. R., Poehlein, A., Linder, S., Erz, C., Hummel, T., Hoffmeister, S., et al. (2016). Industrial acetogenic biocatalysts: a comparative metabolic and genomic analysis. Front. Microbiol. 7:1036. doi: 10.3389/fmicb.2016.01036
Brown, S. D., Nagaraju, S., Utturkar, S., De Tissera, S., Segovia, S., Mitchell, W., et al. (2014). Comparison of single-molecule sequencing and hybrid approaches for finishing the genome of Clostridium autoethanogenum and analysis of CRISPR systems in industrial relevant clostridia. Biotechnol. Biofuels 7:40. doi: 10.1186/1754-6834-7-40
Cho, B.-K., Kim, D., Knight, E. M., Zengler, K., and Palsson, B. O. (2014). Genome-scale reconstruction of the sigma factor network in Escherichia coli: topology and functional states. BMC Biol. 12:4. doi: 10.1186/1741-7007-12-4
Cho, B.-K., Zengler, K., Qiu, Y., Park, Y. S., Knight, E. M., Barrett, C. L., et al. (2009). The transcription unit architecture of the Escherichia coli K-12 MG1655 genome. Nat. Biotechnol. 27, 1043–1049. doi: 10.1038/nbt.1582
Claassens, N. J., Sousa, D. Z., dos Santos, V. A. P. M., de Vos, W. M., and van der Oost, J. (2016). Harnessing the power of microbial autotrophy. Nat. Rev. Microbiol. 14, 692–706. doi: 10.1038/nrmicro.2016.130
David, L., Huber, W., Granovskaia, M., Toedling, J., Palm, C. J., Bofkin, L., et al. (2006). A high-resolution map of transcription in the yeast genome. Proc. Natl. Acad. Sci. U.S.A. 103, 5320–5325. doi: 10.1073/pnas.0601091103
Drake, H. L., Küsel, K., and Matthies, C. (2006). “Acetogenic prokaryotes,” in Prokaryotes (Ecophysiology and Biochemistry), eds M. Dworkin, E. Rosenberg, K. H. Schleifer, and E. Stackebrandt (New York, NY: Springer), 354–420. doi: 10.1007/0-387-30742-7_13
Dupuy, B., Mani, N., Katayama, S., and Sonenshein, A. L. (2005). Transcription activation of a UV-inducible Clostridium perfringens bacteriocin gene by a novel sigma factor. Mol. Microbiol. 55, 1196–1206. doi: 10.1111/j.1365-2958.2004.04456.x
Dupuy, B., and Matamouros, S. (2006). Regulation of toxin and bacteriocin synthesis in Clostridium species by a new subgroup of RNA polymerase σ-factors. Res. Microbiol. 157, 201–205. doi: 10.1016/j.resmic.2005.11.004
Dupuy, B., Raffestin, S., Matamouros, S., Mani, N., Popoff, M. R., and Sonenshein, A. L. (2006). Regulation of toxin and bacteriocin gene expression in Clostridium by interchangeable RNA polymerase sigma factors. Mol. Microbiol. 60, 1044–1057. doi: 10.1111/j.1365-2958.2006.05159.x
Feklistov, A., Sharon, B. D., Darst, S. A., and Gross, C. A. (2014). Bacterial sigma factors: a historical, structural, and genomic perspective. Annu. Rev. Microbiol. 68, 357–376. doi: 10.1146/annurev-micro-092412-155737
Fouquier, D., Hérouel, A., Wessner, F., Halpern, D., Ly-Vu, J., Kennedy, S. P., et al. (2011). A simple and efficient method to search for selected primary transcripts: non-coding and antisense RNAs in the human pathogen Enterococcus faecalis. Nucleic Acids Res. 39:e46. doi: 10.1093/nar/gkr012
Gibson, D. G., Young, L., Chuang, R. Y., Venter, J. C., Hutchison, C. A., and Smith, H. O. (2009). Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6, 343–345. doi: 10.1038/nmeth.1318
Jutras, B. L., Verma, A., and Stevenson, B. (2012). Identification of novel DNA-binding proteins using DNA-affinity chromatography/pull down. Curr. Protoc. Microbiol. 24, 1F.1.1–1F.1.13. doi: 10.1002/9780471729259.mc01f01s24
Kappler, U., and Nouwens, A. S. (2013). The molybdoproteome of Starkeya novella – insights into the diversity and functions of molybdenum containing proteins in response to changing growth conditions. Metallomics 5:325. doi: 10.1039/c2mt20230a
Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and Salzberg, S. L. (2013). TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14:R36. doi: 10.1186/gb-2013-14-4-r36
Li, F., Hinderberger, J., Seedorf, H., Zhang, J., Buckel, W., and Thauer, R. K. (2008). Coupled ferredoxin and crotonyl coenzyme A (CoA) reduction with NADH catalyzed by the butyryl-CoA dehydrogenase/Etf complex from Clostridium kluyveri. J. Bacteriol. 190, 843–850. doi: 10.1128/JB.01417-07
Liew, F., Martin, E., Tappel, R., Heijstra, B., Mihalcea, C., and Köpke, M. (2016). Gas fermentation – a flexible platform for commercial scale production of low carbon fuels and chemicals from waste and renewable feedstocks. Front. Microbiol. 7:694. doi: 10.3389/fmicb.2016.00694
Marcellin, E., Behrendorff, J. B., Nagaraju, S., DeTissera, S., Segovia, S., Palfreyman, R., et al. (2016). Low carbon fuels and commodity chemicals from waste gases – systematic approach to understand energy metabolism in a model acetogen. Green Chem. 18, 3020–3028. doi: 10.1039/C5GC02708J
Minton, N. P., Ehsaan, M., Humphreys, C. M., Little, G. T., Baker, J., Henstra, A. M., et al. (2016). A roadmap for gene system development in Clostridium. Anaerobe 41, 104–112. doi: 10.1016/j.anaerobe.2016.05.011
Molitor, B., Richter, H., Martin, M. E., Jensen, R. O., Juminaga, A., Mihalcea, C., et al. (2016). Carbon recovery by fermentation of CO-rich off gases - turning steel mills into biorefineries. Bioresour. Technol. 215, 386–396. doi: 10.1016/j.biortech.2016.03.094
Nagarajan, H., Sahin, M., Nogales, J., Latif, H., Lovley, D. R., Ebrahim, A., et al. (2013). Characterizing acetogenic metabolism using a genome-scale metabolic reconstruction of Clostridium ljungdahlii. Microb. Cell Fact. 12:118. doi: 10.1186/1475-2859-12-118
Raffestin, S., Dupuy, B., Marvaud, J. C., and Popoff, M. R. (2005). BotR/A and TetR are alternative RNA polymerase sigma factors controlling the expression of the neurotoxin and associated protein genes in Clostridium botulinum type A and Clostridium tetani. Mol. Microbiol. 55, 235–249. doi: 10.1111/j.1365-2958.2004.04377.x
Redl, S., Sukumara, S., Ploeger, T., Wu, L., Jensen, T. O., Nielsen, A. T., et al. (2017). Thermodynamics and economic feasibility of acetone production from syngas using the thermophilic production host Moorella thermoacetica. Biotechnol. Biofuels 10:150. doi: 10.1186/s13068-017-0827-8
Sauer, U., Treuner, A., Buchholz, M., Santangelo, J. D., and Durre, P. (1994). Sporulation and primary sigma factor homologous genes in Clostridium acetobutylicum. J. Bacteriol. 176, 6572–6582. doi: 10.1128/jb.176.21.6572-6582.1994
Schuchmann, K., and Müller, V. (2014). Autotrophy at the thermodynamic limit of life: a model for energy conservation in acetogenic bacteria. Nat. Rev. Microbiol. 12, 809–821. doi: 10.1038/nrmicro3365
Sharma, C. M., Hoffmann, S., Darfeuille, F., Reignier, J., Findeiß, S., Sittka, A., et al. (2010). The primary transcriptome of the major human pathogen Helicobacter pylori. Nature 464, 250–255. doi: 10.1038/nature08756
Shine, J., and Dalgarno, L. (1974). The 3′-terminal sequence of Escherichia coli 16S ribosomal RNA: complementarity to nonsense triplets and ribosome binding sites. Proc. Natl. Acad. Sci. U.S.A. 71, 1342–1346. doi: 10.1073/pnas.71.4.1342
Sineva, E., Savkina, M., and Ades, S. E. (2017). Themes and variations in gene regulation by extracytoplasmic function (ECF) sigma factors. Curr. Opin. Microbiol. 36, 128–137. doi: 10.1016/j.mib.2017.05.004
Song, Y., Shin, J., Jeong, Y., Jin, S., Lee, J.-K., Kim, D. R., et al. (2017). Determination of the genome and primary transcriptome of syngas fermenting Eubacterium limosum ATCC 8486. Sci. Rep. 7:13694. doi: 10.1038/s41598-017-1412-3
Staroń, A., Sofia, H. J., Dietrich, S., Ulrich, L. E., Liesegang, H., and Mascher, T. (2009). The third pillar of bacterial signal transduction: classification of the extracytoplasmic function (ECF) σ factor protein family. Mol. Microbiol. 74, 557–581. doi: 10.1111/j.1365-2958.2009.06870.x
Tan, Y., Liu, J., Chen, X., Zheng, H., and Li, F. (2013). RNA-seq-based comparative transcriptome analysis of the syngas-utilizing bacterium Clostridium ljungdahlii DSM 13528 grown autotrophically and heterotrophically. Mol. Biosyst. 9, 2775–2784. doi: 10.1039/c3mb70232d
Valgepea, K., de Souza Pinto Lemgruber, R., Abdalla, T., Binos, S., Takemori, N., Takemori, A., et al. (2018). H2 drives metabolic rearrangements in gas-fermenting Clostridium autoethanogenum. Biotechnol. Biofuels 11:55. doi: 10.1186/s13068-018-1052-9
Valgepea, K., de Souza Pinto Lemgruber, R., Meaghan, K., Palfreyman, R. W., Abdalla, T., Heijstra, B. D., et al. (2017a). Maintenance of ATP homeostasis triggers metabolic shifts in gas-fermenting acetogens. Cell Syst. 4, 505–515. doi: 10.1016/j.cels.2017.04.008
Valgepea, K., Loi, K. Q., Behrendorff, J. B., de Souza Pinto Lemgruber, R., Plan, M., Hodson, M. P., et al. (2017b). Arginine deiminase pathway provides ATP and boosts growth of the gas-fermenting acetogen Clostridium autoethanogenum. Metab. Eng. 41, 202–211. doi: 10.1016/j.ymben.2017.04.007
Weiss, M. C., Sousa, F. L., Mrnjavac, N., Neukirchen, S., Roettger, M., Nelson-Sathi, S., et al. (2016). The physiology and habitat of the last universal common ancestor. Nat. Microbiol. 1:16116. doi: 10.1038/nmicrobiol.2016.116
Yang, D. C., Deuis, J. R., Dashevsky, D., Dobson, J., Jackson, T. N. W., Brust, A., et al. (2016). The snake with the scorpion’s sting: novel three-finger toxin sodium channel activators from the venom of the long-glanded blue coral snake (calliophis bivirgatus). Toxins 8:E303. doi: 10.3390/toxins8100303
Zhang, Y., Grosse-Honebrink, A., and Minton, N. P. (2015). A universal mariner transposon system for forward genetic studies in the genus Clostridium. PLoS One 10:e0122411. doi: 10.1371/journal.pone.0122411
Keywords: Clostridium autoethanogenum, Wood–Ljungdahl pathway, transcriptional regulation, gas fermentation, autotrophy
Citation: de Souza Pinto Lemgruber R, Valgepea K, Gonzalez Garcia RA, de Bakker C, Palfreyman RW, Tappel R, Köpke M, Simpson SD, Nielsen LK and Marcellin E (2019) A TetR-Family Protein (CAETHG_0459) Activates Transcription From a New Promoter Motif Associated With Essential Genes for Autotrophic Growth in Acetogens. Front. Microbiol. 10:2549. doi: 10.3389/fmicb.2019.02549
Received: 12 October 2018; Accepted: 22 October 2019;
Published: 15 November 2019.
Edited by:Kathleen Scott, University of South Florida, Tampa, United States
Reviewed by:Mitsuo Ogura, Tokai University, Japan
Amy Michele Grunden, North Carolina State University, United States
Weihong Jiang, Shanghai Institutes for Biological Sciences (CAS), China
Copyright © 2019 de Souza Pinto Lemgruber, Valgepea, Gonzalez Garcia, de Bakker, Palfreyman, Tappel, Köpke, Simpson, Nielsen and Marcellin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Esteban Marcellin, email@example.com
†Present address: Renato de Souza Pinto Lemgruber and Christopher de Bakker, Servatus Ltd., Sippy Downs, QLD, Australia