Base editing enables duplex point mutagenesis in Clostridium autoethanogenum at the price of numerous off-target mutations

Base editors are recent multiplex gene editing tools derived from the Cas9 nuclease of Streptomyces pyogenes. They can target and modify a single nucleotide in the genome without inducing double-strand breaks (DSB) of the DNA helix. As such, they hold great potential for the engineering of microbes that lack effective DSB repair pathways such as homologous recombination (HR) or non-homologous end-joining (NHEJ). However, few applications of base editors have been reported in prokaryotes to date, and their advantages and drawbacks have not been systematically reported. Here, we used the base editors Target-AID and Target-AID-NG to introduce nonsense mutations into four different coding sequences of the industrially relevant Gram-positive bacterium Clostridium autoethanogenum. While up to two loci could be edited simultaneously using a variety of multiplexing strategies, most colonies exhibited mixed genotypes and most available protospacers led to undesired mutations within the targeted editing window. Additionally, fifteen off-target mutations were detected by sequencing the genome of the resulting strain, among them seven single-nucleotide polymorphisms (SNP) in or near loci bearing some similarity with the targeted protospacers, one 15 nt duplication, and one 12 kb deletion which removed uracil DNA glycosylase (UDG), a key DNA repair enzyme thought to be an obstacle to base editing mutagenesis. A strategy to process prokaryotic single-guide RNA arrays by exploiting tRNA maturation mechanisms is also illustrated.


Introduction
Base editors are recent gene editing tools derived from the Cas9 nuclease of Streptomyces pyogenes (Komor et al., 2016;Nishida et al., 2016). They can target and modify a single nucleotide in the genome without inducing double-strand breaks (DSB) of the DNA helix. This can be exploited to change a single amino acid for in vivo protein engineering, or, more commonly, to disrupt protein expression with nonsense mutations. Consequently, base editors are promising tools for the engineering of microbes that lack effective DSB repair pathways such as homologous recombination (HR) or non-homologous end-joining (NHEJ) Luo et al., 2020). Additionally, base editors are simple to customize to a new target gene, requiring only to swap their 20 nt protospacer. This makes them readily compatible with high-throughput automated workflows . These combined features make them ideal candidates for multiplex genome editing tools, allowing the engineering of several loci in a single step and thereby significantly reducing the duration, cost and effort involved in mutagenesis procedures . In view of their advantages, in the current study the potential of base editors as multiplex genome editing tools was tested in Clostridium autoethanogenum. This Gram-positive acetogen is the process chassis used in the large-scale, commercial manufacture of ethanol from industrial off-gas (Fackler et al., 2021;Liew et al., 2022), characterised as slow-growing and challenging to engineer (Bourgade et al., 2021).
Several base editors have been developed to date, enabling C-to-T (cytosine base editors, CBE) or A-to-G (adenine base editors, ABE) (Gaudelli et al., 2017) targeted point mutations within various editing windows around the protospacer-adjacent motifs (PAMs) of their respective CRISPR-associated (Cas) nucleases (Banno et al., 2018;Eid et al., 2018;Jiang et al., 2018;Li et al., 2018;Marx, 2018;Molla & Yang, 2019;Chatterjee et al., 2020). Target-AID, a CBE, exploits the targeting capabilities of single guide RNAs (sgRNAs) and the activity of a cytidine deaminase (CDA), to deaminate a cytosine on a single strand of the DNA helix ( Figure 1) in between the positions −20 and −16 from a PAM with the sequence NGG (Nishida et al., 2016). Then, the mismatch repair pathway (MMR) is hijacked by nicking the unedited strand with Cas9 D10A nickase (nCas9 D10A ), which is thought to trick the MMR pathway into using the edited strand as template to repair the unedited strand (Modrich, 2006;Li, 2008;Spampinato et al., 2009;Fukui, 2010;Williams & Kunkel, 2014). However, the base excision repair pathway (BER), initiated by the specialized enzyme uracil DNA glycosylase (UDG), can sometimes remove the deaminated cytosine before the MMR pathway has had a chance mutate the unedited strand. For this reason, many CBEs are fused to another enzyme called uracil glycosylase inhibitor (UGI) to protect the targeted locus from UDG and the BER pathway during the early stages of Overview of the proposed mutagenesis mechanism of CBEs. In a clockwise order starting from (I.) with the WT chromosomic DNA. (II.) The CBE-sgRNA duplex unwinds the DNA double helix around the protospacer, creating a R-loop, nicking the non-edited strand, and exposing a cytosine (C) to the deaminase activity of CDA on the edited strand-which (III.) changes it into uracil (U). (IV.) Without repair of the nick on the non-edited strand, the cell is unable to replicate its DNA and dies. (V.) If the DNA helix undergoes MMR, the nicked non-edited strand is repaired to match the edited strand, replacing the guanine (G) of the non-edited strand with an adenine (A). (VI.) Alternatively, the nick can be ligated, and the DNA replicated semiconservatively into two double helixes: one mutated and the other one WT. (VII.) In the last step, the uracil of the edited strand is finally removed and replaced by a thymine (T) through DNA replication (not shown) or BER. Finally, immediate ligation of the nicked strand and subsequent repair of the edited strand through (VIII.) BER (or (VI.) DNA replication) would result in a stable WT chromosome which would be exposed to another cycle of mutagenesis for as long as the base editor and its sgRNA cassette are being expressed.
Frontiers in Bioengineering and Biotechnology frontiersin.org mutagenesis (Zhigang et al., 1991;Komor et al., 2016;Nishida et al., 2016;Wang et al., 2017). Unfortunately, UGI has also been associated with extra toxicity and off-target mutagenesis in prokaryotes, even after the addition of a GLVA protein degradation tag (Banno et al., 2018). Consequently, we opted to avoid using a UGI fusion in the initial stages of our study. It has also been reported that altering the size of the spacer region of the sgRNA could change the base editing profile of Target-AID in Escherichia coli: instead of preferentially mutating the base in position −18 from the PAM with a standard 20 nt spacer, a 18 nt spacer would preferentially edit the base in position −17, and a 22 nt spacer would preferably edit the base in position −19 (Banno et al., 2018). We opted to test if this flexibility could also be exploited in Clostridium autoethanogenum.
To take full advantage of Target-AID's simplicity, several multiplexing strategies (Adiego-Pérez et al., 2019) that could target several protospacers at once with a single plasmid were also tested. Among more traditional strategies such as using multiple sgRNA transcriptional units (msgRNA) and reproducing the native SpCas9 CRISPR array (mCRISPR), the exploitation of a polycistronic array of sgRNAs and prokaryotic tRNAs fusions (mtRNA) was also explored with the expectation that individual sgRNAs would be released as a consequence of tRNA maturation. This is a common strategy in eukaryotic systems (Xie et al., 2015;Dong et al., 2017;Zhang et al., 2019), but, to the best of our knowledge, it has not yet been used in prokaryotes except for a single recent report in a nonmainstream journal (Lu et al., 2022).
Finally, and perhaps most importantly, the mutant strains were validated by whole-genome sequencing to look for potential offtarget mutations.

Strains and media
Vector assembly and cloning was conducted in E. coli strain DH5α, cultivated in Luria-Bertani (LB) broth. E. coli strain sExpress (Woods et al., 2019) was used as the conjugal DNA donor strain to transfer plasmids into C. autoethanogenum strain DSM10061. C. autoethanogenum was recovered from cryostocks and cultivated in pre-reduced yeast tryptone fructose (YTF) medium in an anaerobic cabinet (Don Whitley Scientific Ltd., Bingley, United Kingdom) at 37°C (Humphreys et al., 2015). Antibiotics and other additives to LB and YTF are summarized in Table 1.
The mutant C. autoethanogenum strains generated over the course of this study are summarised in Table 2.

Cloning and assembly
In silico design of constructs was achieved with A plasmid Editor (ApE) (RRID:SCR_014266) (Davis & Jorgensen, 2022). Unless otherwise specified, all kits, enzymes and buffers were purchased from New England Biolabs Ltd. (NEB, Hitchin, United Kingdom) and used following the manufacturer's instructions. DNA oligos were synthesised by Integrated DNA Technologies, Inc. (Coralville, United States) and were designed to have an annealing region with a melting temperature of 65°C using NEB Tm calculator (RRID:SCR_ 017969, tmcalculator.neb.com) and no secondary structure with a melting temperature higher than 57°C (modelled with Mfold, RRID: SCR_008543 (Zuker, 2003)). All oligos and vectors used in this study

Codon optimization
The sequence coding for UGI-GLVA and the activationinduced cytidine deaminase 1 from Petromyzon marinus (PmCDA1, or AID) with its long protein linker (Nishida et al., 2016;Banno et al., 2018) were codon optimized by Genscript (Piscataway, United States) to match the codon usage of C. autoethanogenum. The codon utilization table of C. autoethanogenum was obtained by extracting all the coding sequences (CDS) from its published genome (GenBank: CP012395.1) (Humphreys et al., 2015) and processing them with the CUSP algorithm (Rice et al., 2000).

Design of a Clostridium Target-AID plasmid
The Target-AID CDS was generated by replacing the STOP codon of the Cas9 D10A nickase (nCas9) CDS with the CDS of the codon-optimized AID and its long protein linker (Nishida et al., 2016).
Target-AID was then cloned inside the pMTL83151 backbone of our RiboCas system (Cañadas et al., 2019). Consequently, expression of Target-AID was subordinated to the tight control of a theophylline-inducible riboswitch, and expression of the custom sgRNA was driven by the strong ParaE constitutive promoter (Huang et al., 2016).

Protospacer design and modelling of total protospacers and genomic coverage
Potential Target-AID protospacer targets for each gene were initially identified with Benchling (RRID:SCR_013955) CRISPR design tool, which lists all the 20 nt sequences directly upstream of an NGG PAM in a given DNA sequence. This list was then pasted onto a custom Microsoft Office Excel ™ spreadsheet, which flagged the protospacers with the appropriate cytosines in positions −19 to −16 from the PAM that could lead to a TAA, TAG or TGA codon if turned into a thymine. Target-AID-NG protospacers were pulled from our whole-genome analysis of potential Target-AID targets for various PAMs in C. autoethanogenum using MATLAB ™ (RRID:SCR_001622). For future work on individual genes, we recommend the use of BE-designer (RRID:SCR_023389, rgenome.net/be-designer/), an online tool with a user-friendly interface.

Design of multiplex sgRNA-tRNA array
Monocystronic tRNA sequences from Clostridium pasteurianum were complemented with 20 nt of their pre-tRNA sequence at their 3′end using the GtRNAdb-Genomic tRNA Database (RRID: SCR_ 006939) (Chan & Lowe, 2009;. Individual tRNA structures were modelled with the online tool RNAfold (Gruber et al., 2008), and manually examined using a variety of parameters. The pre-tRNAs Thr-TGT-1-1 and fMet-CAT-1-1 were selected for (in decreasing order of importance) having a weak A-U rich stem-loop approximately 16 nt from the CCA-3′end of the mature tRNA (Sekiya et al., 1979), for lacking a 3′ poly-U tail which could induce the termination of transcription, for not forming a strong DNA secondary structure with the binding sequence of the oligonucleotide primers used to synthesize the tRNA array during PCR amplification (Zuker, 2003), and for being associated with a codon in relatively high usage within C. autoethanogenum to avoid upsetting excessively the balance of its tRNA pool.

Conjugation in Clostridium autoethanogenum
Two days before mating, one 0.2 mL C autoethanogenum cryostock of 10% DMSO was thawed and inoculated into 2 mL of pre-reduced liquid YTF. Early the next day, each vector was transformed into 20 μL of E. coli sExpress and the C. autoethanogenum inoculum was subcultured into 4 mL of prereduced liquid YTF at starting OD 0.05. On the day of the mating, a single sExpress transformant colony of each construct was inoculated into 5 mL of room-temperature liquid LB + Cm + Kan until OD reached 0.2. 1 mL of culture was then centrifuged at 3,000 rcf for 3 min and washed in 0.5 mL PBS before being centrifuged again. The resulting pellet was finally resuspended into 0.2 mL of C. autoethanogenum culture and gently spread onto a YTF plate without antibiotics. 20 h later, 0.6 mL of PBS was vigorously spread over the mating plate to resuspend the cells and constitute the mating slurry. The mating slurry was normalized to 0.6 mL with PBS in a microcentrifuge tube, and 0.2 mL of it was finally transferred onto YTF + D-cyc + Tm + Th plates for selection of transconjugants and induction of Target-AID constructs. For the preliminary characterisation of Target-AID, the transconjugant colonies were subsequently transferred onto a YTF + D-cyc + Tm + Th+5-FOA in order to select for the ΔpyrE genotype.

Plasmid loss
After mutagenesis and colony PCR, plasmids were lost by restreaking mutants on YTF plates without antibiotic, then patching 10 single colonies on YTF plates with and without Tm. Patches which failed to grow on Tm were considered to have lost their plasmid. They were then inoculated in liquid YTF with and without Tm to prepare cryopreservation in 10% pre-reduced DMSO once they reached OD600~0.4 without antibiotics, if there was still no growth in liquid cultures complemented with thiamphenicol.
Frontiers in Bioengineering and Biotechnology frontiersin.org

Estimation of mutagenesis efficiency
The efficiency of Target-AID mutagenesis was roughly estimated by calculating the proportion of colonies which survived exposure to 5 mM 5-FOA after induction on 5 mM theophylline. Subsequent mutagenesis efficiencies were only estimated from the Sanger sequencing results of five transconjugants colonies per construct.

Sanger sequencing
The targeted genes of fifteen C. autoethanogenum vFS36_TA_ pyrE colonies and five C. autoethanogenum colonies conjugated with each of the other plasmids were screened by Sanger sequencing after a colony PCR with Q5 ® DNA polymerase and purification with QIAquick PCR Purification Kit from Qiagen (Venlo, Netherlands). Each colony aliquot was first boiled in 40 μL sterile ddH2O for 10 min at 98°C and centrifuged at 2,500 RCF for 1 min before using 1 μL of supernatant as DNA template for the PCR. Sanger Sequencing was performed by Eurofins Genomics (Ebersberg, Germany) and sequencing results were aligned with their respective WT sequences using Benchling (RRID:SCR_013955). Sequencing primers are summarized in Supplementary Table S1 and raw reads are available in the Supplementary Material.

Determination of non-essentiality
Prior to plasmid design the three genes CLAU_532, CLAU_ 534 and CLAU_1794 were cross-checked in the published list of essential genes determined by transposon insertion sequencing (Woods et al., 2022) and confirmed to be non-essential under heterotrophic or autotrophic conditions.

Whole genome sequencing
Whole genome sequencing was performed at the Deepseq Next-Generation Sequencing Facility of the University of Nottingham. Sequencing produced 1,919,451 raw reads that were trimmed of Illumina adapters and low quality (Q < 30) nucleotides using TrimGalore (v 0.6.6) (Andrews, 2010;Martin, 2011;Kreuger, 2012) nucleotide clip was performed at the 3′end of reads. Reads shorter than 20 bp were discarded. After quality filtering, 98.1% of reads remained.

Annotation of variants
Genome annotation associated with NZ_CP012395.1 was downloaded from the NCBI database and appropriate annotations added to variants.

Identification of putative off-target mutations
All protospacers sequences targeted in cFS05 were submitted to Cas-Offinder (RRID: SCR_023390) to generate a list of putative off-target sites with up to 9 mismatches, 2 nt DNA gaps and 2 nt RNA gaps. This generated 14,376 putative offtarget sites. Sites within 50 nt of the undesired SNP detected by whole genome sequencing were considered as potential offtargets of their associated protospacer. Additionally, the sequences of all protospacers targeted in cFS05 were aligned with the 41 nt WT sequence of each undesired SNP and the 20 nt immediately upstream and downstream in order to find potential matches between target protospacers and the sequence around undesired SNP using Geneious (RRID:SCR_ 010519).

Target-AID proof-of-concept: pyrE knockout in Clostridium autoethanogenum
The Target-AID design was validated with the plasmid vFS36_ TA_pyrE (Figure 2), customised to introduce a premature STOP codon in the pyrE gene of C. autoethanogenum, which is necessary for pyrimidine synthesis and confers resistance to 5-FOA when knocked out (Ng et al., 2013;Minton et al., 2016). Only 70% of the 816 transconjugant colonies induced with theophylline survived when transferred on 5-FOA, but all the fifteen 5-FOA-resistant colonies screened exhibited the expected C388T mutation which Frontiers in Bioengineering and Biotechnology frontiersin.org results in a premature STOP codon (TAG) (Figure 2). This preliminary experiment confirmed that our basic Target-AID construct and mutagenesis protocol was functional in C. autoethanogenum.

Multiplexing designs
The msgRNA, mCRISPR and mtRNA multiplexing strategies were respectively tested by the plasmids vFS50_TA_msgRNA vFS51_TA_mCRISPR and vFS48_TA_mtRNA (Figure 3). Because only the last 20 nt at the 3′-end of the CRISPR spacer were shown to be necessary for gRNA targeting, but the native S. pyogenes CRISPR array is composed of~30 nt spacers (Jinek et al., 2012), each 20 nt protospacer sequence of vFS51_TA_mCRISPR was complemented with a 6 nt restriction site and 4 random nt at its 5′-end. Three non-selectable and non-essential genes (CLAU_532, CLAU_534 and CLAU_1794) coding for alcohol dehydrogenases were picked as arbitrary targets in C. autoethanogenum, and protospacers which could produce a STOP codon after a C-to-T  (Baig et al., 2021).

FIGURE 3
Schematic of the gRNA expression cassettes of different multiplex Target-AID plasmids. vFS50_TA_msgRNA has three separate sgRNA transcriptional units; vFS51_TA_mCRISPR has two transcriptional units expressing a CRISPR array and a tracrRNA, respectively; vFS48_TA_mtRNA, has a single transcriptional unit expressing an array of sgRNA-tRNA fusions All are derived from vFS36_TA_pyrE. vFS57_TA_msgRNA_1794B is derived from vFS50_TA_msgRNA and only differs in the protospacer sequence targeting CLAU_1794 (CLAU1794B instead of CLAU1794A). vFS56_TA_mtRNAthr_ 1794B is derived from vFS48_TA_mtRNA but uses twice the same tRNA (Thr-TGT-1-1 tRNA) and targets the CLAU1794B protospacer instead of CLAU1794A. DR = Direct repeat; tracrRNA = trans-activating CRISPR RNA. Symbols inspired from SBOL visual (Baig et al., 2021).
Frontiers in Bioengineering and Biotechnology frontiersin.org mutation were identified for each gene (labelled CLAU532A, CLAU534A, and CLAU1794A, respectively). In addition to these three multiplex constructs, three "monoplex" controls were also assembled to estimate the individual effectiveness of each individual sgRNA independently from its multiplexing system (Table 3). These are the vectors vFS52_ TA_CA532A, vFS53_TA_CA534A, and vFS54_TA_CA1794A, all directly derived from vFS36_TA_pyrE with the pyrE protospacer sequence replaced by a protospacer sequence of their respective target gene.

Most Target-AID mutants exhibit mixed genotypes
No mutants could be obtained from any of the gRNAs targeting the protospacer CLAU1794A (Table 3). However, CLAU532A and CLAU534A were successfully mutated by their respective monoplex control and vFS50_TA_msgRNA. Unfortunately, all but two of the twenty-three mutated loci showed some level of mixed trace, revealing the presence of WT cells in the colony. In most cases, the WT trace was dominant over the mutated one. Only one colony transformed with a multiplex construct (vFS50_TA_msgRNA) showed a pure colony with the desired mutation for a protospacer (CLAU532A.A1). Fortunately, it also showed a mixed peak for the CLAU534A protospacer (CLAU534A.A4). That colony was thus re-streaked on a YTF + D-cyc + Tm + Th plate to isolate pure colonies. A single pure colony with the desired CLAU532A.A1 and CLAU534A.A1 C-to-T mutations could be identified after screening five of these re-streaked colonies (cFS15). Albeit with poor efficiency, this is evidence that at least two loci can be mutated with premature STOP codon in a single mutagenesis step using Target-AID.
A subsequent multiplex mutagenesis attempt was made with a different CLAU_1794 protospacer (CLAU1794B) with the plasmid vFS57_TA_msgRNA_1794B ( Figure 3) and its monoplex control vFS58_TA_CA1794B. This time, the last protospacer could be targeted, albeit only two out of five colonies showed a mutated CLAU1794B protospacer (four for the monoplex control), both  showed a mixed trace, and none had the desired C-to-T mutation which would have produced a STOP codon. This confirmed that the last sgRNA cassette was functional and that the previous failure to mutate CLAU1794A was not due to any multiplex system, but to a defective protospacer sequence. Unfortunately, the CLAU534A.A1 or CLAU534A.A4 allele could not be found among the five screened vFS57_TA_msgRNA_1794B colonies.

Arrays of tRNA-sgRNA fusions can be used to express multiple sgRNAs
The initial mtRNA and the mCRISPR multiplexing strategies both failed to mutate the CLAU534A protospacer, even though this could be achieved by the msgRNA strategy and the monoplex control targeting CLAU534A (vFS53_TA_CA534A). The mCRISPR strategy only managed to mutate one of the five colonies screened for CLAU532A mutations (Table 3). Interestingly the mtRNA strategy managed to mutate all five of them. This indicated that the Thr-TGT-1-1 tRNA had not interfered with the function of the CLAU532A sgRNA. The mCRISPR strategy was abandoned at this stage. However, we hypothesized that replacing the fMet-CAT-1-1 tRNA by another copy of the Thr-TGT-1-1 tRNA might rescue the mtRNA strategy.
While multiplex mutagenesis with the vFS56_TA_mtRNAthr_ 1794B (Figure 3), which only uses the Thr-TGT-1-1 tRNA, did not yield the expected C-to-T mutation in either CLAU534A or CLAU1794B, it did successfully target all three protospacers and even yielded two pure CLAU532A.A4 alleles with the desired C-to-T mutation (Table 3).

UGI-LVA and truncated sgRNAs do not improve mutagenesis efficiency
Next, we investigated whether fusing a UGI (with GLVA degradation tag) to Target-AID would improve mutagenesis efficiency in all protospacers, or if truncating the protospacer region of the sgRNAs targeting CLAU534A and CLAU1794B would shift their editing window to edit the cytosine in position −16 from the PAM more favourably. Unfortunately, while the initial vFS57_TA-msgRNA_CA1794B construct once again managed to edit all three loci with almost 100% efficiency (although without pure mutant and only one correctly edited base out of ten in position −16 for CLAU534A and CLAU1794B), the addition of UGI-LVA downstream at the C-terminus of Target-AID in the plasmid vFS75_mTA-UGILVA seemed only to harm editing efficiency (only 2 pure mutations for CLAU534A and no other mutation across the 12 remaining reads), and truncating the sgRNAs targeting CLAU534A and CLAU1794B from 20 nt to 18 nt in the plasmid vFS94_mTA-trsgRNA seemed to completely abolish mutagenesis of their respective protospacers (Table 4).

Target-AID-NG has a vastly superior targeting space
Besides CLAU534A, CLAU1794A and CLAU1794B, there were no other Target-AID-compatible protospacers that could potentially result in a premature STOP codon in CLAU_534 or CLAU_1794. This revealed an inherent weakness of Target-AID   TABLE 4 Sequencing of five Clostridium autoethanogenum colonies conjugated with different multiplex Target-AID constructs and induced on theophylline. vFS75_mTA-UGI-GLVA is a msgRNA Target-AID construct fused with a UGI and GLVA domains at its C-terminus. vFS94_mTA-trsgRNA is a msgRNA Target-AID construct with 18 nt spacers instead of 20 nt. vFS57_TA-msgRNA_CA1794B is a standard msgRNA Target-AID construct (without UGI-GLVA tag and with 20 nt spacers) that targets the same protospacers as vFS75_mTA-UGI-GLVA and vFS94_mTA-trsgRNA. The targeted codon is bolded and mutated bases are capitalized. Fractions indicate mixed reads for this base. The sequence of CLAU532A is oriented in the antisense direction, meaning that TTA, TCA, and CTA alleles would all results in a STOP codon (TAA, TGA and TAG, respectively). (*) Poor quality reads were removed from analysis.

Protospacer
Allele Sequence (20 nt Frontiers in Bioengineering and Biotechnology frontiersin.org mutagenesis: very few protospacers are available for each gene, and many genes cannot be targeted at all. To quantify the problem, the total number of Target-AID protospacers which could produce STOP codons in the first 75% of any gene in C. autoethanogenum were modelled and compared across several published SpCas9 variants (Supplementary Table S6) which exploit noncanonical PAMs (Figure 4). The proportion of C. autoethanogenum genes which could theoretically be disrupted by Target-AID using these protospacers for each PAM (Figure 4) was also measured. We named this parameter the genomic coverage of Target-AID. The bioinformatic analysis undertaken revealed that the conventional Target-AID, with its NGG PAM, could potentially introduce STOP codons in only 3,895 protospacers for a genomic coverage of merely 51.64% in C. autoethanogenum. NG PAMs (Nishimasu et al., 2018), on the other hand, would give access to 15,602 protospacers which would theoretically allow 85.32% of C. autoethanogenum's CDS to be inactivated. Also promising is the NAA PAM of Cas9-iSpymac (Chatterjee et al., 2020), which can target 13,293 protospacers for a genomic coverage of 81.81%. Together, Target-AID-NG and Target-AID-iSpymac would be able to knock out 91.78% C. autoethanogenum's CDS. However, since both CLAU_ 532 and CLAU_534 happen to be among the 18.19% of CDS that Cas9-iSpymac could not knock-out as part of a Target-AID base editor, Target-AID-NG was chosen for a last multiplex mutagenesis attempt of CLAU_532, CLAU_534 and CLAU_1794 with the msgRNA strategy.

UGI without LVA tag does not improve mutagenesis efficiency of Target-AID-NG in Clostridium autoethanogenum
The vectors vFS72_mTA-NG and vFS103_mTA-NG-UGI_ NoLVA were assembled to introduce nonsense mutations in CLAU_532, CLAU_534 and CLAU_1794 using Target-AID-NG and Target-AID-NG-UGI without GLVA tag (Table 5). Our model showed that Target-AID-NG could target five protospacers in CLAU_532, three in CLAU534, and seven in CLAU1794. This allowed the selection of protospacers where all the cytosines in the editing window would result in a nonsense mutation if mutated to thymines, with a preference for protospacers with cytosines in position −18 rather than −16 from the PAM. This time, with a UGI fusion but without GLVA tag, mutants were obtained at roughly the same rate as the Target-AID-NG construct without UGI (80% and 100% for CLAU532A, 0% and 20% for CLAU534B, and 80% and 60%, respectively). Only two traces out of 28 reflected pure mutants for one locus; all the other mutated loci were mixed with WT genotype (often predominantly WT). After one round of re-streak, no CLAU532A mutant could be isolated from the two colonies exhibiting the pure CLAU1794C.A.1 genotype.

Target-AID-NG is also suitable for duplex mutagenesis in Clostridium autoethanogenum
In a last attempt to mutate all three targeted genes in the same strain, we assembled the vector vFS74_TA_NG_msgRNA_ 1794DFG to target three of the four remaining CLAU_ 1794 protospacers targetable by Target-AID-NG to produce a nonsense mutation. This vector was not using a UGI fusion and was using the msgRNA strategy. It was conjugated into the plasmid-free double-mutant C. autoethanogenum strain (cFS04) that had been previously engineered with vFS50_TA_ msgRNA. As described in Table 6, the protospacer CLAU1794D failed to deliver any mutations in the five colonies screened, but CLAU1794F and CLAU1794G showed 100% of mutagenesis efficiency. This time, only three out of ten traces showed mixed peaks, but all five pure CLAU1794F mutations affected a cytosine which did not result in the introduction of a STOP codon. Out of the three protospacers, only CLAU1794G resulted in nonsense mutations in CLAU_1794.

FIGURE 4
Targeting space and genomic coverage of Target-AID in C. autoethanogenum as a function of different PAM sequences. The targeting space is defined here as the total number of protospacers in the first 75% of their respective CDS which can lead to a STOP codon when the cytosines within the nucleotides −19 to −16 from the PAM are mutated into thymines. We define genomic coverage as the proportion of CDS's in the genome which have at least one of these protospacers.
Frontiers in Bioengineering and Biotechnology frontiersin.org 3.9 Fifteen off-target mutations were identified after two rounds of Target-AID and target-AID-NG multiplex mutagenesis The resulting triple-mutant strain was finally validated by whole genome sequencing (NCBI accession PRJNA956560). It revealed twelve off-target single-nucleotide polymorphisms (SNP) (Supplementary Table S7): six exhibiting the canonical Target-AID C-to-T or G-to-A mutation, six within 50 nt of a putative off-target protospacer identified with Cas-Offinder (RRID: SCR_ 023390) (Zhao et al., 2017), and one which could be manually aligned with the protospacer CLAU1794G using Benchling (RRID: SCR_013955) sequence alignment tool (Supplementary Table S8).
One 15 nt duplication event was also found, in addition to two regions of the genomes where the sequencing coverage abruptly dropped to 0 and thus seem to have been lost by the cell (Table 8). One of these deleted regions was only 23 nt long (but inside one of the nine 16S RNA loci), while the other region was 12 kb-long and comprised a putative off-target protospacer relatively similar to its on-target, with only four mismatches and a 2 nt gap. Altogether, fifteen off-target mutations were thus identified in our strain that underwent two consecutive rounds of multiplex Target-AID (then Target-AID-NG) mutagenesis. Importantly, the gene coding for UDG was among the thirteen genes lost within the 12 kb deleted region (Table 8; Supplementary Table S9 for detailed list of deleted genes).  5 Sequencing of five Clostridium autoethanogenum colonies conjugated with different Target-AID-NG constructs and induced on theophylline. vFS72 = vFS72_mTA-NG; vFS103 = vFS103_mTA-NG-UGI_NoLVA. The targeted codon is bolded and mutated bases are capitalized. Fractions indicate mixed reads for this base. The sequence of CLAU532A and CLAU1794C is oriented in the antisense direction, meaning that TTA, TCA, and CTA alleles would all results in a STOP codon (TAA, TGA and TAG, respectively). (*) Poor quality reads were removed from analysis.

Protospacer
Allele Sequence (20 nt) Mutagenesis efficiency (over 5 colonies) vFS72_mTA-NG vFS103_mTA-NG-UGI_NoLVA  Our results show that, although Target-AID and Target-AID-NG can be used for multiplexed targeted point mutagenesis in C. autoethanogenum, they have serious drawbacks. While mutagenesis efficiency is high enough to reliably isolate mutants without selection markers, the resulting colonies are often mixed, which requires additional re-streaking steps to isolate pure mutants. Mixed colonies have been reported previously with several base editors in different organisms (Banno et al., 2018;Li et al., 2019;Liu et al., 2019). We hypothesize that they result from single cells that initially survived Target-AID mutagenesis by undergoing BER or by replicating instead of mutating their non-edited strand (Figure 1.VI, 1.VIII). From there, one of the cell lineages from the colony can mutate and produce the expected mutation, while the second lineage of cells continues to avoid mutagenesis by a variety of mechanisms, for example, by mutating key components of the Target-AID plasmid. The same process could produce mixed colonies when more than one cytosines are present in the editing window: each cell lineage in the same colony would acquire immunity from Target-AID by mutating a different cytosine. Protospacers with multiple cytosines in the editing window should thus be avoided to minimize the risk of obtaining mixed colonies.
Targeting cytosines in position −16 or −17 from the PAM, while possible, was rarely successful-especially when other cytosines were present in positions −18 or −19. Accordingly, only protospacers with a single cytosine in position −18 or −19 from the PAM should be used. This is consistent with the literature (Banno et al., 2018;Li et al., 2019;Nishida et al., 2016), but it severely restricts the already limited range of available protospacers in C. autoethanogenum. As illustrated in this work, the use of base editors with alternative PAM recognition domains such as Target-AID-NG or Target-AID-iSpymac can, however, greatly facilitate the identification of optimal protospacer targets.
This feature of Target-AID-NG was exploited to test three protospacers at once for the same target gene, in the hope that at least one would work. Indeed, out of the ten targeted protospacers (pyrE protospacer, CLAU532A, CLAU534A-B, and CLAU1794A-B-C-D-F-G), two did not work at all (CLAU1794A, and CLAU1794D) TABLE 7 Summary of off-target SNP and short duplication events identified in Clostridium autoethanogenum after two rounds of multiplex Target-AID (then Target-AID-NG) mutagenesis, and their putative association with targeted protospacers. (*) Found by standard alignment with the protospacers targeted in this strain (CLAU532A, CLAU534A, CLAU1794A, CLAU1794D, CLAU1794F, CLAU1794G) using a standard alignment tool (Geneious (RRID:SCR_010519)), instead of Cas-Offinder (RRID: SCR_023390) (Zhao et al., 2017  and only four reliably introduced nonsense mutations (pyrE protospacer, CLAU532A, CLAU1794C, and CLAU1794G).
Targeting the same gene with several protospacers at once might thus still have been a good use of multiplexing and Target-AID, if it had not also been associated with such a high rate of off-target mutagenesis (including a major 12 kb deletion). Consequently, even though we showed multiplex Target-AID mutagenesis was achievable in C. autoethanogenum, we cannot recommend its use as a standard practice, even just to screen several protospacers for the same target gene. Using parallel monoplex constructs is indeed less likely to result in off-target mutagenesis for the same number of targeted genes. It is, however, difficult to assert with confidence which off-target mutations are a direct consequence of Target-AID mutagenesis. Some probably occurred randomly and were only selected through the many rounds of re-streaking; others might have been the indirect consequence of the loss of UDG, a key enzyme of the BER DNA repair pathway. The loss of UDG itself, nonetheless, was likely selected for by Target-AID, as UDG is a putative inhibitor of CBEs mutagenesis and thus a liability for the cells which express Target-AID. We hypothesize that UDG was lost in a random recombination event (potentially triggered by an off-target nick from Target-AID). Δudg cells should lose the BER pathway, which might have made them more susceptible to Target-AID mutagenesis. Accordingly, the Δudg mutants must have been over-represented in the colonies that survived mutagenesis. After the MMR pathway mutated the non-edited strand, in the absence of UDG, the repair of the edited strand from a "U" to a "T" might have been achieved solely through DNA replication.
The loss of UDG indirectly validates the strategy of using an UGI fusion to improve the effectiveness of CBEs, but it also exposes the gene coding for UDG as a mutational hotspot to look out for during Target-AID (or CBE) mutagenesis. This result highlights the importance of whole genome sequencing and rigorous complementation studies in any genome editing experiment, including base editing. Interestingly, in our hands, truncated sgRNA spacers, Target-AID-UGI and Target-AID-UGI-GLVA protein fusions did not result in higher mutagenesis efficiencies in C. autoethanogenum (there was even a marked decrease of mutagenesis efficiency when the GLVA-tag was present or truncated sgRNA spacers were used).
Here, multiplexing gRNA transcription using the native CRISPR array of S. pyogenes was not successful. In hindsight, it might be due to the addition of a restriction site and four extra nt upstream of each 20 nt spacer to facilitate cloning. Future iterations of this multiplexing strategy would be advised to comprise of 30 nt spacers fully homologous to their target. The tracrRNA could also be flanked with self-cleaving ribozymes to exclude any interference from its promoter and terminator in its RNA structure.
While an investigation of whether the tRNA-sgRNA array had successfully been processed into individual tRNAs and sgRNAs molecules was not undertaken, the array succeeded in targeting multiple protospacers almost as effectively as three separate transcriptional units. Given that the monoplex controls also failed to produce the desired nonsense mutation, the absence of correct CLAU534A and CLAU1794B mutants in either multiplexing strategy can be attributed to the protospacers themselves, and not the multiplexing method. An array of tRNA-sgRNA fusions has several advantages over an array of separate sgRNA transcriptional units. Notably, the reduced size of the sgRNA expression cassette (821 nt for mtRNA versus 1,146 nt for msgRNA) and its simplicity and scalability: only one promoter and terminator are needed to express an arbitrary number of sgRNA-tRNA fusions. Beyond Cas9 mutagenesis, tRNA fusions which exploit prokaryotic tRNA maturation mechanisms to process polycistronic RNA into individual molecules could be used to express any RNA-based synthetic biology tool (Altman, 1975;Apirion & Miczak, 1993;Gegenheimer & Apirion, 1981;Green et al., 2014;Lee et al., 2018;Li et al., 2005;Li & Deutscher, 2002;Mackie, 2013;Minagawa et al., 2004;Mörl & Marchfelder, 2001;Ow & Kushner, 2002;Sekiya et al., 1979). This strategy could be improved by identifying more prokaryotic tRNAs which are compatible with sgRNAs arrays, to avoid repeating the same tRNA sequence in the same sgRNA array.
Finally, Target-AID is an exceedingly easy system to build, that does not require any PCR amplification step: just swapping the 20 nt protospacer in a single two-parts Gibson or Golden-Gate assembly between a cut vector and a synthesised DNA oligo is sufficient to create a vector capable of editing a different target. The absence of homology cassette makes it uniquely straightforward to multiplex, especially with polycistronic RNA systems such as the mtRNA or mCRISPR strategies illustrated in this study. However, the time gained during design and assembly can be quickly lost again if mixed colonies need to be restreaked (especially for slow-growing organisms like C. autoethanogenum). Initial screening of mutants is also more difficult than with standard homology-directed knockouts (Seys et al., 2020), as a colony PCR cannot readily identify the desired SNPs through simple gel electrophoresis.

Conclusion
This study is valuable because it illustrates the strengths and weaknesses of the base editor Target-AID-NG in the Clostridium genus and exemplifies the use of prokaryotic tRNAs to process a synthetic polycistronic RNA.
Like other base editors, Target-AID-NG offers several undeniable advantages besides bypassing the need for a functional HR or NHEJ DNA repair pathway. It is straightforward to assemble and use in a standard mutagenesis workflow and can easily be multiplexed. However, it suffers from a restricted pool of optimal and/or functional protospacers, as well as a propensity to generate mixed colonies of WT and mutated cells. Critically, like any other genome editing tool to date, Target-AID-NG is still not precise or innocuous enough to ignore the possibility of offtarget mutations and the necessity of complementation studies. As illustrated in this study, particular attention should be paid to the udg homolog of the targeted organism, as its loss might be selected by Target-AID-NG. With whole-genome sequencing technology becoming more affordable every year, it should become a standard step to any mutant characterisation.
Fortunately, base editing is a quickly evolving field (Anzalone et al., 2020;Kantor et al., 2020). While Target-AID or Target-AID-NG in their current form might find some niche applications in contexts where conventional HR-or NHEJmediated mutagenesis methodologies are impossible, the obstacles encountered in the present study make it difficult to recommend as a tool for mainstream knock-out experiments in Frontiers in Bioengineering and Biotechnology frontiersin.org C. autoethanogenum when other methods are available. Other base editors such as BE4 (Gaudelli et al., 2017;Gehrke et al., 2018;Li et al., 2019) or Target-AID-iSPymac (Chatterjee et al., 2020) might have marginally improved performances while conserving its simplicity of design and potential for multiplexing. Alternatively, the recently developed prime editor (Anzalone et al., 2019), that uses a reverse-transcriptase fused to nCas9 to introduce custom small mutations independently from the target sequence, might hold the greatest promise for a multiplex and scarless point mutagenesis tool in Clostridium.

Data availability statement
The datasets presented in this study can be found in online repositories. This data can be found here: [https://www.ncbi.nlm. nih.gov/bioproject/PRJNA956560]

Author contributions
Conceptualisation: FS, NM, LQ, CH, and SY; choice of CLAU_ 532,534,1794 targets: TM; Investigation, methodology: FS; modelling of total protospacers and genomic coverage: CT-A; writing-original draft preparation, FS; writing-review and editing, CH, CT-A, NM, QL, and SY. All authors contributed to the article and approved the submitted version.

Funding
This research was funded by the Biotechnology and Biological Sciences Research Council (grant numbers BB/L013940/1, BB/ W01453X/1).