Identification of a Type IV-A CRISPR-Cas System Located Exclusively on IncHI1B/IncFIB Plasmids in Enterobacteriaceae

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are diverse immune systems found in many prokaryotic genomes that target invading foreign DNA such as bacteriophages and plasmids. There are multiple types of CRISPR with arguably the most enigmatic being Type IV. During an investigation of CRISPR carriage in clinical, multi-drug resistant, Klebsiella pneumoniae, a Type IV-A3 CRISPR-Cas system was detected on plasmids from two K. pneumoniae isolates from Egypt (isolated in 2002–2003) and a single K. pneumoniae isolate from the United Kingdom (isolated in 2017). Sequence analysis of all other genomes available in GenBank revealed that this CRISPR-Cas system was present on 28 other plasmids from various Enterobacteriaceae hosts and was never found on a bacterial chromosome. This system is exclusively located on IncHI1B/IncFIB plasmids and is associated with multiple putative transposable elements. Expression of the cas loci was confirmed in the available clinical isolates by RT-PCR. In all cases, the CRISPR-Cas system has a single CRISPR array (CRISPR1) upstream of the cas loci which has several, conserved, spacers which, amongst things, match regions within conjugal transfer genes of IncFIIK/IncFIB(K) plasmids. Our results reveal a Type IV-A3 CRISPR-Cas system exclusively located on IncHI1B/IncFIB plasmids in Enterobacteriaceae that is likely to be able to target IncFIIK/IncFIB(K) plasmids presumably facilitating intracellular, inter-plasmid competition.


INTRODUCTION
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR-Cas) are widespread, adaptive, RNA-mediated, immune systems found in the genomes of prokaryotic organisms (bacteria and archaea) that target invading foreign DNA such as bacteriophages and conjugative plasmids (Barrangou et al., 2007;Marraffini and Sontheimer, 2010a). CRISPR functions through a three-stage process: adaptation involving the acquisition of foreign DNA molecules as spacers, expression and maturation of the short CRISPR RNAs (crRNAs), and the interference with a cognate invading foreign DNA molecule (Rath et al., 2015). The classification of CRISPR-Cas systems is continuously updated to include newly identified subtypes. To date, CRISPR-Cas systems are classified into two classes, six Types (I-VI), and ∼ 33 subtypes Makarova et al., 2018Makarova et al., , 2020. There is ongoing discovery of multiple, novel class 2 CRISPR-Cas systems (Makarova et al., 2020). The two classes differ according to the effector module; class 1 utilizes multi-protein effector Cas complexes, while class 2 utilizes a single-protein effector [Type II contains Cas9; Type V contains Cas 12a (previously known as Cpf1), Cas12b (previously known as C2c1), Cas12c (previously known as C2c3), Cas12d (previously known as CasY), and Cas12e (previously known as CasX); and Type VI contains Cas13a (previously known as C2c2), Cas13b, and Cas13c] (Makarova and Koonin, 2015;Makarova et al., , 2017Pyzocha and Chen, 2018). CRISPR-Cas systems are confirmed, or expected, to provide immunity against viruses and other mobile genetic elements (MGEs), except for transposon-encoded CRISPR-Cas systems that lack the interference module and therefore are predicted to perform functions distinct from adaptive immunity (Makarova et al., 2020). Most of the CRISPR types target DNA, some types specifically target RNA such as Type VI, while Type III CRISPR systems are unique because they exhibit both RNA interference and DNA interference in vivo to protect their microbial hosts Sontheimer, 2008, 2010b;Hale et al., 2009;Deng et al., 2013;Manica et al., 2013;Goldberg et al., 2014;Tamulaitis et al., 2014;Zebec et al., 2014;Peng et al., 2015;Samai et al., 2015;Elmore et al., 2016;Estrella et al., 2016;Kazlauskiene et al., 2016;Zhang and Ye, 2017;Ozcan et al., 2019;Lin et al., 2020).
Type IV was previously called the Unknown Type (Type U), due to its rare occurrence and lack of the adaptation module, until an updated classification in 2015 (Makarova et al., 2013;Zhang and Ye, 2017). It was then named Type IV (putative) after its identification in Acidithiobacillus ferrooxidans presenting a different genetic arrangement of Type U cas genes . In 2017, Type IV classification was updated, after its identification in Thioalkalivibrio sp. K90mix (TK90_2699-TK90_2703), to show an associated repeat-spacer array for a cas loci that have csf4 (dinG), csf5 (cas6-Like), csf1 (cas8-Like), csf2 (cas7), and csf3 (cas5) genetic arrangement, respectively, and was then assigned as Type IV-A . In 2018, a variant of Type IV that lacks a repeatspacer array from Rhodococcus jostii RHA1 (RHA1_ro10069-RHA1_ro10072), was assigned as Type IV-B (Figure 1; Makarova et al., 2018). In 2019, the Type IV-C CRISPR-Cas system was formally classified as a distinct subtype after its identification in nine contigs; mostly from thermophilic microorganisms (Makarova et al., 2020). Other papers have also proposed the classification of Type IV-D, Type IV-E, and subgroups of Type IV-A(1-4) Pinilla-Redondo et al., 2019), however, the suggested subgroups did not have a unified genetic arrangement corresponding for each of the named Type IV-A(1-4) variants. Type IV CRISPR-Cas systems were shown to employ crRNA-guided effector complexes (Ozcan et al., 2019). Type IV is the only type to possesses csf4 (dinG) in its CRISPR-Cas loci (Dwarakanath et al., 2015;, and it was recognized initially as the signature proteins for Type IV systems, (Makarova and Koonin, 2015;Crawley et al., 2018) although recently subtype IV-D has been shown to carry a helicase of the RecD family in place of the archetypal DinG (Pinilla-Redondo et al., 2019). To date, Type IV variants (IV-A, IV-B, and IV-C) described above show different genetic arrangements and orientation of cas loci, and they all lack the adaptation module. Also, all Type IV CRISPR-Cas systems are encoded by bacterial plasmids, prophages or other, uncharacterized integrated elements (Faure et al., 2019b). Thus, it has been hypothesized that Type IV is similar to an ancestral innate immune system that gained adaptive ability by associating with a transposon-like element containing cas1 and cas2 (Rath et al., 2015).

Clinical Isolates Sequencing
Three clinical isolates were investigated; Klebsiella pneumoniae-53 and K. pneumoniae-65 were isolated from Egyptian university teaching hospitals (2002)(2003), and K. pneumoniae-CR5 from University College London Hospital in the United Kingdom (2017). The bacterial genomic DNA sequencing was conducted at MicrobesNG (Birmingham, United Kingdom). Isolates were sequenced using an Illumina HiSeq 2500 and an Illumina MiSeq instruments, to boost coverage, with a 2 × 250 bp paired end sequencing using Nextera XT library prep.

CRISPR-Cas System Identification and Characterization
DNA sequences were analyzed using CRISPRFinder, CRISPRCasFinder, CRISPRTarget, and Snapgene (GSL Biotech) (Grissa et al., 2007;Lundgren et al., 2015;Couvin et al., 2018;GSL, 2020). The Cas domain analyses were performed by HHpred (Sensitive protein homology detection, function, and structure prediction based on HMM-HMM comparison) at MPI bioinformatics Toolkit 1 (Zimmermann et al., 2018). HHpred was performed using NCBI_Conserved_Domains(CD)_v3.16 and TIGRFAMs_v15.1 databases and Bac_Escherichia_coli_K12_07_Mar_2017 proteome settings. Multi-Locus Sequence Typing, resistance genes and plasmids were identified using MLST, ResFinder, and PlasmidFinder, respectively (Carattoli et al., 2014). Spacer analysis was performed by BLAST and Geneious (Kearse et al., 2012). A phylogenetic UPGMA-based tree was constructed for CRISPR arrays and Cas proteins using MEGA X 10.1 (Kumar et al., 2016;Shen et al., 2017). The alignment of the regions containing protospacers (including 10 bp flanking the protospacer) associated with CRISPR1 repeats were investigated to identify putative protospacer adjacent motif (PAM) signature, as described in Pinilla-Redondo et al. (2019). PAMs were identified based on compared alignments of nucleotides immediately preceding each detected protospacer in all or up to ten unique protospacers of all the investigated sequences. The leader sequence was identified by sequence alignment. Direct repeats and PAM conservation were assessed using  (Makarova et al., 2013). (B) Type IV (putative) as identified in Acidithiobacillus ferrooxidans in 2015 , however, the two associated repeat-spacer arrays were identified in this study. (C) Type IV-A identified in Thioalkalivibrio sp. K90mix (TK90_2699-TK90_2703) in 2017 (Makarova et al., 2018). (D) Type IV-B identified in Rhodococcus jostii RHA1 (RHA1_ro10069-RHA1_ro10072) in 2017 (Makarova et al., 2018). (E) Type IV-C identified in Thermoflexia bacterium D6793_05715-D6793_05700 (Makarova et al., 2020). (F) Type IV-A3 as detected in Enterobacteriaceae isolates and genomes in this study. Arrows in different colors represent genes; red represents cas6; bright blue represents dinG; light green represents other essential genes of the system; cas8-like/LS, cas7 and cas5; white represents cas11; blue-yellow pattern represents the direct repeat-spacer loci.

Identification of Type IV-A3 CRISPR-Cas in Clinical and GenBank Isolate Sequences
Type IV-A-3 CRISPR-Cas, based on the gene composition and genetic architecture of the IV-A variants detected in K. pneumoniae as described in Pinilla-Redondo et al. (2019), was detected on a total of thirty-one (three clinical isolates and twenty eight sequences from GenBank) IncHI1B/IncFIB(Mar) plasmid sequences within Enterobacteriaceae (Figure 1 and Supplementary Table S1). The IncH1B/IncFIB plasmids are large, low copy number, conjugative plasmids with narrow-hostranges, which are found in multiple genera of Enterobacteriaceae (Zhong et al., 2005;Suzuki et al., 2010;Faure et al., 2019a). An important feature of IncH1B/IncFIB plasmid biology is the entry exclusion by which the cells that contain an IncF/IncH plasmid become poor recipients in additional conjugation rounds (Garcillan-Barcia and de la Cruz, 2008; Ravenhall et al., 2015); which frees a resident plasmid from competition with related plasmids at segregation during bacterial division but may contribute to limiting plasmid dissemination among potential hosts.
This Type IV-A3 CRISPR-Cas is characterized by the presence of a cas loci containing dinG, which is a distinct feature of Type IV-A CRISPR-Cas system that was shown to be a requirement for the system functional activity in Pseudomonas aeruginosa Pinilla-Redondo et al., 2019;Makarova et al., 2020), a conserved leader sequence and a CRISPR array in all the detected sequences and they all show homology to each other. These Type IV-A3 CRISPR-Cas systems were initially detected by BLAST that confirmed the presence of three genes; cas7, dinG, and cas6, and further HHpred analysis of other associated ORFs revealed the presence of two more genes; cas5 and cas8-like.

Association Between Type IV-A3 CRISPR-Cas Sequences and IncHI1B/IncFIB(Mar) Plasmids
We also identified partial related Type IV-A3 systems either (cas8-like, cas6, and dinG) or (cas7 and a CRISPR array) occurring on other IncHI1B/IncFIB(Mar) plasmids (Supplementary Table S1). Partial and complete Type IV-A3 system characterization showed occurrence of a range of different IS elements and retrotransposons (group II introns) (Supplementary Table S1). The average GC content of this Type IV-A3 CRISPR-Cas loci (47.7 ± 0.01%) was found to be closer to that of the IncHI1B/IncFIB(Mar) plasmids on which they reside (46.2 ± 0.01%), compared to the chromosomal sequences of the bacterial host (57 ± 0.02%), ( Table 1).

Characterization of the Type IV-A3 CRISPR-Cas System Found in Enterobacteriaceae
A single CRISPR array (CRISPR1) was identified upstream of all cas loci. The repeats have a predicted stem-loop secondarystructure (Figures 2A,B) and is likely involved in a pre-crRNA Cas6-mediated process. The alignment of the regions around and containing the protospacer, particularly the last six positions preceding the protospacer, associated with CRISPR1 repeats revealed the conservation of the putative PAM signature (AAG) adjacent to the end of the protospacers (Figure 2C). A highly conserved 65 bp leader sequence occurring between the CRISPR-Cas loci and the CRISPR array was observed in all the sequences ( Figure 2D). The minor variations in the leader sequence only occurred in two sequences (C in position −63 is A in CP014776.1 Pluralibacter gergoviae, and G in position −41 is A, and C in position −39 is T in K. pneumoniae-CR5 ST-392). The high conservation of the leader sequence is unlike that presented in Pinilla-Redondo et al. (2019). RT-PCR of the confirmed cas loci (cas7, dinG, and cas6) demonstrated that they are expressed in all three of the available clinical isolates ( Table 2).
We have detected a total of 467 spacers in the 31 CRISPR1 arrays analyzed, out of which 9% (42/467) match to bacteriophages and 25.5% (119/467) match to plasmid sequences. The majority of spacer sequences are present in more than one spacer array and some are present more than once within the same array (Figure 3). Plasmid targeting spacers appeared in every example of this Type IV-A3 associated CRISPR array analyzed. Sequence analysis revealed that spacers correspond to IncFIIK conjugal transfer genes; traN and traL (Figure 3). Limited conservation within the order of the spacer arrays showed that the arrays cluster into two distinct groups which share geographical associations and suggest persistence within the plasmid pool in isolates from certain countries over time (Figure 4).
The CRISPR system described here is always found associated with IncH1B/IncFIB plasmids in Enterobacteriaceae, has dinG and cas7 (involved in interference), and cas6, cas5, and cas8-like (involved in expression and maturation of short crRNAs) (Cass et al., 2015;Dwarakanath et al., 2015;Koonin and Krupovic, 2015;Lundgren et al., 2015;Crowley et al., 2019;Faure et al., 2019b). The detection of a csf1/cas8-like in the Type IV-Avariant described here updates the initial (Newire et al., 2019) and subsequent (Kamruzzaman and Iredell, 2019) reports of this system. Additionally, our results agree with other reports suggesting a need for Type IV-A variant classification Pinilla-Redondo et al., 2019).
Notably, some of the previously described Type IV systems do not possess a dinG or cas8-like (e.g., Type IV-C); however, cas7 genes are consistently found in all the previously and presently described Type IV sequences. Also, Cas7 is the most conserved protein among members of the Type IV CRISPR family (Pinilla-Redondo et al., 2019). This highlights the role of cas7 in Type IV identification.
This Type IV-A3 described here has a variable CRISPR array and a conserved leader sequence. The conserved leader sequence occurrence in a wide variety of K. pneumoniae sequence types may reflect their narrow association with IncH1B/IncFIB plasmids. Conserved leader sequences in other types (Type I-E) were shown to increase acquisition efficiency by presumably stabilizing the Cas1-2-leader-repeat interaction (Kieper et al., 2019). The order of spacers demonstrated conservation with some polymorphism, and they cluster into two main groups (Figure 4) matching DNA from a variety of geographical sources. Expression of this Type IV-A cas genes suggests immunity to incoming DNA matching the spacers. Crowley et al. (2019) posit that interference is mediated, similar to type I and type III systems, through multi-subunit complexes composed of Csf proteins and the use of crRNA as a guide to bind complementary nucleic acid forming R-loops. In this case, it was hypothesized that DinG is then recruited to these R-loops, where it either acts directly to destroy the foreign DNA (e.g., a plasmid) or recruits an endogenous nuclease to mediate RNA-guided interference ; however, this needs to be tested. Also, we note that the adaptation module is missing, thus adding new spacers will require cas1 and cas2 from other CRISPR-Cas systems. Like other Type IV systems that cannot function as independent adaptive immune systems (Koonin and Krupovic, 2015), we suspect that the Type IV-A3 CRISPR-Cas described here is likely to co-operate with other cas loci, whenever they exist within the Enterobacteriaceae host genomes, for spacer acquisition. Those CRISPR-Cas loci could belong to those CRISPR systems that are known to be frequently associated with an Enterobacteriaceae host, such as Type I-E/I-E * or Type I-F (Aydin et al., 2017). The association between Type IV-A and cas6e and cas6f (cas6 sequences observed in subtypes I-E and I-F, respectively) was previously reported in other bacterial families, suggesting functional links ( Another evidence that supports possible functional co-operation is the presence of Cas6 that shows 99%+ identity to Type I-E Cas6 in Enterobacteriaceae in the Type IV-A3 described here. Furthermore, unlike other Cas proteins associated with Type IV-A3 described here, Cas6 were highly conserved sequences, showing no particular association with interrupting IS elements, which may further support the recruitment of cas6 is form Type I-E. These possible adaptation functional links appear to be a feature that can be switched on/off, which requires the presence of the IncHI1B/IncFIB(Mar) plasmids (that carry this Type IV-Avariant) inside a bacterial host that has a functional CRISPR-Cas system in its genome. Although a previous report suggested that Type IV CRISPR-Cas system-positive plasmids were only found The predicted secondary structure of direct repeats and the associated Minimum Free Energy (MFE) estimated in (kcal/mol) shown underneath the structure. This structure is predicted to be involved in the mechanism of pre-crRNA processing. (C) Type IV-A-variant conserved putative protospacer adjacent motifs (PAMs). The alignment of the regions containing protospacers shows the conservation of putative PAM signature (AAG), position -3 to -1, adjacent to the end of the protospacers, using WebLogo. The analysis was performed on all the detected (467 spacers) of the 31 CRISPR1 arrays analyzed, specifically, spacer matching 9% (42/467) to bacteriophages and 25.5% (119/467) to plasmid sequences. Searches for other subtypes/variants were unsuccessful, likely due to the low number of spacer-protospacer matches (D) Conserved Type IV-A-variant CRISPR leader. The WebLogo shows a highly conserved 65 bp occurring between the CRISPR-Cas loci an the CRISPR array among the sequences investigated in this study. in Enterobacteriaceae with chromosomal Type I-E/I-E * CRISPR-Cas (Kamruzzaman and Iredell, 2019), we could not identify Type I-E/I-E * in all the isolate genomes that have Type IV-A3.
For example, K. pneumoniae-53, CP011314.1, HG918041.1, JN420336.1, MF150122.1, CP014776.1, CP018339.1, CP026398.1, CP020848.1, CP024507.1, CP026172.1, and CP025462.1 did not FIGURE 3 | Newly described Type IV-A3 CRISPR spacer polymorphism. The spacers map. Only spacers are represented by boxes, and no repeats are included. Identical spacers are represented by the same number and color, while unique spacers are represented by white color and no number is associated with the box. Self-targeting spacers are indicated by letter (S) and show 100% identity to host DNA, plasmid-targeting spacers are indicated by letter (P), phage targeting spacers are indicated by letters (Ph), other Enterobacteriaceae targeting spacers (100% identity) are indicated by letter (O), cryptic spacers with similarity to other bacterial DNA are indicated by letters (CO), and those with similarity to Eukaryotic DNA are indicated by letters (CE) that are positioned underneath the relevant spacer. CE spacers showed at least 57% identity to eukaryotic DNA. CE spacers were confirmed by multiple sequences alignments. * KY913897.1 is the isolate that only has a CRISPR array and a cas7 (csf2) (not a complete Type IV system) and therefore is not included in the total analysis in Figure 4. FIGURE 4 | Evolutionary relationships of Type IV-A3 CRISPR spacer. The phylogenetic tree illustrating the evolutionary relationships of Type IV-A3 CRISPR array nucleotide sequences. Phylogenetic UPGMA tree was constructed using the MUSCLE algorithm of MEGA7. The evolutionary distances were computed using the Maximum Likelihood method and Tamura-Nei model, bootstrap test (1000 replicates), and the rate variation among sites was modeled with a gamma distribution (shape parameter = 2). The percentage of trees in which the associated taxa clustered together is shown next to the branches. The year and geographical origin of the isolate are listed to the right of the branch ends.
have Type I-E/I-E * CRISPR-Cas systems. Thus, we assume there is no conditional connection between the presence of Type IV-A3 and Type I-E/I-E * CRISPR-Cas systems in Enterobacteriaceae.
Type IV-A3 CRISPR system reported here is exclusively located on IncH1B/IncFIB plasmids. We have also spotted an imperfect spacer target in traN of an IncFIIK plasmid in K. pneumoniae-53 which suggests this plasmid may be able to evade plasmid mediated CRISPR interaction within this strain (Jiang et al., 2013). Therefore, these spacers are likely to be involved in plasmid competition; protecting the resident Type IV-A CRISPR-Cas carrying plasmid in Enterobacteriaceae as previously suggested (Newire et al., 2019;Pinilla-Redondo et al., 2019). Recently, some Type IV-A system variants that are associated with P. aeruginosa were shown to target invasive plasmids, which strengthens the involvement of Type IV-A CRISPR-Cas systems in plasmid competition .
Type IV CRISPR-Cas systems demonstrate a notable diversity of molecular organization (Figure 1) and some appear to have taken on roles in addition to adaptive cellular immunity (Faure et al., 2019a). For example, some of the Type IV CRISPR-Cas loci were previously predicted to encode bacterial toxins that together with the Cas proteins of the Type IV systems may contribute to plasmid stabilization (Faure et al., 2019b). The Type IV-A3 system described here demonstrates a complex evolutionary connection with MGEs in terms of parasitism and immunity . The association between this Type IV-A3 system and multiple MGEs, plus the identification of partial cas loci genes with and without the CRISPR array on other IncHI1B/IncFIB(Mar) plasmids (Figure 1 and Supplementary Table S1), plus the identification of similar arrays in different plasmids in the same host from the same country (Figures 3, 4), indicates that dynamic, MGE mediated movement and rearrangement of this CRISPR-Cas Type IV-A system is ongoing. The similarity in the GC content between this Type IV-A and the IncHI1B/IncFIB plasmids in contrast with the higher chromosomal GC content supports the observations that the system is exclusively plasmid associated, both in this study and in others (Ravenhall et al., 2015). Because reporting standard deviations from comparisons of element-wide GC contents across different genomes could be misleading, since the strains are closely related that statistical observations are not independent, we have investigated the GC skew in a reasonably sized sliding window (1000 bp) across the length of the element (Type IV-A3 system and plasmid DNA sequences) in a single genome (K. pneumoniae-65), which also confirmed that the system is exclusively plasmid associated. This demonstrates unique evolutionarily juxtaposed connections between CRISPR-Cas and MGEs which is worthy of further investigation. To our knowledge, this is the first identification of a CRISPR-Cas system exclusively associated with IncHI1B/IncFIB plasmids that demonstrates an evolutionary association with MGEs and is likely to be involved in plasmid competition.

AUTHOR'S NOTE
This manuscript has been released as a pre-print at BioRxiv (Newire et al., 2019).

AUTHOR CONTRIBUTIONS
EN discovered the CRISPR system within the genomes of her Egyptian isolate collection, analyzed the sequence data, and wrote the first draft of the manuscript. SJ, AA, and VE designed and carried out the experiments to test cas loci expression. AR analyzed the data and wrote the manuscript. All authors critically reviewed and approved the manuscript.