The CRISPR-Cas Mechanism for Adaptive Immunity and Alternate Bacterial Functions Fuels Diverse Biotechnologies

Bacterial and archaeal CRISPR-Cas systems offer adaptive immune protection against foreign mobile genetic elements (MGEs). This function is regulated by sequence specific binding of CRISPR RNA (crRNA) to target DNA/RNA, with an additional requirement of a flanking DNA motif called the protospacer adjacent motif (PAM) in certain CRISPR systems. In this review, we discuss how the same fundamental mechanism of RNA-DNA and/or RNA-RNA complementarity is utilized by bacteria to regulate two distinct functions: to ward off intruding genetic materials and to modulate diverse physiological functions. The best documented examples of alternate functions are bacterial virulence, biofilm formation, adherence, programmed cell death, and quorum sensing. While extensive complementarity between the crRNA and the targeted DNA and/or RNA seems to constitute an efficient phage protection system, partial complementarity seems to be the key for several of the characterized alternate functions. Cas proteins are also involved in sequence-specific and non-specific RNA cleavage and control of transcriptional regulator expression, the mechanisms of which are still elusive. Over the past decade, the mechanisms of RNA-guided targeting and auxiliary functions of several Cas proteins have been transformed into powerful gene editing and biotechnological tools. We provide a synopsis of CRISPR technologies in this review. Even with the abundant mechanistic insights and biotechnology tools that are currently available, the discovery of new and diverse CRISPR types holds promise for future technological innovations, which will pave the way for precision genome medicine.


INTRODUCTION
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) and CRISPR-associated (Cas) proteins constitute an RNA-guided adaptive immune system found in several bacteria and most archaea. Over the past decade, detailed molecular mechanisms were discovered which established CRISPR-Cas not just as a phage defense system, but also as a regulator of bacterial physiology such as virulence and group behaviors. (Murugan et al., 2017;Faure et al., 2019;Liu and Doudna, 2020). In this review, we cover the roles of CRISPR-Cas systems within and outside the realm of adaptive immunity and detail the repertoire of biotechnological tools developed from them.

Discovery of CRISPR-Cas
CRISPR was discovered over 30 years ago in Escherichia coli and later in archaea (Ishino et al., 1987;Mojica et al., 1993). Cas proteins were originally thought to perform DNA repair (Makarova et al., 2002), but were later found to be associated with CRISPR (Jansen et al., 2002). The discovery that spacer sequences match foreign genetic elements led to the hypothesis that CRISPR-Cas is an immune system to protect against invading mobile genetic elements (Bolotin et al., 2005;Mojica et al., 2005;Makarova et al., 2006). Experimental validation of RNA-guided DNA cleavage protecting against intruding phages established CRISPR-Cas as an adaptive immune system (Barrangou et al., 2007).

Classification of CRISPR-Cas Systems
In CRISPR-Cas, the RNA-guided DNA/RNA cleavage occurs through an "effector complex" composed of an RNA guide called CRISPR RNA (crRNA) and a set of Cas proteins (Class 1) or a single multi-domain Cas protein (Class 2) ( Figure 1A). The two classes are each divided into three types, based on the identity of the signature protein that cleaves the target nucleic acid, and further into several subtypes based on the CRISPR-Cas locus architecture. The most updated classification has 2 classes, 6 types and 33 subtypes (Makarova et al., 2020).
Class 1 CRISPR-Cas systems are most abundant and include types I, III, and IV ( Figure 1A, Makarova et al., 2015;Makarova et al., 2020). The type I effector complex is called CASCADE (CRISPR-associated complex for antiviral defense), which comprises crRNA and multiple Cas proteins (Brouns et al., 2008;Jore et al., 2011). Once CASCADE locates the target DNA, Cas3 is recruited for DNA cleavage (Sinkunas et al., 2011). The nuclease and helicase activities of Cas3 bring about long-range degradation of intruder DNA (Huo et al., 2014). Type III is unique since it targets and cleaves transcriptionally active RNA using crRNA complementarity, which in turn activates cleavage of the ssDNA associated with the transcription bubble. While Cas7 cleaves the RNA, Cas10 cleaves the ssDNA (Taylor et al., 2015;Mogila et al., 2019;You et al., 2019). Both these activities are essential for full immune protection, with certain systems collaborating with degradosome-nucleases for efficient clearing of intruder nucleic acids (Staals et al., 2013;Staals et al., 2014;Samai et al., 2015;Estrella et al., 2016;Kazlauskiene et al., 2016;Chou-Zheng and Hatoum-Aslan, 2019). Type IV is a minimalistic CRISPR-Cas system devoid of adaptation proteins, mostly found in plasmids or plasmid-like regions and may be involved in plasmid maintenance (Makarova et al., 2020). DinG helicase is essential for plasmid interference in type IV-A, but the nuclease identity is currently unknown Özcan et al., 2019;Pinilla-Redondo et al., 2020).
Class 2 CRISPR-Cas systems consist of types II, V and VI ( Figure 1A). The type II effector protein Cas9 introduces dsDNA breaks in target DNA using HNH and RuvC endonucleases. Type II systems need an accessory non-coding RNA, transactivating crRNA (tracrRNA), along with crRNA for DNA cleavage (Jinek et al., 2012). Type V systems have Cas12 as the signature effector protein and causes staggered, sequence-specific DNA cleavage using RuvC domain. Subtype-specific variation of the target (DNA vs. RNA) and guide RNA requirements [crRNA or crRNA-tracrRNA or crRNA-scout (short-complementarity untranslated)] have also been observed in type V systems (Zetsche et al., 2015, 1;Yan et al., 2019;Harrington et al., 2020). The type VI signature nuclease, Cas13, binds crRNA and locates a complementary RNA target, followed by processive RNase activity using HEPN domains (Abudayyeh et al., 2016;O'Connell, 2019).

Stages in CRISPR-Cas Defense
There are three distinct stages for CRISPR defense.
(1) Adaptation: During adaptation, a short DNA is excised from intruding DNA and inserted site-specifically into the CRISPR array, creating a new spacer. Known adaptation mechanisms require Cas1 and Cas2 to capture and catalytically insert the spacer. Auxiliary proteins like Cas4, Csn2, Cas9, or non-Cas proteins such as integration host factor are essential for fidelity in certain subtypes. Type III systems with reverse transcriptase-Cas1 fusion proteins can acquire spacers from RNA (Silas et al., 2016). Details of adaptation have been recently reviewed (Sternberg et al., 2016;Mosterd et al., 2020). FIGURE 1 | Clustered Regularly Interspaced Short Palindromic Repeats and CRISPR-associated (CRISPR-Cas) classification, interference complexes, and alternate functions.
(A) Interference complexes, crRNA maturation, and nucleic acid targeting. Type I-E CASCADE from Escherichia coli is shown. The interference complex comprises six Cas7s, two Cas11s, and one each of Cas5, Cas8, and Cas6. This complex recruits Cas3, which uses a histidine aspartate (HD) nuclease domain to nick the target DNA, followed by unwinding of the DNA by helicase domain and long-range degradation of DNA by the HD domain. Type III is an RNA-targeting system with a historical distinction into two groups of effector complexes, "Csm" (Cas subtype Mtube, includes subtypes A, D, E and F) and "Cmr" (Cas module RAMP, includes subtypes B and C), differentiated based on the sequences of Cas11 subunit, called as Csm2 and Cmr5 respectively in Csm and Cmr complexes . Type III-B from Pyrococcus furiosus is shown. The interference complex comprises six Cas7s, coded by three distinct genes (cmr4, cmr1 and cmr6), three Cas11s, and one each of Cas5 and Cas10. While Cas7 cleaves target RNA with a 6-nucleotide periodicity, Cas10 cleaves ssDNA in the transcription bubble using its HD domain. The Palm domain of Cas10 produces cyclic oligo-adenylate (cOA) that activates Csx1 for non-specific RNA cleavage. Type IV is represented by subtype A1 from Aromatoleum aromaticum. The interference complex comprises Cas5, Cas6, Cas8, and multiple Cas7s, although the exact subunit stoichiometry is unknown. DinG is a helicase necessary for interference. Question marks and arrows in type IV-A1 indicate that subunit organization and nuclease identity is unknown. Cas6, the protein required for crRNA processing may or may not be part of the effector complex in Class I systems. Each class II system is represented by its respective subtype A. Type II-A Cas9 uses its HNH endonuclease domain to cleave the DNA strand hybridized to crRNA and RuvC endonuclease domain to cleave the other strand. RNaseIII is required for crRNA processing in type II systems. Type V-A Cas12a (previously called Cpf1) uses RuvC domain to cleave both strands of the target DNA, even though assistance of the Nuc domain is needed to cleave the strand hybridized to crRNA. Cas12a also possesses non-specific DNA cleavage, called trans-cleavage. Type VI-A Cas13a uses two higher eukaryotes and prokaryotes nucleotide-binding (HEPN) domains to elicit sequence-specific and collateral (non-specific) cleavage of target RNA. Type V-A and type VI systems possess specific sites within their effector protein for crRNA processing. Secondary trimming for crRNA maturation is not shown. Protein legacy names are given in parentheses. PAM stands for protospacer adjacent motif, PFS stands for protospacer flanking site, and 3' flank stands for 3' flanking sequence of the protospacer that is non-complementary to the 5'-tag of crRNA; each of which are used for self vs. non-self recognition. Nucleic acids are not to scale with proteins. Target or non-specifically cleaved RNA is in purple, crRNA is red, tracrRNA is in black and DNA is blue. (B) Left: Pseudomonas aeruginosa (type I-F) Cas3-mediated RNA and DNA targeting affect host immune evasion and biofilm formation respectively. Right: Comparison between Streptococcus agalactiae (type II-A) and Francisella novicida (type II-B) mechanisms of Cas9-mediated down-regulation of endogenous genes. The differences include the type of guide RNA (crRNA vs. scaRNA), nucleic acid target (mRNA vs. genomic DNA), mode of repression (transcript degradation vs. steric hindrance to RNA-polymerase), and endogenous gene target (master regulatory protein vs. regulon affecting virulence). Created with BioRender.com.
(2) crRNA Processing: Typically, the CRISPR array is transcribed into a long pre-crRNA, which associates with Cas proteins for further processing into mature crRNAs. Class 1 systems require Cas6 to process pre-crRNA into individual crRNA molecules ( Figure 1A, (Staals et al., 2013;Staals et al., 2014;Taylor et al., 2019)). Class 2 systems do not possess a signature Cas protein for crRNA processing, but rather use distinct mechanisms: type II systems depend on RNase III (Deltcheva et al., 2011;Charpentier et al., 2015), type V-A and type VI signature nucleases possess a distinct active site for crRNA processing, and certain type V subtypes depend on host nucleases (East-Seletsky et al., 2016;Fonfara et al., 2016;Liu et al., 2017). Certain type-II systems directly transcribe mature crRNAs using individual promoters within the CRISPR array (Zhang et al., 2013). (3) Interference: This stage involves the sequence-specific targeting and cleavage of foreign DNA and/or RNA. The classification section above details relevant proteins and cleavage types. Structural and mechanistic details of CRISPR interference have been recently reviewed (Murugan et al., 2017;Liu and Doudna, 2020). Interference involves Rloop formation as the crRNA guide-region hybridizes to target DNA or base-pairing between the crRNA guide region and target RNA. This is followed by cleavage/ degradation of the target.
CRISPR-Cas systems must distinguish between self and foreign DNA to avoid self-targeting. The characteristics that warrant intruder cleavage are (i) presence of PAM (protospacer adjacent motif), a DNA motif flanking the RNA-DNA complementary region in types I, II, and V; (ii) absence of RNA complementarity between the 5′-tag of crRNA and 3′ flank of the target RNA in type III; and (iii) presence of a protospacer flanking sequence (PFS), an RNA motif in the target RNA, in type VI (Gleditzsch et al., 2019).

PHYSIOLOGICAL ROLES OF CRISPR-CAS OBSERVED IN BACTERIA
Some bacteria regulate physiological processes using RNAguided Cas proteins that target the self-genome. While partial complementarity between the guide RNA and the self-genome or endogenous transcripts seems to drive gene regulation, complete complementarity with the genome has been shown to regulate prophage and temperate phage life-cycles and trigger bacterial evolution through recombination to avoid lethal self-targeting (Aklujkar and Lovley, 2010;Vercoe et al., 2013;Goldberg et al., 2014). We review here the roles of CRISPR systems in bacterial pathogenicity and survival.

Virulence
Host Immune Evasion CRISPR-Cas can regulate gene expression to conceal bacteria from host toll-like receptors (TLR). The best studied example is the Francisella novicida U112 type II-B system, which downregulates a bacterial lipoprotein (BLP) on the cell envelope to evade host immune responses Ratner et al., 2019). This mechanism requires the association of Cas9 with tracrRNA and small, CRISPR-Casassociated RNA (scaRNA), which specifically represses transcription of a regulon containing blp (FTN_1104-1101) ( Figure 1B). Transcriptomic studies have revealed similar trends where blps are upregulated in cas9 deletion mutants (Dcas9) of Streptococcus agalactiae GD201008-001 (type II-A) and Riemerella anatipestifer (type II-C) (Ma et al., 2018;Wang et al., 2019). In Streptococcus pyogenes GAS-M1T1-5448 (type II-A), Dcas9 produces less master regulator protein Mga, which in turn downregulates ScpA and SIC proteins, which are essential to inactivate the host complement immune defense (Gao et al., 2019). S. pyogenes Cas9 also downregulates the CovR/S two component system (TCS) which regulates capsule genes that confer antiphagocytic properties (Sarkar and Sumby, 2017;Gao et al., 2019). Involvement of a type I-F CRISPR-system in virulence was demonstrated in Pseudomonas aeruginosa UCBPP-PA14, where Cas3 degrades the mRNA of the master regulator protein LasR to evade the TLR-initiated host immune response ( Figure 1B, (Li et al., 2016)). The bacterial molecule regulated to avoid recognition is currently unknown.

Relation to Genome Repair and Involvement in Non-Virulent Gene Regulation and Programmed Cell Death
Current research also shows CRISPR-Cas systems regulating genes not directly involved in bacterial virulence. Experimental evidence has demonstrated Cas1's ability to cleave branched DNA substrate and revealed enhanced sensitivity of △cas1 E. coli to DNA damage (Babu et al., 2011). A recent study has shown that non-homologous end-joining (NHEJ) does not co-exist with type II-A systems due to the competition between Csn2 (type II-A) and NHEJ proteins for DNA substrates. Another DNA-repair system, RecBCD, is essential for spacer acquisition in E. coli (type I-E) (Levy et al., 2015). These studies suggest complex interdependence of CRISPR and DNA repair systems, the mechanisms of which are still elusive. Transcriptomic and proteomic studies of △cas9 S. pyogenes (type II-A) (Gao et al., 2019), Group B Streptococcus (type II-A) (Spencer et al., 2019), and Riemerella anatipestifer (type II-C)  have indicated both up-and down-regulation of several endogenous genes, the implications of which are not clear. Several studies in Myxococcus xanthus have shown that CRISPR systems are involved in cell-stress dependent sporulation (type I-C) and fruiting body development (type III-B) (Viswanathan et al., 2007;Wallace et al., 2014).
Type III and type VI CRISPR systems promote indiscriminate RNA cleavage leading to PCD, a strategy employed when immune protection has failed (Faure et al., 2019). In type III systems, crRNA binding to target RNA activates production of cyclic oligoadenylate (cOA) by Cas10. The cOA molecules initiate Csm6/Csx1's indiscriminate RNase activity resulting in cell death (Kazlauskiene et al., 2017;Niewoehner et al., 2017). In type VI, crRNA binding to target RNA activates Cas13 for promiscuous RNase activity (Abudayyeh et al., 2016). Interestingly, the resemblance of Cas2 to the toxin component of toxin-antitoxin systems and the genomic co-localization of CRISPR and PCD elements have led to the idea of their codependence in bacterial physiology, and are awaiting experimental conformation (reviewed in Faure et al., 2019).

Diversity of CRISPR-Cas MECHANISMS
Current studies indicate that CRISPR has functions beyond adaptive immunity, primarily in regulating genome content and gene expression. Horizontal Gene Transfer (HGT) can be negatively impacted in CRISPR-containing bacteria since the acquired DNA can be targeted by CRISPR. The effects of CRISPR on HGT are evolving. Some bacteria compensate limitations on HGT by maintaining a defective CRISPR locus or establishing mechanisms for CRISPR-tolerance Wimmer and Beisel, 2020). A different perspective on offsetting this disadvantage has been demonstrated by CRISPR-mediated enhancement of transduction by phages, which then enables transfer of genetic materials between bacteria (Watson et al., 2018).
Based on studies of streptococcal physiology, we speculate that CRISPR-mediated gene regulation promotes strain differentiation via genome remodeling and regulatory changes. CRISPR changes expression of S. pyogenes master regulators Mga and VicR and the TCS CovR/S, which then regulate immunomodulatory virulence factors that drive development of strains with different host tissue preferences and physiologies ranging from hypervirulent to carrier status (Vega et al., 2016;Sarkar and Sumby, 2017). Similarly, natural mutation in P. aeruginosa master regulator LasR increases bacterial fitness in the cystic fibrosis lung (Smith et al., 2006). A common theme in these diverse mechanisms is that Cas proteins target genes that bring about differential phenotypes in bacteria. Legionella pneumophila's requirement of Cas2 to infect amoebas indicates the involvement of other Cas proteins in endogenous gene regulation, requiring future studies to unravel more of such mechanisms.

CRISPR-CAS BIOTECHNOLOGY TOOLS
The RNA-guided nucleic acid targeting of Cas proteins offer several biotechnology tools dependent on the diverse CRISPR mechanisms ( Table 1).We are providing selected aspects of currently available applications (Table 1), with more details in Table S1 and several recent reviews (Pickar-Oliver and Gersbach, 2019; Xu and Qi, 2019).

Gene Editing
The revolutionary characteristic of CRISPR-Cas-based gene editing is the ability to introduce heritable genome modifications by programing Cas proteins using guide RNAs, instead of modifying the editing protein, which was the limiting aspect of zinc fingers and TALENS (Gaj et al., 2013).
The most widely used CRISPR-based gene editing system is S. pyogenes Cas9 (SpyCas9) with a single guide RNA (sgRNA) ( Table 1.A). The most prominent application is modification of eukaryotic genes utilizing efficient DNA-repair pathways of eukaryotes to repair Cas9 induced DNA breakage by NHEJ (creates gene knock-outs) or homology directed repair (HDR, creates gene knock-ins). An approach to fix genetic mutations while eliminating unwarranted effects from off-target DNA cleavage is fusing Cas9 with base editors (Table 1.B, (Komor et al., 2016)). Cas9-based gene editing in prokaryotes is limited due to inefficient DNA repair mechanisms; however, Cas9 cleavageinduced killing of unedited cells can be used to increase efficiency of other bacterial gene-editing methods (Wilson et al., 2003).
Cas12a and CASCADE gene editing systems complement those of Cas9 since their DNA cleavage mechanisms produce HDR-enhancing staggered ends and instill long-range deletions respectively (Dolan et al., 2019;Paul and Montoya, 2020).

Biomedical Tools and Future Prospects in Genome Medicine
The site-specific genetic modulation facilitated by Cas proteins has been monumental in quickly advancing gene therapy with several ongoing clinical trials for cancer immunotherapies, blood disorders, blindness etc. (Table S1.A.6, clinicaltrials.gov). To facilitate progress, SpyCas9 off-targeting concerns are being addressed using highly stringent Cas proteins, temporal control of protein expression, and gene editing reaction activators/ quenchers (Table S1.A, ). As new CRISPR systems with diverse mechanisms are being characterized, new biomedical applications will follow.

OUTLOOK
As described here, CRISPR-Cas is a two-in-one mechanism for protection against intruding nucleic acids as well as for regulating bacterial physiology, including pathogenicity. Following the current discovery trends, genomic analyses will keep unearthing new CRISPR-Cas systems, sometimes even rewriting the existing rules (e.g. discovery of first known Cas9 in nanoarchaea, (Burstein et al., 2017)). The mechanistic basis of CRISPR-based physiology is still evolving. Interestingly, the majority of spacer targets are yet to be identified, (Shmakov et al., 2017) and these target locations may hold the key to mechanisms of more CRISPR-mediated alternate functions.
Currently, Cas9, Cas1, Cas2, and Cas3 have been directly implicated in several physiological functions, suggesting potential comparable functions in other Cas proteins. So far, Cas9 has been the most divergent in function ranging from phage defense to gene regulation to directly acting as a virulence factor in C. jejuni (Faure et al., 2019). Future research is crucial in understanding the structure-function relations of these diverse Cas9 mechanisms. The recent finding that secretion of guide-free Cas9 by C. jejuni can inflict host cell DNA damage points to Cas9 mechanisms that are independent of guide RNA Saha et al., 2020b).
The arms race between bacteria and viruses has delivered an arsenal of anti-CRISPR proteins (Marino et al., 2020). In an interesting twist, the use of CasF by huge phages showed adaptation of CRISPR-Cas mechanisms to ward off virophages (Pausch et al., 2020). The fitness cost benefits of an active CRISPR system and the maintenance of degenerated CRISPR-Cas systems in several bacteria provide interesting future research avenues. CRISPR-Cas systems have proven unique because of their repurposing into powerful diagnostic, therapeutical and experimental tools. While there are several aspects such as off-target effects and promiscuous DNA damage that need to be fixed by future research (Haapaniemi et al., 2018;Wang et al., 2020), the days of personal precision genome medicine are a step closer with this powerful technology.

AUTHOR CONTRIBUTIONS
SN, HPP, LM, and RR wrote, reviewed, and edited the manuscript. All authors contributed to the article and approved the submitted version.

ACKNOWLEDGMENTS
We thank Dr. Kesavan Babu for critical discussions during the preparation of this review. We acknowledge the financial support provided from the Office of the Vice President for Research and Partnerships and the Office of the Provost, University of Oklahoma towards the publication cost.