Occurrence and Diversity of CRISPR Loci in Lactobacillus casei Group.

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) is an adaptive immune system that resists foreign genes through nuclease targeting in bacteria and archaea. In this study, we analyzed 68 strains of Lactobacillus casei group from the NCBI GenBank database, and bioinformatic tools were used to investigate the occurrence and diversity of CRISPR system. The results showed that a total of 30 CRISPR loci were identified from 27 strains. Apart from three strains which contained double loci with distinguishable distributed sites, most strains contained only one CRISPR locus. The analysis of direct repeat (DR) sequences showed that all DR could form stable RNA secondary structures. The CRISPR spacers showed diversity, and their origin and evolution were revealed through the investigation of their spacer sequences. In addition, a large number of CRISPR spacers showed perfect homologies to phage and plasmid sequences. Collectively, our results would contribute to researches of resistance in L. casei group, and also provide a new vision on the diversity and evolution of CRISPR/Cas system.


INTRODUCTION
Lactic acid bacteria are recognized as food safety grade microorganisms (Saad et al., 2013). They contribute to improve food nutrition and ameliorate food flavor. At the same time, they have various probiotic functions such as the regulation of intestinal flora (Corinna et al., 2016) as well as the improvement of immunity (Akoglu et al., 2015). L. casei group, a type of lactic acid bacteria, can transit strong acid environments in the stomach and be colonized in the intestinal mucosa, thus play a major role in the prevention and treatment of gastrointestinal diseases. L. casei group have been widely commercialized. Several related products occupy a huge market share, and they are favored and affirmed by consumers. They have also expanded from the initial field of conventional yogurt to health, medicine, vaccines and several other fields. However, phage contamination is still a very serious problem for the industry of lactic acid bacteria (Garneau and Moineau, 2011). Phages can lyse bacteria to influence their death, decrease viable counts, cause slow fermentation and even production failure. These detriments consequently result in the decline of their acid production, flavor and taste. Since phages can resist pasteurization, their occurrence is difficult to completely eliminate. They are capable of rapid spread and even destruction of an entire production chain, leading to huge economic losses. Thus, anti-phage ability of lactic acid bacteria is a potential problem that needs to be explored in order to solve the problem of their development for useful applications.
Bacteria had evolved a variety of strategies against phages, which includes CRISPR/Cas system (Rodolphe et al., 2007;Garneau et al., 2010). These strategies prevent infection by cutting and integrating genetic elements of foreign invaders. Bacteria are also immune to other foreign invaders with homologous sequences (John et al., 2009;Makarova et al., 2011;Stern et al., 2012). CRISPR/Cas systems are usually clustered together with short palindromic repeats (CRISPR) and CRISPR-related (Cas) genes (Alexander et al., 2005;Rodolphe et al., 2007). CRISPR arrays contain RNA coding sequences that target foreign elements. Cas proteins function as nuclease and helicase, with the ability to unravel and cut DNA double strands in order to cause double-strand breaks in certain cases (Poorna et al., 2007;Tautvydas et al., 2013). In the past decade, CRISPR systems had been discovered to include 2 classes, 6 types, and more than 20 subtypes (Makarova et al., 2017a,b). Subsequently, researchers had developed them as gene-editing tools (Martin et al., 2012;Burgess, 2013;Wenyan et al., 2013). Currently, CRISPR-Cas9 gene-editing technology is widely used in animals and plants. CRISPR/Cas systems also have enormous potential applications as additions in genetic editing. Recently, the approach of strain-typing based on CRISPR system had been widely used in microorganisms such as Pseudomonas aeruginosa (Alex et al., 2015), salmonella enterica (Pettengill et al., 2014), and Helicobacter cinaedi (Tomida et al., 2017). Among lactic acid bacteria, it is usually difficult to type closely related strains based on the 16s rRNA sequences. However, this problem could be solved using CRISPR genotyping. Cas proteins capture fragments from foreign genetic elements and insert them into CRISPR array after processing to create a new repeat-spacer unit. The events of these insertions usually occur at the leader end of CRISPR loci, and the earlier acquired sequence, which also called ancestral spacers, is at the other end. As the evolution progresses, the newly acquired spacers are arranged in turn, then the information of the spacers' position can form an evolution timeline. Bacteria rarely share exactly same CRISPR system. Thus, more useful background information is provided with CRISPR genotyping than other methods of genotyping.
Bioinformatic analysis of CRISPR system in probiotics is crucial for the assessment of potential evolution in the prediction of immunity, it is also of great importance to food industry and other applications. Our study would contribute to providing useful information about the molecular mechanism of L. casei group against phage infection. In addition, it would lay a foundation for subsequent screening and breeding of commercial anti-phage-infected Lactobacillus.

CRISPR/Cas System Identification
The 68 L. casei group genomes obtained from the NCBI GenBank database 1 were used to characterize CRISPR/Cas systems. Subsequently, the CRISPR loci of these genomes were 1 https://www.ncbi.nlm.nih.gov/genbank/ identified by CRISPR-Cas++ webserver 2 , and the output option was set to default. In general, CRISPR-Cas++ webserver was used for the detection of "Questionable" and "Confirmed" CRISPR loci, which involved only little repeat-spacer sequence and carried additional Cas gene, respectively. Only "Confirmed" CRISPR loci were further researched for the diversity of CRISPR system.

Repeat Structure Predict and Spacers Analyses
Similar repeat sequences were clustered and aligned by MEGA7.0 and DNAMAN 6.0 software (MEGA Inc., Memphis, TN, United States). RNA secondary structures were predicted by the RNA fold Web server 3 , using minimum free energy (MFE) with default parameters. The visual representation of CRISPR spacers were performed using CRISPRVIZ 4 , and each unique color combination represented one distinct spacer sequence.

Protospacer Target
Protospacers were publicly identified against plasmid and phage genomic sequences using CRISPR target web server 5 . The protospacer with the most effective match was considered from the comparison between two sequences which showed below 3 mis-matches across the whole length of the spacer sequences. The spacer matches were then analyzed for hierarchical clustering using the Pheatmap package in R3.6.0 software.

Identification of CRISPR-Cas Systems in Lactobacillus casei Group
A total of 68 L. casei group strains from NCBI GenBank Database were analyzed for the occurrence of CRISPR systems and included 11 Lactobacillus casei and 57 Lactobacillus paracasei ( Table 1). Based on the CRISPR systems search results, a total of 30 confirmed CRISPRs which included Cas genes were identified among two investigated subspecies. The number of strains with confirmed CRISPR systems accounted for 39% of L. casei group and was close to the occurrence rate estimated at 40% for bacteria. CRISPR/Cas system subtypes were confirmed by Cas genes species flanked by CRISPR arrays (Figures 1A,B). Among the strains with CRISPR loci, 24 L. casei group strains contained type II-A CRISPR locus, while 5 strains of the group contained type I-E CRISPR locus, only a single strain contained type I-C CRISPR locus. Interestingly, the type I-C locus had separate CRISPR arrays on both sides of Cas genes with similar repeat sequences, it was considered as the same. These confirmed CRISPR loci in the 30 strains were subsequently analyzed.
Repeat sequences were conserved in each CRISPR/Cas subtype. The length of the repeat sequences was 36 nucleotides for subtype II-A, 28 nucleotides for subtype I-E, and 32 nucleotides for I-C ( Figure 1C). However, for subtype II-A

Diversity of Repeat and Spacer Sequences
Based on the results of the alignment of repeat sequences, 30 CRISPR loci in the 68 strains of L. casei group strains were divided into five groups, which included one group for subtype I-C, two groups for subtype I-E, and two groups for subtype II-A. The repeat sequences were conserved in the same subtype (Figure 2). The subtype II-A DR1 was the commonest sequence in 23 strains. In addition, the predicting results of DR structure showed that the RNA secondary structures formed stems in the middle. According to predictions, subtype I-C DR sequence included 7 bp stem length (Figure 3A), 7 and 10 bp for subtype I-E DR sequences (Figures 3B,C), 6 bp for subtype II-A DR sequence (Figures 3D,E).
The conservation and stability of the secondary structures of DR sequence could be analyzed from the diagram of the structure and MFE value. According to the algorithm of system optimization, the red graphic represented a high probability of formation, while the green graphic indicated that the structure had relatively low possibility. Overall, the stable RNA secondary structures tended to form long stems. In addition, the stability of the secondary structures could also be affected by the GC content. Repeats with higher GC content were more stable at the same stem length. In all groups, the secondary structures of DR in Subtype II system had mini MFE values and the formation of the shortest stem. However, the secondary structures of DR in Subtype I system had larger MFE comparable to those in Subtype II system although with similar stem lengths. This indicated that the GC content and the mismatched base numbers in the stem were in accordance with the stability of the secondary structure of RNA.
The CRISPR spacers were analyzed to clarify the similarity and divergence of strains evolution under selective pressure from invasive DNA. As shown in Figures 4, 5, the strains could be grouped through the composition and distribution of spacers. The spacers were arranged from the ancestral (right) end toward the most newly acquired end (left), and each color combination represented a unique spacer sequence based on the nucleotide sequences. The strains with similar spacers were considered as one group, due to the fact that they could likely be initially exposed to the same environment. According to the spacer alignment, 9 type-II CRISPR genotypes were found which included 19 unique patterns (Figure 4). In addition, five different type-I CRISPR genotypes were found and included seven unique patterns (Figure 5). While BL23, LC2W, 7112-2, LcA, BD-II, and LcY completely shared identical spacers, other strains in the same group had different later acquired spacers. It is well known that same ancestral spacer may indicate a common origin, thus the acquisition of subsequent spacers could reflect different evolution. As a consequence, bacteria could be classified according to spacers.

Spacers Homology to Phage and Plasmid
The investigation of the similarity between CRISPR spacers and foreign DNA elements could be conducive for the unfold of immune information of strains, extraction of records of threats challenges encountered, as well as the rout of invasive DNA. Among the 27 L. casei group strains which harbored CRISPR/Cas systems, nine strains harbored at least one spacer targeting phages, while 18 strains displayed at least one spacer targeting plasmids (Figure 6). Interestingly, the L. casei group strains with type-I CRISPR systems presented more spacer targeting foreign DNA. The CRISPR/Cas system of L. casei TMV1.1434 harbored up to 11 spacers to target plasmids from L. casei. The most frequently match events happened with the plasmid sequences from L. plantarum, which was consistent with target of total plasmid. The reason could be as a result of the much largest number of sequenced plasmids from L. plantarum in the database. Moreover, 9 L. casei group strains presented spacers targeting phage, the spacers of L. casei TMV1.1434 could match the maximum number of phages sequence, while half of L. casei CECT9104 spacers could match the phages. Regarding the diversity of species of matched spacers, L. paracasei LC335, L. paracasei 525 LPAR, and L. casei TMV1.1434 targeted up to 44, 36, and 24 different species of phages respectively.
PL-1 was the most frequently targeted phage. Also at the top of targeting were L. casei phage A2, phiAT3 and J-1. Notably, we analyzed the gene-targeted characteristics in phage PL-1, and some spacers shared a region of homology with the gene encoding tail component which played vital roles in phage replication. Similarly, multiple spacers matched the gene regions encoding major capsid protein or DNA packaging machinery. Thus, the immunity of CRISPR/Cas system could prevent phage replication via the destruction of these critical components, and enhance the viability of bacteria in phage-rich environment.

DISCUSSION
CRISPR system could resist foreign phages and plasmids through the mechanism of target interference by specific protein and guide RNA, thereby endue the bacteria strong adaptability to fight against complex environments (Makarova et al., 2013). Studies have found that about 40% of bacterial genomes contained CRISPR locus (Lillestøl et al., 2006), most CRISPR loci were located on their chromosomes and rarely on plasmids. The main reason for this phenomenon is that it will damage bacterial immunity hereditary if CRISPR-containing plasmids were lost.  Identical spacers are shown as squares representing different color combination, Gray Squares containing an "X" represent no spacers. Strains are listed by CRISPR genotype, CRISPR array pattern, strain name. The newly acquired spacer is represented on the left side while the earliest acquired spacer is on the right side.
In the present study, we analyzed 68 strains of L. casei group, among which an extensive diversity was shown by CRISPR/Cas systems. Their different subtypes were harbored and included subtype IC, IE and IIA. It was obvious among different species there were diverse characteristics of CRISPR loci, thus provided a novel method of bacterial genotyping.
In general, direct repeat sequences of different CRISPR loci may exist differences. But they were conserved in same subtypes. Due to the presence of direct repeats of short palindromic sequences, double-stranded RNA secondary structures can be transcripted from the CRISPR array and combine with Cas proteins to target sites (Alexander et al., 2005;FIGURE 5 | CRISPR subtype I spacers comparison in L. casei group. each unique spacer sequence is showed as a unique color combination. Gray Squares containing an "X" represent no spacers. Spacer is displayed from the ancestral end (right) toward the recently acquired spacers (left) in order.  Tautvydas et al., 2013). The stem-loop structure of direct repeats may contribute to the interaction between RNA and Cas protein.
As a consequence, the function of CRISPR loci may be affected by the stability of RNA secondary structures. Interestingly, in our study, the secondary structures of same subtypes were conserved, with similarities in their structural composition and free energy, irrespective of their differences in repeat sequences. It suggested that the repeat sequences were diverse in the process of evolution, although the function could be conserved. In addition, according to MFE value theory, longer match base numbers and higher GC contents in stem tended to form stable secondary structures, and secondary structures of RNA with lower minimum free energies were more stable. In understanding the evolution of bacteria, the genotyping analysis was crucial although excessive data analysis and high cost of sequencing had mainly hindered its widespread use. Another genotyping method commonly used is multilocus sequence typing (MLST), which is based on the nucleotide sequences of seven housekeeping genes. Tomida et al. (2017) determined a genotyping method based on CRISPR spacer and compared it with the methods of MLST genotyping using 42 H. cinaedi strains, the results showed MLST had little variability while the CRISPR spacer sequences showed remarkable diversity (Tomida et al., 2017). Morovic et al. (2016) showed that 42% of the commercial dietary supplements contained incorrectly labeled microorganism regarding taxonomy. Lewis et al. (2016) showed that 15 of 16 commercial probiotics in this study products present bacterial compositions that differed from the list of ingredients. Thus, the accessorial genotyping and correct identification methods were seriously needed as additions to traditional tools. The CRISPR/Cas systems had been used for the identification of various pathogens. However, rare reports are available about the application of genotyping to probiotics via CRISPR systems. Riedel et al. analyzed CRISPR systems and genotyped strains via spacer sequence in Bifidobacterium (Riedel et al., 2015;Hidalgo-Cantabrana et al., 2017), thereby created an awareness about the potential of the CRISPR system in probiotic genotyping. In this study, we considered CRISPR spacers as genotyping tools in L. casei group, in order to distinguish closely related strains. Different strains were specifically distinct, the later acquired spacers were diversified despite that they shared the same ancestral spacer. Only a few strains shared exact same spacers. Similar to other methods, the CRISPR/Cas genotyping method was largely limited and could be attributed to the absence of CRISPR systems in some strains. Therefore, the combination of multiple genotyping methods could be a developmental trend in the future. CRISPR spacers represent the immunity records of strains suffering from invasive DNA. The results of our study showed that less than half of spacers could match phages or plasmids. Only CECT9104 reached half of the spacers, while most of the other strains had only one or none. The limited number of spacer matches could be attributed to the presence of substantial plasmids or phages that had not been sequenced, or constantly evolve lead to escape mechanism. Strains obtained an evolutionary advantage from CRISPR-Cas systems by recording immune information, thereby prevented DNA invasion again. The L. casei group harboring CRISPR/Cas immune systems would be suitable as industrial probiotics against viral challenges. They could also have the potential to fight abundant phages in the gut.

CONCLUSION
In conclusion, our findings confirmed that L. casei group strains harbored diverse CRISPR/Cas systems. Furthermore, the results of the bioinformatic analysis could provide a data basis for broader CRISPR studies in L. casei group. The polymorphism of CRISPR system showed its potential for the genotyping of strains as well as the immunity of strains against invasive DNA. The CRISPR/Cas system analysis of L. casei group provided new insights into the diverse roles of CRISPR/Cas system in probiotic in this study.