The Diversity of the CRISPR-Cas System and Prophages Present in the Genome Reveals the Co-evolution of Bifidobacterium pseudocatenulatum and Phages

Diverse CRISPR-Cas systems constitute an indispensable part of the bacterial adaptive immune system against viral infections. However, to escape from this immune system, bacteriophages have also evolved corresponding anti-defense measures. We investigated the diversity of CRISPR-Cas systems and the presence of prophages in the genomes of 66 Bifidobacterium pseudocatenulatum strains. Our findings revealed a high occurrence of complete CRISPR-Cas systems (62%, 41/66) in the B. pseudocatenulatum genomes. Subtypes I-C, I-U and II-A, were found to be widespread in this species. No significant association was found between the number of bacterial CRISPR spacers and its host’s age. This study on prophages within B. pseudocatenulatum genomes revealed that prophage genes related to distinct functional modules became degraded at different levels, indicating that these prophages were not likely to enter lytic cycle spontaneously. Further, the evolutionary analysis of prophages in this study revealed that they might be derived from different phage ancestors. Notably, self-targeting phenomenon within B. pseudocatenulatum and Anti-CRISPR (Acr) coding genes in prophages was observed. Overall, our results indicate that the competition between B. pseudocatenulatum and phages is a major driving factor for the genomic diversity of both partners.


INTRODUCTION
Bifidobacteria are one of the earliest colonizers of the human gut and the predominant microbial group in infants and healthy adults (Arboleya et al., 2016). The abundance of bifidobacteria in the gut is often considered as an indicator of the human health status and has been proven to be correlated with various intestinal and immunological disorders, such as inflammatory bowel disease, irritable bowel syndrome, obesity and diabetes (Turroni et al., 2014). As an important member of the Bifidobacterium genus, Bifidobacterium pseudocatenulatum is commonly found in the fecal samples of human across all ages (Turroni et al., 2012) and is especially abundant in breastfeeding infants (Gore et al., 2008). Compared with other Bifidobacterium species, B. pseudocatenulatum has been shown to be significantly associated with metabolic diseases in both animal experiments (Agusti et al., 2018) and clinical trials (Wu et al., 2017;Zhao et al., 2018). In addition, some B. pseudocatenulatum strains have been noted for their beneficial properties, such as the production of enterolignan, urolithin and conjugated linoleic acid (Yang et al., 2017;Gaya et al., 2018;Peiroten et al., 2019). Therefore, B. pseudocatenulatum is considered as the next-generation probiotic species for its potential beneficial effects.
One of the main challenges for probiotics is to overcome the harsh conditions in the gastrointestinal tract. The human gut is a natural reservoir of bacteriophages and it is expected that > 10 12 phage particles reside in the human gut (Shkoporov and Hill, 2019). Even though temperate phages are widespread (Kim and Bae, 2018), the presence of phage particles still provide a challenge for the survival of probiotics in the intestine. A major strategy for bacteria to resist bacteriophage infection is via an immune mechanism known as Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR), together with CRISPR-associated Cas enzymes (Rodolphe et al., 2007). As a heritable adaptive immune system in bacteria and archaea, the CRISPR-Cas system selects foreign nucleic acids and integrates it into the CRISPR array in the form of spacer sequences to provide a memory of infection. Upon reinfection, CRISPR-Cas comes into action, deploying RNAguided nucleases for silencing specific sequences of the foreign genetic materials. Cas proteins encoded by cas genes adjacent to the CRISPR array are necessary for the three phases of CRISPR-Cas immunity: adaptation, CRISPR RNA (crRNA) biogenesis and interference. During adaptation, foreign nucleic acids are captured, processed and then integrated into the CRISPR array. For retrieving the memory, this CRIPSR-spacer array is transcribed to generate a precursor crRNA (pre-crRNA) that is further processed to generate mature crRNAs. Upon subsequent infection, the interference machinery, is guided by mature crRNAs to identify the foreign invader and cleave its nucleic acid sequences, thus protecting the bacteria from infection (Hille et al., 2018).
Due to the antagonistic coevolution between bacteria and bacteriophages over billions of years, bacteriophages evolved an alternative form of infection, namely lysogeny. Under such circumstances, prophage refers to the temperate phage genome that is integrated into the host bacterial chromosome, replicating with its host without producing virion progeny (Howard-Varona et al., 2017). Comparative genomic analyses in early studies have shown that more than 50% of bacteria possess prophages (Canchaya et al., 2003) whilst recent study showed that the prevalence of prophages within murine gut microbiota is much higher (Kim and Bae, 2018). However, prophages can be activated under certain conditions, such as UV light (Guo et al., 2016) or chemicals (Goerke et al., 2006;Mavrich et al., 2018). A recent study also showed that fructose and short-chain fatty acids could promote prophage induction in Lactobacillus reuteri (Oh et al., 2019a), suggesting the effect of sugar metabolism on phage production in human gastrointestinal tract. Meanwhile, bacterial hosts can acquire some novel functions and become more competitive in the community. For example, a prophage in Escherichia coli increases its host's resistance to antibiotics and oxidative stress (Wang et al., 2010); prophages within Enterococcus faecalis V583 encode platelet-binding-like proteins which were found to function in adhesion to human platelets (Matos et al., 2013); prophages in L. reuteri can be induced and released to kill competitor strains so as to be advantageous in intestinal niche (Oh et al., 2019b).
In this study, to investigate the competition between B.
pseudocatenulatum and its bacteriophages, 66 B. pseudocatenulatum strains newly isolated from human and animal feces were subjected to de novo sequencing. The obtained genomic datasets were used to explore the diversity of CRISPR-Cas systems and the presence of prophages within the bacterial genomes. Our findings demonstrated the coevolution of both the host and the temperate phage, which may provide insights for classification of this species as a probiotics.

Isolation of B. pseudocatenulatum Strains
Fecal samples collected from human and animal in China were immersed in 60% glycerol and stored at −80 • C. One volume of this feces-glycerol complex was then mixed with nine volumes of phosphate-buffered saline (pH = 6.5) and diluted serially. The fecal dilutions were plated on De Man, Rogosa and Sharpe (MRS) agar supplemented with 0.05% (wt/col) L-cysteine hydrochloride and 50 µg/ml mupirocin and incubated in an anaerobic chamber at 37 • C for 48 h. Colonies with different morphology were selected for second plating and then sub-cultivated in MRS broth. The bacterial 16S rDNA fragment was amplified using 16S rDNA common primers to identify the species. Finally, 66 strains of B. pseudocatenulatum were detected and selected for further analyses (Supplementary Table S1).

Genome Sequencing and Assemblies
The draft genomes of the 66 strains were sequenced at Majorbio Bio-pharm Technology Co. Ltd. (Shanghai, China) using Illumina Hiseq × 10 platform to achieve a sequencing coverage of at least 100-fold. Raw data were assembled with the SOAPdenovo2 software using K-mer size trials to obtain the best result. Partial assembly and optimization were performed according to the paired-end and overlap relationships to form scaffolds. The length of the average scaffold N50 is 321,717 bp, accounting for 14% total genome length. The genome sequences were submitted to NCBI database and the accession numbers can be seen in Supplementary Table S1.

CRISPR-Cas Detection and Identification
The genomes of the 66 B. pseudocatenulatum strains were then subjected to CRISPR detection using the Mince software (Bland et al., 2007) and cas gene detection using the CRISPR-Cas++ website 1 , which was also used for subtype prediction (Couvin et al., 2018). Subsequently, manual curation was performed to check whether the strains had the important cas genes that play a key role in the bacterial immune function, e.g., the spacer integrase genes involved in the adaptation period; the Cas1, Cas2 (Nunez et al., 2014), and Cas3 protein genes involved in the cleavage of targeted sequence; and the Cas9 protein gene for the type II CRISPR system (Hille et al., 2018).

Prophage Identification
The prophages within the B. pseudocatenulatum genomes were identified as follows: (1) The prophages were first screened by the Prophage Hunter server (Song et al., 2019). (2) The prophages in the active status with scores greater than 0.8 were selected for the next step. (3) The selected prophages were extracted, and their open reading frames (ORFs) were predicted using the Glimmer and GeneMarkS software and annotated using BLASTP v2.2.28 analysis (e-value cut-off of 1e −5 ) against several reference databases [NR, Swiss-Prot, String and Cluster of Orthologous Group (COG) databases]. (4) Sequence alignment was performed between the prophages and spacer sequences in the CRISPR array. The inclusion criteria for the prophages for further analyses were as follows: prophages should have in silico predicted scores (using the Prophage Hunter server) greater than 0.8; genome sequence length longer than 10 kb; at least 20 ORFs (Lugli et al., 2016) and one phage-associated important gene; and complete alignment with at least one spacer sequence.

High Occurrence of CRISPR-Cas Systems in B. pseudocatenulatum
The occurrence of CRISPR-Cas systems in B. pseudocatenulatum was found to be approximately 62% (41/66, Table 1). All of the 41 strains with complete CRISPR-Cas systems encoded only one cas1 gene. Overall, 28 type I-C systems, 8 type I-U systems, 1 type I-E system and 4 type II-A systems were identified. Cas1 protein is the most conserved Cas protein and can be found in almost all CRISPR-Cas systems. The reliability of subtype classification was further confirmed by phylogenetic tree construction based on the amino acid sequences of Cas1 protein ( Figure 1A). The nuclease Cas3 is the hallmark protein of type I systems (Hille et al., 2018). 1 https://crisprcas.i2bc.paris-saclay.fr/ The subtypes within type I systems could be well distinguished in the phylogenetic tree constructed based on Cas3 amino acid sequences ( Figure 1B).
Type I CRISPR-Cas systems were relatively common in B. pseudocatenulatum. Among the subtypes of type I systems, subtype I-C showed the highest prevalence, with a coverage of 42.4%, which is much higher than its prevalence in Bifidobacterium longum and across the Bifidobacterium genus (13.6 and 23.0%, respectively) (Figure 2). Subtype I-U was found to be the second most prevalent CRIPSR-Cas system in B. pseudocatenulatum, although its prevalence has been reported to be only at 1.5% in B. longum (Hidalgo-Cantabrana et al., 2017). Type II systems were less common in B. pseudocatenulatum. Subtype II-C was absent in this species, whereas it is frequently found in B. longum. In contrast, subtype II-A was the only subtype present in B. pseudocatenulatum, which is absent in B. longum strains (Figure 2).

CRISPR Loci Characterization in B. pseudocatenulatum
The same CRISPR-Cas subtypes showed similar cas gene arrangement, size and direction across the B. pseudocatenulatum strains, but the length of the CRISPR arrays varied. The representative strains for each subtype were selected to map their CRISPR-Cas systems ( Figure 3A). Compared with other reported Bifidobacterium CRISPR-Cas locus architectures (Briner et al., 2015;Hidalgo-Cantabrana et al., 2017), the cas gene composition and arrangement of subtypes I-C and II-A were found to be more stable in B. pseudocatenulatum. Subtype I-U in this species showed csb1 and csb2 genes adjacent to each other, whereas that in B. longum subsp. longum 17-1B has been shown to possess csb2 and csb3 genes separated from each other (Hidalgo-Cantabrana et al., 2017). In contrast to subtype I-E in B. longum, that in B. pseudocatenulatum in our study showed an additional element, i.e., the cse1 gene.
Notably, some incomplete CRISPR-Cas systems have been found. They contained the subtype-specific signature Cas proteins but lacked the Cas proteins essential for immune function ( Figure 3B). Strains with incomplete CRISPR-Cas systems of subtype I-C and I-U generally lack the effector complex essential for crRNA maturation (Makarova et al., 2013;Hille et al., 2018). B. pseudocatenulatum FGSYC91M2 strain is in lack of Cas2 protein which cooperate with Cas1 protein to capture foreign genetic material, leading to its disability to update enemy blacklist (Wang et al., 2015). B. pseudocatenulatum FXJKS15M4 strain possessed neither Cas2 protein nor Cas3 protein, indicating that it cannot recognize infective virus and protect itself. Although these CRISPR-Cas systems were defective, they could be perfectly classified in the phylogenetic trees based on Cas1 and Cas3 amino acid sequences.
The size of CRISPR-Cas systems of each subtype is related to its cas gene composition and the number of repeats. Type I systems are generally larger than type II systems because type I systems use multiple Cas protein complexes for interference, whereas type II systems use only Cas9 protein. The sequence length of type I systems, including subtypes I-C, I-U, and I-E, Frontiers in Microbiology | www.frontiersin.org ranges from 9 to 20 kb. Type II-A systems contain fewer cas genes and repeats than type I systems, so their sequence length is relatively short (6-9 kb). The location and size of each CRISPR-Cas system are provided in Supplementary Table S2.
The number of repeats in each subtype was found to be different between B. pseudocatenulatum strains (Supplementary Figure S1). Subtype I-C presented high variability in the number of repeats, from 2 repeats in the FGSYC76M7 strain to 116 repeats in the FQHXN83M4 strain. The distribution of repeats in subtype I-U was relatively stable at approximately 40 repeats across all strains. Subtype II-A contained the lowest number of repeats, ranging from 13 to 47. The unique I-E subtype, which was present in only the FZJHZ1M1 strain, showed the highest number of repeats at 168.
Pervasive spacer deletion coupled with spacer acquisition have been observed in natural as well as laboratorial conditions Lopez-Sanchez et al., 2012). Such changes in CRIPSR arrays may be explained by environmental selection pressure that drives bacteria to delete the less valuable spacers whilst acquire the more valuable spacers (Horvath et al., 2008). A previous study of B. longum found that strains from infant feces possessed lower number of spacers whilst strains from adult feces contained a high number of spacers (Hidalgo-Cantabrana et al., 2017). From this perspective, we speculated that the number of spacers in complete CRISPR-Cas systems (equivalent to the number of repeats) is related to the duration of existence of the strain in the human intestine, i.e., the longer a strain persisted in the human intestine, the more repeats its CRISPR loci contained because of more saved foreign gene fragments. Therefore, we performed a scatter plot to explore the possible correlation between the number of repeats in each B. pseudocatenulatum strain and its host's age. Unexpectedly, we found no correlation between the two variables (Data not shown).
The repeat sequences within each CRISPR-Cas subtype in B. pseudocatenulatum were found to be conserved. The length of the repeat sequence was 33 nucleotides in subtype I-C, 29 in subtype I-E and 36 in subtypes I-U and II-A. Notably, the repeat sequences within most subtypes were identical in nucleotide arrangement, except those within subtype I-C, which displayed five nucleotide polymorphisms ( Table 1).

Association Between CRISPR-Cas Systems and Prophages
Spacers in the CRISPR-Cas loci originate from foreign invaders and bacteriophages are the most common threats for bacteria. If a bacterial strain has ever been invaded by a phage, the spacer sequences of the strain may contain a fragment corresponding to the phage genome. Based on this knowledge, we attempted to identify the prophages present in B. pseudocatenulatum strains to determine the interaction between this Bifidobacterium species and its prophages. In total, 3652 spacer sequences were extracted from the genomes of 41 strains with complete CRISPR-Cas systems. The sequence length ranged from 29 to 42 bp, with 2383 unique base sequences. To further investigate the origins of these foreign DNA fragments, we performed a BLAST search of the extracted spacer sequences against the NCBI virus Refseq database (updated on 2020.3.3). Only Bifidobacterium phage PMBT6 was targeted by B. pseudocatenulatum spacers, with a total of 10 matches from 5 unique spacers, belonging to 10 different strains, indicating the lack of studies on bifidophages.
Through a series of screening methods, we investigated 59 prophages in 35 B. pseudocatenulatum strains ( Table 2). To analyze the association between CRISPR-Cas systems and lysogeny, the strains containing either prophages or CRISPR-Cas systems were evaluated. The results showed that the presence of CRISPR arrays or prophages was not associated with the number of prophage fragments or CRISPR spacers, respectively ( Figure 4A). Approximately 15.6% of the bacterial spacer sequences (371/2383, Figure 4B) were completely mapped to the prophages present in previously reported bifidobacterial strains (Ventura et al., 2010;Hidalgo-Cantabrana et al., 2017) or identified in this study. The more spacers a strain containing a CRISPR-Cas system harbored, the greater the number of prophages it would match (Figure 4C), suggesting that such strains possessed a strong immunity. Importantly, in the strains containing both CRISPR-Cas systems and prophages, the number of spacers was found to be irrelevant to whether there was a spacer targeting its own prophage.
In order to explore the homology between spacers in B. pseudocatenulatum and prophages identified in other Bifidobacterium strains, a BLAST search of spacers against the prophages in 76 genomes reported before were performed (Lugli et al., 2016;Hidalgo-Cantabrana et al., 2017). Spacers presenting in the 41 complete CRISPR-Cas systems within B. pseudocatenulatum showed homology to prophages in other 12 bifidobacterial strains (Figure 5A), indicating B. pseudocatenulatum strains acquired immunity against temperate phages which could infect other Bifidobacterium species, including Bifidobacterium boum, Bifidobacterium adolescentis, Bifidobacterium ruminantium, Bifidobacterium breve, Bifidobacterium bifidum, Bifidobacterium merycicum, and B. longum.
Among the 41 B. pseudocatenulatum strains harboring complete CRISPR-Cas systems, 27 strains presented spacers targeting prophages in other Bifidobacterium spp. genomes ( Figure 5A). The B. pseudocatenulatum strains FGSZY20M1 and FSDWF3M4 contained a higher number of spacers matching the prophages in the genomes of other Bifidobacterium spp., such as B. boum, B. adolescentis, which were most frequently targeted by CRISPR spacers of B. pseudocatenulatum ( Figure 5A). Besides, B. pseudocatenulatum strains FHuNMY37M1, FZJHZ1M1, and FGSYC18M1 presented the highest number of spacers targeting prophages detected in this study while FFJND17M1 presented the least (Figure 5B).
Among the prophages detected in B. pseudocatenulatum species, Bpseuc_3 and Bpseuc_6 prophages were detected in the genomes of A13 and FAHBZ9L5, corresponding to spacer sequences in 38 and 34 strains, respectively ( Figure 5B). In addition, prophages of Bpseuc_26, Bpseuc_35 and Bpseuc_48 within FJLHD2M3, FQHXN6M4, and FYNLJ23M6, presented the most uncommon prophage within B. pseudocatunulatum because the spacers matching these prophages can only be found in few strains ( Figure 5B). Notably, only B. pseudocatenulatum FXJWS24M3 strain displayed a spacer sequence targeting its own prophage Bpseuc_44 (Figure 5B), indicating a potential to prevent prophage induction and lysis as mentioned in previous studies (Edgar and Qimron, 2010;Goldberg et al., 2014).

Prophages Within B. pseudocatenulatum Generally Lack the Genes Essential for the Phage Life Cycle
To determine the contribution of prophages to the B. pseudocatenulatum genomes, we annotated all genes of both prophages and the bacterial hosts' genomes using NR and COG databases. In total, 22,944 genes were detected in 35 B. pseudocatenulatum strains, 508 of which were derived from prophages, accounting for 2.2% of the total bacterial genes ( Figure 6A). The proteins that constitute each COG are assumed to be derived from an ancestral protein with similar or identical functions. The COG database 2 is a popular tool for functional annotation because of its reliable assignment of orthologs and paralogs and careful manual curation (Tatusov et al., 1997;Galperin et al., 2015). We analyzed the gene clusters between B. pseudocatenulatum strain genomes and prophages to clarify the effects of prophages on the gene compositions of those strains. In total, 1598 gene clusters were found in 35 strains (BifCOGs), 107 of which originated from prophages (ProGOGs), thus accounting for 6.7% of the total gene clusters across the studied B. pseudocatenulatum strains (Figure 6B).
In total, 390 genes were annotated with clear COG classification in the 59 prophages (Figure 6C), 40.8% (159) of which found to be involved in DNA replication, recombination and repair and 7.7% (30 genes) in cell cycle control, cell division and chromosome segmentation, making the third largest COG. However, the functions of a large number of genes contributed by the prophages could not be identified, indicating a vast scope for further investigation.
Database matches allowed a tentative subdivision of the 59 prophages in B. pseudocatenulatum genomes into functional modules for a better understanding of their dynamics within their hosts. Based on a previous study (Botstein, 1980), we divided their function into five modules, namely lysogeny, DNA replication, DNA packaging, head and tail morphogenesis, and host lysis (Figure 7A).
Among the identified functional modules, DNA packaging module was well-preserved, with the occurrence of terminase, portal protein and capsid protein genes on 49, 54, and 31% of the prophages, respectively. In contrast, DNA replication was the most variable module, containing fewer preserved genes between the analyzed prophages, most of which lack the protein-encoding genes belong to this module except for the gene encoding the single-stranded DNA-binding protein. Further analysis of individual genes revealed that the most preserved gene was the integrase-encoding gene belonging to the lysogeny module, which was present on 58% of the prophages. The next most preserved genes were the single-stranded DNA-binding proteinencoding gene (54%) belonging to the DNA replication module and the portal protein-encoding gene (54%) belonging to the DNA assembly module ( Figure 7B).

Phylogenetic Analysis of Prophages Within B. pseudocatenulatum
To understand whether the identified prophages were derived from the same origin and homology, we constructed a phylogenetic tree based on its predicted whole genome sequences (Figure 8). In addition to the 59 identified prophages, two bifidophages obtained from the NCBI database, namely Bbif-1 (GCA_002633625.1) and PMBT6 (GCA_006529735.1), were subject to this evolutionary analysis. The homology analysis of these prophages divided them into six groups. The average guanine and cytosine (GC) content of Group1 was 55.3%, of Group2 and Group3 was approximately 59%, and of Group4 and Group6 was variable at around 60%. Group 5 was the most stable group with an average GC content of 57.9%. Comparison of the GC content of the prophages between the groups revealed a significant difference with varying degrees between Group1 and Group3, Group4 and Group6, indicating that the prophages in Group1 were distinctly different from those in the other groups. The average GC content was also significantly different between Group4 and Group5.
A systematic dot plot analysis (Supplementary Figure S2) of the sequences of these 61 Bifidobacterium prophages was performed to verify the accuracy of the prophage grouping and highlighted the possible collinearity between these groups. Group1 prophages were the most conserved and shared few homologous genes with those in other groups. Group2 prophages represented an independent prophage population in B. pseudocatenulatum as it showed no gene homology within or between the groups. Group3 prophages were relatively conserved and showed partial genome matches with Group5 prophages. Many similar genes were identified between Group4 as well as Group6 prophages. Notably, Group5 represented the most complex group of prophages within B. pseudocatenulatum as they showed gene fragment similarities with prophages in each group but the degree of similarity within Group5 was not high. This situation can be explained by pervasive genetic   degradation of prophages, characterized by orthologous gene loss after the temperate phage integration into bacterial chromosome (Bobay et al., 2014).
Further, among the genes most shared between these prophages, single-stranded DNA-binding protein encoding genes showed homology between most of the prophage sequences. A homologous gene encoding an unknown protein was also widely distributed in these prophages, suggesting that this protein plays an important function for the prophages. In addition, the genes encoding tape measure protein and integrase were also found to contribute significantly to the complex collinearity between the prophage sequences in B. pseudocatenulatum.

DISCUSSION
B. pseudocatenulatum is ubiquitous in the human gut across all ages (Turroni et al., 2012) and has been proven to possess multiple probiotic properties (Yang et al., 2017;Agusti et al., 2018;Peiroten et al., 2019). In this study, we analyzed the CRISPR-Cas systems and prophages within 66 B. pseudocatenulatum strains isolated from human and animal feces to clarify the defense and counter-defense struggle between B. pseudocatenulatum and its temperate phages.
The genomes of B. pseudocatenulatum strains showed broad diversity in their CRISPR-Cas systems (Table 1, 62%), which is higher than that in B. longum (38%) (Hidalgo-Cantabrana et al., 2017) and the estimated occurrence of 46% within bacteria in general (Grissa et al., 2007), but slightly lower than that in most other Bifidobacteria species (77%) (Briner et al., 2015). Most strains harbored cas1 genes displayed complete CRISPR-Cas systems that can recognize exogenous DNA fragments and exert immunological effects. Four subtypes, namely I-C, I-U, I-E, and II-A, were detected in our B. pseudocatenulatum strains. Subtype I-E, which has been detected as the most prevalent subtype in the previous CRISPR-Cas study on bifidobacteria (Briner et al., 2015), was only found in B. pseudocatenulatum FZJHZ1M1 strain in this study ( Table 1), suggesting that this subtype is rare in B. pseudocatenulatum species. Type II systems are the least common systems in nature  and so are in Bifidobacterium species (Briner et al., 2015). Moreover, in a previous study on Bifidobacterium CRISPR-Cas systems, subtype II-A was found only in B. bifidum and B. merycicum. Our study showed that subtype II-A is present in B. pseudocatenulatum, indicating that this species has a potential for genome editing. Meanwhile, subtype II-C is likely to be absent in this Bifidobacterium species (Figure 2). The difference between II-A and II-C systems is that II-A contains an additional Csn2 protein that interacts with other Cas proteins during the integration of spacer sequences (Garneau et al., 2010); the alternative of the Csn2 protein present in II-C is still unknown. Evolutionary research of type II CRISPR-Cas systems suggested that subtype II-A has evolved from subtype II-C, indicating that the csn2 gene was possibly acquired by the II-A ancestor during evolution (Chylinski et al., 2014). However, the phylogenetic tree based on the core genes of Bifidobacterium in a previous study suggested that B. pseudocatenulatum appeared earlier than B. longum (Sun et al., 2015). Therefore, it is very likely that B. pseudocatenulatum obtained the csn2 gene through horizontal gene transfer.
Some incomplete type I systems, mostly I-E systems (Figure 3B), were found during the detection of CRISPR-Cas subtypes in B. pseudocatenulatum. Most CRISPR-Cas systems with the signature cas genes of subtype I-E were incomplete. It is reported that the presence of the anti-CRISPR (Acr) protein and Acr-associated protein (Aca) encoded by the prophages could inhibit the normal function of CRISPR-Cas systems (Marino et al., 2018). Notably, possible Acr protein has been found by mapping protein-coding sequences in prophages against the latest Acr protein database (Marino et al., 2018;FIGURE 7 | Preservation of genes within the prophages identified based on the genomic functional modules. (A) Prophage genes were subdivided in five functional modules supported by a heatmap of the identified genes for each prophage. The prophage names are indicated on the right-hand margin of the heatmap, and the gene names are displayed at the bottom. (B) Abundance of individual functions identified within the prophages. The first column shows the number of prophages that encode a particular function listed in the second column, whereas the third column shows the relative percentages. Yin et al., 2019), whilst one Acr protein and one Aca protein capable of causing an incomplete CRISPR-Cas system were confirmed in the prophage within A13 whose CRISPR-Cas system was defective (Supplementary Table S3). FQHXN112M3 and FZJHZD11M4 strains possessing incomplete CRISPR-Cas systems showed the self-targeting phenomenon, i.e., the presence of other DNA fragments identical to the spacer sequences in the same bacterial genome. Thus, self-targeting phenomenon could also be a reason for the presence of incomplete CRISPR-Cas systems in B. pseudocatenulatum strains.
Spacers in CRISPR loci preserve the immunity record of invasive genomic fragments. In this study, B. pseudocatenulatum displayed CRISPR spacers targeting prophages not only within its own species but also in other Bifidobacterium species (Figure 5), which is in accordance with the results of previous report on B. longum (Hidalgo-Cantabrana et al., 2017). The spacers in B. pseudocatenulatum matching prophages in other bifidobacterial strains may suggest that those species share the same ecological niche in the human gut. However, the presence of diverse spacers in B. pseudocatenulatum supports the prevalence of phages in human gut (Shkoporov and Hill, 2019), especially for the temperate phages (Kim and Bae, 2018). In this respect, CRISPR-Cas systems provide this species with an evolutionary advantage, acting as a strong defense mechanism to avoid prophage predation or other foreign DNA fragments invasion.
Over 50% B. pseudocatenulatum strains ( Table 2) have prophages. The prevalence of lysogeny is in accordance with that in human gut microbiota (Kim and Bae, 2018) as well as that in aquatic bacteria (Castillo et al., 2018). The contribution of prophages to the bacterial genomes identified in our study was slightly lower than that in a previous bifidophage study (Lugli et al., 2016), probably because the previous study evaluated incomplete prophage fragments, whereas our study focused on complete prophages. Prophages within B. pseudocatenulatum are defective to a large extent. The well preserved integrase-encoding gene was found on only 58% of all B. pseudocatenulatum prophages, whereas it has been reported to be present in up to 90% of Bifidobacterium prophages (Lugli et al., 2016). The expression of the genes in the host lysis module is essential for the entry of the prophage into the lytic cycle (Hyman and Abedon, 2010), whereas the prophages in our study generally lacked genes encoding lysis-related proteins, indicating that their lytic cycle is unlikely to be induced. In addition, the retention of other important viral functional genes of prophages in B. pseudocatenulatum genomes is not as complete as that in other Bifidobacterium species. Notably, in the lysogeny module of 11 prophages, we observed the presence of genes encoding putative toxin-antitoxin family proteins, which may be crucial for the stable retention of the prophages in the host cells (Guo et al., 2014;Yao et al., 2018) and for the protection of the hosts against further phage infection (Samson et al., 2013). However, prophage degeneration is a common phenomenon under purifying selection (Canchaya et al., 2003;Asadulghani et al., 2009;Bobay et al., 2014). A study on E. coli and Salmonella enterica (Bobay et al., 2014) reported that gut bacteria generally have a domestication effect on prophages, characterized by rapid prophage inactivation followed by much slower degradation.
The prophages found in B. pseudocatenulatum showed abundant diversity and were divided into six groups by whole genome alignment, phylogenetic tree construction (Figure 8) and collinearity analysis (Supplementary Figure S2), each group representing the possible phage source for this species. Notably, DNA fragments of bacteriophage PMBT6 isolated from other Bifidobacterium species were found to be completely consistent with unique five spacer sequences present in 10 strains in our study, whereas no spacer sequence was perfectly matched to the DNA fragments of phage Bbif-1 isolated from B. bifidum. This phenomenon indicates that the same bifidophage may invade several host Bifidobacterium species/strains. Although phage selection is generally considered to be narrow for the host, increasing evidence suggests that phages have a broad host range in nature (Dekel-Bird et al., 2015;Kauffman et al., 2018). The bacteria that share the same ecological niche (Bono et al., 2013) or have the same outer membrane phage receptor binding proteins (Takeuchi et al., 2016;Dowah and Clokie, 2018) are likely to have the same phage predator. A recent study revealed the effect of phage receptor expression on bacterial susceptibility to phage infection (Castillo et al., 2019).
As a powerful genome editing tool, CRISPR has been receiving much attention. In addition to genome editing, CRISPR-Cas systems have also been proven useful in probiotic research applications. The conservation of CRISPR spacer sequences has enabled the traceability and evolutionary analysis of probiotics (Barrangou and Dudley, 2016), which has been used in Lactobacillus buchneri strains genotyping (Briner and Barrangou, 2014) and new species-level taxa identification (Zhou et al., 2020). Meanwhile, the role of prophage in antibiotic resistance genes (ARGs) transfer has been revealed (Haaber et al., 2016) and prophages carrying ARGs were also found in some Bifidobacterium strains (Mancino et al., 2019). This study is the first systemic analysis of CRISPR-Cas systems and prophages in B. pseudocatenulatum, which may provide insights for classification of this species as a probiotics.
It was also found that the prophages contributed to the genomic diversity in B. pseudocatenulatum, accounting for 2.2% in pan-genome and 6.7% in COG (Figure 6). Besides, primed spacer acquisition (Fineran et al., 2014) was also found within this species with several different spacer sequences in a certain CRISPR locus corresponding to the same temperate phage genome (Figure 5B), providing selective pressure for phage evolution and genomic diversity. Future studies are warranted to better understand the interaction between B. pseudocatenulatum prophage and its host by exploring the existence of some genetic elements in prophage driving bacterial evolution, or functional gene clusters helping host adaptation to harsh environment, as shown in the previous study on marine bacteria (Castillo et al., 2018). Besides, the isolation of difficult-to-culture phages from culturable bacteria by prophage induction could be used to improve our understanding of the bacteria-phage network (Mavrich et al., 2018).
However, this study still had some limitations. Protospacer adjacent motif (PAM) is a short, conserved sequence and essential for CRISPR target recognition (Hille et al., 2018). The identification of PAM is dependent on the analysis of protospacers, and the spacers sequence in the targeted DNA together with the upstream (5 -end) and downstream (3end) region (Gleditzsch et al., 2019). This study failed to determine PAM sequences of different CRISPR-Cas subtypes within B. pseudocatenulatum due to the limited number of sequenced Bifidobacterum phages genomes. In addition, the temperate phage integration site was not analyzed owing to the gaps presenting in the draft genomes. The main strength of this study is that the identification criterion for prophages was the presence of a complete match between the spacer sequences in CRISPR loci and the prophage, so the possibility of mismatch was extremely low. Furthermore, prophage grouping was subjected to evolutionary and collinear analyses to ensure high reliability. A careful analysis of prophages will help us select strains for prophage induction in the future study.

CONCLUSION
This study highlights the coevolution of B. pseudocatenulatum and bacteriophages, providing insights into the interaction between them. In this study, B. pseudocatenulatum showed the presence of a wide variety of CRISPR-Cas systems to protect itself against the invasion of foreign DNA fragments. The majority of phage DNA fragments (prophages) already inserted into the bacterial host genome were defective in genes associated with the disruption of host cells, which could be explained by purifying selection of temperate phage after its integration into bacterial chromosome (Bobay et al., 2014). Prophages within B. pseudocatenulatum tend to be inactive and unlike to enter lytic cycle spontaneously and release virions thereof. Notably, Acr protein and Aca protein encoding genes were found in the prophage from A13 strain presenting incomplete I-U system, which may represent a counter-defense strategy of temperate Bifidobacterium phage against CRISPR-Cas system. To further explore the defense-counter-defense strategy between B. pseudocatenulatum and its phages, future studies should perform prophage induction and obtain their genomic data.

DATA AVAILABILITY STATEMENT
The datasets generated for this study can be found in the BioProject, under accession number PRJNA577207.

AUTHOR CONTRIBUTIONS
GW, ZL, and WC conceptualized, reviewed and edited the manuscript. GW and QL contributed to data curation, investigation and writing the original draft. GW, QL, ZP, LW, and PT helped with the formal analysis. ZL and WC were responsible for the funding acquisition and project administration. GW, ZL, and HZ worked on the methodology. ZL, JZ, HZ, and WC contributed to resources. ZP, LW, and PT worked on the software. WC supervised. ZL and JZ validated the study.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmicb. 2020.01088/full#supplementary-material FIGURE S1 | Box and Whisker representation of the number of CRISPR repeats detected in the CRISPR loci of each CRISPR subtype.