Genetic characteristics and integration specificity of Salmonella enterica temperate phages

Introduction Temperate phages can engage in the horizontal transfer of functional genes to their bacterial hosts. Thus, their genetic material becomes an intimate part of bacterial genomes and plays essential roles in bacterial mutation and evolution. Specifically, temperate phages can naturally transmit genes by integrating their genomes into the bacterial host genomes via integrases. Our previous study showed that Salmonella enterica contains the largest number of temperate phages among all publicly available bacterial species. S. enterica is an important pathogen that can cause serious systemic infections and even fatalities. Methods Initially, we extracted all S. enterica temperate phages from the extensively developed temperate phage database established in our previous study. Subsequently, we conducted an in-depth analysis of the genetic characteristics and integration specificity exhibited by these S. enterica temperate phages. Results Here we identified 8,777 S. enterica temperate phages, all of which have integrases in their genomes. We found 491 non-redundant S. enterica temperate phage integrases (integrase entries). S. enterica temperate phage integrases were classified into three types: intA, intS, and phiRv2. Correlation analysis showed that the sequence lengths of S. enterica integrase and core regions of attB and attP were strongly correlated. Further phylogenetic analysis and taxonomic classification indicated that both the S. enterica temperate phage genomes and the integrase gene sequences were of high diversities. Discussion Our work provides insight into the essential integration specificity and genetic diversity of S. enterica temperate phages. This study paves the way for a better understanding of the interactions between phages and S. enterica. By analyzing a large number of S. enterica temperate phages and their integrases, we provide valuable insights into the genetic diversity and prevalence of these elements. This knowledge has important implications for developing targeted therapeutic interventions, such as phage therapy, to combat S. enterica infections. By harnessing the lytic capabilities of temperate phages, they can be engineered or utilized in phage cocktails to specifically target and eradicate S. enterica strains, offering an alternative or complementary approach to traditional antibiotic treatments. Our study has implications for public health and holds potential significance in combating clinical infections caused by S. enterica.


Introduction
Temperate phages are important mobile genetic elements with significant impacts on essential bacterial cellular processes (Golding et al., 2021).During the lysogenic cycle of the temperate phage, the phage can integrate its own genes into the bacterial host genome via integrase, thus becoming an intrinsic part of the bacterium (Chen et al., 2020).Such an integration process can naturally transmit the important and essential genes of the temperate phage to its host, potentially increasing bacterial virulence and promoting antibiotic resistance (Harrison and Brockhurst, 2017;Monteiro et al., 2019).
Both integrases and their attachment sites in the respective phage and bacterial genomes are necessary components for temperate phage integration.During the integration process, the integrases encoded by the temperate phages first identify the core sequence regions of the phage attachment sites (attP) and the bacterial attachment sites (attB) and then mediate unidirectional site-specific recombination between them (Groth and Calos, 2004).Characterization of phage integrases and their attachment site (att) core regions is a crucial step to understand the phage integration process.
In a previous study, we observed that the Salmonella enterica genome contains the largest number of temperate phages compared to other sequenced bacterial species (Zhang et al., 2022).Salmonella enterica is a zoonotic pathogen of substantial concern to global human and animal health (Knodler and Elfenbein, 2019).It is a leading cause of morbidity and mortality in human beings worldwide.Identification of the genetic characteristics of integration specificity of S. enterica temperate phages will provide a foundation to both better understand the mechanisms of phage-bacteria interactions and to potentially modify phages to address the public health issues caused by S. enterica infections.
With the goal of systematically analyzing S. enterica temperate phages and gaining insight into their potential roles to S. enterica pathogenesis and evolution, we conducted an extensive examination of our previously published temperate phage genome dataset.From this dataset, we extracted 8,777 S. enterica temperate phages, each containing an integrase gene within their genomes.We further explored the genome characteristics, integration specificity, and genetic diversity of the temperate phages and their integrases.
Our analyses indicated that S. enterica temperate phages were primarily sourced from the U.S. and U.K. though also widely distributed across continental Europe.In total, the S. enterica temperate phages contained 491 different integrases within their genomes, which could be further categorized into three integrase types: intA, intS, and phiRv2.Strong correlations existed between the sequence lengths of integrases and the core regions of attB and attP.Phylogenetic analysis indicated that both the S. enterica temperate phage genomes and the integrase gene sequences were highly diverse.Most temperate phages showed high sequence similarities with phages within the bacterial genomes in the genera Enterobacteriaceae, Pseudomonadaceae, and Vibrionaceae.
Our work shed lights on the mosaicity and diversification of S. enterica temperate phages and their integrases at the genomic level.The identification of the sequence factors important for integration of S. enterica temperate phages revealed by our study will facilitate further research on the interactions between temperate phages and their hosts and the roles that temperate phages play in the evolution of bacteria.However, it is important to acknowledge some limitations of our work.First, while we extracted a large number of phages and integrases, the functional characterization of these elements was beyond the scope of this study.Further investigations are needed to understand the specific mechanisms and implications of these integrases in S. enterica pathogenesis and evolution.Second, our study focused primarily on the genomic level, and future studies should consider integrating other omics approaches to gain a more comprehensive understanding of the complex dynamics between temperate phages and their hosts.

Identification of integrase sequences
This study identified a total of 8,777 temperate phages specifically associated with S. enterica from the comprehensive public and compromised temperate phage database generated in our previous study (Zhang et al., 2022).These temperate phages were detected from the bacterial next-generation sequencing data.In the previous study, we first used FastQC (Babraham Bioinformatics, 2010) and Trimmomatic (Bolger et al., 2014) for quality control and NGS reads filtering, followed by the steps of prophage detection and temperate phage detection (Zhang et al., 2022).
In this study, we identified the integrase sequences within the S. enterica temperate phage genome sequences by searching a customized integrase amino acid sequence database.To build such a customized database, we downloaded the amino acid sequences of all integrases listed on NCBI. 1 The database contains 14,045 amino acid sequences of integrase, and its size is 6.3 megabytes (MB).We then added the customized integrase amino acid sequence database to Prokka v1.13 (Seemann, 2014) as a built-in database.We then annotated all S. enterica temperate phages using Prokka and extracted the annotated integrase sequences.

Extraction of the host information
We downloaded the host information of the S. enterica temperate phages from NCBI (See footnote 1) using their SRA numbers.Then, we wrote an in-house python script to extract the host information,

Generation of temperate phage and integrase entries
We used CD-HIT v4.81 (Fu et al., 2012) to group the sequence of each temperate phage and its reverse complement into a single temperate phage entry.Each temperate phage entry contains the temperate phage genome sequences with 100% identity and 100% coverage.The same was done with the sequence of each integrase and its reverse complement.Strict CD-HIT parameters were used.In particular, the parameter c was set to 1, representing 100% similarity between any two sequences; the parameters AL and AS were both set to 0, indicating that the two sequences matched exactly without any non-aligned parts.

Phylogenetic analysis of temperate phage and integrase entries
We first used MAFFT v7.505 (Katoh, 2002) to generate multiple sequence alignments of temperate phage and integrase entries, respectively.In order to focus on obtaining sequence clusters rather than evolution rates, we employed the neighbor-joining (NJ) method for constructing the phylogenetic tree due to its efficiency.Specifically, we used TreeBest (Vilella et al., 2009) software to build the neighborjoining trees based on the temperate phage nucleotide sequence alignments and the integrase amino acid sequence alignments.Default parameters were used, except for setting the bootstrap value to 1,000.The phylogenetic trees were displayed using the online tool ChiPlot.2

Taxonomic classification of Salmonella enterica temperate phages
The putative open reading frames (ORFs) of all 3,857 S. enterica temperate phages were identified and translated using Prodigal v1.13 (Hyatt et al., 2010).Then, the putative temperate phage ORFs were input into vContact2 to perform taxonomic clustering using the ProkaryoticViralRefSeq94-Merged database with default parameters (Bin Jang et al., 2019).

Correlation analysis of Salmonella enterica temperate phages, integrases, and core regions of attachment sites attB and attP
We used Spearman correlation coefficients (rs) to assess the relationships of both sequence lengths and GC content among temperate phages, integrases, and core attachment site regions.

Analysis of Salmonella enterica serotypes infected by the Salmonella enterica temperate phages
To be precise, we consider temperate phages that have been reported to integrate into their host genomes.This indicates that a specific S. enterica serotype has been found to be infected by these temperate phages.Integration refers to the inclusion of temperate phage genome sequences within the host genomes, illustrating that they share sequence similarities.Particularly, we used BLASTn to align all the 3,857 S. enterica temperate phage entries (non-redundant phage genome sequences) against nucleotide database of NCBI.We then extracted all the temperate phage genome sequences aligned to the S. enterica serotypes genome sequences (with sequence identify > 80% and sequence coverage > 80%).

Genomic characteristics of Salmonella enterica temperate phages and their integrases
To begin our study, we obtained an extensive collection of S. enterica temperate phages from a publicly accessible and comprehensive temperate phage database that we had previously developed (Zhang et al., 2022).Within this database, we meticulously identified attachment site core sequences at both ends, facilitating the precise delimitation of prophage sequences.Leveraging the wealth of data provided by this database, we proceeded to conduct an exhaustive analysis encompassing a total of 8,777 S. enterica temperate phages, including the examination of their integrases.The S. enterica hosts were widely distributed across six continents, though the majority of samples were acquired from the U.S. and the U.K. (Figure 1A; Supplementary Table 2).The S. enterica studied here were primarily sampled from human hosts (Figure 1B; Supplementary Table 2).
Considering that sequence length and GC content are fundamental characteristics of genomes, our initial analysis focused on examining the distribution of lengths and GC contents among temperate phages of S. enterica.To be specific, the sequence lengths of the S. enterica temperate phages and integrases covered a relatively narrow range (Figure 2A; Supplementary Table 1), with most temperate phages between 35,000 and 40,000 bp in length and most integrases between 1,170 and 1,275 bp.Similarly, the GC content of these elements covered a narrow range, from 42% to 48% GC content in temperate phages and 48% to 49% in the integrases (Figure 2B; Supplementary Table 1).
Even though the number of extremely large (>70,000 bp) temperate phages is very small, their presence is noteworthy.These phages make up only 0.09% (8/8,777) of the temperate phage genome sequences identified in our previous study, where these temperate phages were extracted from bacterial high-throughput sequencing data based on their spontaneous induction (Zhang et al., 2022).When temperate phages are induced to a particular concentration from their host strains and their circular phage genome sequences are detected, we can confidently recognize these sequences as complete temperate phage genome sequences.In other studies, the reported genome sizes for phages induced from STEC (Shiga toxin-producing Escherichia coli) belonging to serogroups O26, O29, and O157 ranged from 47 to 70 kb (Willshaw et al., 1987;Rietra et al., 1989;Schmidt, 2001).Additionally, prophage genomes in Baltic Sea bacterioplankton isolates were found to vary in size from 8 to 87 kb, with 57% falling within the 35-45 kb range (Leitet et al., 2006).These findings underscore the diversity and range of temperate phage genome sizes, with some temperate phages exhibiting notably larger genomes.
It has been observed that temperate phages found in various hosts may possess identical genome sequences.Consequently, we consider these temperate phage genome sequences, exhibiting 100% identity and 100% coverage, as a single temperate phage entry (non-redundant phage genome sequence).In total, we identified unique 3,857 temperate phage entries (non-redundant phage genome sequences) and 491 unique integrase entries (non-redundant integrase gene sequences).

The correlation among Salmonella enterica temperate phages, integrases, and the core regions of the attachment sites attB and attP
As depicted in the upper panel of Figure 3A, the core regions of the attachment sites attB and attP consist of identical sequences shared by both sites.A temperate phage can use integrase to integrate into host genome.In our previous study, we provided the complete temperate phage genomes (Zhang et al., 2022).From these genomes, we further extracted the core region and integrase for each phage (Figure 3B).To accomplish this, we initially predicted the ORFs and then proceeded with gene annotation.Finally, we successfully identified the core region and integrase of each temperate phage genome.
We used Spearman correlation coefficients (rs) to measure correlations among the sequence lengths of all three aforementioned elements, namely temperate phages, integrases, and attachment site core regions.We evaluated the correlation between variables using both rs (correlation coefficient) and value of ps.To determine a correlation, we considered variables to be correlated if the rs value exceeded 0.3 and the corresponding p-value was less than 0.05.
In particular, we observed a strong positive association between the sequence lengths of integrases and the core regions of attB and attP (R = 0.88, p < 2.2e − 16, Figure 4B; Supplementary Table 1).This positive association suggests that longer integrase sequences are correlated with longer core regions of attachment sites.Conversely, the correlations between the sequence lengths of temperate phages and the other two elements were both modest and negative.Specifically, there was a negative correlation between temperate phages and integrases (R = −0.53,p < 2.2e − 16, Figure 4A; Supplementary Table 1), as well as between temperate phages and attachment site core regions (R = −0.44,p < 2.2e − 16, Figure 4C; Supplementary Table 1).This negative correlation indicates that longer temperate phage sequences are associated with shorter integrase sequences/core regions of attachment sites.
Additionally, we found a modest and negative correlation between the GC content of integrases and temperate phages (R = −0.31,p < 2.2e − 16, Figure 4D; Supplementary Table 1).This negative correlation suggests that as the GC content of integrases increases, the GC content of temperate phages tends to decrease, and vice versa.However, we did not observe monotonic associations between the GC content of core regions and either attachment site core regions (R = −0.062,p < 7.7e − 09, Figure 4E; Supplementary Table 1) or integrases (R = −0.067,p < 2.6e − 10, Figure 4F; Supplementary Table 1).These non-monotonic associations indicate that changes in the GC content of a temperate phage or its integrase would not significantly impact the GC content of the core regions of attB and attP.
Our findings indicate that integrases typically localize in close proximity to attachment site core regions (Figure 5; Supplementary Table 1), which aligns with previous studies (Karaolis et al., 1998;Novick and Subedi, 2007;Tanaka et al., 2009).To be more precise, the relative distance between the integrase and the core regions of the attachment site is commonly found within two distinct ranges.Approximately 58% of temperate phages have lengths ranging from 102 to 215 bp, whereas 34% of temperate phages have lengths ranging from 774 to 1,068 bp.The relative distances are concentrated in the ranges of 0.002-0.005and 0.027-0.028,respectively.This concentration of relative distances suggests that there are specific preferred positioning patterns for integrases in relation to the attachment site core regions.Our findings indicate that integrases tend to localize near the core regions of attachment sites.The concentration of relative distances within specific ranges suggests preferred positioning patterns, further highlighting the functional relationship between integrases and the attachment site core regions.

The sequence varieties and genetic characteristics of Salmonella enterica temperate phages and their integrases
Each temperate phage identified in this study contained a single integrase.That is, 8,777 S. enterica temperate phages contained 8,777 integrases.As mentioned in the previous section (3.1), we classify these temperate phage genome sequences, which display 100% identity and 100% coverage, as a phage genome sequence entry (non-redundant temperate phage sequences).In total, these 8,777 S. enterica temperate phages were grouped into 3,857 unique S. enterica temperate phage entries (non-redundant temperate phages sequences).The 8,777 S. enterica integrases were grouped into 491 unique S. enterica integrase entries (non-redundant integrase sequences).
By aligning the 491 unique S. enterica integrases to the custom integrase database (Methods and Materials 2.1), all 491 integrases were classified into three types: intA, intS, and phiRv2 (Supplementary Table 4).The intS is a phage P4-type symbiosis island integrase protein, which is required for integration and excision of the island (Ramsay et al., 2006).The intA is an Stx-like integrase protein (Hochhut et al., 2006).Shiga toxin (Stx) and Stx-like toxins are normally found in various lambdoid phages that can integrate into multiple sites within the Shigella species and select Escherichia coli pathovars (O'Brien et al., 1984(O'Brien et al., , 1989)).The phiRv2 has previously been detected in Mycobacterium tuberculosis H37Rv genome (Mahadevan et al., 1998), encoding a tyrosine recombinase (Bibb and Hatfull, 2002).The illustration of temperate phage induction, core region and integrase.(A) Temperate phage induction and integration processes.The letter B represent the sequence on host bacteria, the letter P shows the sequence of temperate phage, and the letter O means the core regions of the attB and attP sites.(B) The identification of the core region and integrase of a temperate phage.These temperate phage genome sequences are from the database we generated in a previous study (Zhang et al., 2022).Among the 3,857 non-redundant S. enterica temperate phage genome sequences, a total of 40 S. enterica serotypes are susceptible to be infected by temperate phages (Supplementary Table 8), suggesting a diverse range of phage-host interactions within the S. enterica species.The predominant serotype is S. enterica serotype (serovar) Enteritidis (S. enteritidis), being infected by 330 S. enterica temperate phages (Supplementary Table 8).Salmonella enteritidis is a globally recognized serotype with significant public health implications (Whistler et al., 2018).It is widely recognized as one of the most prevalent strains responsible for human infections worldwide second to S. typhimurium (Galanis et al., 2006).Salmonella typhimurium can be infected by 46 temperate phages in our study.
Regarding the sequence length and GC content, we observed a relatively narrow range for S. enteritidis temperate phages, integrases, and core regions (Figure 9; Supplementary Table 9).Most temperate phages had lengths between 39,267 bp and 39,279 bp, integrases had a length of 1,275 bp, and core regions had a length of 81 bp (Figure 9A; Supplementary Table 9).Similarly, their GC content exhibited a narrow range, with most temperate phages ranging from 41.32% to 41.33% GC content, integrases at 49.02%, and core regions at 49.38% (Figure 9B; Supplementary Table 9).

Discussion
Salmonella enterica is an important pathogen that can cause lifethreatening diseases in humans and animals.The temperate phages of S. enterica play essential roles in bacterial evolution and pathogenicity, especially in the horizontal transfer of virulence factors and antibiotic resistance genes.Our previous study also found S. enterica had the largest number of temperate phages of all sampled bacterial species (Davies et al., 2016).A gene-sharing network of 3,857 unique temperate phages.The genomes of the 3,857 unique phage entries were compared with the bacterial virus genomes retrieved from ViralRefSeq94-Merged database.The phage genomes in our study are colored by their integrase types, while the phage gene sequences from the database are colored according to their family level.Each node represents a phage gene sequence, while each edge displays the similarity between any two phage gene sequences.The larger the node is, the more similarity it has with other phages.This study focused on basic characteristics and integration specificity of S. enterica temperate phages.We analyzed 8,777 S. enterica temperate phage genomes and identified the integrases contained within each one.The hosts of the S. enterica temperate phages were widely distributed all over the world (Figure 1; Supplementary Table 2).Probably due to sampling bias, these temperate phages were mainly distributed in the U.S. and the U.K., and primarily sourced from humans.The presence of unique phage types or variants specific to other regions might be missed due to the limited geographical representation.This could result in an incomplete understanding of the taxonomic composition and functional characteristics of S. enterica temperate phages on a global scale.Future studies should aim to expand the geographical representation, including diverse ecological settings, to obtain a more comprehensive assessment of the global distribution and diversity of S. enterica temperate phages.From the basic sequence characteristics of the temperate phages, we found that the sequence lengths of phages, integrases, and the core regions of attB and attP had modestly negative (phages vs. integrases, and phages vs. core regions) correlations (Figure 4; Supplementary Table 1).The negative correlations in sequence lengths suggest that there might be a coordinated evolution between phages, integrases, and the core regions of attB and attP.This coordination could indicate functional or structural dependencies among these elements during phage integration.It is possible that maintaining specific length ratios between these sequences is important for the efficient and stable integration of the phage genome into the host bacterial chromosome.
We also discovered a strong positive correlation between the lengths of integrases and the core regions (Figure 4; Supplementary Table 1).It is reported that the efficiency of integrase reduces at shorter attB length (Ghosh et al., 2003).Our study found that the phage with shorter integrase normally has shorter attachment site core region.Thus, we hypothesized that the reduced integrase efficiency may due to its shorter length.
The GC content between phages and integrases was weakly correlated (Figure 4; Supplementary Table 1).This suggests that there might not be a strong relationship between the GC content of phages and integrases, indicating that factors other than GC content might be driving their co-occurrence or co-evolution.
Also our results showed that the integrases of S. enterica temperate phages were often localized around attachment sites (Figure 5; Supplementary Table 1), which is consistent with the previous studies (Huang et al., 2019;Yarnall et al., 2022).The localization of integrases near attachment sites suggests a potential functional advantage in terms of integration efficiency.By being in close proximity to the attachment sites, integrases can readily access and interact with the specific sequences required for recombination.This proximity facilitates the efficient recognition and binding of the attachment site core regions by the integrase, promoting successful integration of the phage DNA into the host genome.
The presence of intA integrases in Shiga and E. coli species, as well as phiRv2 integrases in M. tuberculosis, has been commonly observed.However, the identification of intA and phiRv2 integrases in temperate phages of S. enterica suggests the possibility of horizontal gene transfer and genetic diversification among different bacterial species.This horizontal gene transfer could contribute to the acquisition of new genetic elements, virulence factors, and the genetic diversification within bacterial populations, leading to the emergence of novel traits, pathogenicity, or phenotypes.On the other hand, the low representation of phiRv2 integrases suggests a involvement of the PhiRv2 prophage in the bacterial populations of S. enterica.This implies that the transfer of genetic material associated with the PhiRv2 prophage is less frequent or less successful in S. enterica compared to its native host, M. tuberculosis.
The presence of intA and phiRv2 integrases in S. enterica temperate phages highlights the potential for horizontal gene transfer and genetic exchange among different bacterial species.The lower representation of phiRv2 integrases suggests a less prominent role of the PhiRv prophage in S. enterica populations compared to its association with M. tuberculosis.
Phylogenetically, most integrases of type intS tend to be grouped together (Figure 6A, bottom; Supplementary Table 4).The integrase sequences of type phiRv2 (Figure 6A, in red; Supplementary Table 4) had low similarity with the sequences of types intA and intS.While most phages have the integrase type intS (63.7%) and intA (35.8%),only two phages with type phiRv2 (0.5%).
On the other hand, the temperate phages with integrases intA and intS were mixed in both (Figure 6B; Supplementary Table 3).Taxonomically, some S. enterica temperate phages with the same integrase type tended to fall within the same taxonomic groups (Figure 7, two clusters on the right, Supplementary Tables 6, 7), while some phages with different integrase types were mixed together (Figure 7, left and middle; Supplementary Tables 6, 7).The results suggested that besides the integrase type, other factors may also influence the evolution of S. enterica temperate phages, which need to be further explored.In terms of sequence similarity, the S. enterica temperate phage sequences were highly similar to the phage sequences of Enterobacteriaceae, Pseudomonadaceae, and Vibrionaceae.
Among the S. enterica serotypes susceptible to S. enterica temperate phages, S. enteritidis stands out as the dominant serotype.Our results provide the basic knowledge for bioengineering the temperate phages that infect S. enterica serotypes, holding the potential to develop precise strategies for the prevention, surveillance, and management of this significant public health concern.Additionally, gaining a comprehensive understanding of the various serotypes, their prevalence, and the clinical manifestations associated with S. enterica infection by temperate phages can aid in the development of targeted approaches to address the diverse public health concerns caused by different S. enterica serotypes.
Moreover, it is important to acknowledge the limitations of our study.Firstly, there may be potential biases in the data due to the majority of phages being acquired from the U.S. and the U.K., which could limit the generalizability of our findings to a global scale.Furthermore, our study focused on basic sequence characteristics, and further investigations are needed to elucidate the functional implications of these correlations and the underlying mechanisms involved.Future studies should incorporate a wider range of phage samples from diverse geographical locations and consider additional factors such as gene expression patterns and host interactions to gain a more comprehensive understanding of phage integration.In conclusion, our study provides insight S. enterica temperate phages and their integrases sheds light on their genetic characteristics and integration specificity.This work will facilitate future research into the interactions between phages and S. enterica and efforts to engineer phages to generate functional materials to better combat S. enterica infections.

FIGURE 1
FIGURE 1 Sample information of Salmonella enterica temperate phages.(A) Geographical distribution of S. enterica temperate phages.The size of the sampling point indicates the number of samples.(B) Sankey diagram of sample sources of S. enterica temperate phages.The bars from left to right indicate geographic location, host source, and sampling time, respectively.

FIGURE 2
FIGURE 2 Basic sequence characteristics of temperate phages, integrases, and attachment site core regions.(A) Sequence length distributions of temperate phages, integrases, and the core regions of attB and attP.The left panel depicts the number of sequences with the same length.The right panel illustrates the distributions of the sequence lengths.(B) GC content distributions of temperate phages, integrases, and the core regions of attB and attP.The left panel depicts the number of sequences with the same GC content.The right panel illustrates the distributions of GC content.
Most of the integrases (63.7%) were classified as intS, while 35.8% of the integrases were classified as intA and only two were classified as phiRv2.It is noted that intA is normally found in Shiga and E. coli species, and phiRv2 is normally found in M. tuberculosis.The finding of intA and phiRv2 integrases in S. enterica temperate phages implies a potential for horizontal gene transfer and genetic diversification

FIGURE 4
FIGURE 4 Correlations among the sequences of temperate phage, integrase, and attachment site core region sequences.R > 0.3 and p-value < 0.05 indicate the two elements were correlated.(A-C) Correlations among sequence lengths of temperate phages, integrases, and attB and attP core regions.(D-F) Correlations among GC content of temperate phages, integrases, and attB and attP core regions.

FIGURE 6
FIGURE 6 Phylogenetic diversity of unique integrase and temperate phage sequences.(A) Neighbor-joining (NJ) tree of 491 integrase entries.The inner string indicates the type of integrase.The outer string indicates the number of integrases in each entry.(B) Neighbor-joining (NJ) tree of all 3,857 temperate phage entries.The inner string indicates the type of integrase.The outer string indicates the number of temperate phages in each entry.

FIGURE 8 Aligned
FIGURE 8Aligned Weblogos depicting the core region sequences of various integrase types.The X-axis represents the nucleic acid base position, while the Y-axis quantifies the sequence conservation at each position (measured in bits).(A) The Weblogo analysis of the core region sequences of integrase intA.(B) The Weblogo analysis of the core region sequences of integrase intB.(C) The Weblogo analysis of the core region sequences of integrase phiRv2.

FIGURE 9
FIGURE 9Basic sequence characteristics of Salmonella enteritidis temperate phages, integrases, and attachment site core regions.(A) Sequence length distributions of S. enteritidis temperate phages, integrases, and the core regions of attB and attP.(B) GC content distributions of S. enteritidis temperate phages, integrases, and the core regions of attB and attP.