Comparative Methylome Analysis of Campylobacter jejuni Strain YH002 Reveals a Putative Novel Motif and Diverse Epigenetic Regulations of Virulence Genes

Campylobacter jejuni is a major cause of foodborne gastroenteritis worldwide inflicting palpable socioeconomic costs. The ability of this pathogen to successfully infect its hosts is determined not only by the presence of specific virulence genes but also by the pathogen’s capacity to appropriately regulate those virulence genes. Therefore, DNA methylation can play a critical role in both aspects of this process because it serves as both a means to protect the integrity of the cellular DNA from invasion and as a mechanism to control transcriptional regulation within the cell. In the present study we report the comparative methylome data of C. jejuni YH002, a multidrug resistant strain isolated from retail beef liver. Investigation into the methylome identified a putative novel motif (CGCGA) of a type II restriction-modification (RM) system. Comparison of methylomes of the strain to well-studied C. jejuni strains highlighted non-uniform methylation patterns among the strains though the existence of the typical type I and type IV RM systems were also observed. Additional investigations into the existence of DNA methylation sites within gene promoters, which may ultimately result in altered levels of transcription, revealed several virulence genes putatively regulated using this mode of action. Of those identified, a flagella gene (flhB), a RNA polymerase sigma factor (rpoN), a capsular polysaccharide export protein (kpsD), and a multidrug efflux pump were highly notable.


INTRODUCTION
Campylobacteriosis is one of the most frequently acquired bacterial infections associated with the consumption of contaminated foods. Symptoms of campylobacteriosis include diarrhea, abdominal pain, vomiting, cramps, and fever. Although the infection is often self-limiting, its prevalence is a cause of concern for the World Health Organization (World Health Organization, 2020). Most cases of campylobacteriosis are caused by Campylobacter jejuni, a bacterium that exists as a commensal organism in the gastrointestinal tract of many domestic animal populations that are utilized as food sources such as poultry, cattle, pigs, and sheep (Facciola et al., 2017). Factors that influence the virulence of Campylobacter in humans are currently under investigation in an effort to devise better methods to control and/or prevent its spread.
DNA sequencing has become a driver for many scientific advancements over the past few decades. Sequence-based methods are now commonly used to identify bacterial virulence factors and as bioinformatic approaches advance, the frequency of utilizing comparative genomic analyses to identify genes involved with virulence continues to rise. Such analyses have already led to the identification of the pVir plasmid in Campylobacter jejuni (Bacon et al., 2002). With the rapid progress in whole genome sequencing (WGS) technology over last decade, the study of bacterial foodborne pathogens has gained new dimensions, which has provided further insight into the biology of these organisms (Llarena et al., 2017). In addition to generating genome sequences, Single Molecule Real Time (SMRT) technology offers a tool for studying epigenetic mechanisms, including DNA methylation. In bacteria, DNA methylation is a base modification system that is catalyzed by methyltransferases and results in the formation of the following: C 5 -Methyl-cytosine (m5C), N 6 -methyl-adenine (m6A), and N 4methyl-cytosine (m4C) (Sanchez-Romero et al., 2015). These modified bases enable the cell to identify and eliminate foreign DNA via the bacterial restriction-modification (RM) system, which has been shown to cleave double stranded DNA fragments. The resulting fragments are then further degraded by other endonucleases within the cell, essentially destroying any foreign DNA that may have been introduced that threatens the integrity of the cellular DNA. RM systems are common amongst bacteria with all four classes of RM systems (types I-IV) having been identified in C. jejuni 1 .
Methylation sites that are embedded within promoter regions or other regulatory sites may also serve as a mechanism by which genes, including virulence genes, can be regulated. For example, the presence of two DNA methylase targets within the regulatory region of the well-studied pyelonephritis-associated pilus (pap) operon has been shown to establish fimbrial phase variation in Escherichia coli populations by modifying the binding sites of a transcriptional regulatory factor involved in the process (van der Woude et al., 1996). It has been suggested by Mou et al. (2015) that DNA methylation may also be involved in the pathogenicity of C. jejuni, leading to some of the variations in disease presentation that can be seen amongst strains sharing a high degree of genomic synteny and sequence homology. This idea is further supported by gene products like that of cj1461, a N 6 -adenine-specific DNA methyltransferase that has been shown to play a role in factors related to pathogenicity including motility, adherence, and invasion in C. jejuni (Kim et al., 2008).
Overall, the study of methylomes is still in its infancy, with only a few reports available for C. jejuni (Murray et al., 2012;Mou et al., 2015;O'Loughlin et al., 2015) and none for a C. jejuni food isolate. In this context, we undertook the methylation analysis of the C. jejuni YH002, a strain isolated from beef liver, and 1 http://tools.neb.com/genomes/index.php?page=C compared/contrasted it with two well-studied C. jejuni strains to provide a deeper understanding into the biology of this important foodborne pathogen.

MATERIALS AND METHODS
Detection and Analysis of Methylated Bases in C. jejuni YH002 The strain C. jejuni YH002 was isolated from retail beef liver using a passive filtration method (Speegle et al., 2009;Jokinen et al., 2012). The identity of the strain was confirmed by real-time PCR targeting the hipO gene (He et al., 2010). For genomic DNA preparation, the strain was grown in Brucella broth (Difco, Becton Dickinson, NJ) for approximately 40 h under microaerophilic conditions (5% O 2 , 10% CO 2, and 85% N 2 ) at 42 • C and pelleted by centrifugation. High quality genomic DNA for SMRT sequencing was extracted and purified using the Qiagen Genomic-tip 100/G kit (Qiagen, Valencia, CA). Library preparation and SMRT sequencing was provided by the University of Delaware Sequencing and Genotyping Center (Newark, DE). The complete genome sequence and annotation are available in GenBank of NCBI with the accession No. CP020776 for the 1,774,584 bp chromosome and No. CP020775 for the 45,904 bp plasmid (pCJP002) (Ghatak et al., 2020).
Detection of methylated bases in the genome of C. jejuni YH002 was performed by using SMRT Analysis Suite 2.3.0 2 with default parameters. Additional analyses were undertaken with various modified QV settings using the MotifMaker tool 3 .
The methylation locations in the C. jejuni YH002 genome were obtained by searching the 41 nucleotide (nt) context sequence of each motif in the raw PacBio data against the complete genome sequence. The average methylation frequency for each motif in the genome was calculated by dividing the number of occurrences of a particular methylation motif in either the chromosome or the plasmid by the total number of kilobase pairs in the chromosome (1,774.6 Kb) or the plasmid (45.9 Kb), respectively. All coding sequences (CDS) and gene products were predicted by RAST annotation 4 . The methylation frequency of the RAATTY motif in each CDS, rRNA, and tRNA was determined using a custom function added in Macros of Excel.

Comparative Analysis Amongst C. jejuni Strains
For the comparative assessment of the methylomes, published motif data of C. jejuni 81-176 and C. jejuni NCTC 11168 were downloaded from REBASE database (Roberts et al., 2015). Genomes of C. jejuni 81-176 and NCTC 11168 were searched for motifs with a P-value cut off < 0.001 as described previously (Grant et al., 2011). Methylated and un-methylated CDS were then determined using RAST annotation and methylation data (Quinlan and Hall, 2010).

Identification of Methylated Promoters
The locations of the transcriptional start sites (TSSs) in the C. jejuni YH002 chromosome were identified by sequence mapping the 50 nt region upstream of the TSSs in C. jejuni NCTC 11168, which were previously determined for C. jejuni NCTC 11168 using RNA-Seq data obtained from biological replicates during mid-log growth (Dugar et al., 2013). The C. jejuni YH002 chromosome is known to share high sequence similarity with C. jejuni NCTC 11168 (Ghatak et al., 2020). Similarly, the locations of the TSSs in the C. jejuni YH002 plasmid were identified by mapping the 50 nt sequences upstream of the TSSs in pTet in C. jejuni 81-176 since these two plasmids have 99% sequence similarity. In the present analysis, only those TSS deemed as either a primary or secondary TSS by Dugar et al. (2013) were utilized during the mapping process. All matched promoter sequences in C. jejuni YH002 were then checked for overlap with the methylated bases within the genome, ultimately deducing the promoter regions that were methylated (Quinlan and Hall, 2010). Plots of methylated and un-methylated CDS regions were generated using Circos (Krzywinski et al., 2009).

Methylation Detection, Motifs, and Distribution Patterns
Detection of base modifications during SMRT sequencing of the C. jejuni YH002 genome identified putative methylation motifs ( Table 1). Because generating motifs using the default settings in the modification analysis of the SMRT sequencing platform might not be ideal in all cases for motif identification (Methylome-Analysis-Technical-Note) 5 , we undertook a comparative analysis with various modification QV (modQV) cut-off values ranging from 30 to 200. Though the results (Table 1) indicated the presence of m6A and m4C types of modifications in the C. jejuni YH002 genome, low occurrences of m4C type modification, inconsistent detections, and their bizarre patterns were likely indications of spurious detection by the software (Motif maker) and therefore, we did not consider them as valid motifs in our results. Mean coverage of motif detection varied from 225X to 299X indicating satisfactory analysis. Overall, the results yielded 8-12 motifs depending on the modQV cut-off values. Among these, 7 motifs ( Table 1) appeared consistently in all the analyses. Of the seven motifs, four motifs formed two pairs of bipartite motifs (herby named ACNNNNNCTC/GAGNNNNNGT and TAAYNNNNNTGC/GCANNNNNRTTA), and another motif (RAATTY) was palindromic. Search of REBASE database for similar motifs in Campylobacter genomes confirmed our results.
Manual analysis of the remaining identified motifs (those not included in the consistently detected motifs described above) indicated that most were either a part of or an extension of those previously described motifs. However, a closer analysis revealed that among the set of motif strings which were not part of the motifs detected in all modQV runs was the putative 5 https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/ hidden motif, CGCGA. We believe this to be a novel motif for Campylobacter species as our search in the REBASE database did not yield a match. However, an exact match was noticed for Streptococcus mutans (GenBank Accession CP044495), which was isolated from the oral cavity of a human.
The consistently observed motifs were almost always (90.8-99.5%) methylated in the genome of C. jejuni YH002, with modification of the motif RAATTY being the predominant motif in both the chromosome and the pCJP002 plasmid (Figure 1). This is in agreement with a similar analysis performed on a recently emergent C. jejuni sheep abortion clone known as C. jejuni IA3902 (Mou et al., 2015). A comparison of the frequencies associated with the different methylation motifs in C. jejuni YH002 demonstrated that both the RAATTY and the ACNNNNNCTC/GAGNNNNNGT motifs showed a slightly higher frequency in the chromosome compared to the plasmid (Figure 2A). However, the frequency of the CCYGA, GKAAYG, TAAYNNNNNTGC/GCANNNNNRTTA, and the CGCGA motifs were higher in the plasmid by comparison. Although the distribution of the RAATTY motif appeared relatively uniform across both the chromosome and the plasmid of C. jejuni YH002 (Figure 1), an in-depth analysis of the frequency of the RAATTY (the predominant motif) demonstrated the existence of both hypomethylated and hypermethylated areas that were associated with specific genomic features/CDS (Supplementary Tables 1, 2). In this analysis, a value of 1 indicated that a single methylated site of the RAATTY motif in a CDS. Values greater than 40 indicated that CDS could be hypermethylated and those less than 1 indicated the possibility of hypomethylated CDSs (Mou et al., 2015). Based on this, CDS within the chromosome of C. jejuni YH002 contained an average of 11.1 RAATTY motifs per CDS ( Figure 2B). However, certain CDS contained ≥ 60 RAATTY motifs, with the number of RAATTY motifs highest for a putative type IIS restriction-modification enzyme (78), a possible restrictionmodification protein (64), the glutamate synthase [NADPH] large chain (60), and a DNA methylase (66) encoded for the plasmid (Supplementary Tables 1, 2). On the other extreme were the genes encoding tRNA, with a RAATTY frequency of 0.1 (Figure 2B), which indicated that many tRNA genes did not contain this methylation motif (Supplementary Table 1). Both integrated genetic elements (phage/CJIE1 and CJIE) and rRNAs had a considerably lower frequency of RAATTY compared to the chromosome and plasmid.
Comparative Analysis of Methylation in the Genomes of C. jejuni YH002, C. jejuni 81-176, and C. jejuni NCTC 11168 For the comparative analysis of the methylation patterns, previously reported methylation motifs from the REBASE database for the C. jejuni strains 81-176 and NCTC 11168 were obtained and the methylated CDS in all three genomes identified. Analysis of these CDSs revealed that a majority of the methylated CDSs (n = 1,178) were common among the three strains, though some uniquely methylated CDSs were also identified in each genome: 99 for C. jejuni YH002, 76 for C. jejuni 81-176, and  11 for C. jejuni NCTC 11168 ( Figure 3A and Table 2). Mapping of these uniquely methylated CDSs revealed discernible patterns ( Figure 3B) with the uniquely methylated genes more uniformly distributed in the genome of C. jejuni 81-176 compared to the other two genomes (YH002 and NCTC 11168). Both C. jejuni YH002 and C. jejuni 81-176 have a collection of plasmid-borne genes that are not present in C. jejuni NCTC 11168. Interestingly, these plasmid-borne genes are also methylated ( Figure 3B). The other two uniquely methylated regions in C. jejuni YH002 are both located in the integrated genetic elements (CJIEs). For C. jejuni YH002, this finding seems conflicted by the identification of proteins related to various components of putative RM systems, including Type I RM systems (Ghatak et al., 2020) and the presence of the genes encoding McrBC, which is responsible for specific restriction activity against methylated bases (usually 5 mC or 5 hmC).

Analysis of Methylation Locations in Gene Promoters
To enhance our ability to define genes that may be regulated by methylation, methylated sites located in gene promoter regions were identified by searching the known methylated sites within the −50 nt promoter sequences obtained by sequence mapping to the C. jejuni NCTC 11168 promoters (Dugar et al., 2013). Methylated bases were found in the promoter regions of 18 different hypothetical genes and 123 genes with predicted functions (Supplementary Table 3). Interestingly, several virulence genes were found to contain methylated nucleotides within their promoter region. This finding indicates a role for methylation in the transcription regulation of these genes, which includes flhB, rpoN, kpsD, multidrug resistance genes, and CRISPRs.
Further analysis of these promoter regions revealed the existence of all 6 different methylation motifs. Once again, RAATTY was the most dominant motif, having been identified 76 times within the predicted methylated promoters. The remaining motifs were found much less frequently, with CCYGA identified twice, GKAAYG identified three times, ACNNNNNCTC/GAGNNNNNGT identified once, TAAYNNNNNTGC/GCANNNNNRTTA identified seven time, and the CGCGA motif identified twice. The remaining methylated promoter regions did not appear to contain any of these defined methylation motifs, suggesting additional methylation motifs remain to be discovered and/or some of the six motifs extend beyond the −50 nt promoter sequences.

DISCUSSION
Methylation of the genome plays several important roles in prokaryotic biology including acting as a defense mechanism against the incorporation of intruding nucleic acids into the bacterial host DNA and the regulation of gene expression by influencing transcription, replication initiation and mismatch repair (Murray et al., 2012;Mou et al., 2015;Blow et al., 2016). At the heart of host defense mechanisms utilizing methylation of host DNA is the restriction-modification (RM) system comprised of a restriction endonuclease (REase) and methyltransferase (MTase) (Murray et al., 2012;Blow et al., 2016). Of the four types (I-IV) of RM systems that have been reported, a majority have been classified as Type II systems (Blow et al., 2016). This is believed to be the result of technological difficulties in detecting the other RM systems (type I, type III, type IV) (Murray et al., 2012). However, this bottleneck has been mostly overcome with SMRT sequencing (Rhoads and Au, 2015), which has provided an opportunity to study the methylome of bacteria and contribute to the rapidly growing resource pool of methylation data (Roberts et al., 2015). Nonetheless, to date there are only a few reports on the methylome of Campylobacter isolates (Murray et al., 2012;Mou et al., 2015;O'Loughlin et al., 2015;Zautner et al., 2015) and to our knowledge there is no report on the methylome of C. jejuni originating from a food source. In our analysis we observed 7 motifs, of which 4 motifs were part of two bipartite pairs. Similar findings have been documented previously (Murray et al., 2012;Mou et al., 2015;O'Loughlin et al., 2015). In addition to the previously reported motifs, our results also identified a novel putative motif CGCGA for Campylobacter spp. as we did not find a match for this motif within the Campylobacter genomes in the REBASE database. However, an exact match with Streptococcus mutans was found, possibly indicated shared RM systems even across distant classes of bacteria (Epsilonproteobacteria and Bacilli). During our analysis, we undertook a comparative modQV analysis to determine the effects of various cut-offs and also to derive a list of motifs with high confidence. Results suggested that comparative mod QV analysis is crucial for motif detection as various modQV cut-off values yielded varying results. Moreover, detection of base modifications by SMRT sequencing is known to produce secondary signals (usually 5 bases upstream m6A; 2 bases upstream 5 mC). Therefore, rigorous validation of software identified motifs is critical for correct motif identification as has been done in the present study.
Visualization of the methylated segments in the C. jejuni YH002 genome indicated widespread methylation covering almost 99% of the genome, which has been reported previously (Mou et al., 2015). Comparative analysis of the methylomes of three C. jejuni strains (YH002, 81-176 and NCTC 11168) highlighted that methylation patterns within a species are nonuniform and unlike that previously reported by O'Loughlin et al. (2015) we could not convincingly detect the m4C type FIGURE 2 | Methylation motif frequency in C. jejuni strain YH002. (A) The frequencies of six different methylation motifs in either the chromosome (black) or the plasmid (gray) were calculated using the number of a particular methylation motif divided by the total size of the chromosome (1,774.6 Kb) or the plasmid (45.9 Kb), respectively. (B) In-depth analysis of the frequency of the RAATTY motif within different genomic features was determined using the number of RAATTY motifs within a genomic feature divided by the total number of CDS in that genomic feature. Chromosome, C. jejuni YH002 chromosome; Plasmid, pCJP002; Phage/CJIE1, located between bp 1,409,078-1,444,627 in C. jejuni YH002;, and CJIE, Integrated mobile element located between bp 1,223,758-1,309,551 in C. jejuni YH002.
of modification. However, these results coincide with those of a previous study where only the m6A type modification was documented in C. jejuni (Mou et al., 2015).
Detection of the mcrBC gene in the C. jejuni YH002 genome is also interesting considering this type IV RM system is methylation dependent for its restriction activity, which can restrict the establishment of any foreign DNA from sources such as bacteriophage that lack the proper DNA methylation. It is believed that methylation-specific RM systems such as this have evolved in response to an evolutionary arms race between bacteriophage and bacteria. As bacteriophages evolved mechanisms to allow for the invasion of bacterial cells, host bacteria have armed themselves with methylationspecific RM systems, which then force the bacteriophage to alter their genome if they are to avoid cleavage (Bickle and Kruger, 1993;Labrie et al., 2010;Sukackaite et al., 2012). Therefore, we believe the genome of C. jejuni YH002 may be unusual for possessing a type IV RM system (mcrBC) while simultaneously harboring two methylated bacteriophage elements. However, taking into account that McrBC only weakly restricts methylcytosines other than hydroxymethylcytosine (Ishikawa et al., 2010) and the suggested absence of m4C type methylation in the present analysis, the co-existence of both the methylated bacteriophage elements and the mcrBC system appears justifiable.
Methyltransferases that are typically involved in the regulation of gene expression often do not contain functional restrictionmodification enzymes and are therefore referred to as orphan or solitary DNA methyltransferases. Methyltransferases of this type are important in cellular biology because of their ability to activate or silence genes without mutating the gene itself. Orphan methyltransferases can be found in C. jejuni YH002. For example, Accession # ARJ53876 (Ghatak et al., 2020) is a 227 amino acid protein found in C. jejuni YH002 whose sequence is identical to that of the N 6 -adenine-specific DNA methyltransferase cj1461 (Kim et al., 2008). The presence of orphan methyltransferases and the fact that multiple putative virulence genes were found in this study to contain methylation sites within their promoter regions (Supplementary Table 3) implies the ability of C. jejuni YH002 to employ this mode of regulation for the expression of certain virulence genes.
To uncover the genes that contained methylation sites within their promoter regions, the start of transcription was initially assessed. In this manuscript, TSSs were defined using data obtained from previously published RNA-seq experiments performed with biological replicates from specific C. jejuni strains during the mid-log phase of growth (Dugar et al., 2013). Although analysis in this fashion allowed the designation of a TSS to be based around experimental evidence instead of simply computational formulations, it is highly possible that some sites are missing from this analysis. For example, additional transcripts that were not assessed here may be produced by C. jejuni strains grown under alternative conditions. In addition, because the analysis relies upon those transcripts identified in C. jejuni NCTC 11,168 (which shares a high sequence similarity with C. jejuni YH002), transcripts unique to C. jejuni YH002 (e.g., CJIEs) would not be accounted for using this method. Regardless, TSSs regulated by σ 70 as well as σ 28 and σ 54 were established using this method. It is also important to note that transcripts corresponding to many of the areas designated here were found in other C. jejuni strains including C. jejuni 81-176, C. jejuni 81116, and C. jejuni RM1221 using RNA-seq (Dugar et al., 2013), which helps highlight the reliability of the TSS predictions used.   Of the virulence genes identified that contained a methylation site within its promoter region, one appeared to be directly related to flagellum production (flhB). Much is known about the biopolar flagella in C. jejuni. With approximately 25-30 proteins involved in the assembly and function of flagella (Lertsethtakarn et al., 2011), this system was initially classified as a virulence factor based upon its role in the motility and the invasion of epithelial cells in the intestine (Morooka et al., 1985;Black et al., 1988;Wassenaar et al., 1991;Takata et al., 1992;Nachamkin et al., 1993). Additional investigations have shown these genes are also important not only for the secretion of nonflagellar proteins involved in virulence but that changes to the glycan composition of the flagella can affect autoagglutination, a preliminary step in biofilm formation, and evasion of the host's immune response (Guerry, 2007). Insertional inactivation of flhB in C. jejuni resulted in mutants which were non-motile and had an altered cellular morphology (Matz et al., 2002). These mutants showed a significant reduction in the level of flaA transcript, demonstrating a role for FlhB in the regulation of flaA and thus indicating that the impact methylation has on the transcriptome may extend beyond those genes directly expressed from methylated promoters.
Another virulence gene in C. jejuni YH002 with widespread effects whose transcription is putatively regulated by DNA methylation is rpoN, which encodes for the σ 54 protein. In general, sigma factors are global regulators involved with the recruitment of RNA polymerase to the promoter regions in which they are bound. Like other C. jejuni strains, C. jejuni YH002 appears to encode three sigma factors (Parkhill et al., 2000;Ghatak et al., 2020): (1) rpoD, which is considered the "housekeeping" sigma factor; (2) fliA, which regulates the flagellar operon;, and (3) rpoN, which has long been associated with a range of physiological phenomena and has recently been designated as a "central controller of the bacterial exterior" (Francke et al., 2011). Previous studies in C. jejuni have shown that not only is rpoN required for the transcription of a number of genes involved in bacterial motility (Hendrixson et al., 2001) but that rpoN mutants also lacked the presence of secreted proteins and are attenuated in their ability to both adhere to and be internalized by HeLa cells (Fernando et al., 2007). Interestingly, a transcriptional activator associated with σ 54 was also found to contain a methylation site within its promoter; further establishing the importance of methylation for proper regulation of σ 54 in C. jejuni.
Identification of a methylation site within the promoter regions of genes encoding the capsular polysaccharide export protein (KpsD), clustered regularly interspaced short palindromic repeats (CRISPR) genes, and a putative multidrug efflux pump are also worth mentioning because of their contributions to the survival, adaptation, and antibiotic resistance of C. jejuni (Bacon et al., 2001;Luangtongkum et al., 2009;Guerry et al., 2012;Louwen et al., 2014). Considering the multifaceted roles of these genes in virulence, the information presented here regarding methylation as a possible mode of regulation will be highly important for combating diarrheal disease caused by C. jejuni since its ability to successfully infect a hosts lies not only in the presence of specific virulence genes but also in the pathogen's capacity to appropriately regulate those genes. Overall, studies such as this, which expand upon our knowledge of DNA methylation, serve to increase our understanding of how bacterial genomes are able to adapt to changes in the environment while simultaneously defending the integrity of the genetic material critical for their survival.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: GenBank of NCBI with the accession Nos. CP020776 and CP020775.