Complete Genome Sequence of Weissella hellenica 0916-4-2 and Its Comparative Genomic Analysis

Weissella genus from Leuconostocaceae family forms a group of Gram-positive lactic acid bacteria (LAB) that mostly reside in fermented foods and some have been isolated from the environment and vertebrates including humans. Currently there are 23 recognized species, 16 complete and 37 draft genome assemblies for this genus. Weissella hellenica has been found in various sources and is characterized by their probiotic and bacteriocinogenic properties. Despite its widespread importance, little attention has been paid to genomic characterization of this species with the availability of draft assembly of two species in the public database so far. In this manuscript, we identified W. hellenica 0916-4-2 from fermented kimchi and completed its genome sequence. Comparative genomic analysis identified 88 core genes that had interspecies mean amino acid identity of more than 65%. Whole genome phylogenetic analysis showed that three W. hellenica strains clustered together and the strain 0916-4-2 was close to strain WiKim14. In silico analysis for the secondary metabolites biosynthetic gene cluster showed that Weissella are far less producers of secondary metabolites compared to other members of Leuconostocaceae. The availability of the complete genome of W. hellenica 0916-4-2 will facilitate further comparative genomic analysis of Weissella species, including studies of its biotechnological potential and improving the nutritional value of various food products.


INTRODUCTION
The non-spore forming lactic acid bacteria (LAB) of Weissella genus within the Leuconostocaceae family contains 23 validly described species. In nature, Weissella spp. have been found in a wide range of habitats (Fusco et al., 2015) such as traditional fermented foods, milk, vegetables, feces, environment and vertebrates including humans. In traditional Korean fermented vegetable food kimchi, Weissella form the dominant genus at the late stage of fermentation, partly due to their high acid-tolerant property (Kim et al., 2016). Recently, a higher level of interest is being paid for the probiotic, biotechnological and bacteriocinogenic potential of Weissella, although some strains are known to act as opportunistic pathogen (Kamboj et al., 2015). Given that most strains regarded as opportunistic pathogens were isolated from the hosts with underlying risk factor, such as immunocompromised condition (Fusco et al., 2015;Kamboj et al., 2015), distinguishing the probiotic and pathogenic Weissella has always been a challenging task. Interestingly, the contemporary research has also identified Weissella as a producer of botulinum-like toxin (Mansfield et al., 2015;Zornetta et al., 2016). Despite the widespread importance, little attention has been paid to genomic characterization of this genus. This is further evident from the number of genome assemblies available in public database. So far, only 16 species of this genus have been sequenced and complete genome sequence is available for seven species only. Weissella hellenica 0916-4-2 (highlighted in the figures and tables by bold font) was isolated in our laboratory from Korean fermented pickle kimchi and the strain was identified by 16s rRNA sequencing. Although this species harbors a prominent bacteriocinogenic potential, with the production of various bacteriocins such as 7293A (Woraprayote et al., 2015), Weissellicin L (Leong et al., 2013), Weissellicin D   (Chen et al., 2014), Weissellicin M (Masuda et al., 2012), and Weissellicin Y (Masuda et al., 2012), there is a great lack of genomic studies among W. hellenica. In this manuscript, we completed the genome sequence of W. hellenica 0916-4-2 and performed its comparative genomic analysis with publicly available Weissella genomes. We found that W. hellenica 0916-4-2 clustered with W. hellenica Wikim14 based on whole genome phylogeny and harbored two putative genes clusters for the biosynthesis of bacteriocin and non-ribosomal peptide synthetase.  Species on the first column and first row of the table are represented in the same numeric order. The numbers show the percentage similarity between the conserved regions of the genomes, where the colors vary from yellow (low similarity) to blue (high similarity).

Strain and DNA Extraction
The strain used in this study was W. hellenica 0916-4-2 isolated from Korean pickle kimchi using MRS medium. Genomic DNA was isolated as explained (Panthee et al., 2017b).

Whole Genome Sequencing
The library preparation for Oxford Nanopore MinION and Thermo Fisher Ion PGM sequencing was performed as explained previously (Panthee et al., 2017a(Panthee et al., ,c, 2018

Read Correction and Genome Assembly
Read correction and genome assembly was performed as explained previously (Panthee et al., 2018). We obtained 2M reads (mean length 277 bp) from Ion PGM, and 247K reads from MinION (mean length 7 kb), accounting for approximately 273-fold and 900-fold genome coverage, respectively. The high quality MinION long reads were filtered using Filtlong and self-correction and trimming was performed using canu 1.7 (Koren et al., 2017). The short reads from Ion PGM were corrected using SPADES 3.11 (Bankevich et al., 2012). The hybrid error correction of long reads was then performed using LoRDEC (Salmela and Rivals, 2014). The final genome assembly was performed from the long corrected reads using Flye 2.3.3 assembler (Kolmogorov et al., 2018). Further polishing of the assembly was performed by mapping the short reads to the assembly followed by consensus generation. The assembled genome was annotated using the NCBI Prokaryotic Genome Annotation Pipeline (PGAP) to find 1778 protein coding genes and 101 RNA genes. The genome was further submitted to PathogenFinder (Cosentino et al., 2013) to predict the absence of

Genomic Data of Other Weissella Species and Comparative Genomic Analysis
Genomic data of the additional Weissella species analyzed in this study were obtained from the NCBI. The accession numbers and assembly status are indicated in the Supplementary Table  S1. To construct the phylogenetic tree, first the core genes of these genomes were computed, and the alignment of each core gene set was generated using MUSCLE, and the alignments were concatenated to create a single alignment. This alignment was used to generate the phylogenetic tree using neighbor-joining algorithm in EDGAR (Blom et al., 2016). The core-and panproteomes were predicted using EDGAR (Blom et al., 2016). The COG analysis of the core proteome was performed using eggNOG-Mapper (Huerta-Cepas et al., 2017). The analysis of secondary metabolite gene clusters was performed as explained (Panthee et al., 2017b) using antiSMASH (Blin et al., 2017).

Weissella Virulence Factors Analysis
For the bioinformatic analysis of virulence factors in Weissella genomes, we downloaded the amino acid sequence of core virulence factors from the virulence factor database (Chen et al., 2005;Liu et al., 2018) and performed a BLAST search against the Weissella proteome with a cut off e-value of e −30 and minimum identity of >70%. To analyze the virulence potential of W. hellenica 0916-4-2 in a mouse model, bacteria was cultivated overnight in MRS at 30 • C. The culture was centrifuged and resuspended in phosphate-buffered saline (PBS). Six weeks old female ICR mice (n = 5) were injected with bacteria equivalent to 200 µl of the overnight culture (1.5 × 10 9 CFU) through intraperitoneal route and survival was observed for 5 days. All mouse experimental protocols were approved by the Animal Use Committee at the Genome Pharmaceuticals Institute.
The most abundant GO annotation included oxidation-reduction process, ATP binding, and integral component of the membrane, respectively. InterProScan showed that a total of 3132 families were assigned for 1585 (89%) of the total proteins, where P-loop containing nucleoside triphosphate hydrolase family (IPR027417) was the highest 143 (4.6%) ( Figure 1D). Given that some of the Weissella species harbored the genes for drug resistance (Abriouel et al., 2015), none were detected in the genome of the 0916-4-2 strain.

General Features and Comparative Genomics of Weissella Species
As a first step toward the comparative analysis of the W. hellenica 0916-4-2 genome, we analyzed the gene sets with W. hellenica R-53116 and Wikim14 genomes using pairwise alignment (Figure 2A) and identified the strain-specific and commonly shared genes ( Figure 2B). Based on the analysis, we found that 1377 genes were distributed throughout the strains; the 0916-4-2 genome had a larger set of genes shared with Wikim14 genome compared to R-53116 genome and the number of genes unique to 0916-4-2 was 133. Interestingly, nearly half of the unique genes were regarded either hypothetical protein or domain of unknown function harboring proteins. Next, to provide an overview of Weissella genus and perform a comparative genomic analysis, the sequence data of 53 Weissella strains, that included 16 complete genome sequence assemblies, was obtained from NCBI. Of the 23 recognized species as of December 2018, these genomes represent only 16 species and complete genome is available only for seven species: W. ceti, W. cibaria, W. jogaejeotagli, W. koreensis, W. paramesenteroides, and W. soli. The assembly accession numbers and status of assemblies are shown in Supplementary Table S2. To determine the genomic variability between Weissella species, we performed the comparative genomic analysis of all the genome assemblies. The set of commonly shared genes, core genes, in the genus was determined using EDGAR (Blom et al., 2016) and the whole genome phylogenetic tree was constructed. The phylogenetic tree indicated that this analysis was capable of grouping the various strains of a single species into a single cluster (Figure 3). W. hellenica 0916-4-2 clustered with W. hellenica Wikim14, suggesting that these two strains might harbor similar biological properties. Mean AAI analysis of the core proteome showed  U B A 1 1 2 9 4 _ P h a g e _ 1 K C T C 3 7 5 1 _ P h a g e _ 1 N C T C 1 3 6 4 5 _ P h a g e _ 1 K    Table S1). Among the interspecies similarity, a high degree of similarity was obtained for W. jogaejeotgali FOL01 and W. thailandensis KCTC3751 with a 99% mean AAI ( Table 2). The members of Weissella genus are heteroformentative (Collins et al., 1993) attributed to the lack of phosphofructokinase (Sun et al., 2015). We did not find homologs for phosphofructokinase and lactate dehydrogenase in the 0916-4-2 genome. This suggested that 0916-4-2 lacks ability to produce L-lactate and this was further consistent with the observation of multiple copies of D-2-hydroxyacid dehydrogenase gene responsible for metabolism of pyruvate through D-lactate pathway. An analysis of Weissella genomes indicated that majority of Weissella had the genes for D-lactate formation with W. viridescens, W. minor, W. confusa, W. cibaria, and Weissella sp. DD23 harboring the genes for the production of both D/L configuration of lactic acid (Table 3). This might provide a possible explanation for the presence of significant amount of D-lactic acid (Yoon et al., 2005) and increase in lactic acid content at late stage of kimchi fermentation (You et al., 2017).
W. hellenica was first described in 1993 where Collins et al. (1993) reported the inability of this species to utilize various carbohydrates including D-cellobiose, D-raffinose, and gentiobiose for acid production. We analyzed the carbohydrate utilization pattern and found that W. hellenica 0916-4-2 can utilize D-cellobiose and D-raffinose very well; and gentiobiose partially (Supplementary Figure S1). This indicated the variation in strain specific properties of W. hellenica toward carbohydrate fermentation. Moreover, the metabolism of D-cellobiose and D-raffinose was consistent with the presence of the phosphotransferase system with β-glucosidase and α-galactosidase genes in the genome. Raffinose, present in various cruciferous vegetables including cabbage (Santarius and Milde, 1977), is not metabolized by humans and results in various intestinal disorders like flatulence (Rackis, 1981). Our study indicates that vegetable fermentation using 0916-4-2 strain might enhance the nutritional value of the products such as kimchi.

Core Genome Analysis of Weissella Species
Among the 53 Weissella strains analyzed, the core and pan genome size was 88 and 11,519, respectively (Figure 4). Among the 88 core genes in Weissella, three were found to be categorized as hypothetical and a COG analysis of the remaining 85 genes showed that more than a third of the genes were categorized to be involved in translation and nearly 13% of the genes were categorized to have unknown function (Table 4). Further, with addition of each species, there was a significant change in the number of core and pan genome suggesting a great genomic variation among the species within genus which could be a result of genomic fluidity or significant gene gain/loss during adaptation in natural niche.

Virulence Factors in Weissella Genome and W. hellenica 0916-4-2
Based on the BLAST search against the core virulence factors from the virulence factor database (Chen et al., 2005;Liu et al., 2018), we did not find homologs associated with toxin production system, including botulinum neurotoxin homolog from W. oryzae SG25 (Mansfield et al., 2015;Zornetta et al., 2016), in the analyzed Weissella genomes. We detected some of the genes, hasC (UDP-glucose pyrophosphorylase); cpsI (UDP-galactopyranose mutase); gnd (6-phosphogluconate dehydrogenase); cpsF and clpE (ATP-dependent protease) involved in virulence in various bacteria. These genes constitute a part of a pathway, indicating that the existence of a single genes is not sufficient to exert virulence. This suggested the need of a detailed investigation into the role of these genes in functional Weissella trait. Interestingly, all but Weissella sp. DD23 harbored hasC and there was no occurrence of species-specific virulence gene. We further expanded our study by the examination of the pathogenicity of W. hellenica 0916-4-2 through intraperitoneal injection to five mice. We found that mice did not die for 5 days post-injection suggesting its non-pathogenicity (data not shown). Although some members of the genus Weissella are opportunistic pathogen, there is a lack of clear demarcation between the probiotic and pathogenic strains among the strains analyzed in this report suggesting a need of a detailed investigation regarding Weissella pathogenicity.

Bacteriophages and Phage Defense
Phages are critical part of bacterial genome that facilitate various beneficial traits for bacteria such as adaptation in new environmental niches and acquisition of bacterial resistance. We used PHASTer (Arndt et al., 2016) to search for putative prophage elements in the complete and draft Weissella genomes to identify a total of 127 phages, of which 44 were considered as intact (Supplementary Table S1) and the number of phages in each strain ranged from 1 to 9. The large number of incomplete phages might be due to the draft nature of the assembly, as  the phages generally have the repeated sequences. The intact phages from Weissella, ranged from 14 to 74 kb in length and majority had a length of 20-40 kb ( Figure 5A). To identify the presence of potential pathogenic genes in the intact phages, we looked for the possible presence of virulence factors and antibiotic resistance genes. We did not find any virulence factors in the phages and a search in the CARD database (McArthur et al., 2013) using perfect and strict hits did not find any genes responsible for drug resistance. All the intact phages were further analyzed using Victor (Meier-Kolthoff and Göker, 2017) and phylogenetic tree was created. Based on the analysis, we grouped the phages in five groups. Our finding suggested that among the two intact phages present in W. hellenica 0916-4-2, one clustered with the phage from W. jogaejeotgali FOL01 and the next one fell onto relatively distinct cluster ( Figure 5B). Interestingly, we found that two separate strains of W. soli (CECT7031 and KACC11848) and W. cibaria (AM27-22LB and AM27-24) shared a common phage, suggesting a very close intraspecies relationship.

Secondary Metabolic Potential
The genome assemblies were analyzed using antiSMASH (Blin et al., 2017) for the possible presence of gene clusters encoding for secondary metabolites. We found a total of 10 predicted secondary metabolites classified as: arylpolyene (1), bacteriocin (4), lassopeptide (2), NRPS (1), and thiopeptide (2) ( Table 5). Given that only 9 out of 53 strains harbored the gene clusters for secondary metabolites, the genus Weissella can be regarded as a low producer of secondary metabolites. Further, this data was compared with other families of the order Lactobacillales. The genome assemblies were downloaded from NCBI and bioinformatic analysis for the presence of secondary metabolite biosynthetic gene clusters was performed. Bacteriocin constituted the major class of metabolite predicted to be present in the genome. Furthermore, in contrast to Weissella, the other members of Leuconostocaceae family were found to harbor about one secondary metabolite gene cluster per genome ( Table 6). Among all the Lactobacillales, Lactococcus and Streptococcus were found to be relatively higher producers of metabolites with the presence of about three gene clusters per genome.

SUMMARY
The finished genome of W. hellenica 0916-4-2 is 1.93 Mb with a chromosome and two plasmids. We found that this strain can utilize D-cellobiose, a cellulose derivative, and harbored two putative secondary metabolite biosynthetic gene clusters. The ability of 0916-4-2 to utilize raffinose can be exploited to enhance the nutritional value of fermented vegetables. Comparative genomic analysis revealed a high degree of genomic variation among Weissella species. In recent post-genomic era, although the genome sequence data are becoming increasingly available for diverse bacterial species including Weissella, a species which has both probiotic and opportunistic pathogenic potential, the future research should focus on the detailed investigation of the genome and the association to specific gene(s) to the functional traits.

DATA AVAILABILITY
The complete genome assembly of Weissella hellenica 0916-4-2 has been deposited at DDBJ/ENA/GenBank with accession numbers: CP033608, CP033609, and CP033610 for chromosome, pWHSP041, and pWHSP020, respectively. The BioProject accession number for this project is: PRJNA503947.

AUTHOR CONTRIBUTIONS
SP, HH, and KS designed the study. SP and AP performed the genome sequencing and annotation. SP, AP, and JB performed the comparative genomic analysis. SP and AP wrote the manuscript. HH and KS integrated the research and critically revised the manuscript for important intellectual content. KS approved the final version of the manuscript.