Molecular Evolutionary Analyses of the RNA-Dependent RNA Polymerase Region in Norovirus Genogroup II

Noroviruses are the leading cause of viral gastroenteritis in humans across the world. RNA-dependent RNA polymerase (RdRp) plays a critical role in the replication of the viral genome. Although there have been some reports on a limited number of genotypes with respect to the norovirus evolution of the RdRp region, no comprehensive molecular evolution examination of the norovirus GII genotype has yet been undertaken. Therefore, we conducted an evolutionary analysis of the 25 genotypes of the norovirus GII RdRp region (full-length), collected globally using different bioinformatics technologies. The time-scaled phylogenetic tree, generated using the Bayesian Markov Chain Monte Carlo (MCMC) method, indicated that the common ancestor of GII diverged from GIV around 1443 CE [95% highest posterior density (HPD), 1336–1542]. The GII RdRp region emerged around 1731 CE (95% HPD, 1703–1757), forming three lineages. The evolutionary rate of the RdRp region of the norovirus GII strains was estimated at over 10−3 substitutions/site/year. The evolutionary rates were significantly distinct in each genotype. The composition of the phylogenetic distances differed among the strains for each genotype. Furthermore, we mapped the negative selection sites on the RdRp protein and many of these were predicted in the GII.P4 RdRp proteins. The phylodynamics of GII.P4, GII.P12, GII.P16, and GII.Pe showed that their effective population sizes increased during the period from 2003 to 2014. Our results cumulatively suggest that the RdRp region of the norovirus GII rapidly and uniquely evolved with a high divergence similar to that of the norovirus VP1 gene.


INTRODUCTION
Noroviruses belonging to the family Caliciviridae, genus Norovirus, are a major causative agent of gastroenteritis (Green, 2013;Robilotti et al., 2015). Accumulating epidemiological evidence suggests that norovirus infections affect both children and adults and that, globally, this agent causes approximately 60% of all viral gastroenteritis cases and 200,000 deaths in children under 5 years old (Robilotti et al., 2015). Moreover, noroviruses cause large-scale food poisoning outbreaks throughout the world (Le Guyader et al., 2008;Räsänen et al., 2010;Hardstaff et al., 2018).
Noroviruses are classified into seven genogroups (GI-GVII) according to the norovirus VP1 gene sequences (Vinjé, 2015). Among them, GI, GII and GIV viruses are known to infect humans. Furthermore, GI and GII viruses are classified into 9 and 22 genotypes, respectively, by their VP1 gene sequences (Vinjé, 2015). The norovirus genome encodes three open reading frames (ORF). Of these, ORF1 encodes seven genes, including that of the RNA-dependent RNA polymerase (RdRp) region. The RdRp region plays a critical role in the norovirus genome replication.
Previous reports have suggested that the GII.P2 and GII.P16 RdRp regions exhibit a large genetic divergence as does the VP1 gene encoded in ORF2 (Nagasawa et al., 2018a). The RdRp genotypes of GII viruses are currently classified into nearly 30 genotypes on the basis of their RdRp region sequences (Vinjé, 2015;Chhabra et al., 2018). Furthermore, recombination of the norovirus genome frequently occurs at ORF1 and at two junctions, resulting in the formation of new chimeric viruses (Bull et al., 2005;Bull et al., 2007). For example, the ORF1 of the GII.2 virus primarily encodes two distinct types of RdRp proteins, which are produced by GII.P2 and GII.P16 (GII.P2-GII.2 and GII.P16-GII.2, a chimeric virus; Bidalot et al., 2017;Lu et al., 2017;Mizukoshi et al., 2017;Niendorf et al., 2017). Notably, a previous report revealed the VP1 gene of the GII.2 virus encoding two distinct RdRp genotypes reside in different clusters (Mizukoshi et al., 2017). Furthermore, the evolutionary rate of the VP1 gene in these viruses differs significantly (Mizukoshi et al., 2017). Thus, it is possible that the RdRp region modulates the evolution of the VP1 gene (Mizukoshi et al., 2017). These observations imply that it is important to study the evolution of the RdRp region of GII viruses because little information is presently available. Moreover, GII viruses representing multiple genotypes may represent the major causative agents of gastroenteritis.
Recently developed bioinformatics methods have enabled the analysis of the evolution of various viral genes, including those of the norovirus (Kimura et al., 2015;Kobayashi et al., 2015;Takahashi et al., 2018). Here, we conducted a detailed molecular evolutionary analysis of the RdRp region in the norovirus GII viruses on the basis of the globally collected strain sequences.

Strain Selection
Full-length nucleotide sequences of the norovirus GII RdRp region were collected from the GenBank 1 (accessed on May 10, 2018). The classification of the strains was conducted using a norovirus genotyping tool (Kroneman et al., 2011) and all the sequences of the GII genogroup were selected. Strains with an unknown collection year and ambiguous sequences with undetermined base sequences (e.g., N, Y, and V) were omitted from the dataset. At this point, the dataset was comprised 1 http://www.ncbi.nlm.nih.gov/genbank of approximately 1,500 strains of the RdRp region sequences. However, a selective pressure analysis could not be performed owing to software capacity limitations. The nucleotide identity among the 1,500 RdRp region sequences was calculated using Clustal Omega (Sievers et al., 2011). One sequence was randomly selected from a group of homologous sequences with ≥99.4% identities while the others were excluded from the dataset. In addition, the dataset was subjected to a protocol for the detection of recombination within the RdRp region using RDP4.95 software with seven primary exploratory recombination signal detection methods (RDP, GENECONV, BOOTSCAN/RESCAN, MAXCHI, CHIMAERA, SISCAN, and 3SEQ; Martin et al., 2015). The threshold of the p-value for significance was set to 0.001. The recombinant regions were defined as those that were detected by more than four of these methods, which were then removed from the dataset. Upon completing this refinement, 484 strains were finally used in this study (Supplementary Table S1). The sequences in the dataset were aligned with the MAFFT software (Katoh and Standley, 2013).

Construction of a Time-Scaled Phylogenetic Tree Using the Bayesian Markov Chain Monte Carlo (MCMC) Method
Time-scaled phylogenies were constructed by the Bayesian MCMC method in the BEAST software package v2.4.8 (Drummond and Rambaut, 2007;Bouckaert et al., 2014). In order to estimate the phylogenetic relationships in the RdRp region between distinct norovirus genogroups, we added the nucleotide sequences of the human norovirus GI, porcine GII (GII.P11 and GII.P18), bovine GIII and human GIV strains to the dataset (yielding a total of 489 strains). First, we determined the best substitution model (GTR+I+ ) using the jModelTest2 software (Guindon and Gascuel, 2003;Darriba et al., 2012). Next, the best of four clock models (i.e., strict clock, relaxed clock exponential, relaxed clock log normal and random local clock) and two tree prior models (i.e., coalescent constant population and coalescent exponential population) was selected by path sampling/stepping stone-sampling marginal-likelihood estimation (Baele et al., 2012). As a result, the dataset was analyzed using a strict clock and tree prior of coalescent constant population. The MCMC was run on chain lengths of 100,000,000 steps with sampling at every 2,000 steps. The analyzed data were then evaluated by the effective sample size using Tracer 2 software and values larger than 200 were accepted. The maximum clade credibility trees were created by discarding the first 10% of the trees (burn-in) using the TreeAnnotator v2.4.8 in the BEAST2 package. The time-scaled phylogenetic trees were visualized using the FigTree 3 v1.4.0 software. The reliability of branches was supported by the 95% HPD interval. Moreover, the evolutionary rates for the RdRp genotypes of the norovirus GII, including more than 10 strains (GII.P4, P7, P12, P16, P21, and Pe), were also estimated using the appropriate models determined from the datasets described above.

Calculation of Phylogenetic Distance
Phylogenetic trees were created from the datasets of the norovirus GII genotypes and each RdRp genotype, including more than 10 strains, based on the maximum likelihood (ML) method in the MEGA7 software package (Kumar et al., 2016). The best substitution models were determined using the jModelTest2. The phylogenetic distances between the norovirus GII strains were calculated from the pairwise ML distance of the ML tree using Patristic software (Fourment and Gibbs, 2006).  Sali, 2014, 2016). The crystal structures of the GII.P4 strain RdRp protein (PDB ID: 1SH0) were used as the template for homology modeling. The amino acid sequences of the template and target strains were aligned using the MAFFTash software (Standley et al., 2007;Katoh et al., 2009). The constructed structures were minimized using GROMOS96 (van Gunsteren et al., 1996) implemented in the Swiss PDB Viewer v4.1 (Guex and Peitsch, 1997) and structural reliability was evaluated using Ramachandran plots through the RAMPAGE server (Lovell et al., 2003), resulting in favored regions of 97.7 ± 0.44%, allowed regions of 1.9 ± 0.43% and outlier regions of 0.4 ± 0.09% [mean ± standard deviation (SD)] for all residues in each structure. Non-synonymous (dN) and synonymous (dS) substitution rates at each codon were calculated using the Datamonkey server to estimate the positive and negative selection sites in the norovirus GII RdRp regions and in each genotype containing more than three strains (Pond and Frost, 2005;Delport et al., 2010). Sites under positive and negative selection were assigned by three methods [singlelikelihood ancestor counting (SLAC), fixed effects likelihood (FEL), and internal fixed effects likelihood (IFEL)], using a significance level of p < 0.05. A two-tailed extended binomial distribution was used to calculate the p-value for SLAC. The FEL and IFEL were based on a single-degree-offreedom likelihood ratio test (using an asymptotic chi-squared distribution) for classifying a site as positively or negatively selected. The final structural models were modified and colored using the Chimera v1.13 software (Pettersen et al., 2004). The substitution sites of the other RdRp proteins, compared to a prototype GII.P8 strain (accession no. AB039780) and the negative selection at these sites were mapped onto the structure.

Bayesian Skyline Plot Analysis
The genealogical population size of the norovirus strains was estimated using the Bayesian skyline plot algorithm with BEAST v2.4.8. Appropriate substitution and clock models were selected as described above. The analyzed plots were visualized with 95% HPD using Tracer 2 .

Statistical Analyses
Statistical analyses were conducted using EZR statistical software on the Kruskal-Wallis test with multiple comparisons and the Holm test for evolutionary rates and phylogenetic distances (Kanda, 2013). Detailed statistical data are presented in Supplementary Tables S2, S3.

Phylogenetic Distances of the RdRp Region in the Norovirus GII Strains
We also analyzed the phylogenetic distances of the RdRp region in the norovirus GII strains, which averaged 0.549 ± 0.486 (mean ± SD, Figure 3A). The phylogenetic distances of GII.P4 and GII.P7 were 0.084 ± 0.044 and 0.124 ± 0.056, respectively. These histograms were distributions with a broad range of the distances (Figures 3B,C). Furthermore, the phylogenetic distance of GII.P12 was 0.058 ± 0.036 ( Figure 3D). The phylogenetic distance of GII.P16 was 0.064 ± 0.063, and the histogram was bimodal ( Figure 3E). Moreover, the phylogenetic distance of GII.P21 was 0.053 ± 0.025 ( Figure 3F). The phylogenetic distance of GII.Pe was 0.032 ± 0.023, and the histogram was bimodal (Figure 3G). The phylogenetic distances of each genotype differed significantly between the norovirus GII strains (p < 0.05 or p < 0.001), excluding those between GII.P12

Mapping and Structure of Amino Acid Substitutions and Negative Selection Sites on the Norovirus GII RdRp Protein
We mapped the substitution sites of the other RdRp proteins using GII.P8 (a prototype, accession no. AB039780) as a reference strain, because GII.P8 was the first to diverge from the common ancestor of the norovirus GII polymerase region (Supplementary Figure S1). Several amino acid substitutions were predicted on the exterior structures of RdRp proteins, but no substitution was found adjacent to the active site of the enzyme (Figure 4 and Supplementary Figure S2). Five common amino acid substitutions (Met160Val, Ile221Met, Leu227Ile, Gln383Glu, and Arg408Gln) were detected in all genotypes of the RdRp protein.
We first mapped the negative selection sites and amino acid substitution sites on the RdRp protein structure of the norovirus GII viruses. Of the 171 substitution sites per monomer, 158 were estimated to be negative selection sites between the prototype GII.P8 strain and other strains, according to the amino acid residues in the dataset of the norovirus GII strains ( Table 2). Furthermore, of the negative selection sites in each genotype, the GII.P4, GII.P7, and GII.P16 strains contained 49, 7 and 12 sites per monomer, respectively ( Table 2 and Supplementary  Table S4). There were five or fewer negative selection sites per monomer in other genotypes (GII.P1, P12, P21, P22, and Pe), while no negative selection sites were found in the GII.P2, P3, P6, P17, P23, Pc, and Pg strains ( Table 2 and Supplementary  Table S4). No positive selection sites were predicted in all the norovirus GII strains and each genotype strain (data not shown).

Phylodynamics of the RdRp Region in the Norovirus GII Strains Using the BSP Method
We analyzed the phylodynamics of the norovirus GII RdRp region using the BSP method; the detailed parameters are shown in Supplementary Table S5. The mean effective population size remained constant until approximately 1990 in the    Figure 5B). The mean effective population sizes of GII.P12, GII.P16, and GII.Pe increased approximately between 2003 and 2004, 2014 and 2015, and 2009 and 2011, respectively, while no size changes were observed in the other polymerase genotypes (Figures 5C-G). In addition, the effective population sizes of the norovirus GII strains were estimated to have been maintained at over 10 2 for approximately 250 years ( Figure 5A).

DISCUSSION
Molecular evolutionary analyses of the norovirus RdRp region have been reported in some previous studies (Siebenga et al., 2010;Matsushima et al., 2015;Nagasawa et al., 2018a). For example, Siebenga et al. (2010) mainly studied the GII.P4 RdRp region, while Matsushima et al. (2015) studied the RdRp region and the VP1 gene of GII.P17-GII.17. Furthermore, Nagasawa et al. (2018a) analyzed the RdRp region and the VP1 gene of GII.P2-GII.2 and GII.P16-GII.2. To the best of our knowledge, this is the first comprehensive molecular evolutionary study of a large number of polymerase genotypes (25) from globally collected strains. Our cumulative findings suggest that (i) the common ancestor of the RdRp region of the analyzed strains, including GI, GII, GIII, and GIV, diverged around 1150 years ago (867 CE) and that the GII RdRp region diverged from GIV around 570 years ago (1443 CE) to form three major lineages of the norovirus GII strains (Figure 1); (ii) the RdRp region evolved rapidly (>10 −3 substitutions/site/year) and the evolutionary rates of the different genotypes were significantly different ( Figure 2); (iii) several amino acid substitutions of the RdRp protein (39-107 sites) were detected in various genotypes, while negative selection sites were distinct in each genotype (Figure 4 and Table 2); (iv) the phylodynamics of the GII RdRp region differed between the genotypes. These results suggest that the norovirus GII RdRp region evolved rapidly with several amino acid substitutions. Furthermore, the evolutionary mechanism of each genotype may be different.
The time-scaled evolutionary tree of the GII RdRp region using the Bayesian MCMC method showed similar evolutionary trends to the one produced using the GII VP1 gene (Kobayashi et al., 2016). Specifically, Kobayashi et al. (2016) reported that the common ancestor of the GI, GII, GIII, and GIV VP1 gene diverged around 1160 years ago (854 CE), while the common ancestor of the GII and GIV diverged around 570 years ago (1445 CE) and formed three major lineages of the norovirus GII. However, the divergence years of the norovirus GII RdRp region and the VP1 gene were different (290 vs. 380 years ago). Although the datasets differed between the previous and present studies, timescaled evolution of the region and the gene might fall into step. Several evolutionary studies of the norovirus genes have been reported, although these mainly focused on the VP1 (capsid) gene (Bok et al., 2009;Boon et al., 2011;Qiao et al., 2016;Motoya et al., 2017;Hernandez et al., 2018;Sang and Yang, 2018). However, some studies focused on the evolution of the norovirus RdRp region (Siebenga et al., 2010;Mahar et al., 2013;Nagasawa et al., 2018a). For example, Siebenga et al. (2010) estimated that, based on partial sequences (247 nt), the evolutionary rate of the GII.4 RdRp region was 8.98 × 10 −3 substitutions/site/year. Another report showed that the rate for the GII.3 RdRp region (274 nt) was 2.79 × 10 −3 substitutions/site/year (Mahar et al., 2013). Furthermore, Nagasawa et al. (2018a) estimated that the rate for the full-length GII.P16 was 2.03 × 10 −3 substitutions/site/year. However, there have been no evolutionary studies on the various norovirus genotypes in the RdRp region; therefore, it was important to comprehensively study the evolution of the norovirus GII RdRp region from globally collected strains. In this study, we showed that the GII RdRp region evolved rapidly (>10 −3 substitutions/site/year) and that distinct evolutionary rates occurred in each genotype (Figure 2). These new findings may contribute to a more complete understanding of the genetic properties of the norovirus GII RdRp region.
We also analyzed the phylogenetic distances in the present strains (Figure 3) and found that the composition of the distances differed among the strains for each genotype (Figures 3B-G). The histograms of the distances of the GII.P4 and GII.P7 genotypes showed broad range distributions, suggesting a larger genetic divergence than that in other genotypes. The histograms of the distances of the other genotypes showed narrow-range distributions and suggested that they were a small genetic divergence. Thus, the GII RdRp region exhibits a varied genetic divergence, as does the GII VP1 gene (Kobayashi et al., 2016;Motoya et al., 2017).
We then estimated the negative selection sites in the RdRp protein in silico and mapped these sites. Several possible negative selection sites were identified in the RdRp protein of each genotype (Figure 4 and Supplementary Figure S2). Of these, many of the negative selection sites (49 sites per monomer) at the amino acid substitution sites, compared to the reference strain (GII.P8), were found in the GII.P4 RdRp protein. In general, negative selection sites may play a role in preventing the loss of viral protein function, because most mutations are deleterious (Domingo, 2006). In addition, other data suggest that amino acid substitutions (291Thr or 291Val) enhance the GII RdRp activity (Bull et al., 2010). The present data suggest that such substitutions occurred in the GII.P4 protein (291Thr) and that the GII.P4 may, therefore, be more efficient at replicating the viral genome, although we did not examine its activity in vitro in this study. Epidemiological data suggest that the GII.P4-GII.4 was responsible for the pandemics of the gastroenteritis between 2006 and 2014 (Kumazaki and Usuku, 2015). A high replication efficacy of the GII.P4 RdRp protein may have been associated with these pandemics, although further studies are required to test this hypothesis.
Finally, we also assessed the phylodynamics of various genotypes of the GII RdRp region (Figure 5). The phylodynamics of GII.P4, GII.P12, GII.P16, and GII.Pe fluctuated at certain times. For example, GII.P4 increased at around 2004-2008, while GII.P12, GII.P16 and GII.Pe increased at around 2003-2004, 2014-2015, and 2009-2011, respectively. We determined the year when the population size of the polymerase type increased to the year in which the VP1 genotypes of the strains were obtained. In the genotypes of GII.P4, GII.P12, and GII.Pe, the collection year of GII.4 was almost consistent with the year when the population size increased. However, in the GII.P16, the collection years of GII.2 and GII.4 agreed with the year of increase of the population size. The collection year of these detected genotypes was consistent with those of previous epidemiological reports on the GII.2 and GII.4 (Okada et al., 2006;Motomura et al., 2008;Chen et al., 2015;Nagasawa et al., 2018b). A similar phenomenon was previously reported in the phylodynamics of the GII VP1 gene (Kobayashi et al., 2016;Sang and Yang, 2018). Therefore, information about the phylodynamics of norovirus genes may contribute to understanding the past prevalence of each norovirus genotype.
In this study, a limited number of GII RdRp sequences from the GenBank 1 were used, which may have led to some selection bias. In addition, sequences with ≥99.4% identity were omitted. These biases may have affected the results of the bioinformatics analyses. In particular, the result of the phylogenetic distance may have been biased by analyzing the dataset of all sequences and each genotype excluding ≥99.4% identity sequences.

CONCLUSION
In conclusion, the common ancestor of the GII RdRp region diverged around 290 years ago (1731 CE) with a high evolutionary rate (over 10 −3 substitutions/site/year). As a result, the GII RdRp region and proteins exhibit high diversity. Moreover, the phylodynamics of some RdRp genotypes have changed drastically over the past 10 years. These results should contribute to a better understanding of the norovirus virology.

AUTHOR CONTRIBUTIONS
KO, KK, and HK designed the study. KO, YM, KN, TM, MK, and HK analyzed the data. KO, YM, AR, KK, and HK wrote and supervised the study. All authors read and approved the manuscript.

FUNDING
This work was partly supported by a commissioned project for Research on Emerging and Re-emerging Infectious Diseases from the Japan Agency for Medical Research and Development, AMED (Grant No. 18fk0108033h0002).