Molecular Evolution of the VP1 Gene in Human Norovirus GII.4 Variants in 1974–2015

Human norovirus (HuNoV) is a leading cause of viral gastroenteritis worldwide, of which GII.4 is the most predominant genotype. Unlike other genotypes, GII.4 has created various variants that escaped from previously acquired immunity of the host and caused repeated epidemics. However, the molecular evolutionary differences among all GII.4 variants, including recently discovered strains, have not been elucidated. Thus, we conducted a series of bioinformatic analyses using numerous, globally collected, full-length GII.4 major capsid (VP1) gene sequences (466 strains) to compare the evolutionary patterns among GII.4 variants. The time-scaled phylogenetic tree constructed using the Bayesian Markov chain Monte Carlo (MCMC) method showed that the common ancestor of the GII.4 VP1 gene diverged from GII.20 in 1840. The GII.4 genotype emerged in 1932, and then formed seven clusters including 14 known variants after 1980. The evolutionary rate of GII.4 strains was estimated to be 7.68 × 10−3 substitutions/site/year. The evolutionary rates probably differed among variants as well as domains [protruding 1 (P1), shell, and P2 domains]. The Osaka 2007 variant strains probably contained more nucleotide substitutions than any other variant. Few conformational epitopes were located in the shell and P1 domains, although most were contained in the P2 domain, which, as previously established, is associated with attachment to host factors and antigenicity. We found that positive selection sites for the whole GII.4 genotype existed in the shell and P1 domains, while Den Haag 2006b, New Orleans 2009, and Sydney 2012 variants were under positive selection in the P2 domain. Amino acid substitutions overlapped with putative epitopes or were located around the epitopes in the P2 domain. The effective population sizes of the present strains increased stepwise for Den Haag 2006b, New Orleans 2009, and Sydney 2012 variants. These results suggest that HuNoV GII.4 rapidly evolved in a few decades, created various variants, and altered its evolutionary rate and antigenicity.

The HuNoV major capsid (VP1) protein is considered to be closely associated with the infectivity and antigenicity of these strains . Many previous reports suggested that the HuNoV VP1 gene rapidly evolved, resulting in a large divergence of antigenicity (Allen et al., 2008;Bok et al., 2009;Yang et al., 2010;Debbink et al., 2012a;Zakikhany et al., 2012). Earlier findings also indicated that the rapid evolution of the VP1 gene in HuNoV is strongly associated with gastroenteritis pandemics caused by HuNoV (Robilotti et al., 2015).
Various bioinformatic techniques for investigating viral genes may contribute to a better understanding of the evolution of the viruses (Andernach et al., 2013;Añez et al., 2013;Sun et al., 2017;Wei and Li, 2017). Of these, phylogenetic analyses using the Bayesian Markov chain Monte Carlo (MCMC) method can characterize evolutionary processes by estimating the date of divergence and evolutionary rates using sequences with a known sample-collection date for various organisms (Drummond and Bouckaert, 2015). Additionally, the Bayesian skyline plot (BSP) allows prediction of the transition of the infectious population size for pathogens (Hall et al., 2016). Conformational epitope analyses using in silico approaches also enable us to identify and map binding sites of host antibodies on three-dimensional structures of proteins (Potocnakova et al., 2016). For HuNoV, a large number of studies on viral evolution have been performed using bioinformatics (Giammanco et al., 2014;Kimura et al., 2015;Kobayashi et al., 2015Kobayashi et al., , 2016Lu et al., 2016;Parra et al., 2017;Siqueira et al., 2017;Tohma et al., 2017). Previous reports described the evolutionary rates of some HuNoV genotypes using the Bayesian MCMC method and linear regression analyses, which suggested differences in such rates among the genotypes . In addition, the blockade epitopes on the VP1 protein in some HuNoV genotypes were identified by amino acid variation analyses and molecular biological techniques using the virus-like particle (VLP) (Lindesmith et al., 2014). Previous phylogenetic approaches also showed that GII.4 continuously created genetically different clusters, which resulted in the emergence of many variants (Fioretti et al., 2014;Qiao et al., 2016). However, the differences of evolutionary features among all GII.4 variants, including recently discovered strains, have not been comprehensively elucidated (Bok et al., 2009;Siebenga et al., 2010;Qiao et al., 2016). It may be important to understand why GII.4 has frequently produced escape from previously acquired immunity and repeatedly caused large outbreaks. Moreover, a recent study constructed a system for predicting the HuNoV genotype composition in the peak season using a unique algorithm, namely, NOROCAST (Suzuki et al., 2016). Thus, these bioinformatics technologies may contribute to our understanding of the evolution of the HuNoV GII.4 capsid gene, including the changes of antigenicity, for advancing the prevalent prediction systems for HuNoV. In the present study, using various bioinformatic techniques, we compared the molecular evolution of all GII.4 variants strains using globally collected full-length HuNoV GII.4 VP1 gene sequences.

Strains Used in this Study
We obtained the full-length nucleotide sequences of HuNoV VP1 gene from GenBank 1 in December 2015. The strains employed were classified using norovirus genotyping tool (Kroneman et al., 2011), and sequences of GII.4 genotype were selected. Of these, strains with ambiguous sequences and unknown year of collection were excluded from the dataset. At this point, complete GII.4 VP1 sequences of approximately 2,000 strains were included in the dataset. However, due to limitation in computing capacity, positive and negative selection analyses could not include sequences with greater than 99.0% identity, as it resulted in over 500 sequences. Thus, the VP1 sequences with >98.9% identity were omitted from the dataset. Moreover, the dataset was analyzed to detect recombination within the VP1 gene using RDP4.95 with seven primary exploratory recombination signal detection methods (RDP, GENECONV, BOOTSCAN/RECSCAN, MAXCHI, CHIMAERA, SISCAN, 3SEQ) (Martin et al., 2015). The p-value threshold for significance was set as 0.001. Recombinant regions was determined when it was detected by more than four of these methods. However, this dataset contained no recombinant sequences. Finally, 466 strains were included in this study ( Figure  S1, Table S1). The sequences of the dataset were aligned using MAFFT software (Katoh and Standley, 2013) and were then subjected to bioinformatic analysis as described below.
Time-Scaled Phylogenetic Tree Constructed Using the Bayesian Markov Chain Monte Carlo Method Molecular clock evolutionary dynamics was performed using the Bayesian MCMC method in the BEAST package v2.4.6 (Suchard et al., 2001;Drummond and Rambaut, 2007;Bouckaert et al., 2014). To construct the correct phylogenetic tree (Choudhuri, 2014), we added the nucleotide sequences of all GII genotypes, including porcine NoV GII (GII.11,GII.18,and GII.19) and other HuNoV GII genotypes (18 strains), and an outgroup strain, HuNoV GI genotype (GI.1) into the dataset mentioned above (a total of 488 strains). First, we selected the appropriate substitution model using the jModelTest2 program (Guindon and Gascuel, 2003;Darriba et al., 2012). Next, the best of four clock models (Strict clock, Exponential relaxed clock, Relaxed clock log normal, Random local clock) and two tree prior models (Coalescent constant population, Coalescent exponential population) were determined by the method of path-sampling/stepping stone-sampling marginallikelihood estimation. Finally, the dataset was analyzed using the strict clock and constant tree prior models. The MCMC chain lengths were 300,000,000 steps with sampling every 20,000 steps using high-performance computers [system: x3850X6, CPU: Intel(R) Xeon (R) CPU E7-8890 v3 @ 2.50GHz, memory: 1TB] with GPU [Tesla K20c; global memory: 5,061 MB, clock speed; 0.71 GHz, number of cores: 2,496]. The analyzed data were evaluated by the effective sample size using Tracer 2 , and values >200 were accepted. The maximum clade credibility tree was constructed after we discarded the first 10 or 15% ("burn-in") of the trees using the TreeAnnotator v2.4.6 in the BEAST 2 package. The phylogenetic tree was visualized by the FigTree v1.4.0 program. The reliability of branches was supported by 95% highest posterior densities (HPDs). Furthermore, the evolutionary rates of HuNoV GII.4, the variants including more than 10 strains (US95_96, Farmington Hills 2002, Asia 2003, Hunter 2004, Yerseke 2006a, Den Haag 2006b, Osaka 2007, Apeldoorn 2007, New Orleans 2009, and Sydney 2012 and the domains were also estimated using appropriate models selected for each dataset as described above.

Similarity Analysis
Similarities between the aligned nucleotide sequences and the Bristol 1993 variant were visualized using the SimPlot program (Lole et al., 1999). Two strains of Bristol 1993 variant (X76716 and FJ537137) were used as query sequences. The similarity was examined using a window size of 200 nucleotides in length (nt) and a step size of 20 nt in the full-length VP1 genes. 2 http://tree.bio.ed.ac.uk/software/tracer/

Calculation of the Phylogenetic Distance
Phylogenetic trees were generated from the datasets of all GII.4 strains and each GII.4 variant, including more than three strains, based on the maximum likelihood method in the MEGA6 software. The best substitution models were determined by the jModelTest2. The phylogenetic distances between GII.4 strains were calculated from the constructed phylogenetic trees by the Patristic program (Fourment and Gibbs, 2006).

Selective Pressure Analysis
Non-synonymous (dN) and synonymous (dS) substitutions rates at each codon were calculated by the Datamonkey server to estimate the positive and negative selection sites in capsid VP1 genes of HuNoV GII.4 and each GII.4 variant (Pond and Frost, 2005;Delport et al., 2010). Sites under positive (dN > dS) and negative (dN < dS) selection were determined by three methods (SLAC, FEL, and IFEL) based on a significance level of p < 0.05. For SLAC, the two-tail extended binomial distribution was used to culculate the p-value. The FEL and IFEL were based on the single degree of freedom likelihood ratio test (chi-squared asymptotic distribution was used) to classify a site as positively or negatively selected.  (Sweredoski and Baldi, 2008), EPCES (Liang et al., 2007), and EPSVR (Liang et al., 2010). Cut-off values of epitopes were set at −3.7 (DiscoTope 2.0), 1.3 (BEPro), and 70 (EPCES, EPSVR). Consensus sites in all of the four tools and regions with close residues over two of the sites on the VP1 dimer structures were determined as conformational epitopes.

Mapping of Positive Selection Sites, Predicted Epitopes, and Amino-Acid
Substitutions on the Three-Dimensional Structure of HuNoV GII.4 VP1 Proteins VP1 dimer structural models of each GII.4 variant were constructed using MODELLER v9.15 (Webb and Sali, 2014). The templates for homology modeling were based on the crystal structures of four strains (PDB ID: 1IHM, 4X07, 4OP7, and 4OPS). The capsid structure of GI (PDB ID: 1IHM) was used as a template for the construction of a shell domain in whole VP1 structures. Amino acid sequences of five templates and the targeted strains were aligned by MAFFTash (Katoh et al., 2002;Standley et al., 2007). The constructed models were minimized based on GROMOS96 (van Gunsteren et al., 1996), implemented by Swiss PDB Viewer v4.1 (Guex and Peitsch, 1997), and evaluated by Ramachandran plots through RAMPAGE server (Lovell et al., 2003). The final models were modified and colored by Chimera v1.11.2 (Pettersen et al., 2004). The positive selection sites, conformational epitopes of each variant, and substitution sites of the other variants to the Bristol 1993 strain were mapped on the structure.

Bayesian Skyline Plot Analysis
The genealogical population size of HuNoV GII.4 strains was estimated by BSP inference using BEAST v2.4.6. Appropriate substitution and clock models were selected as described above. Yerseke 2006a strains were analyzed in a short period for 1 year since the strains with >98.9% sequence identity were omitted. The analyzed plots were visualized with the 95% HPDs using Tracer.

Statistical Analyses
Statistical analyses were performed by EZR statistical software based on the Kruskal-Wallis test with multiple comparisons using the Holm test for the evolutionary rates and phylogenetic distance, and Fisher's exact test for the distribution of negative selection sites (Kanda, 2013). We obtained sample sizes of 18,000-135,000 for the statistical analyses of the evolutionary rates. The values obtained by dividing the numbers of chain lengths by that of the log parameters were used as sample sizes. For the phylogenetic distance analyses, we acquired 3-11,440 data that exhibited the number of all combinations between GII.4 strains included in the dataset (Mizukoshi et al., 2017). Detailed statistical data are shown in Tables S2-S4. Sample sizes of each group are also presented in Figures 2, 4.

Simplot Data Analyses of the Capsid VP1 Gene in the Present GII.4 Strains
We performed the SimPlot analysis based on the full-length capsid gene sequences in the present strains. As illustrated in Figure 3, similarities of 85-100% were found in the shell domain, whereas the P1 and P2 domains were characterized by lower similarities (75-95%). These results indicated that the genetic divergence of the VP1 gene in the GII.4 strains differed among domains.

Phylogenetic Distances of the Capsid VP1 Gene in the GII.4 Strains
We calculated the phylogenetic distance of the VP1 gene in the GII.4 strains studied here. The phylogenetic distance values of all GII.4 strains were 0.210 ± 0.105 (mean ± standard deviation) ( Figure 4A). The mean values of each variant strain were within the range 0.0163-0.0698, and the phylogenetic distances were significantly distinct between GII.4 variants. In particular, the Osaka 2007 variant strains probably included the highest number of nucleotide substitutions, whereas the branches of the strains of the Asia 2003 variant tended to be shorter than those of the other variants (Kruskal-Wallis test; p < 0.001) ( Figure 4B, Table  S4).

Mapping of Conformational Epitopes, Positive Selection Sites, and Amino-Acid Substitutions on the VP1 Structures of GII.4
We also comprehensively predicted the conformational epitope on the whole deduced GII.4 variants VP1 protein and mapped the epitopes, positive selection sites, and amino acid substitutions on the structures of the GII.4 strains to assess their relationships using in silico methods (Borley et al., 2013;Yao et al., 2013). Five regions were recognized as conformational epitopes, four of which were located in the exterior surface of the P2 domain and overlapped with the epitope regions identified by previous in vitro studies. Moreover, a number of amino-acid substitutions were found on/around the putative epitopes (Table 2, Figure 5, Figure S2). Positive selection sites for the dataset including all GII.4 strains were predicted to be amino acid (aa) 6 (Asn6Ser and Ser6Asn), aa9 (Asn9Ser, Thr, His, Lys, and Ser9Asn), and aa534 (Thr534Ala, Ser, and Ala534Val, Thr) positions, whereas no positive selection site was found in the P2 domain (Table 3). Moreover, a total of 369 residues were under negative selection, whose rates were significantly higher in the P1 domain than in the shell and P2 domains (p < 0.001) ( Table 5). The positive selection sites for the dataset separating each GII.4 variant was estimated to be aa393 (Ser393Gly, Asn, and Gly393Ser) and   Table 4). The Den Haag 2006b variant contained the most negative selection sites (59 sites), whereas seven GII.4 variants (Camberwell 1994, Kaiso 2003, Farmington Hills 2002, Lanzou 2002, Asia 2003, Hunter 2004, and Yerseke 2006a were not identified as negative selection sites ( Table 5).

Phylodynamics of the Capsid VP1 Gene in the Present GII.4 Strains
We examined the phylodynamics of human NoV GII.4 strains in the VP1 gene using the BSP analyses with the parameters shown in Table 1 respectively, whereas no changes of the sizes of the other variants were observed (Figures 6B-K). Moreover, the effective population sizes of all NoV GII.4 strains were estimated to be 10 2 for approximately 10 years ( Figure 6A). These results suggested that restricted sublineages of GII.4 increased the effective population sizes, and the GII.4 strains have adapted to humans for approximately 10 years.

DISCUSSION
In the present study, we show the molecular evolution of the capsid VP1 gene in globally collected HuNoV GII.4, including recent strains, during a period of approximately 40 years. Our new findings can be summarized as follows: (1) The evolutionary pattern, including evolutionary rates, phylogenetic distance, and infectious population size, may differ between GII.4 variants.
(2) The evolutionary rates and constraints on amino acid mutations are likely distinct between domains within GII.4 VP1.
(3) There may be few epitopes on the exterior surface of the VP1 structure other than those positions on the P2 domain, as previously identified. (4) Prevalent GII.4 variants specifically undergo selective pressure that leads to new variant that can escape the immune system of the host.
The Bayesian MCMC method is used to characterize evolutionary processes in terms of date of divergence and the rates of evolution for organism genes with high evolutionary rates, such as RNA viruses (Drummond and Bouckaert, 2015). This method has already been used to study molecular evolution not only for noroviruses but also for various other RNA viruses  Table S4. (Añez et al., 2013;Vijaykrishna et al., 2015). In particular, it is important to estimate nucleotide evolutionary rates for HuNoV since previous reports showed that the viruses could evolve with the distinct rates among the genotypes depending on difference of polymerase fidelity and immune pressure for each genotype (Bull et al., 2010). Estimation of evolutionary rates using the method may contribute to predict the frequency of emergence of variants that can escape from previously acquired host immunity in the HuNoV genotypes. However, the results of the method are affected by the sequences included in the dataset. In this study, the time-scaled evolutionary phylogenetic tree showed that the common ancestor of GII.4 and GII.20 dated back to about 175 years ago. Furthermore, during the past 40 years, the GII.4 strains evolved and formed seven clusters including 14 known variants (Figure 1). Next, we estimated the HuNoV GII.4 VP1 gene evolutionary rate as 7.68 × 10 −3 substitutions/site/year. Our Chains that are composed of the dimer structures are colored in gray (chain A) and dim gray (chain B). Predicted epitopes of each variant are colored in red and circled for regions. Positive selection sites are colored in blue for aa9, aa294, aa376, aa393, aa412, and aa534. Of note, aa6 could not be specified due to lack of structure modeling in N terminus. Amino-acid substitutions of the other variants to a GII.4 Bristol 1993 strain are colored in green.
observations are generally consistent with previous observations overall, with GII.4 having a higher evolutionary rate than other genotypes such as GII.7 (2.3 × 10 −3 substitutions/site/year), GII.3 (4.16 × 10 −3 substitutions/site/year), and GII.2 (1.31 × 10 −3 substitutions/site/year) (Bok et al., 2009;Bull et al., 2010;Siebenga et al., 2010;Boon et al., 2011;Fioretti et al., 2014; Qiao et al., 2016;Mizukoshi et al., 2017;Parra et al., 2017). In addition, we estimated the different evolutionary rates among GII.4 variants. The rates among domains (shell, P1, and P2 domains) in the VP1 gene also tended to be distinct, with that of the P2 domain being the highest (Figure 2). A previous report showed that GII.4 had replication rates of polymerase that differed among the variants (Bull et al., 2010). In addition, the prevalence of GII.4 differed among the variants (Vinjé, 2015), which may produce differences of herd immunity pressure for each GII.4 variant. We also performed similarity plot analyses in the GII.4 variant strains. Our data showed that the similarities of the shell domains were relatively high (85-100%), followed by a lower level in the P1 domains, and the lowest level in the P2 domains (75-95%), which is consistent with previous reports that showed similarities in the shell, P1, and P2 domains of around >85, >80, and >75%, respectively (Eden et al., 2013;Doerflinger et al., 2017 (Figure 4). These results suggest that GII.4 has uniquely evolved among these variants having different evolutionary rates. Earlier molecular biological and bioinformatic studies identified or estimated the blocking epitopes in the P2 domain within the GII.4 VP1 structure (Debbink et al., , 2013Lindesmith et al., 2012a,b;Parra et al., 2012;Giammanco et al., 2014). These studies were mainly focused on the P2 domain and based on the amino acid variations on the exterior surface in the domain that arose over time. However, other conformational epitopes may also be contained in the shell and P1 domains. Since amino acid sequences of the shell and P1 domains are conserved than those of the P2 domain, it may be possible to use the epitopes of the shell and P1 domains for the development of diagnostic tools for HuNoV, such as immunochromatography kits. Additionally, antibodies associated with antigenicity in positions different from previously identified epitopes may be produced by the B-cell response in the host. To the best of our knowledge, conformational epitope analyses of the whole VP1 structure, including the shell and P1 domains, of various GII.4 variants using mathematical methods have not yet been conducted. The epitopes predicted in our data were close to or overlapped with those positions identified by previous in vitro experiments using VLP and monoclonal antibodies . Furthermore, our prediction identified few epitopes, including in shell and P1 domains, other than the previously reported positions. This result may suggest difficulty in acquiring antibodies broadly responding to various GII.4 variants that can escape from previously acquired herd immunity. To the best of our knowledge, this is the first observation of this kind in this field. This study suggests the need for additional clarification of the immunology of the GII.4 VP1 protein. However, the present study was performed only in silico.
To obtain a better understanding of norovirus biology, further studies including in vitro epitope-mapping experiments for all antibodies produced by the host should be performed. Next, our analyses revealed that many common epitopes among the GII.4 variants strains were present in the P2 domain. Moreover, a number of substitutions of amino acids were detected in the conformational epitopes ( Table 2). Other in vitro studies using the HuNoV GII.4 VLP established that the antigenicity changes of the GII.4 were associated with the amino acid substitutions in the P2 domains of GII.4 VP1 proteins (White, 2014). Additionally, previous in vitro and bioinformatic studies predicted contact residues specific to antibodies in epitopes A (aa294-298, aa362, 368), B (aa333, aa382), C (aa340, aa376), D (aa393-395), and E (aa407, aa412-413). It has also been confirmed that epitope A is an antigenic region, whereas epitope D is associated with HBGA binding sites. Moreover, epitope C has been predicted to be an evolving epitope and was confirmed using a monoclonal antibody. Epitope E is associated with the potential blockade of Farmington Hills 2002 variant. However, there are no reports on epitope B based on in vitro experiments. Previous bioinformatic studies suggested that the five epitopes are associated with the production of antibodies against various GII.4 strains on the exterior surface in the P2 domain (Debbink et al., , 2013Lindesmith et al., 2012a,b;Parra et al., 2012;Giammanco et al., 2014;Garaicoechea et al., 2015). Thus, based on our results and those of other researchers, we propose that previously identified epitopes are robustly associated with the antigenicity of HuNoV GII.4.
We also estimated the selective pressures in the present GII.4 VP1 gene. The results obtained by the SLAC, FEL, and IFEL methods revealed positive selection for the whole GII.4 genotype at aa6, aa9, and aa534, which are located in the shell and P1 domains, whereas no positive selection sites were identified in the P2 domain (Table 3). However, separating for each GII.4 variant, positive selection sites in three prevalent GII.4 variants strains were predicted in the P2 domain (aa393 and aa412 in Den Haag 2006b, aa294, and aa376 in New Orleans 2009, and aa393 in Sydney 2012) ( Table 4). These sites mostly overlapped with the epitopes that were identified by our prediction and previous in vitro studies ( Table 2). Positive selection implies that non-synonymous substitutions facilitate the survival of organisms, such as escape from the immune response of the host (Kryazhimskiy and Plotkin, 2008). Moreover, a previous report predicted an association with positive selection of efficient RNA translation via secondary RNA structures (Siebenga et al., 2010). These results indicate that the prevalent GII.4 variants specifically undergoes strong selective pressure that may produce immune escape variants due to the numerous infectious populations with herd immunity.  Amino acids corresponding to epitope regions identified by Lindesmith et al. (2012a) were shown.
Frontiers in Microbiology | www.frontiersin.org  However, the importance of the positive selection in the shell and P1 domains requires further investigation by in vitro studies. Previous studies also estimated the positive selection sites in the GII.4 VP1 gene (Lindesmith et al., 2008;Bok et al., 2009;Siebenga et al., 2010;Vega et al., 2014b). For example, using the SLAC method, Bok et al. (2009) predicted six sites of positive selection (aa6, aa9, aa15, aa47, aa395, and aa534). However, the threshold for statistical significance in that study was higher than that in our study (p < 0.25). Siebenga et al. (2010) identified eight positive selection sites using FEL and IFEL (aa6, aa9, aa47, aa352, aa372, aa395, aa407, and aa534), which is partially consistent with our results. The partial discrepancy between our data and those in previous reports might have resulted from the difference in the datasets between studies. Moreover, our results revealed many sites under negative selection. In each domain, negative selection sites accounted for approximately 60-80%. The proportion of negative selection sites tended to be distinct between the domains in the VP1 gene (Table 5). However, low numbers of strains may result in no or few negative selection sites in some GII.4 variants. Negative selection exhibits non-synonymous substitutions reduce the survival of organisms. Namely, the sites should preserve synonymous substitutions to maintain the survival (Suzuki, 2004). These results suggest that a number of mutations in the VP1 gene may be under negative and neutral selection. Next, we mapped the conformational epitopes, positive selection, and amino acid substitutions on the VP1 structures in GII.4 variants. Most putative epitopes and substitutions were located on the exterior surface of the P2 domain in all variants. The epitopes overlapped with or were proximal to the substitution sites. Moreover, positive selection sites for shell and P1 domains existed at the exterior surface of the structure that was distal to the predicted epitopes. However, the positive selection sites for Den Haag 2006b, New Orleans 2009, and Sydney 2012 variants were located on the exterior surface of the epitopes (Figure 5). These results indicate that HuNoV GII.4 has evolved with frequent alteration of antigenicity. In addition, the circulation of GII.4 may also be associated with mechanisms other than immune escape.
In addition, we estimated the fluctuation in the size of infectious populations in HuNoV GII.4. The population size increased around 1994-1996and 2005. Moreover, Den Haag 2006b, New Orleans 2009, and Sydney 2012variants accumulated infectious populations in 2005-2007, 2004 (Figure 6). These results suggest that GII.4 has a capacity to repeatedly produce highly epidemic variants. A previous report showed evidence of higher affinity to a broad range of HBGA types in the Den Haag 2006b strains compared with that in other variants (de Rougemont et al., 2011). Thus, the prevalence of HuNoV GII.4 may be associated with multiple factors, such as the capacity to bind to host receptors and immune evasion.
Overall, this study utilized limited numbers of GII.4 VP1 sequences. For example, there were some biases among GII.4 variants in the amount of original data deposited in GenBank. Additionally, sequences with >98.9% identity were omitted. These biases may have affected the results of the bioinformatic analyses.
In conclusion, our findings show that the evolutionary pattern of GII.4 may be distinct among various variants. Moreover, our data may contribute to developing candidates of an efficient vaccine and prevalent prediction systems for HuNoV. In the near future, the accumulation of data from molecular epidemiological studies with continuous surveillance will be needed.