Predicting Directions of Changes in Genotype Proportions Between Norovirus Seasons in Japan

The norovirus forecasting system (NOROCAST) has been developed for predicting directions of changes in genotype proportions between human norovirus (HuNoV) seasons in Japan through modeling herd immunity to structural protein 1 (VP1). Here 404 nearly complete genomic sequences of HuNoV were analyzed to examine whether the performance of NOROCAST could be improved by modeling herd immunity to VP2 and non-structural proteins (NS) in addition to VP1. It was found that the applicability of NOROCAST may be extended by compensating for unavailable sequence data and observed genotype proportions of 0 in each season. Incorporation of herd immunity to VP2 and NS did not appear to improve the performance of NOROCAST, suggesting that VP1 may be a suitable target of vaccines.


INTRODUCTION
Norovirus (NoV) is an etiological agent of acute gastroenteritis in humans (Kapikian et al., 1972), causing 18% of all cases (Ahmed et al., 2014) and 200,000 deaths worldwide annually (Patel et al., 2008). The human NoV (HuNoV) is more prevalent in developed countries (20%) and low-mortality developing countries (19%) than in high-mortality developing countries (14%) (Ahmed et al., 2014). Therefore, improvement in sanitation and hygiene may not be sufficient for control and prevention of HuNoV, and development of vaccines is demanded.
Based on the similarity in the amino acid sequence of VP1, NoV is divided into genogroups GI-GVII (Zheng et al., 2006;Vinje, 2015). HuNoV consists of GI, GII, and GIV strains, which are further classified into genotypes GI.1-GI.9, GII.1-GII.22, and GIV.1 and GIV.2, respectively (Kroneman et al., 2013;Vinje, 2015). Multiple genotypes of HuNoV co-circulate every season changing genotype proportions (Suzuki et al., 2016;Thongprachum et al., 2016). Different genotypes of HuNoV appeared to be antigenically related to each other with various degrees (Hansman et al., 2006) and have been grouped into several immunotypes (Parra et al., 2017). In addition, the HuNoV infection appeared to be mediated via person-to-person, food-borne, and water-borne transmissions with some preferences depending on the genotypes (Mathijs et al., 2012). It may therefore be useful to predict changes in genotype proportions between HuNoV seasons for determining the genotypes to be included in vaccines and the human populations to be vaccinated in each season. The norovirus forecasting system (NOROCAST) has been developed for predicting changes in genotype proportions between HuNoV seasons in Japan (Suzuki et al., 2016). The herd immunity to VP1 was taken into account based on the fitness model that was originally proposed for describing changes in proportions of influenza A virus strains (Luksza and Lassig, 2014;Suzuki, 2015). The purpose of the present study was to examine whether the performance of NOROCAST could be improved by taking into account the herd immunity to VP2 and NS in addition to VP1.  Table S1), which were converted into the observed genotype proportions (Supplementary Table S2). Multiple genotypes of GI and GII strains but no GIV strains were observed in these seasons.

METHOD
Nearly complete genomic sequences of GI and GII strains containing the entire coding regions of VP1, VP2, and NS provided with the information on the isolation year and month, were retrieved from the International Nucleotide Sequence Database (INSD). The INSD accession numbers for the 404 sequences analyzed in the present study were listed in Supplementary Table S3. These sequences were classified according to the isolation season and the VP1 genotype, which was determined by the Norovirus Genotyping Tool (version 1.0) (Kroneman et al., 2011) (Supplementary Table S4). It appeared that the sequence data were biased toward more prevalent genotypes and were sometimes unavailable for less prevalent genotypes in each season. For each of VP1, VP2, and NS, multiple alignment of 404 amino acid sequences was made using the computer program MAFFT (version 7.215) (Katoh et al., 2002).
In the fitness model (Luksza and Lassig, 2014), the proportion of strain i in the target season t (p i(t) ) was predicted from that in the season immediately before the target season (pre-target season) t − 1 (p i(t−1) ) bŷ where f i is the fitness of strain i. In NOROCAST (Suzuki et al., 2016), f i was defined as where f 0 is a constant to ensure ip i(t) = 1. The term c i denotes the effect of herd immunity to VP1 of strain i elicited by strain j's that have circulated from season 2006/2007 to the pre-target season, namely Here k VP1 is the coefficient of herd immunity to VP1, which was assumed to decline by fraction l VP1 for each of amino acid differences in VP1 between strains i and j designated as d ij(VP1) (Lindesmith et al., 2011). The terms t i and t j denote the isolation seasons of strains i and j, respectively. The herd immunity was assumed to decline 20% each season, so that the expected duration of human immunity was roughly 5 years (Simmons et al., 2013).
In the present study, the herd immunity to VP2 and NS was also taken into account, so that c i was re-defined as Here k VP2 , l VP2 , and d ij(VP2) and k NS , l NS , and d ij(NS) are parameters for VP2 and NS, respectively, corresponding to k VP1 , l VP1 , and d ij(VP1) for VP1, respectively. For predicting genotype proportions in the target season, parameters were optimized using the observed genotype proportions from season 2006/2007 to the pre-target season. Optimization was performed with the genetic algorithm (Tomita et al., 2000) through minimizing the sum of squared deviations of the predicted genotype proportions from the observed genotype proportions over the seasons from season 2008/2009 to the pre-target season. Three random real numbers were assigned as the initial parameter values to confirm the convergence of the optimization (data not shown) (Suzuki, 2013(Suzuki, , 2015. The optimized parameter values were adopted for predicting directions of changes in genotype proportions from the pre-target season to the target season. The performance of NOROCAST was evaluated retrospectively by predicting directions of changes in genotype proportions setting the target seasons to be from season 2008/2009 to season 2016/2017. It should be noted that in NOROCAST (Suzuki et al., 2016), genotype proportions could not be predicted or were always predicted to be 0 when the sequence data were unavailable or the observed genotype proportions were 0 in the pre-target season, respectively. Therefore, the prediction was considered to be correct when both the observed and the predicted directions of changes in genotype proportions (increase or decrease) were the same, and incorrect when they were different. In the present study, unavailable sequence data in a particular season were compensated by the available sequence data in the closest season before or after that season or both within the same genotype (Supplementary Table S5). The observed genotype proportions of 0 were also replaced with an arbitrary value (5 × 10 −6 , 5 × 10 −5 , or 5 × 10 −4 ) that was smaller than 5.099 × 10 −4 (= 1 1961 ), which was the lowest positive value obtainable in Supplementary Table S2.

RESULTS
The accuracy of NOROCAST in predicting directions of changes in genotype proportions was examined setting the target seasons to be from season 2008/2009 to season 2016/2017 (Supplementary Tables S6-S14). When only the herd immunity to VP1 was taken into account without replacing the observed genotype proportions of 0, the numbers of correct and incorrect predictions were 66 and 62, respectively, yielding the accuracy of 0.516 (P = 0.724 by the χ 2 test) ( Table 1). However, by replacing the observed genotype proportions of 0 with 5 × 10 −6 , 5 × 10 −5 , or 5 × 10 −4 , directions were predicted correctly and incorrectly in 86 and 62 cases, respectively, so that the accuracy was 0.581 (P = 0.0485 by the χ 2 test). The accuracy was elevated mainly because changes in genotype proportions became predictable even when the observed genotype proportions were 0 in the pre-target season ( Table 1).
However, the accuracy of NOROCAST did not appear to be improved by taking into account the herd immunity to VP2 and NS in addition to VP1 (Supplementary Tables S6-S14). That is, without replacing the observed genotype proportions of 0, both the numbers of correct and incorrect predictions were 64, yielding the accuracy of 0.5 (P = 1 by the χ 2 test) ( Table 1). By replacing the observed genotype proportions of 0 with 5 × 10 −6 , 5 × 10 −5 , or 5 × 10 −4 , directions were predicted correctly and incorrectly in 85-87 and 61-63 cases, respectively, so that the accuracy was 0.574-0.588 (P = 0.0326-0.0705 by the χ 2 test) ( Table 1).
It should be noted that the direction of change in genotype proportion was always (9 out of 9 cases with VP1; P = 0.00391 by the binomial test) or almost always (8 out of 9 cases with VP1, VP2, and NS; P = 0.0352 by the binomial test) predicted correctly for GII.4, which was the most prevalent genotype in HuNoV (Supplementary Tables S6-S14). Interestingly, GII.4 has been classified as the evolving genotype, whereas other genotypes have been classified as the static genotypes (Parra et al., 2017). It is possible that GII.4 strains may be influenced by the herd immunity more strongly than the strains of other genotypes, which may have resulted in faster evolution and higher predictability of the direction of change in genotype proportion for GII.4 than other genotypes.

DISCUSSION
In the previous study (Suzuki et al., 2016), NOROCAST was developed for predicting directions of changes in genotype proportions between HuNoV seasons in Japan modeling the herd immunity to VP1. From the analysis of 1458 VP1 sequences setting the target seasons to be from season 2008/2009 to season 2014/2015, directions were predicted correctly and incorrectly in 45 and 30 cases, respectively, yielding the accuracy of 0.6 (P = 0.0833 by the χ 2 test). In the present study, the effect of modeling herd immunity to VP2 and NS in addition to VP1 was examined from the analysis of 404 nearly complete genomic sequences of HuNoV.
When only the herd immunity to VP1 was taken into account without replacing the observed genotype proportions of 0, directions were predicted correctly and incorrectly in 52 and 48 cases, respectively, for the same target seasons as above, so that the accuracy was 0.52 (P = 0.689 by the χ 2 test). Compared with the previous results, the number of predictions was raised because unavailable sequence data were compensated but the accuracy was lowered possibly because a smaller number of sequences were analyzed in the present study. However, the accuracy was recovered by replacing the observed genotype proportions of 0 with an arbitrary value that was smaller than the lowest positive value obtainable as the observed genotype proportion over all HuNoV seasons.
In contrast, the performance of NOROCAST did not appear to be improved by incorporation of herd immunity to VP2 and NS, suggesting that VP1-derived virus-like particle (VLP) and VP1-P domain-derived P-particle may be suitable as vaccines (Hansman et al., 2006;Kocher and Yuan, 2015). Nevertheless, VP2 and NS may still be incorporated into the fitness model for improving the performance of NOROCAST. It may be interesting to model the fitness effects of environmental conditions such as temperature, humidity, and pH, which may affect the functions of VP2 and NS as well as VP1 (de la Noue et al., 2014;Samandoulgou et al., 2015). In addition, the fitness effect of herd immunity to VP1 may depend not only on the number of amino acid differences between HuNoV strains but also on their impacts on the binding affinity to the HuNoV receptor, which has not been identified yet (Murakami et al., 2013;Haga et al., 2016;Orchard et al., 2016). It may be critical to identify the amino acid sites in VP1, VP2, and NS involved in these functions.
When the target season was set to be season 2017/2018, it was consistently predicted that the proportions of GII.2-GII.4 and GII.17 may decrease, whereas those of other genotypes may increase, regardless of whether the observed genotype proportions of 0 were replaced with 5 × 10 −6 , 5 × 10 −5 , or 5 × 10 −4 and whether the herd immunity to VP2 and NS was taken into account in addition to VP1 (Supplementary Table S15). The predicted genotype proportions can be seen on the NOROCAST website (URL: http://www.nsc.nagoya-cu.ac.jp/~yossuzuk/norocast.html). To validate these results and predict genotype proportions in the future HuNoV seasons, it may be critical to update the observed genotype frequencies and the sequence data of HuNoV.

AUTHOR CONTRIBUTIONS
YS, YHD, HK, HS, KS, and KK designed this study and wrote the manuscript. YS and YHD analyzed the data. HK, HS, KS, and KK supervised this study. All authors read and approved the manuscript.

FUNDING
This research was supported by AMED under Grant Numbers 18fk0108033h0002, 18fk0108034h0402, and 18fk0108033s0702.