Comprehensive Analysis of Clinically Significant Hepatitis B Virus Mutations in Relation to Genotype, Subgenotype and Geographic Region

Hepatitis B virus (HBV) is a highly variable DNA virus due to its unique life cycle, which involves an error-prone reverse transcriptase. The high substitution rate drives the evolution of HBV by generating genetic variants upon which selection operates. HBV mutants with clinical implications have been documented worldwide, indicating the potential for spreading and developing their own epidemiology. However, the prevalence of such mutants among the different HBV genotypes and subgenotypes has not been systematically analyzed. In the current study, we performed large-scale analysis of 6,479 full-length HBV genome sequences from genotypes A-H, with the aim of gaining comprehensive insights into the relationships of relevant mutations associated with immune escape, antiviral resistance and hepatocellular carcinoma (HCC) development with HBV (sub)genotypes and geographic regions. Immune escape mutations were detected in 10.7% of the sequences, the most common being I/T126S (1.8%), G145R (1.2%), M133T (1.2%), and Q129R (1.0%). HBV genotype B showed the highest rate of escape mutations (14.7%) while genotype H had no mutations (P < 0.001). HCC-associated mutations were detected in 33.7% of the sequences, with significantly higher frequency of C1653T, T1753V and A1762T/G1764A in genotype G than C (P < 0.001). The overall frequencies of lamivudine-, telbivudine-, adefovir-, and entecavir-resistant mutants were 7.3, 7.2, 0.5, and 0.2%, respectively, while only 0.05% showed reduced susceptibility to tenofovir. In particular, the highest frequency of lamivudine-resistant mutations was observed in genotype G and the lowest frequency in genotype E (32.5 and 0.3%; P < 0.001). The prevalence of HBV mutants was also biased by geographic location, with North America identified as one of the regions with the highest rates of immune escape, antiviral resistance, and HCC-associated mutants. The collective findings were discussed in light of natural selection and the known characteristics of HBV (sub)genotypes. Our data provide relevant information on the prevalence of clinically relevant HBV mutations, which may contribute to further improvement of diagnostic procedures, immunization programs, therapeutic protocols, and disease prognosis.


INTRODUCTION
Hepatitis B is one of the most prevalent viral infections in humans and a major global public health problem. Hepatitis B virus (HBV) has infected one-third of the global population, with 257 million chronic infections worldwide and more than 880,000 deaths per year, mostly from cirrhosis and hepatocellular carcinoma (HCC) (WHO, 2020).
Hepatitis B virus is the prototype member of the family Hepadnaviridae containing a partially double-stranded relaxed circular DNA genome (∼3.2 kb) with a compact coding organization containing four partly or completely overlapping open reading frames (Seeger et al., 2013). Based on >7.5% genomic sequence divergence, HBV has been phylogenetically classified into nine genotypes (A-I) and one putative genotype (J). The significant diversity within specific HBV genotypes has led to further classification into numerous subgenotypes (Kramvis, 2014). HBV genotypes and subgenotypes have distinct geographic distributions. The unique replication cycle of HBV includes the activity of an error-prone reverse transcriptase (RT) that generates numerous viral variants (quasispecies). Constant HBV evolution leads to continued selection of mutations through pressure exerted by endogenous (host immune system) and exogenous (antiviral therapy and vaccination) factors, which fuels a molecular arms race between virus and host, resulting in emergence of mutations involved in several clinical effects (Araujo et al., 2011;Rajoriya et al., 2017;Revill et al., 2020).
The HBV surface antigen (HBsAg) contains a highly conserved antibody-neutralizing epitope cluster termed "a" determinant, which spans amino acids 124-147. Mutations occurring within or around this region lead to conformational changes, which can affect binding of neutralizing antibodies produced during natural infection or following active or passive immunization (Zuckerman and Zuckerman, 2003). These immune escape mutations account for several possible consequences, such as false-negative results by commercial HBsAg assays (occult hepatitis B) and evasion of anti-HBV immunoglobulin therapy and vaccine-induced immunity. G145R was the first escape mutation described and the most frequently detected HBV variant with proven vaccine escape properties in humans (Carman et al., 1990). Over the years, a number of other mutations associated with immune escape have been documented worldwide (Cooreman et al., 2001;Lazarevic et al., 2019).
The influence of HBV diversity on severity of liver disease has been investigated mainly for genotypes A-D (McMahon, 2009;Kao et al., 2010;Levrero and Zucman-Rossi, 2016). Genotype C is considered more oncogenic than genotypes A, B, and D (Chan et al., 2004;Wong et al., 2013). Likewise, mutations in the basal core promoter (BCP) resulting in decreased expression of HBeAg but enhanced viral replication and deletions in the Pre-S region are associated with increased risk of HCC (Liu et al., 2009).
The widespread distribution of HBV mutations with clinical implications poses a considerable challenge for design of diagnostic assays and current treatment strategies and is a potential threat to the long-term success of mass vaccination programs (Lazarevic, 2014;Yan et al., 2017). However, little is known on the frequencies of these mutations in different HBV genotypes. Knowledge of HBV subgenotypes is even more limited. Additionally, the unique distribution patterns of HBV (sub)genotypes in different geographic areas has hampered simultaneous comparisons. In the current study, we performed large-scale analysis of 6,479 full-length HBV genome sequences from genotypes A-H retrieved from public database with the aim of gaining comprehensive insights into the relationships of the most relevant escape-, resistance-and HCC-associated mutations with HBV (sub)genotypes and geographic regions. Knowledge of the prevalence of clinically relevant mutations among the different HBV genotypes and subgenotypes and their worldwide distribution should facilitate improvement of diagnostic procedures, immunization programs, therapeutic protocols, and disease prognosis.

Statistical Analysis
Statistical analysis was performed using SPSS statistical software package version 23.0 (IBM SPSS Inc., Chicago, IL, United States). Frequencies were compared using the chi-squared test or Fisher's exact test. P-values < 0.05 were considered statistically significant.

Immune escape Mutations
Ten positions within HBsAg (P120, I/T126, Q129, G130, M133, K141, P142, S/T143, D144, and G145) incorporating 25 immune escape mutations were analyzed. The amino acid frequencies at each position in different HBV genotypes are shown in Table 1. Overall, immune escape mutations were detected in 691 of 6,479 sequences (10.7%). HBV genotype B showed the highest frequency of mutation (14.7%), followed by genotypes C (11.2%), G (10%), D (9.3%), A (7.5%), E (5.1%), and F (2.0%) (P < 0.001). No escape mutations were detected in genotype H sequences. The most common substitutions were I/T126S (1.8%), FIGURE 1 | Phylogenetic analysis of HBV complete genome sequences representative of genotypes A-J. Maximum likelihood phylogenetic tree of 6,538 complete genome sequences (6,434 from HBVdb database plus 104 reference sequences). Sequences whose subgenotype could not be determined (n = 45) were excluded from the analysis. The branches are colored according to the genotypes. (Sub)genotypes are indicated next to their respective clusters. The numbers in branches indicate the statistical support (aLRT value).

Geographic Distribution of HBV Mutants
The frequencies of HBV mutants according to geographic region are shown in Figure 3 and Table 7. Central Asia was the only region where no mutants were detected. The highest rates of HCC-associated mutants were observed in Southern Asia (41%), followed by North America (37%) and Eastern Asia (36.5%) and the lowest rates in Middle Africa (14.6%), Western Europe (17.5%), and Australia and New Zealand (19.6%) (P < 0.001). Immune escape mutants were detected more often in North America (16.3%), Eastern Asia (12.3%) and South-eastern Asia (12%) and less frequently in Middle Africa (0), Eastern Africa (1.4%) and the Caribbean (1.6%) (P < 0.001). Regarding antiviral therapy resistance, HBV sequences from Australia and New Zealand, Caribbean, Melanesia, Northern Africa, Northern Europe, Polynesia, Southern Africa, and Western Africa contained no NA-resistant mutations. Higher frequencies of LAM-and LdT-resistant mutants were found in North America (40.9% for both), Southern Europe (24% for both) and Western Europe (9.1 and 8.4%), and lower frequencies in Eastern Europe (0.7% for both), Middle Africa (1.0% for both) and Eastern Africa (1.4% for both) (P < 0.001). ADV-resistant mutants were specifically identified in Eastern Asia (1.1%), North America (0.2%) and Southern Asia (0.2%) and ETV-resistant mutants in Southern Asia (0.4%) and Eastern Asia (0.1%) (P > 0.05 for all). Mutants with reduced TDF and TAF susceptibility were solely detected in Eastern Asia (0.1%) (Figure 3 and Table 7).

DISCUSSION
Evolution is an important aspect of epidemiology of viral diseases. HBV evolution occurs through both mutation and recombination processes (Kay and Zoulim, 2007;Araujo, 2015). Since HBV replication involves an error-prone reverse transcription step, nucleotide substitution rates are higher than those for other DNA viruses. The estimated HBV mutation rates are reported to range from 10 −4 to 10 −6 substitutions/site/year (Zehender et al., 2014;Paraskevis et al., 2015;Spitz et al., 2019). This high substitution rate allows HBV to evolve rapidly and adapt to changes in the host environment, leading to the emergence of mutations with important implications for prevention, diagnosis, treatment, and prognosis of infection. HBV mutants have been documented worldwide (Zhang et al., 2016), indicating the potential to spread in a range of human populations and develop their own epidemiology.
HBV genotypes and subgenotypes have been associated with differences in disease progression, response to antiviral therapy, and clinical outcomes McMahon, 2009;Kim et al., 2011;Liu and Kao, 2013). One potential reason is that HBV mutations may be more common among some (sub)genotypes than among others. However, due to the unique distribution patterns of HBV genotypes in Asian and Western countries, epidemiological data on HBV mutants are based on comparisons between genotypes B and C or A and D, with little information on the other genotypes. In this study, mutations related to immune escape, antiviral resistance, and HCC development were analyzed in 6,479 HBV genome sequences classified into genotypes A-H. To our knowledge, the present study has conducted the most comprehensive analysis of the association of clinically relevant mutations of HBV with genotypes, subgenotypes, and geographic regions.
most prevalent among patients under 15 years of age after mass vaccination in China (Yan et al., 2017). Notably, significant differences in the frequencies of escape mutations among HBV genotypes were observed in this study, with genotypes A-D and G showing higher rates than E, F, and H. The lack of escape mutations observed in genotype H should be interpreted with caution due to the low number of genotype H sequences available for analysis. However, similar findings were reported by Ma and Wang (2012), who showed that HBV genotypes A-D harbor more escape mutations than the other genotypes. Likewise, Di Lello et al. (2019) observed a significantly higher rate of escape mutations in genotype D (33%) than genotype F (2.3%). The results collectively suggest low circulation of immune escape mutants among the Amerindian (and more divergent) HBV genotypes F and H, and highlight the necessity to monitor genotypes A (mainly A2), B, C (mainly C6-C15), D, and G.
Moreover, since the HBV vaccine used worldwide is produced with HBsAg subgenotype A2, stronger selection pressure of the humoral immune response against the "a" determinant of HBV-A2 isolates may occur. Consequently, higher rates of immune escape mutants are expected in subgenotype A2 than other (sub)genotypes. In fact, significant differences in the frequencies of immune escape mutants were observed within genotype A, with A2 showing the highest rates. However, other (sub)genotypes included higher rates of escape mutants than A2, suggesting that this potential selective pressure on subgenotype A2 is not yet a concern for virus vaccine design. Hepatocellular carcinoma development due to HBV infection is a multifactorial process associated with viral, host and environmental factors. Patients with chronic HBV infection have a 20-fold higher risk of HCC than non-infected individuals (Parkin, 2006). The ability to identify patients at risk of HCC is crucial, since progression to liver cancer in the   The P-value was listed as "-" when the specific mutation was found in no genotype (statistical analysis not conducted). LAM, lamivudine; LdT, telbivudine; ADV, adefovir; ETV, entecavir; TDF, tenofovir disoproxil fumarate; TAF, tenofovir alafenamide; QS, quasi-subgenotype. a Sequences with at least one immune escape mutation. b Sequences with at least one HCC-associated mutation. c Amino acid substitution profile for LAM resistance: rtM204V/I. d Amino acid substitution profile for LdT resistance: rtM204I or L180M + M204V. e Amino acid substitution profile for ADV resistance: rtA181T/V or rtN236T. f Amino acid substitution profile for ETV resistance: rtL180M + rtM204I/V ± rtT184S/G ± rtS202I/G ± rtM250V. g Amino acid substitution profile for TDF/TAF-reduced susceptibility: rtA181T/V + rtN236T.
absence of cirrhosis has been documented in chronic HBV infection cases (McMahon, 2009;Chayanupatkul et al., 2017;Rajoriya et al., 2017). A previous meta-analysis showed that Pre-S deletions, C1653T in enhancer II, and T1753V and A1762T/G1764A in BCP are associated with increased risk of HCC compared with HBV without mutations (Liu et al., 2009). Significant differences in these mutations were observed among HBV genotypes (P < 0.001). Genotype G showed by far the highest frequency (≥95%) of C1653T, T1753V, and A1762T/G1764A. Genotype G is associated with more severe liver disease in HIV-HBV co-infected patients (Lacombe et al., 2006;Dao et al., 2011;Malagnino et al., 2019) but its role in HCC development remains to be established. Genotype C displayed the highest frequency of Pre-S deletions (3.3%) and the second highest rates of C1653T, T1753V and A1762T/G1764A variations. These findings are consistent with several previous reports demonstrating the association of genotype C with increased risk of HCC relative to other major HBV genotypes (Chan et al., 2004;Liu et al., 2009;Wong et al., 2013).
Interestingly, a correlation between genotype F infection and development of HCC among native Alaskan people was demonstrated in an earlier study (Livingston et al., 2007). Furthermore, Kowalec et al. (2013) showed that development of HCC in the native Alaskan population is significantly associated with specific mutations within the core promoter region of genotype F isolates (Kowalec et al., 2013). In the current study, all genotype F sequences from Alaska were classified as subgenotype F1. On the other hand, subgenotype F2 has been associated with development of HCC in Venezuelan patients (Puche et al., 2016;Pujol et al., 2020). In fact, both subgenotypes showed similar frequencies of A1762T/G1764A (F1, 29.3%; F2, 27.6%), which were higher than those observed for subgenotypes F3 and F4 (15.4% and 10.7%, P = 0.027). Moreover, significantly higher frequency of T1753V was observed in subgenotype F2 (44.8%, P < 0.001), whereas F1 was the only subgenotype that contained Pre-S deletions. The collective results suggest that HBV subgenotypes F1 and F2 are more prone to harboring HCC-associated mutations than subgenotypes F3 and F4, resulting in stronger oncogenic potential.
Likewise, the frequency of Pre-S deletions in subgenotype A1 was significantly higher than that in other A subgenotypes (3.0%, P = 0.008) and slightly lower than that in genotype C (3.3%). Notably, 62.5% (5 of 8) of A1 sequences with Pre-S deletions were obtained from African patients (Supplementary Table S1).     The P-value was listed as "-" when the specific mutation was found in no genotype (statistical analysis not conducted). LAM, lamivudine; LdT, telbivudine; ADV, adefovir; ETV, entecavir; TDF, tenofovir disoproxil fumarate; TAF, tenofovir alafenamide; QS, quasi-subgenotype. a Sequences with at least one immune escape mutation. b Sequences with at least one HCC-associated mutation. c Amino acid substitution profile for LAM resistance: rtM204V/I. d Amino acid substitution profile for LdT resistance: rtM204I or L180M + M204V. e Amino acid substitution profile for ADV resistance: rtA181T/V or rtN236T. f Amino acid substitution profile for ETV resistance: rtL180M + rtM204I/V ± rtT184S/G ± rtS202I/G ± rtM250V. g Amino acid substitution profile for TDF/TAF-reduced susceptibility: rtA181T/V + rtN236T.
The high frequency of Pre-S deletions in A1 may contribute to the enhanced pathogenicity of this subgenotype and its association with high rates of HCC in sub-Saharan Africa Kramvis and Kew, 2007). In contrast, the predominance of genotype H among the Mexican population is associated with low prevalence of HCC (Roman and Panduro, 2013). Consistently, genotype H showed the lowest rates of T1753V and A1762T/G1764A and no Pre-S deletions, and was identified as the HBV genotype with fewer HCC-associated mutant sequences. Although NA therapies have been successful in sustained viral suppression, long-term use of treatments with a low barrier to resistance contributes to the emergence of resistant HBV mutants. Therefore, such agents (LAM, LdT and ADV) are no longer recommended as first-line therapy (Easl, 2017;Terrault et al., 2018). However, management of NA failure remains a crucial issue, especially in low-resource countries where ETV, TDF, and TAF are not available for naïve patients or treatmentexperienced patients. Consistently, the overall rates of mutations inducing resistance to LAM and LdT were higher than those for other NAs (LAM, 7.3%; LdT, 7.2%; ADV, 0.5%; ETV, 0.2%). In addition, very low rates of mutants with reduced susceptibility to TDF and TAF (0.05%, 3/6479) were observed.
The link between genotype and response to treatment with NA-based regimens is still unclear. Results from the current study showed significant differences in the rates of LAM-, LdT-, ADV-, and ETV-resistant mutants among HBV genotypes and subgenotypes. Genotype G showed the highest rates of LAMand LdT-resistant mutants, whereas genotype E had the lowest ones (32.5 and 0.3% for both NAs, respectively, P < 0.001). However, these differences may not be exclusively due to intrinsic characteristics of HBV genotypes. Notably, genotype G has been frequently found in HIV-infected men who have sex with men (MSM) (Sanchez et al., 2007;Osiowy et al., 2008;Araujo et al., 2013;Cornelissen et al., 2016), suggesting a strong association with this risk group. In addition, LAM has been widely used in the treatment of both HBV and HIV viruses. This may lead to a stronger selective pressure due to the more frequent use of LAM on genotype G isolates than other genotypes, ultimately resulting in higher frequencies of LAMresistant mutants (and LdT due to cross-resistance). Likewise, genotype E is highly endemic in most sub-Saharan Africa     The P-value was listed as "-" when the specific mutation was found in no genotype (statistical analysis not conducted). LAM, lamivudine; LdT, telbivudine; ADV, adefovir; ETV, entecavir; TDF, tenofovir disoproxil fumarate; TAF, tenofovir alafenamide; QS, quasi-subgenotype. a Sequences with at least one immune escape mutation. b Sequences with at least one HCC-associated mutation. c Amino acid substitution profile for LAM resistance: rtM204V/I. d Amino acid substitution profile for LdT resistance: rtM204I or L180M + M204V. e Amino acid substitution profile for ADV resistance: rtA181T/V or rtN236T. f Amino acid substitution profile for ETV resistance: rtL180M + rtM204I/V ± rtT184S/G ± rtS202I/G ± rtM250V. g Amino acid substitution profile for TDF/TAF-reduced susceptibility: rtA181T/V + rtN236T. (Andernach et al., 2013) where therapy is not always accessible to HBV chronic patients (Spearman et al., 2017). Consequently, weaker antiviral selection pressure on HBV isolates may occur in this region than in others where NA-based regimens are widely available, restricting the expansion of genotype E resistant mutants throughout populations. Interestingly, significant differences in mutation profiles of same (sub)genotype among geographic regions were observed. In this regard, the differences in the rates of HCC-associated mutants between genotype C isolates from Asia and Oceania, and genotype F isolates from North America and South America, may be due to the circulation of distinct subgenotypes in these regions. On the other hand, subgenotype A2 isolates from North America showed significantly higher rates of immune escape and HCC-associated mutants than those from Western Europe. Likewise, genotype E isolates from Western Africa had significantly more HCC mutations than those from Middle Africa. Moreover, the highest rates of LAMresistant mutants within (sub)genotypes A2, B, D, and G were observed in North America. Altogether, these findings suggest that the same (sub)genotype may have different potentials for immune escape, oncogenicity and treatment response, depending on the selective pressures (e.g., host genetic background, prophylactic and therapeutic regimens) applied in the geographic locations.
According to Carman's study (Carman, 1997), viral variants occur naturally and may have been selected over centuries, while viral mutants have been selected over a short period by human intervention (therapy or prophylaxis). It is not always clear that a sequence substitution is a functional mutation (mutant) or simply a characteristic (variant) of a specific (sub)genotype. Variants are expected to be more frequent than mutants within (sub)genotypes. Interestingly, the antiviral resistance-and immune escape mutations analyzed in our study showed much lower frequencies within (sub)genotypes than HCC-associated mutations. This might be due to the fact that therapy and prophylaxis are recent selective forces, while HCC-associated mutations have been selected during much longer time by host The P-value was listed as "-" when the specific mutation was found in no genotype (statistical analysis not conducted). LAM, lamivudine; LdT, telbivudine; ADV, adefovir; ETV, entecavir; TDF, tenofovir disoproxil fumarate; TAF, tenofovir alafenamide. a Sequences with at least one immune escape mutation. b Sequences with at least one HCC-associated mutation. c Amino acid substitution profile for LAM resistance: rtM204V/I. d Amino acid substitution profile for LdT resistance: rtM204I or L180M + M204V. e Amino acid substitution profile for ADV resistance: rtA181T/V or rtN236T. f Amino acid substitution profile for ETV resistance: rtL180M + rtM204I/V ± rtT184S/G ± rtS202I/G ± rtM250V. g Amino acid substitution profile for TDF/TAF-reduced susceptibility: rtA181T/V + rtN236T.
immune pressure. Here, the vast majority of the mutations showed low frequencies within (sub)genotypes, and therefore, do not seem to be variants of any specific group. However, we have demonstrated that some mutations were biased by (sub)genotype. It is possible that over the years of viral evolution, they will become variants within these groups. This have occurred with BCP mutations in genotype G (frequency rate ≥ 95%), which have been stably integrated into the genome. The analysis of such a large dataset provided in this study will be helpful for further studies in establishing the likelihood that a sequence substitution is actually a mutant or a variant of a specific HBV (sub)genotype. Finally, a number of limitations of this study need to be considered. Our analyses were based on HBV sequences from public databases, which may not be entirely representative of the globally circulating strains. Additionally, each sequence was considered a single HBV isolate, not taking into account the occurrence of serial sequences representing the same single isolate (e.g., clones), which may have generated some bias in our results. However, the large number of sequences analyzed may have overcome these limitations, at least in part. Moreover, information on whether the sequence was obtained before or after antiviral treatment or HCC development was not obtained here, which would have contributed to an accurate analysis for the relationship between mutation and clinical outcome. Nevertheless, since obtaining such information from all sequences was not possible, only mutations that have been strongly associated with immune escape, antiviral resistance, and HCC development in previous studies were selected for analysis. Lastly, due to the large variation in the number of sequences available for each (sub)genotype, over-or underrepresentation of some (sub)genotypes may have influenced the results. Therefore, although some results corroborated previous data, others should be further confirmed.

CONCLUSION
We have comprehensively investigated HBV mutations with clinical implications and their associations with genotype, Frontiers in Microbiology | www.frontiersin.org  The P-value was listed as "-" when the specific mutation was found in no genotype (statistical analysis not conducted). LAM, lamivudine; LdT, telbivudine; ADV, adefovir; ETV, entecavir; TDF, tenofovir disoproxil fumarate; TAF, tenofovir alafenamide. a Sequences with at least one immune escape mutation. b Sequences with at least one HCC-associated mutation. c Amino acid substitution profile for LAM resistance: rtM204V/I. d Amino acid substitution profile for LdT resistance: rtM204I or L180M + M204V. e Amino acid substitution profile for ADV resistance: rtA181T/V or rtN236T. f Amino acid substitution profile for ETV resistance: rtL180M + rtM204I/V ± rtT184S/G ± rtS202I/G ± rtM250V. g Amino acid substitution profile for TDF/TAF-reduced susceptibility: rtA181T/V + rtN236T.