Global prevalence and molecular characteristics of three clades within hepatitis B virus subgenotype C2: Predominance of the C2(3) clade in South Korea

Hepatitis B Virus (HBV) genotypes reflect geographic, ethical or clinical traits and are currently divided into 10 genotypes (A–J). Of these, genotype C is mainly distributed in Asia, is the largest group and comprises more than seven subgenotypes (C1–C7). Subgenotype C2 is divided into three phylogenetically distinct clades, C2(1), C2(2), and C2(3), and is responsible for most genotype C infections in three East Asian nations, including China, Japan, and South Korea, which are major HBV endemic areas. However, despite the significance of subgenotype C2 with regard to clinical or epidemiologic aspects, its global distribution and molecular characteristics remain largely unknown. Here, we analyze the global prevalence and molecular characteristics between 3 clades within subgenotype C2 using 1,315 full genome sequences of HBV genotype C retrieved from public databases. Our data show that almost all HBV strains from South Korean patients infected with genotype C belong to clade C2(3) within subgenotype C2 [96.3%] but that HBV strains from Chinese or Japanese patients belong to diverse subgenotypes or clades within genotype C, suggesting clonal expansion of a specific HBV type, C2(3), among the Korean population. Our genome sequence analysis indicated a total of 21 signature sequences specific to the respective clades C2(1), C2(2), and C2(3). Of note, two types of four nonsynonymous C2(3) signature sequences, sV184A in HBsAg and xT36P in the X region, were detected in 78.9 and 82.9% of HBV C2(3) strains, respectively. In particular, HBV strains C2(3) versus C2(1) and C2(2) show a higher frequency of reverse transcriptase mutations related to nucleot(s)ide analog (NA) resistance, including rtM204I and rtL180M, suggesting an increased possibility of C2(3) infection in those with NA treatment failure. In conclusion, our data show that HBV subgenotype C2(3) is extremely prevalent in Korean patients with chronic HBV infection, which is distinct from two other East Asian nations, China and Japan, where diverse subgenotypes or clades within genotype C coexist. This epidemiologic trait might affect distinct virological and clinical traits in chronic HBV patients in Korea, where exclusively C2(3) infection is predominant.


Introduction
The discovery of hepatitis B virus (HBV) led to the first vaccine not prepared through tissue culture but that was initially directly prepared from plasma, with a later recombinant vaccine produced from yeast (da Fonseca, 2010). Despite an effective vaccine, HBV infection remains a severe global health issue, with more than 240 million people being chronic carriers and approximately 786,000 patient deaths annually worldwide due to HBV-related diseases, including liver cirrhosis (LC) and hepatocellular carcinoma (HCC; Schweitzer et al., 2015;Guvenir and Arikan, 2020). HBV is an endemic disease in South Korea (Korean National Health and Nutrition Survey of 2011); the prevalence of hepatitis B virus surface antigen (HBsAg) positivity is 3.4% in males and 2.6% in females (Yim and Kim, 2019).
HBV, which belongs to the Hepadnaviridae family, is an enveloped virus with a partially double-stranded DNA genome of approximately 3.2 kb in length that includes 4 overlapping open reading frames (ORFs) encoding the surface protein (S), core protein (C), polymerase (Pol), and HBx protein (X; Liang, 2009;Datta et al., 2012). HBV has been characterized into 10 genotypes, A to J (Sunbul, 2014), which are subdivided into more than 30 subgenotypes (Kurbanov et al., 2010;Shi et al., 2012). Each subgenotype has been reported to show distinct geographical patterns and clinical traits (Liu et al., 2021). Genotype C is the oldest and most common extant genotype and is endemic in the Asia-Pacific region (Velkov et al., 2018;Kyaw et al., 2020). Compared to genotype B, genotype C exhibits higher HBV replication capacity and tends to result in chronic infection, which may lead to the development of LC and/or HCC (Chan et al., 2003;Gao et al., 2007). In addition, incomplete response to IFN therapy and higher levels of mutations have been reported in genotype C HBV infections (Kao et al., 2000).
Genotype C is further divided into many subgenotypes, C1 to C11 (McNaughton et al., 2020). Subgenotype C2 is endemic in Far Eastern countries, including Korea, China and Japan, which are close geographically and similar in historical and socialeconomic aspects (Lin et al., 2019). Rather diverse genotypes are observed in China and Japan in addition to subgenotype C2, whereas an extremely high prevalence of subgenotype C2 has been reported in Korea (Kim et al., 2007;Cho et al., 2009). More recent studies have shown that subgenotype C2 can be further divided into three phylogenetically distinct clades, C2(1), C2(2), and C2(3) (Yin et al., 2019;McNaughton et al., 2020), but the correlation of these clades with geographic and clinical traits has not yet been analyzed.
Studies on the correlation between regional-specific genotypes and epidemiologic traits may provide an explanation for specific clinical or virological issues in local population (Lin et al., 2019). In Korea, there are higher rates of naturally occurring HBV variants related to clinical severity compared to other countries, and occult infection or potential antiviral drug resistance and a higher frequency of IFN therapy failure occur even with intensive medical care (Yim and Kim, 2019). In this study, we investigated the prevalence of three clades within subgenotype C2, i.e., C2(1), C2 (2), and C2 (3), in East Asian countries. A total of 683 whole-genome sequences of HBV subgenotype C2 of 1,315 HBV genotype C retrieved from public databases were used to analyze overall mutation rates and signature mutations specific for each clade.

Materials and methods
Acquisition of HBV genome sequence data A total of 1,315 HBV genome sequences of genotype C were used in this study. With 1,096 sequences from a previous research dataset (McNaughton et al., 2020), additional Korean and Japan sequences from NCBI 1 databases with the keywords "HBV, " "complete genome, " "Korea, " or "Japan" were downloaded, and the final 219 sequences were genotyped using phylogenetic analysis. HBV genotype C and genotype A sequence (accession no. AB116076, AB116080, AB452979) alignments were fitted to 3,215 and 3,221 bp, respectively. The genotyped datasets used in this manuscript were constructed in FASTA format (Supplementary Table S1). A flowchart of the database configuration in each analysis is shown (Figure 1).

Phylogenetic analysis and genotyping
Alignments were constructed using MAFFT with the FFT-NS-2 algorithm in Geneious Prime software (2022.1.1, Biomatters, Inc., New Zealand), and all indels in alignments were deleted. A total of 1,315 sequences were genotyped by phylogenetic analysis using the approximate maximum likelihood method with the FastTree program in Geneious Prime software (2022.1.1, Biomatters, Inc., New Zealand; Supplementary Figure S1). Then, 243 genotype C sequences, including 40, 34, and 36 well-branched sequences of each C2(1), C2(2), and C2(3) clade, were extracted. A phylogenetic tree with the 243 sequences was constructed by the MrBayes program (Huelsenbeck and Ronquist, 2001) using the MCMC method and GTR substitution model. The topology of the phylogenetic tree was confirmed by the maximum likelihood method with the Tamura-Nei model (Tamura and Nei, 1993) using the MEGA X program ). An accession number list organized by subgenotype, including geographical information, was constructed in Excel format (Supplementary Table S2).

Genetic distance calculation
A total of 683 subgenotype C2 sequences, 261, 270, and 152 sequences belonging to subgenotypes C2(1), C2(2), and C2(3), respectively, were used for genetic distance calculations. Consensus sequences were extracted using the majority rule from each subgenotype and aligned using Geneious Prime software (2022.1.1, Biomatters, Inc., New Zealand). Pairwise distance between sequences and the consensus in each subgenotype was calculated using the MEGA X program .

Mutation and signature sequence analysis
The number of preC/Core and BCP mutations and polymorphism rates for the rt269 region were analyzed for each clade (Tong et al., 2007; Frontiers in Microbiology 03 frontiersin.org Jeong et al., 2021). The amino acid composition of each clade was analyzed for 42 potential analog-resistant (NAr)-related amino acid mutations identified in a previous study . The total mutation rate for all 42 NAr sites was calculated by dividing the number of sequences in each clade multiplied by 42 by the number of mutations in each clade. By examining the source of the obtained accession number, we checked whether the patients with respective sequences experienced antiviral treatment before serum extraction. Then, the sequences were further classified into four categories: "treatment naïve, " "antiviral treatment, " "not mentioned, " and "unpublished" (Supplementary Table S3). The frequency of 42 potential NAr-related amino acid mutations with sequences from 183 "treatment-naïve" patients was further analyzed. A polymorphism that uniquely appeared with a frequency of ≥70% in each subgenotype was defined as a signature sequence. Signature sequences were found throughout the HBV genome, and the nucleotide composition ratio (A, T, G, C) with other clades was compared for each signature sequence. Due to the characteristics of the overlapping genome of HBV, nonsynonymous or synonymous mutations of two proteins can occur simultaneously within one signature sequence, and both were considered. Gaps in alignment or strains that were not properly sequenced were excluded.
Analysis of two signature mutations specific to C2(3), sV184A in HBsAg and xT36P in the X region in Korean patients Serum DNA samples from 127 Korean HBV patients who visited Konkuk University Hospital were extracted as previously described (Kim et al., 2017). For the frequency of xT36P in HBx, the HBx region of 127 DNA samples was newly sequenced using the following primers (FW: CTC TGC CGA TCC ATA CTG CGG AA, RV: TTA ACC TAA TCT CCT CCC CCA; Sugauchi et al., 2001). The frequency of sV184A in HBsAg was analyzed using the overlapping HBsAg region of reverse transcriptase sequences from the 131 Korean patients analyzed in our previous study (accession nos. KX264864~KX264922, KX264792~KX264863; Kim et al., 2017). As shown in Figure 2A, 65 patient DNA samples overlapped between the 131 samples with the reverse transcriptase sequenced in a previous study and the HBx sequences from 127 samples in this study. The frequency of the sV184A and xT36P sequences from the overlapping 65 patients was calculated. The 127 HBx sequences are shown in Supplementary Table S4. As previously extracted virion DNA from isolates was used in this study, informed consent and waiver of informed consent were not required by the IRB of the hospital (IRB1012-131-346).

Phylogenetic analysis based on reverse transcriptase sequences from 131 Korean patients
A 1,032 bp fragment of HBV DNA reverse transcriptase from 131 Korean patients was aligned to reverse transcriptase sequences extracted from the 683 subgenotype C2 sequence dataset using the MAFFT method, as described above. A phylogenetic tree was generated using the maximum likelihood method with the Tamura-Nei model (Tamura and Nei, 1993). Flowchart of dataset preparation and data composition for each analysis.

Statistical analysis
Statistical analysis was conducted using GraphPad Prism 8.0 software for Windows, GraphPad Software, San Diego, California United States, www.graphpad.com. Statistical significance of the number of amino acid differences was calculated by ANOVA (analysis of variance). For analysis of mutation sites, wild-type and mutated sequences were compared using the chi-square test, and the significance calculated is shown in related figures and tables.
HBV strains belonging to subgenotype C2(3) vs. C2(1) and C2(2) are more prone to some reverse transcriptase mutations related to nucleos(t)ide analog resistance As genotype C HBV strains have been reported to have a higher frequency of reverse transcriptase (RT) mutations related to nucleos(t)ide analog (NAr) resistance (Kim et al., 2017), the amino acid mutation frequency of 42 potential NAr positions in HBV RT sequences  was evaluated between clades C2(1), C2(2) and C2(3) in a total of 683 subgenotype C2 strains. C2(3) showed a higher frequency of mutations at a total of 42 potential NAr positions than C2(1) or C2(2) (p < 0.05; Figure 5D). In particular, the A B C

FIGURE 4
Global prevalence of three clades within hepatitis B virus subgenotype C2. (A) In China, C2(2) is the most prevalent genotype, followed by C2(1). (B) In Japan, the three subgenotypes are distributed at approximately 30%. (C) In Korea, the subgenotype C2 (3) is overwhelmingly dominant and shows lower genotypic diversity than in the other two countries.

Differentiation at the clade level of Korean patients with chronic HBV infection via sequence analysis of the HBV reverse transcriptase region
To confirm the dominance of C2(3) in Korean patients with chronic HBV infection, we evaluated the clade distribution using 131 HBV RT sequences previously reported by our laboratory (Kim et al., 2017; Figure 6). Phylogenetic analysis using the maximum likelihood method showed that among the 131 patients, all belong to C2(3) (117/131 patients, 89.3%), except for 12 patients with C2(1) and 2 with C2(2), further supporting our finding of the dominance of C2(3) in South Korea. Of the 117 Korean patients infected with C2(3), 77 (65.8%) carried the sV184A mutation in the overlapping HBsAg region and 101 (79.5%) the xT36P mutation in HBx (Figure 2). Of a total of 65 patients for whom sequence information for both HBsAg and HBx regions was available, the proportion of those carrying both sV184A and xT36P was highest at 61.5% (40/65), followed by those with xT36P alone, at 26.1% (17/65), sV184A alone, at 9.2%, and s184V/+x36T/S/A, at 3.1%.

Discussion
HBV subgenotype C2 is responsible for most genotype C infections in three nations of East Asia, namely, China, Japan and South Korea, which are major HBV endemic areas (Cho et al., 2009; The frequency of HBV mutations between three clades.   Wang et al., 2010;Demarchi et al., 2022). Subgenotype C2 is reported to have unique virological and clinical traits distinct from other genotypes and subgenotypes, including higher virulence, higher BCP or preC/C mutation frequency, and lower response to IFN-I therapy (Lin and Kao, 2011;Ito et al., 2018;Tang et al., 2018). Phylogenetic analysis based on entire HBV genome sequences to date indicates the existence of three distinct clades, C2(1), C2(2), and C2 (3), within subgenotype C2 (McNaughton et al., 2020). However, studies regarding the global prevalence and molecular characteristics of these three distinct clades remain largely unknown. In this study, we for the first time determined the global prevalence and molecular characteristics of three distinct clades, C2(1), C2(2) and C2 (3), within subgenotype C2. First, we found differences between China, Japan and South Korea with regard to the clade distribution of subgenotype C2. C2(3) was found to be exclusively predominant in South Korea, distinct from China and Japan, showing coexistence of the three clades despite a discrepancy in their distribution (Figure 4). This epidemiologic finding may provide a likely explanation for the distinct findings of several studies using Korean cohorts, including a higher frequency of NAr or BCP mutation (Kim, 2014), the presence of rarely encountered HBV mutation types (Kim et al., 2017), and a higher prevalence of HBeAgnegative HCC patients (Jang et al., 2022).
Second, we identified 21 types of signature sequences specific for C2(1), C2(2) and C2(3), which may be used for differentiation of the 3 clades as genetic markers (Table 2). In particular, 8 types of nonsynonymous signature sequences might influence the distinct virological and clinical traits of each clade. Future studies should focus on nonsynonymous signature sequences to elucidate the underlying mechanism associated with the distinct virological and clinical traits of each clade.
Third, we found that compared to the other two clades, i.e., C2(1) and C2(2), C2(3) showed a higher frequency of NAr mutations, even in rt204 and rt180, which are related to primary and secondary drug resistance, respectively (Table 1). This suggests that C2(3) infection may lead to enhanced NA treatment failure compared to the other two clades. Indeed, our previous study using a cohort of treatment-naïve patients with chronic HBV infection revealed a higher prevalence of naturally occurring NAr mutations in Korea compared to other areas, including China (Kim et al., 2017). Our previous LNA-based RT-PCR assay also showed a higher prevalence of rt204I region YMDD variants, particularly in Korean HCC patients (Choe et al., 2019). Higher rates of relapse in Korean chronic HBV infection patients with HBeAg seroconversion after lamivudine treatment have also been found, supporting the above hypothesis (Song et al., 2000).
Fourth, despite no significant difference in the frequency of preC mutation, the BCP double mutation frequency was significantly higher in chronic HBV patients infected with C2(3) or C2(2) than in those infected with C2(1) ( Figure 5). Given the relationships of BCP double mutation with liver disease progression and HBeAg-negative infection (Kim, 2014), the possibility that C2(3) or C2(2) vs. C2(1) may cause more advanced liver diseases, including HCC in HBeAg-negative Frontiers in Microbiology 09 frontiersin.org chronic HBV patients, cannot be excluded (Zhang et al., 2010;Alexopoulou and Karayiannis, 2014). In fact, a recent study using a Korean cohort reported that HCC is more prevalent in HBeAg-negative patients without liver cirrhosis than in HBeAg-positive patients without liver cirrhosis, which is distinct from other studies showing that HBeAg is not an independent risk factor for HCC (Jang et al., 2022). This issue should be further assessed in future studies using Korean cohorts.
In conclusion, our data show that HBV subgenotype C2(3) is extremely prevalent in Korean patients with chronic HBV infection, which is distinct from two other East Asian countries, China and Japan, where diverse subgenotypes or clades within genotype C coexist. In addition, we found that HBV strain C2(3) shows a higher frequency of RT mutations related to NAr, including rtM204I and rtL180 M, than strains C2(1) and C2 (2)

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary material.

Ethics statement
The studies involving human participants were reviewed and approved by the Institutional Review Board of Seoul National University Hospital (1012-131-346). Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

Author contributions
DK and B-JK designed the study. B-JK interpreted the research and wrote the first draft of the manuscript. DK performed the data analysis and revised the manuscript. Y-MC and JJ supported the data analysis and revised the manuscript. All authors contributed to the article and approved the submitted version.