Low host immune pressure may be associated with the development of hepatocellular carcinoma: a longitudinal analysis of complete genomes of the HBV 1762T, 1764A mutant

Background It has been reported that hepatitis B virus (HBV) double mutations (A1762T, G1764A) are an aetiological factor of hepatocellular carcinoma (HCC). However, it is unclear who is prone to develop HCC, among those infected with the mutant. Exploring HBV quasispecies, which are strongly influenced by host immune pressure, may provide more information about the association of viral factors and HCC. Materials and methods Nine HCC cases and 10 controls were selected from the Long An cohort. Serum samples were collected in 2004 and 2019 from subjects with HBV double mutations and the complete genome of HBV was amplified and sequenced using next-generation sequencing (NGS). Results The Shannon entropy values increased from 2004 to 2019 in most cases and controls. There was no significant difference in mean intrahost quasispecies genetic distances between cases and controls. The change in the values of mean intrahost quasispecies genetic distances of the controls between 2004 and 2019 was significantly higher than that of the cases (P<0.05). The viral loads did not differ significantly between cases and controls in 2004(p=0.086) but differed at diagnosed in 2019 (p=0.009). Three mutations occurring with increasing frequency from 2004 to 2019 were identified in the HCC cases, including nt446 C→G, nt514 A→C and nt2857T→C. Their frequency differed significantly between the cases and controls (P<0.05). Conclusions The change in the values of mean intrahost quasispecies genetic distances in HCC was smaller, suggesting that HBV in HCC cases may be subject to low host immune pressure. Increasing viral loads during long-term infection are associated with the development of HCC. The novel mutations may increase the risk for HCC.


Introduction
Primary liver cancer is the seventh most frequently occurring cancer worldwide and the second most common cause of cancer mortality (1).It has a wide geographic variation in incidence.More than 80% of cases occur in sub-Saharan Africa and Eastern Asia (2).In Asia, liver cancer is the fifth most common cancer and the second most common cause of cancer-related death (3).In most countries, 75-90% of liver cancers are hepatocellular carcinoma (HCC) (4).
The risk factors of HCC are complex and vary geographically.In high-rate HCC areas, hepatitis B virus (HBV) and aflatoxin B1 (AFB1) are the dominant factors, whereas hepatitis C virus (HCV) and alcohol are more important factors in low-to medium-rate areas (4).Chronic infection by HBV is by far the most important risk factor for HCC in high-risk areas, including China and Africa (5).Up to 60-80% of HCCs are seropositive for markers of HBV infection (6).Male carriers of hepatitis B surface antigen (HBsAg) have a greater than 300-fold higher risk of developing HCC than antigen-negative controls (7).Universal immunization against hepatitis B already has had a favorable impact on the annual incidence of HCC in children and young adults (8,9) confirming the causative role of the virus.
It has been proposed that some mutations in the genome of HBV may be involved in the development of HCC (10).G1896A, C1653T, T1753V, the basal core promoter (BCP) double mutations (A 1762 T, G 1764 A), and pre-S region deletions in the HBV genome have been reported to be associated with the development of HCC (11).The association of A1762T, G1764A double mutations and HCC has been confirmed by prospective cohort studies (12).However, it is unclear who is prone to develop HCC, among those infected with HBV with basal core promoter double mutations (1762T, 1764A).
HBV exists as a quasispecies (QS) in infected individuals (13).Various mutations may occur naturally in the HBV quasispecies during long-term infection (14).HBV quasispecies diversity, which is strongly influenced by host immune pressure, may increase the progression of fibrosis in chronic hepatitis B patients under immune selection (15).Therefore, exploring HBV quasispecies may provide more information about the relationship between viral factors and the development of HCC.Compared to Sanger DNA sequencing for the detection of HBV genetic diversity, next-generation sequencing (NGS) can simultaneously sequence a large number of viral genomes with high sensitivity and specificity (16).However, longitudinal data from NGS of complete HBV genomes is lacking.In this study, NGS technology was used to explore the long-term evolution of complete genome in the HBV quasispecies before the diagnosis of HCC, based on the Long An cohort (12).

Study subjects and ethics statement
To determine the long-term evolution of HBV in HCC cases and control, a nested case-control study was carried out based on the Long An cohort (12), which was established in 2004.The cohort comprises 2,258 asymptomatic HBsAg carriers (ASC), then aged 30-55 years and living in the rural area of Long An county, Guangxi, China, including a group with BCP double mutations and a wild type BCP group.The study subjects were followed up for three years from 1st July, 2004.Each study subject completed a onepage questionnaire at their first visit, provided a serum sample every six months for the assessment of virological parameters and alpha fetal protein (AFP) concentrations, and was monitored for HCC by ultrasonography (US).Then, HCC cases were followed every year and collected blood samples where possible.This cohort was followed up in 2019 and serum samples were collected again.
All cases of HCC were diagnosed at the Long An people's Hospital.The diagnosis was made by one or more of the following: (a) surgical biopsy; (b) elevated serum AFP (levels ≥400 µg/mL), excluding pregnancy, genital cancer, and other liver diseases including metastasis of tumors from other organs, plus clinical symptoms or one image (US or computed tomography, CT); (c) elevated serum AFP (levels <400 ng/mL), excluding pregnancy, genital cancer, and other liver diseases including metastasis of tumors from other organs, plus two images (US and CT) or one image (US or CT) and two positive HCC markers such as DCP, GGT II, AFU, CA19-9, etc.
The critical selection criterion is that both cases and controls were infected with HBV with BCP double mutations at baseline or later during follow up, so that the confounding effect of BCP double mutations could be controlled and other mutations associated with HCC could be identified.All samples were tested for anti-HCV was detected and those positive were excluded, to eliminate the confounding effect of HCV infection on the incidence of HCC.Informed consent in writing was obtained from the study subject.The study protocol conforms to the ethical guidelines of the 1975 Declaration of Helsinki and has been approved by the Guangxi Institutional Review Board (GXIRB2020-0021).
Serological testing, measurement of serum viral loads, PCR for HBV genomic DNA, library preparation and next-generation sequencing, NGS data preprocessing, haplotype construction and diversity analysis and estimation of the intra-host HBV evolutionary rate These methods have been reported previously (17).In brief, total DNA was extracted from 200 mL of each patient's serum and full-length HBV genomic sequences (~3kb) were amplified and enriched, using the primer pair P1 and P2, according to the protocol of Günther et al. (18).If the mass of amplicon was insufficient, a second round of PCR was carried out using nested primers.All final products were confirmed by electrophoresis through 1% agarose gels.Next generation sequencing libraries were then prepared and sequenced on the Novaseq sequencer, with 150 bp PE reads in Delivectory Biosciences Inc.Company (Beijing, China).Quality control and preprocessing of each HBV sample's raw reads was performed by fastp v0.20.1 (19).The clean reads were then mapped to a common reference sequence (accession no.X02763).After deduplication, a consensus HBV genome sequence was generated for each sample using Cliquesnv v1.5.3.All HBV consensus sequences were multi-aligned with references from HBVdb and several additional sequences.A maximum-likelihood tree was constructed with MEGA 7 (20) and the consensus sequence from each sample was genotyped.

Shannon entropy and mutation analysis along the HBV genome
For each sample, the nucleotides at each position in the HBV genome were evaluated using the Samtools mpileup algorithm (21).The Shannon entropy of each site was calculated according to the nucleotide's frequency and the differences between 2019 and 2004 were calculated for the paired samples.HBV SNV and Indels were obtained from the mpileup result and mutations related to HCC and ASC were investigated further.Mutations with a high frequency in year 2004 and 2019, or with an increasing frequency from 2004 to 2019 in the HCC group, or with a decrease in frequency from 2004 to 2019 in the ASC group, were considered to be associated with HCC.Mutations with a high frequency in year 2004 and 2019, or with a frequency increasing from 2004 to 2019 in the ASC group, or with a decrease in frequency from 2004 to 2019 in the HCC group, were considered to be protective.Only mutations which occurred in at least two cases in the same group were investigated further.

HBV genotyping
HBV genotypes were determined using phylogenies reconstructed on the basis of the complete genome and preS/S regions of the viruses.The sequences were aligned to 45 HBV sequences of all known genotypes retrieved from GenBank using Clustal W and visually confirmed with the sequence editor BioEdit (22).The reference sequences are shown in Figure 1.Neighbor-Joining trees were reconstructed under the Kimura 2-parameter substitution model with the program MEGA (23).The reliability of clusters was evaluated using the interior branch test with 1000 replicates and internal nodes with over 75% support were considered reliable.

Statistical methods
Statistical comparisons of the prevalence of HBV mutations between the case and control groups were performed using Pearson's c 2 tests, McNemar's test and Fisher's exact test.The data of evolutionary rates are presented as median (range).Viral loads, genetic diversity, Sn value and evolutionary rates were compared between groups using the Mann-Whitney test.All Pvalues were two-tailed and P <0.05 was considered to be significant.All statistical analyses were performed using the SPSS software (ver.16.0;Chicago, IL, USA).

General characteristics and genotypes
The original sample size was 13 cases and the paired controls.However, amplification of the complete HBV genome failed for some cases and controls.Eight cases with paired controls plus one HCC case and two asymptomatic controls that could not be paired were included.ALL were included in the analysis, making nine HCC cases and 10 ASC controls.Almost all (16/19) were infected with genotype C HBV (Figure 1).The proportion of that genotype in cases and controls is 88.9% (8/9) and 80% (8/10), respectively.There was no significant difference in the prevalence of genotype C between cases and controls (p=1).No genotype shifting was found during the 15 years.There is no significant difference in HBeAg positivity between cases and controls (p=1).The number of quasispecies of case group in 2004 and 2019 was 191and 205, respectively.And that of control group in 2004 and 2019 was 195 and 254, respectively (Table 1).

Quasispecies Shannon entropy and intrahost genetic distance
The quasispecies Shannon entropies of each subject in the HCC and control groups are shown in Table 1 and Figure 2A.The Sn values of the HCC case group (1.9867 in 2004 and 2.4733 in 2019) and control group (1.726 in 2004 and 2.36 in 2019) did not differ significantly (p=0.368 in 2004, p=1.000 in 2019, respectively).However, most of the values increased from 2004 to 2019 in both groups, indicating that the quasispecies structure increased in complexity over time.Most of the mean intrahost quasispecies genetic distances in the HCC cases and controls were less than 1% in 2004 and 2019 and there was no significant difference between the two groups (p=0.683 in 2004; p=0.253 in 2019, respectively), as shown in Table 1 and Figure 2B.However, most of the mean intrahost quasispecies genetic distances of the controls between 2004 and 2019 were greater than 1%.The values were significantly higher than the HCC cases (p<0.05),suggesting that the viruses in HCC cases may be subject to lower immune pressure from the host.

Shannon entropy change along the genome
The Shannon entropy values of each site along the HBV genome were calculated for each sample, and the difference between 2004 and 2019 was further calculated for each subject, if available, as shown in Figure 3.The plot suggested that the viral nucleotide distribution tends to change much more in the PreC/C and PreS1/PreS2/S regions in HCC patients over time, while no such outcome was seen in the controls.
The details of each subject's intrahost viral evolutionary rates are summarized in Table 1.The highest and lowest evolutionary rates in the HCC group are 1.01E-03 and 3E-04.The highest and lowest evolutionary rates in the control group are 4E-03 and 2E-04.The difference in the median value of the substitution rate between the case and control groups is not significant (p=0.102).

Changes in viral loads
The details of each subject's viral loads in 2004 and 2019 are summarized in Table 1.The viral loads of HCC cases in 2019 were not significantly different from 2004 (p=0.757).Similar results are seen for the control group.The viral loads in 2019 did not differ significantly from 2004 (p=0.0.821).The viral loads of HCC cases in 2004 did not differ significantly from the control group in 2004 (p=0.086).However, the viral loads of HCC cases in 2019 were significantly different from those of the control group in 2019 (p=0.009),suggesting that increasing viral loads are associated with the development of HCC.
In this study, four novel mutations were identified with increasing frequency from 2004 to 2019 in HCC group, while the frequency of these mutations did not increase or decrease from 2004 to 2019 in the control group.The frequency of these mutations differed significantly between the case group and control group.These mutations, nt446 C→G (P=0.033) and nt514 A→C (P=0.033) in the S open reading frame (ORF), nt2170 T→C (P=0.033) in the C ORF and nt2857T→C (P=0.033) in the PreS2 region, seem to be associated with the development of HCC (Table 3).
nt514 A→C is a synonymous mutation in the S ORF but is a missense mutation in the overlapping polymerase ORF, affecting codon 129 (methionine to leucine).nt2170 T→C is a synonymous mutation in the C ORF. nt446 C→G is a missense mutation, causing the change of leucine to valine at codon 98 of the S ORF and serine to cysteine at codon 106 of the overlapping polymerase ORF.nt2857T→C is a missense mutation, resulting in the change of tryptophan to arginine at codon 4 of the PreS2 domain and leucine to serine at codon 184 of the overlapping polymerase ORF (Table 3).

Discussion
The major finding of this study is that the quasispecies structure of HBV with BCP double mutations increased in complexity over time.The change in the values of mean intrahost quasispecies genetic distances of the controls between 2004 and 2019 were significantly higher than the HCC cases, suggesting that HBV in the HCC cases may be subject to lower host immune pressure.Nucleotides in the PreC/C and PreS1/PreS2/S ORFs tended to change much more frequently over time in the HCC patients than in the asymptomatic carriers.Four novel mutations were identified that are associated with the development of HCC; three of them are missense mutations that affect codons in the PreS2/S and polymerase ORFs.Increasing viral loads during long-term infection also are associated with the development of HCC.The strength of this study is that the data were derived from the longterm evolution of HBV quasispecies before the diagnosis of HCC, which may provide information about the change of intrahost quasispecies genetic distances and the causative role of the mutations.The weakness of this study is that the sample sizes were insufficient for multivariable logistic regression analysis.
It has been reported that the characteristics of viral quasispecies are associated with the exacerbation of liver fibrosis progression and the development of liver cancer (15,24).NGS provides higher sensitivity and specificity for detecting quasispecies than Sanger DNA sequencing (16).Therefore, this technique has been used to search for quasispecies and the mutations associated with the development of HCC and has produced some interesting results

B A
HBV intra-host viral quasispecies Shannon entropy differences between 2004 and 2019 (A) and genetic diversity distribution (B) of the 9 HCC and 10 CHB subjects.
(25-28).All of these studies investigated the PreS region only, except for Chang's group, who investigated the complete genome of HBV from HCC patients and non-HCC controls and found 41 novel HCC-associated SNVs and preS deletions that involved HBV ORFs and regulatory elements (26).Unfortunately, that was a crosssectional study and could not provide information about the change of intrahost quasispecies and mutations before the diagnosis of HCC.Therefore, the results obtained in this study should be more  reliable, because both HCC cases and controls were selected from a prospective cohort and serum was available prior to HCC diagnosis.
Although HBV infection has long been established as a major cause of HCC, the mechanisms of oncogenesis remain obscure (29).Nonetheless, the quasispecies associated with tumor development recently have become a major focus for research.The characteristics of viral quasispecies in the PreS region have been reported to be associated with the development of HCC.HBV polymerase sequences may contain vital HBV quasispecies features which may be used to predict HCC (30).Quasispecies diversity was found to be strongly influenced by host immune pressures (14).The findings here that low host immune pressure may be associated with the development of hepatocellular carcinoma is important for understanding the mechanisms of oncogenesis of HBV.All individuals infected with the HBV 1762T 1764A mutant, should be screened regularly for HCC among those so as to detect the tumor at an early stage, because those exerting low immune pressure on the virus may be at particularly high risk of HCC.
It has been reported that there is a subtle relaxation of selection pressure on the HBV core gene in subjects with HBeAg-negative chronic hepatitis B. This may be attributable to impaired antiviral immunity, and could contribute to the high levels of viral replication (31).In this study, all HCC cases infected with HBV with basal core promoter double mutations (A1762T, G1764A) were negative for HBeAg and their viral loads are significantly higher than those of the controls.High HBV viral loads are a risk factor of HCC (32).High rates of replication of HBV may lead directly to increased numbers of chromosomal integration events and, thus, to HCC (33).Therefore, it is reasonable to postulate that the antiviral immunity of the HCC patients was impaired, resulting in low immune pressure on the replicating virus.If so, this may provide clues as to who is prone to develop HCC, among those infected with HBV with basal core promoter double mutations (A1762T, G1764A).
HBV mutations, in the preS or PreC and/or core promoter regions, have been recognized to be significantly associated with HCC (34).This conclusion was further supported by this study.Some other studies have suggested that the core promoter mutations C1653T and T1753V also are associated with the occurrence of severe hepatitis B and the development of HCC (35).However, this suggestion was not supported by the current or previous results (12).The occurrence of deletions in the PreS region was prominently more common in HCC patients than ASC patients (36,37).It surprises us that the current results do not support the association of PreS deletions and the development of HCC, although both studies are based on the same cohort.This may be attributable to the small sample size.Although the four novel mutations may increase the risk of HCC, there was no significant difference in the prevalence of these mutations between HCC patients and controls.These mutations may not be factors leading to HCC, but only co-factors.However, these need to be investigated further.
A prospective study reported that HBV viremia, except perhaps at extremely low levels, is associated with an increased risk for HCC; those with high viral loads of 3.8 x 10 4 virions/ml at entry are at increased risk of HCC (38).In this study, no significant difference was found in viral loads between the cases and control groups at entry, although the difference became significant when HCC was diagnosed.Therefore, evaluation of the risk for HCC based on a single measurement of viral load may not be sufficient.
Reflecting virus-host interplay, increased HBV quasispecies complexity and diversity in the pre-S region was found to be associated with the development of HCC (25).These findings were not confirmed by this study.In contrast, the mean intrahost quasispecies genetic distances of controls between 2004 and 2019 were found to be significantly higher than those of the HCC cases.This may be attributable to these results having been derived from complete genome sequences.

Conclusions
The quasispecies structure of HBV with BCP double mutations may increase in complexity over time.The change in the values of mean intrahost quasispecies genetic distances of HCC cases were significantly lower than the controls, suggesting that HBV in HCC cases may be subject to lower host immune pressure and result in high replication of the virus during long-term infection.The findings may provide a novel clue to understand the mechanisms of oncogenesis of the HBV 1762T,1764A mutant.

FIGURE 1
FIGURE 1The maximum likelihood tree for genotyping the consensus sequences of 19 subjects.The yellow colored clade contains sequences from 18 genotype C subjects, BH434, WX316, etc.The pink colored clade contains sequences from 3 genotype I subjects, YJ025, HCC03 and YL095.The blue marked clade contains sequences from 1 genotype B subject, TD287.

FIGURE 3 HBV
FIGURE 3HBV intra-host viral quasispecies' 2019 and 2004 Shannon entropy differences, according to the genomic location.

TABLE 3
Newly identified mutations associated with higher risk for HCC.