Germline Variants and Genetic Interactions of Several EMT Regulatory Genes Increase the Risk of HBV-Related Hepatocellular Carcinoma

Epithelial-mesenchymal transition (EMT) plays an important role in the development of hepatitis B virus (HBV)-related hepatocellular carcinoma (HCC). We hypothesized that germline variants in the major EMT regulatory genes (SNAIL1, ZEB1, ZEB2, TWIST1) may influence the development of HBV-related HCC. We included 421 cases of HBsAg-positive patients with HCC, 1371 cases of HBsAg-positive subjects without HCC [patients with chronic hepatitis B (CHB) or liver cirrhosis (LC)] and 618 cases of healthy controls in the case-control study. Genotype, allele, and haplotype associations in the major EMT regulatory genes were tested. Environment-gene and gene-gene interactions were analysed using the non-parametric model-free multifactor dimensionality reduction (MDR) method. The SNAIL1rs4647958T>C was associated with a significantly increased risk of both HCC (CT+CC vs. TT: OR=1.559; 95% confidence interval [CI], 1.073-2.264; P=0.020) and CHB+LC (CT+CC vs. TT: OR=1.509; 95% CI, 1.145-1.988; P=0.003). Carriers of the TWIST1rs2285681G>C (genotypes CT+CC) had an increased risk of HCC (CG+CC vs. GG: OR=1.407; 95% CI, 1.065-1.858; P=0.016). The ZEB2rs3806475T>C was associated with significantly increased risk of both HCC (P recessive =0.001) and CHB+LC (P recessive<0.001). The CG haplotype of the rs4647958/rs1543442 haplotype block was associated with significant differences between healthy subjects and HCC patients (P=0.0347). Meanwhile, the CT haplotype of the rs2285681/rs2285682 haplotype block was associated with significant differences between CHB+LC and HCC patients (P=0.0123). In MDR analysis, the combination of TWIST1rs2285681, ZEB2rs3806475, SNAIL1rs4647958 exhibited the most significant association with CHB+LC and Health control in the three-locus model. Our results suggest significant single-gene associations and environment-gene/gene-gene interactions of EMT-related genes with HBV-related HCC.


INTRODUCTION
Hepatocellular carcinoma (HCC), a common malignant tumour of the digestive system, is the second leading cause of cancerrelated death in China. HCC is characterized by high malignant potential, concealed pathogenesis, rapid progress, poor prognosis and a high mortality rate. It is typically diagnosed during the middle and late disease stages, when surgery is no longer a viable option (1). Therefore, it is important to identify genetic loci that may be valuable predictors for the development of both HCC and chronic hepatitis B virus (HBV) infection in order to evaluate the risk of HCC in patients with HBV infection.
In recent years, the significance of epithelial-mesenchymal transition (EMT) in tumours has been extensively studied. There are many complex factors that may influence the process of tumour metastasis; however, the specific underlying mechanisms are not yet clear. A great many studies have revealed that EMT plays an important role in tumour invasion and metastasis. To date, three well-established transcriptional regulatory groups have been identified as important factors in regulating the expression of EMT molecular markers (2). Studies have shown that several EMT regulators are involved in the process of tumour metastasis and that the phenotypic changes associated with EMT play a key role in the development of invasive phenotypes in colon cancer, thyroid cancer and breast cancer (3). In addition, increasing evidence demonstrates that EMT is involved in promoting other aspects of tumour progression (4)(5)(6). A more comprehensive understanding of the role of EMT in regulating the growth and metastasis of tumours is critical for improving the diagnosis and treatment of these tumours.
Previous work has demonstrated that SNAIL and TWIST are the major regulators of EMT, which subsequently induces HCC (7). Overexpression of SNAIL and TWIST is associated with greater tumour volume, increased recurrence, and shorter disease-free and overall survival in HCC patients (7). In addition, SNAIL and TWIST expression is associated with decreased E-cadherin expression in HCC. In vitro experiments have confirmed that overexpression of SNAIL or TWIST promotes invasion and increases the interstitial phenotype of tumour cells. Overexpression of SNAIL or TWIST in Huh7 cells suppresses E-cadherin expression and induces EMT (3,8,9). In addition, previous studies have demonstrated that EMT leads to increased chemotherapeutic resistance in poorly differentiated HCC cell lines (4)(5)(6). Wu et al. constructed gemcitabine-resistant HCC cell lines and found that these cells develop an EMT-related phenotype (10). Furthermore, real-time PCR has been used to demonstrate the downregulation of E-cadherin expression and increased expression of TWIST1, further confirming the development of EMT (11).
Genetic variations in EMT-related regulatory genes may affect the process of EMT and thus influence the development of HCC or chronic HBV infection. However, there has been no published research on the association of these variants with HCC or chronic HBV infection risk. Moreover, although several genetic variants associated with these liver diseases have been revealed by GWAS, little research has been done on the link between these genes and disease progression. Therefore, it is of great value to identify which genetic loci of EMT-related genes are related with the development of HCC. Thus, we assessed whether Genetic variations in EMT-related regulatory genes are associated with the progress of HCC and chronic HBV infection.
A common analysis method for genotype data is to perform a single gene locus or haplotype analysis on a single gene, that is, to detect the association between each locus or gene and disease separately. However, when we want to explain the genetic changes in complex diseases, the usefulness of this analysis is limited (12). Because the risk of a particular disease may be explained by genetic mutations at other loci, discovering gene-togene interactions is more conducive to a comprehensive understanding of the factors that affect disease risk (13). In this study, we investigated possible genetic interactions between EMT-related genes (SNAIL1, ZEB1, ZEB2 and TWIST1) in HBV-related HCC in the Han population and their relevance as potential biomarkers for HBV and HCC. This approach may help develop new therapy or individualized treatments for HBVrelated HCC and chronic HBV infection.

Study Subjects
Case-control studies were conducted to investigate HBV-related HCC and chronic HBV infection in northern China. To evaluate HBV-associated mutations and their correlation with HCC risk, 421 HBsAg-positive patients with HCC, 1371 HBsAg-positive patients without HCC [691 cases of chronic hepatitis B (CHB) and 680 cases of liver cirrhosis (LC)] and 618 controls without HBV infection were enrolled. All subjects are independent of each other and are ethnically Han Chinese. All participants were recruited between January 2010 and March 2014 from the First, Second and Fourth Hospitals of Hebei Medical University and the Fifth Hospital of Shijiazhuang City. Each subject provided demographic characteristics as well as a one-time 2 mL blood sample. All subjects signed a written informed consent forms to study initiation. This study was approved by the institutional review board of Hebei Medical University (Ethics Committee of Hebei Medical University: No. 2017053).
Healthy individuals were defined as (i) HBsAg, antibodies against HBc (anti-HBc) and other HBV biomarkers were free; (ii) blood routine and biochemical indexes were normal; (iii) without a history of hepatitis B vaccination; (iv) without endocrine, cardiovascular, renal or other liver diseases. CHB patients were defined as (i) serum HBsAg was positive; (ii) HBeAg was positive; (iii) anti-HBe was negative; (iv)serum HBV-DNA >2000 IU/mL lasting for >6 months; (v) the value of alanine aminotransferase (ALT) was persistent or repeated rising; (vi) liver histology showed hepatitis. LC patients were defined by clinical manifestations of portal hypertension (e.g., varicose oesophageal or gastric fundus, ascites and splenomegaly) and imaging results of ultrasonography, computed tomography, and magnetic resonance imaging (14,15). HBV-related HCC patients were defined as pathologic diagnosis and/or blood alpha-fetoprotein (AFP) >400 ng/mL, at the same time combined with imaging examination results (16,17). Patients were excluded from this study if they with alcoholic liver disease, positive laboratory tests for HCV (identified by the presence of anti-HCV and/or HCV-RNA) and HIV or suspected autoimmune diseases with an antinuclear antibody titre greater than 1:160.
The personal information of the research subjects was obtained through questionnaires, which included the subjects' gender, age, smoking status, and drinking status. The definition of smoking and drinking here is: an individual who smokes every day and has smoked for more than 1 year is defined as a smoker, and an individual who drinks once or more a week for more than 6 months is defined as a drinker. We collected about 2 mL of anticoagulated venous blood by ethylenediamine tetra-acetic acid (EDTA) from each subject. Each subject signed an informed consent form. The study protocol adhered to the ethical guidelines set forth by the 1975 Declaration of Helsinki and was approved by the Hebei Medical University ethics committee.

Polymorphisms Selection and Genotyping
According to the dbSNP database (http://www.ncbi.nlm.nih.gov/), we selected 6 EMT gene loci located in the promoter, regulator coding region and 3'-UTR. All putative functional singlenucleotide polymorphisms (SNPs) of the genes encoding the aforementioned EMT regulators (SNAIL1 rs4647958T>C, SNAIL1 rs1543442G>A, ZEB1 rs7349C>T, ZEB2 rs3806475T>C, TWIST1 rs2285681G>C and TWIST1 rs2285682T>G) with a minor allele frequency greater than 5% in the Chinese population were selected. The location information in gene region for the selected SNPs was shown in Table 1  Harvard, United States) and MDR 3.0.2 software (https:// sourceforge.net/projects/mdr/) were used to perform statistical analyses. Categorical variables were described using frequencies, while continuous data with abnormal distribution were described using the median and interquartile range. The comparisons of continuous data sets were done using Kruskal-Wallis H test and evaluation of differences in categorical variables between groups was done using Pearson chi-square test. The bonfferny method was used for pairwise comparisons between groups when there was a significant difference in the overall distribution of each factor in the three groups. Calculation of odds ratios (OR) and 95% confidence intervals (95% CI) was done using unconditional logistic regression. Analysis of correlations between genetic variants and HCC stages was done by Spearman's rank correlation. Linkage disequilibrium (LD) and haplotype block analyses were used to investigate the LD of EMT SNPs using Haploview 4.2 software. Multifactor dimensionality reduction (MDR) method as a nonparametric alternative was used to analyse the environment-gene and gene-gene interactions. The MDR analyses were performed by MDR 3.0.2 software. This extensive search for genetic interactions was done for HCC. Up to four loci interactions were tested using 10-fold crossvalidation in a search considering all possible SNP combinations. The SNP combination with maximum cross-validation consistency (CVC) was considered to be the best model (see METHODS in the Supplements). All hypothesis tests were based on two-sided. When P values were less than 0.05, it is considered statistically significant.

Subject Characteristics
Baseline characteristics of the 1371 HbsAg-positive patients without HCC (CHB+LC), 421 HBsAg-positive patients with HCC and 618 healthy control subjects were shown in Supplementary Table 2. The age, gender, and tobacco and alcohol use distributions were significantly different among all studied groups (P<0.05). Smoking and drinking were significantly lower in healthy patients versus in HBsAg-positive patients with and without HCC. The proportion of males was higher in the HBsAg-positive patients versus the healthy subjects, while patients older than 45 years old were more frequent in the HBsAg-positive patients with HCC. We adjusted for these factors in the multivariate logistic regression models.

Genotypes of EMT Regulators and Their Association With Hepatocellular Carcinoma and Chronic HBV Infection Risk
The genotype distributions of the six EMT regulators and their associations with HCC and CHB+LC are presented in Tables 1 and 2. Based on the best genetic model (defined as the model with the smallest AIC value), the SNAIL1 exon variant rs4647958T>C was significantly associated with an increased risk of both HCC (P dominant =0.020) and CHB+LC (P dominant =0.003). The ZEB2 promoter variant rs3806475T>C was significantly associated with an increased risk of both HCC (P recessive =0.001) and CHB+LC (P recessive <0.001). Further, the TWIST1 promoter variant rs2285681G>C was significantly associated with an increased risk of HCC (P dominant =0.016). However, no significant association was  observed between any of the other loci and the risk of HbsAgpositive HBV with or without HCC. Therefore, we further analysed the SNAIL1 rs4647958T>C, ZEB2 rs3806475T>C and TWIST1 rs2285681G>C SNPs. As shown in  Figure 1A). Meanwhile, the rs3806475 genotype-associated risk of HCC development was more pronounced in non-drinking individuals (OR, 2.410; 95% CI, 1.577-3.683) versus those who did drink (OR, 1.117; 95% CI, 0.621-2.009; Breslow-day test, P = 0.036) under the recessive model (see Figure  1B). Last, the rs3806475 genotype-associated risk of CHB+LC development was more pronounced in non-drinking individuals (OR, 2.425; 95% CI, 1.732-3.395) compared with those who did drink (OR, 1.276; 95% CI, 0.770-2.114; Breslow-day test, P = 0.037) under the recessive model (see Figure 1C). None of the other SNPs observed were associated with any significant differences in disease characteristics.

SNAIL1, ZEB2 and TWIST1 Genotypes and Their Correlation With HbsAg-Positive HBV With and Without HCC Progression
EMT has been widely studied in the metastatic process of epithelial malignancies (18). We therefore analysed the correlation between SNPs and HCC clinical stages as shown in Supplementary Table 3. We found that the rs4647958 SNAIL1 genotypes were correlated with HCC progression with a lower correlation-coefficient in non-smoking patients (r s = 0.087, P<0.001). Additional correlations were identified in the following patient groups: age less than 45 years (rs4647958: r s = 0.113, P=0.001), female (rs4647958: r s = 0.079, P=0.026; rs2285681: r s = 0.074, P=0.038) and non-drinking (rs4647958: r s = 0.074, P=0.002; rs3806475: r s = 0.054, P=0.025).

LD and Haplotype Block Analysis
Haplotype block LD mapping demonstrated that the rs2285681 and rs2285682 SNPs are in tight LD in a 0-kb sequence, while the rs4647958 and rs1543442 SNPs are in tight LD in a 4-kb sequence (Supplementary Figure 1). As shown in Table 3, the CG haplotype of the rs4647958/rs1543442 haplotype block is associated with significant differences between healthy control subjects and HCC patients (P=0.0347). Meanwhile, the CT haplotype of the rs2285681/rs2285682 haplotype block is associated with significant differences between CHB+LC and HCC patients (P=0.0123). However, no significant correlations were identified between other observed SNPs.

MDR Models of Environment-Gene and Gene-Gene Interactions
We searched for possible genetic interactions of the four genes studied in the context of HCC. We evaluated up to three-locus interactions with 6 polymorphic sites and 3 environmental factors (gender, tobacco smoking and alcohol drinking). For HCC and health subjects as comparative groups, gender in onelocus models was the best, while the balanced accuracy (BA) for testing the dataset was 60.55% and the CVC was 10/10. For HCC and CHB+LC as comparative groups, the combination drinking, smoking in the two-locus model was the best, while the BA was 57.48% and the CVC was 9/10. For CHB+LC and health subjects as comparative groups, the combination TWIST1rs2285681, ZEB2rs3806475, SNAIL1rs4647958 of the three-locus model was the best model with a BA of 56.99% and CVC of 9/10. Table 4 summarizes the MDR results for the one-to three-locus models. Figures 2-4 show the detailed distribution of high-and low-risk genotypes in the best three-locus model for the HCC and CHB+LC. These results were all significant, with empirical p-values of <0.001 in 10000 permutation tests.

DISCUSSION
We hypothesized that EMT genes play an important role in HCC and chronic HBV infection, and that environment-gene and genegene interactions are important. We found significant genetic associations for single EMT genes with HCC and chronic HBV infection, as well as environment-gene and gene-gene interactions. The MDR results indicated that interactions of environment-gene and gene-gene contribute significantly to HCC and chronic HBV infection, even when individual EMT genes do not.
In this study, we found that EMT-related genes were important in HCC and chronic HBV infection. Our findings demonstrate that the SNAIL1 rs4647958T>C, ZEB2 rs3806475T>C and TWIST1 rs2285681G>C SNPs are associated with increased susceptibility to both HCC and chronic HBV infection. In addition, interactions among potentially related polymorphic sites were associated with the development of HCC through the MDR method. MDR is a suitable method to analyse environment-gene and gene-gene interactions by reducing multi-locus genotypes into high-risk and low-risk groups in case-control studies (19). This method marks the genotype in each cell as high or low risk based on whether the ratio of case to control cell is greater than or less than the threshold (20). Furthermore, the MDR method is further extended to model-based MDR, generalized MDR and surviving MDR, etc., to apply to different situations (21)(22)(23). Our study found SNAIL1 rs4647958 showed a significant environment-gene interaction for chronic HBV infection with or without HCC in the MDR results. TWIST1 rs2285681 showed a significant environment-gene interaction for the development of HCC.
For HCC and health subjects as comparative groups, the onelocus model was found to be optimal for the prediction of HCC in terms of BA (60.55%). However, the BA values between the two-and three-locus combination models in HCC didn't have any meaningful difference. For HCC and CHB+LC as comparative groups, the two-locus model was found to be the best in terms of BA (57.48%). Similarly, the BA values between the one-and three-locus combination models didn't have any meaningful difference. For CHB+LC and health subjects as comparative groups, the four-locus combination model (TWIST1 rs2285681, ZEB2 rs3806475 and SNAIL1 rs4647958) was the best for BA (56.99%) and the CVC was 9/10.
In the three-locus combination models for HCC and CHB+LC, SNAIL1 rs4647958 was common to both liver diseases, while the other two factors differed. Comparatively, while we compared HCC and CHB+LC, CHB+LC and health subjects, TWIST1 rs2285681 appeared in both three-locus combination models at the same time. Our finding of the EMT-related gene interaction seemed to support the clinical observation that SNAIL1 rs4647958 and TWIST1 rs2285681 had an impact on patients with HCC and chronic HBV.
In this study, we found significant associations between germline variants of six EMT regulators and the development of chronic HBV infection and HCC revealed that the SNAIL1 exon variant rs4647958T>C and the ZEB2 promoter exon variant rs3806475T>C are significantly associated with the risk of developing both diseases. Additionally, the TWIST1 promoter exon variant rs2285681G>C is associated with an increased risk of HBV-related HCC. Furthermore, the SNAIL1 rs4647958T>C genotype is associated with decreased probability of HBV-related HCC metastasis at diagnosis among smokers.
The stratified analysis showed that SNAIL1 genotypes (rs4647958) are associated with the development of a more aggressive form of HCC in non-smokers, ZEB2 genotypes (rs3806475) are associated with increased risk of HCC development in non-drinkers, and TWIST1 genotypes (rs3806475) are associated with increased risk of CHB and LC. Meanwhile, the SNAIL1 SNPs (rs4647958) are correlated with HCC stages in smokers, though not significantly. SNAIL1 is an important factor involved in inducing and promoting EMT. SNAIL1 is also involved in the pathogenesis of hepatitis B virus mutations in HCC patients. Our findings are remarkably consistent with previously published studies. Chen et al. (24) found that SNAIL is negatively correlated with E-cadherin expression and positively correlated with MMP-2 expression in HCC tissues. Further, these changes in E-cadherin and MMP-2 expression help to promote HCC invasion. Woo et al. (25) used immunohistochemistry to study HCC and found that SNAIL expression is correlated with low E-cadherin expression and poor differentiation in hepatocellular carcinoma. The occurrence and development of HCC are related to many signal pathways, and the expression of SNAIL can play a role in the process of HCC by affecting these signal pathways. Kim et al. (26) found that Notch1 and ROS synergistically upregulate the expression of SNAIL protein in hepatoma carcinoma cells through the PI3K/Akt signalling pathway, thereby increasing cancer cell invasion. Cheng et al. (27) demonstrated that increased expression of SNAIL1 can promote liver tumour initiation, progression, and metastasis. High SNAIL1 expression was also reported in liver tissues,  suggesting that it also contributes to HCC pathogenesis (28). These pieces of evidence all indicate that the SNAIL variant (rs4647958) is functional and contributes to increased risk of HCC and chronic HBV infection. The TWIST1 protein (also known as Twist) can regulate the expression of many specific genes and participates in many different biological processes required for normal growth and development (29). However, TWIST also plays an oncogenic role in tumour cells. Yang et al. (30) showed that TWIST plays a key role in the vascular invasion and lung metastasis of cancer cells. During the process of tumour metastasis, primary tumour cells undergo EMT and then metastasize to distant organs via the circulatory system. TWIST stimulates tumour metastasis by promoting the occurrence of EMT in tumour cells. In addition, TWIST can inhibit apoptosis and senescence pathways and immortalize cells (31).
The ZEB2 protein plays an important role as a transcription factor in the TGF signalling pathway. This signalling pathway is essential during early foetal development (32). ZEB1 and ZEB2 can bind to the CACCT (G) sequence in the promoter of the E-cadherin gene, causing epithelial cells to lose their epithelial-like characteristics and transform into mesenchymal cells, thus leading to EMT (33). Gene mutations can result in the production of non-functional ZEB2 proteins or can completely inactivate the gene. The absence of ZEB2 proteins influences the biological processes of many organs. ZEB2 mutations are the underlying cause of irregular development of the neural crest (34). Our study is the first to demonstrate that mutations in the ZEB2 gene are related to HCC.
As a case-control hospital-based study, some limitations in our study are inevitable. For example, selection and information biases are unavoidable. However, our identification of associations between gene variants and HBV-related HCC risk are unlikely to be solely due to chance, as these findings were confirmed by the results of functional assays.
Longo et al. (35) concluded that the liver microenvironment of HCC patients is more immunosuppressed, accompanied by an increase in the number of regulatory T cells (Tregs), tumorassociated macrophages (TAM) and myeloid-derived suppressor cells (MDSC), which is associated with tumor progression and poor prognosis. Here we described and analysed the single-gene associations and environment-gene/gene-gene interactions of EMT-related genes with HBV-related HCC. We found that EMT genes play a role in HBV-related HCC and genetic factors at multi-levels, from alleles and genotypes to haplotypes and environment-gene/gene-gene interactions. Our study suggests that these SNPs are not only candidate predictors for HCC and chronic HBV infection risk but may also be a genetic determinant for the development of HCC in the chronic HBV infection population. Future studies should be performed in a larger population encompassing multiple ethnic groups in order to confirm our findings.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Ethics Committee of Hebei Medical University. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin. Written informed consent was obtained from the individual(s), and minor(s)' legal guardian/next of kin, for the publication of any potentially identifiable images or data included in this article.