rs62139665 Polymorphism in the Promoter Region of EpCAM Is Associated With Hepatitis C Virus-Related Hepatocellular Carcinoma Risk in Egyptians

Hepatocellular carcinoma (HCC) is a universal health problem that is particularly alarming in Egypt. The major risk factor for HCC is hepatitis C virus (HCV) infection which is a main burden in Egypt. The epithelial cell adhesion molecule (EpCAM) is a stem cell marker involved in the tumorigenesis and progression of many malignancies, including HCC. We investigated the association of -935 C/G single nucleotide polymorphism in EpCAM promoter region (rs62139665) with HCC risk, EpCAM expression and overall survival in Egyptians. A total of 266 patients (128 HCV and 138 HCC cases) and 117 age- and sex-matched controls participated in this study. Genotyping, performed using allelic discrimination and confirmed by sequencing, revealed a significant association between EpCAM rs62139665 and HCC susceptibility, with higher GG genotype and G allele distribution in HCC patients than in non-HCC subjects. Such association was not detected in HCV patients compared to controls. EpCAM gene expression levels, determined in blood by RT-qPCR, and its serum protein expression levels, determined by ELISA, were significantly higher in GG relative to GC+CC genotype carriers in HCV and HCC patients in a recessive model. ROC analysis of EpCAM protein levels revealed significant discriminatory power between HCC patients and non-HCC subjects, with improved diagnostic accuracy when combining α-fetoprotein and EpCAM compared to that of α-fetoprotein alone. Altogether, EpCAM rs62139665 polymorphism is significantly associated with HCC and with EpCAM gene and protein expression levels in the Egyptian population. Moreover, serum EpCAM levels may hold promise for HCC diagnosis and for improving the diagnostic accuracy of α-fetoprotein.


INTRODUCTION
Hepatitis C virus (HCV) infection is a major burden in Egypt, infecting almost 14.7% of the population. HCV in Egypt is considered the highest worldwide. Chronic HCV is the leading cause of liver-related death in Egypt (1).
Hepatocellular carcinoma (HCC) is considered the third cause of cancer-related mortality globally (2). This high mortality rate could be attributed to the late manifestation of HCC symptoms and, hence, its late diagnosis. Such diagnostic inadequacy is particularly prominent in lower resource settings with limited screening tools (3). Approximately 60-80% of HCV patients develop chronic hepatitis, of which 10-20% develop cirrhosis within 20-30 years. About 1-5% of cirrhotic patients may develop HCC (4). Thus, the presence of cirrhosis increases the risk for HCC, however, some patients develop HCC in non-cirrhotic livers and in the absence of inflammation (5). Several studies reported some genetic and epigenetic defects that lead to the onset of HCC (6,7).
A minor fragment of cancer stem cells (CSCs) are responsible for the tumor initiation, growth, metastasis and relapse after treatment (8,9). CSCs are mainly responsible for high resistance to both radiation and chemotherapy (10,11). The epithelial cell adhesion molecule (EpCAM) is a~40kDa transmembrane glycoprotein located on chromosome 2p21, highly expressed in most epithelial cancers except squamous, urothelial and renal cell carcinomas (12). EpCAM is considered as an important marker for hepatic CSCs (13) and it becomes absent once cells are differentiated into mature hepatocytes (14). Poor prognosis was observed in carcinomas with high EpCAM expression (15). The role of EpCAM is not only restricted to cell-cell linkage, but also plays an important role in migration, proliferation, signaling, differentiation, metastasis and renewal of hepatic cells (16). EpCAM acts by the activation of Wnt signaling and increasing the c-Myc expression in highly proliferating tumor cells (17,18).
Single nucleotide polymorphisms (SNPs) are considered the most common form of genetic diversity scattered throughout the human genome and is responsible for most variabilities in genetic traits between patients as disease vulnerability, prognosis and response to therapy (19). Promoter region SNPs of a gene regulate its expression since transcription factors bind to certain nucleotide sequences within this region, thus modulating translation and predisposing an individual to certain diseases including cancer (20)(21)(22). For example, SNP rs1126497 in EpCAM gene is significantly associated with an increased risk of breast cancer and cervical cancer, as well as the overall survival (OS) of non-small cell lung cancer patients and HCC patients who had portal vein tumor thrombus (19,(23)(24)(25). These findings suggest that SNPs in the EpCAM gene may play a significant role in the initiation and progression of various types of cancer.
The objective of the present study was to investigate whether -935 C/G SNP (rs62139665) in the EpCAM gene promoter region is associated with its high expression and, hence, with susceptibility to HCC and OS in HCV Egyptian patients.

Subjects
The present study was conducted on 266 Egyptian patients categorized into 128 HCV-infected patients and 138 HCVdependent HCC patients, recruited from the Endemic Medicine and Gastroenterology Department, Faculty of Medicine, Cairo University, from June 2014 until October 2017. The patients were followed-up for 2 years unless they died. Hepatitis C viral RNA was detected in all HCV patients, while HCC patients had HCV infection that was detected by testing positive for anti-HCV antibodies. The medical history of HCC patients is shown in Table S1. HCC patients were diagnosed based on pathology, cytology, ultrasound and computed tomography (CT) imaging, in addition to serum levels of alpha-fetoprotein (AFP). Tumor number, lesion size, macroscopic vascular invasion, the TNM stage, portal vein thrombosis, portal hypertension, as well as brain, chest, and total-body bone CT (to rule out extrahepatic metastases) were also evaluated. Model for end-stage liver disease (MELD) and Child-Pugh scores were also determined. HCC patients receiving radiotherapy or chemotherapy or suffering other types of cancer were excluded from the investigation.
One hundred and seventeen apparently healthy volunteers, ageand gender-matched to the patients, joined the study as controls ( Table 1). They all showed normal liver function profiles, normal hepatobiliary ultrasound, and negative serological results for viral hepatic and autoimmune diseases, with no previous history of liver disease. Liver ultrasound findings in the studied groups are displayed in Table S2.
An informed consent form was signed by all the study participants. The study protocol was approved by the Research Ethics Committee, Faculty of Pharmacy, Cairo University (Permit number: BC 1813) and conformed to the ethical guidelines of the 1975 Helsinki Declaration.
Hepatitis B virus (HBV) or human immunodeficiency virus (HIV) antibodies, diabetes, fatty liver, active schistosomiasis, presence of alcohol or heavy metal in blood were considered as exclusion criteria for the study participants.

Sample Processing and Laboratory Investigations
Ten milliliter-venous blood specimens were obtained from all enrolled subjects by trained laboratory technicians. The collected samples were aliquoted and processed as previously described (22). Briefly, one aliquot was used for RNA and DNA extraction and subsequent gene expression analysis, genotyping and sequencing. A second aliquot was separated into plasma and assayed for albumin as well as prothrombin time-international normalized ratio (PT-INR). A third aliquot was used for serum separation for the assessment of HCV-RNA and antibody titres, AFP and EpCAM levels, alanine aminotransferase (ALT), aspartate aminotransferase (AST) and ALP activities, as well as total and direct bilirubin. The

Designing Primers and Probes
The EpCAM sequence was acquired from the NCBI. Ensembl genome browser 90 was used to display all variants in order to design primers that do not superimpose SNPs. Then, allele-specific primers and probes were designed using Primer3Plus, and their specificity was checked by Blast and MFEprimer-2.0. We chose a SNP, rs62139665, in the 5'UTR with a MAF exceeding 20% and predicted to modulate the promoter binding affinity to various transcription factors, thus modifying EpCAM gene expression.

EpCAM mRNA Expression Analysis
RNA extraction, reverse transcription and qPCR were performed as previously described (22). Briefly, total RNA extraction from blood samples was performed using a total RNA purification kit (Jena Bioscience, Munich, Germany) followed by storage of the isolated RNA at −80°C until analysis. Reverse transcription was performed using cDNA archive kit (Applied Biosystems, Foster City, California, USA). Quantitative real-time PCR (qRT-PCR) was performed using GoTaq PCR master mix (Promega Co., Madison, USA); 1 µL of cDNA was added to 25 µL of master mix, 0.25 µL of CXR Reference Dye, 1 µL of forward and reverse primers and the volume was completed to 50 µL. A protocol comprising an initial denaturation step at 95°C for 10 minutes, followed by 40 cycles of denaturation at 95°C for 15 seconds, and annealing and extension at 60°C for 1 minute, then 60°C for 30 seconds was conducted on a 7500 Real-Time PCR system (Applied Biosystems, Foster City, California, USA). The used primers had the following sequences: 5′-AGTGTAATGGCACGATCTCTG -3' (forward), 5'-GGATCA CCTGAGGTTTGAAGT -3' (reverse) for EpCAM, with b-actin as an internal control.

Determination of Serum EpCAM Levels
Serum EpCAM protein concentration was determined by an ELISA kit supplied by Boster Biological Technology (Catalogue no. EK0755, Pleasanton, CA, USA) in compliance with its operational guidelines.

Statistical Analysis
Data are presented as mean ± SD, number (percentage) or median (interquartile range). The differences between two groups were statistically analyzed by Student's t-test and Chi square test for numerical and categorical variables, respectively. The variations between the three groups were assessed using one-way ANOVA followed by Tukey's multiple comparisons post-hoc test. Receiver operating characteristic (ROC) analysis was performed to calculate EpCAM and AFP sensitivity and specificity, individually or in combination. The correlation between EpCAM and AFP levels was tested by Spearman's correlation analysis. Four models (dominant, recessive, overdominant and multiplicative) were used to assess the association between each genotype and the risk of HCC. Logistic regression was conducted to estimate the odds ratios (ORs) and 95% confidence intervals (CIs) of the association between EpCAM SNP rs62139665 and HCC risk. For a two-tailed test, a P-value lower than 0.05 was considered statistically significant. The Kaplan-Meier method and the log-rank survival test were employed to estimate the OS. The GraphPad Prism 6 (GraphPad Software, CA, USA) and the SPSS software, version 20.0 (SPSS Inc. Chicago, IL, USA) statistical packages were used to perform the statistical analyses. Hardy-Weinberg equilibrium (HWE) was tested online (http://www. oege.org/software/hwe-mr-calc.shtml).

Demographic, Laboratory and Clinical Characteristics of the Study Participants
The demographic features as well as the laboratory and clinical data of the study participants are depicted in Tables 1, S1 and S2. Neither age nor gender varied significantly between the studied groups.

Genotype Distribution and Allele Frequencies of EpCAM rs62139665 Polymorphism in the Studied Groups, and Compliance With Hardy-Weinberg Equilibrium
The genotype frequencies in the control, HCV and HCC groups were in agreement with the assumption of a Hardy-Weinberg equilibrium (P > 0.05) as displayed in Table 2. Table 3 and Figure 1 illustrate the genotype and allele frequencies of EpCAM rs62139665 polymorphism. The GG genotype and G allele frequencies were significantly higher in the HCC group compared to the HCV patients (P = 0.0005 and P < 0.0001, respectively) and to the control subjects (P = 0.04 and P = 0.001, respectively), while no significant difference was found between HCV and control groups. According to the genetic model selection strategy (27), a recessive model was chosen as it best fits the analysis of the association between rs62139665 and HCC risk. Table 4 depicts the association of rs62139665 with HCV and HCC. In HCC cases, the rs62139665 GG genotype carriers displayed a markedly higher distribution than GC+CC genotype carriers relative to the control (OR = 2.86, P = 0.003) and HCV (OR = 2.66, P = 0.004) groups. Furthermore, the HCC cases exhibited an appreciably higher G allele frequency than in the control (OR = 1.76, P = 0.002), and HCV (OR = 2.02, P = 0.0001) groups.
The results reported herein accentuate a notable association between EpCAM rs62139665 SNP and HCC susceptibility in Egyptians. Nevertheless, EpCAM rs62139665 was not significantly linked with HCV risk in the tested sample.

Association of EpCAM rs62139665 Polymorphism With the Clinicopathological Features of HCC Patients
As for the clinicopathological variables, the GG genotype was significantly correlated with higher ALP activity compared to GC + CC genotypes (P = 0.01, Table 5). In addition, the Kaplan-Meier and log-rank survival tests showed insignificantly lower overall survival and survival time in EpCAM rs62139665 GG genotype carriers compared with GC + CC genotype carriers in  the HCC group. Moreover, the GC + CC genotype carriers exhibited non-significantly lower MELD score when compared with the GG genotype carriers ( Table 5 and Figure 2).

EpCAM Gene and Protein Expression Levels in the Studied Groups
As illustrated in Table 6, the expression of EpCAM at both the gene and protein levels was notably higher in HCV (P < 0.0001 and P < 0.05, respectively) and HCC (P < 0.0001) patients compared to the control group. Moreover, EpCAM protein expression level was appreciably higher in HCC than in HCV patients (P < 0.0001), whereas the gene expression level did not vary significantly between the two groups. Comparing the expression of EpCAM between rs62139665 genotypes, we observed a significant link between EpCAM rs62139665 polymorphism and its expression. The GG carriers exhibited significantly higher EpCAM gene and protein expression levels compared to the GC + CC carriers in each of the three studied groups (P < 0.05 and P < 0.0001 regarding gene and protein expression, respectively, within the control group; P < 0.0001 for both gene and protein expression within each of the HCV and HCC groups).

Serum AFP Level, Correlation With EpCAM Level, and ROC Analysis of Discriminatory Performance of AFP and EpCAM Individually and in Combination
Comparing the serum AFP levels in the three studied groups revealed markedly higher levels in the HCC group than in each of the control and HCV groups (P < 0.0001, Table 1). As illustrated in Figure S1, a strong positive correlation was observed between AFP and EpCAM protein levels (r = 0.99, P < 0.0001). The ROC curves of EpCAM protein expression level depict significant discriminatory power between HCC patients and non-HCC subjects (AUC = 0.92, CI = 0.87-0.97, P < 0.0001), as shown in Figure S2, suggesting comparable diagnostic

Corroboration of EpCAM rs62139665 Genotyping Findings by Sanger Sequencing
Upon comparing the findings of EpCAM rs62139665 genotyping obtained by allelic discrimination with those of Sanger sequencing, both outcomes were perfectly matched for all the genotypes in the tested samples ( Figure 3).

DISCUSSION
Worldwide, Egypt endures the highest incidence of HCV infection that is considered a main predisposing factor for the progression of HCC. This investigation was undertaken to explore the association of -935 C/G (rs62139665) SNP in the promoter region of EpCAM with the risk of HCV-related HCC, EpCAM expression levels and the OS in the Egyptian population. The present investigation demonstrates, for the first time, the association of EpCAM rs62139665 with HCC risk. rs62139665 is located at 935 upstream in the promoter region and it was reported that the transcriptional activity of 1.1kb upstream of the EpCAM gene is closely associated with the levels of EpCAM 28. Mutations in the EpCAM gene were reported in patients having Lynch syndrome through deletions in the 3'UTR 29 or congenital tufting enteropathy that results in decreasing EpCAM protein level (28).
In the current study, we adopted the allelic discrimination method to examine -935 C/G gene polymorphism (rs62139665) in the promoter region of EpCAM, and we, thereafter, verified the results by direct DNA sequencing. The genotype distribution in the three studied groups agreed with HWE.
It is worthy to note that there is a variation in the incidence of EpCAM rs62139665 SNP that is related to ethnicity, as reported in the NCBI map database for diverse ethnic populations, revealing C and G allele frequencies of C = 0.16 and G = 0.83 in African Carribbeans, C = 0.17 and G = 0.83 in Han Chinese and C = 0.64 and G = 0.36 in Toscani in Italia. The EpCAM rs62139665 C and G allele frequencies disclosed herein in the control group were 0.64 and 0.36, respectively, which are quite in accordance with those reported in the Toscani Italian population, having the C allele as the major allele.
The SNP rs62139665 distribution was significantly different between HCC patients and non-HCC subjects at both genotype and allelic levels. The EpCAM GG genotype and G allele were  significantly more frequent in HCC patients than in HCV patients and controls, proposing that the presence of the C allele may have protective effects. By using logistic regression analyses, we found a significant association between HCC risk and the rs62139665 G allele and GG genotype when compared to GC + CC in a recessive model. This association with HCC risk can be explained and evidenced by the significant up-regulatory effect of the GG genotype on EpCAM gene and protein expression levels observed in the present study. Numerous studies showed that SNPs in the promoter region are related to increased gene expression (22,29,30). The HCV group displayed significantly higher gene and protein expression of EpCAM compared to the control group despite the lack of significant difference in rs62139665 genotype and allele frequencies between the two groups. The higher expression of EpCAM observed in the HCV group compared to the control could be linked to the reported HCV-induced elevation of plasminogen activator inhibitor-1 (PAI-1). Increased expression of PAI-1 subsequent to HCV infection promoted the cancer stem-like cells (CSC) state in HCV-infected hepatocytes through the activation of the chief mediator of cell proliferation, protein kinase B, in consort with the increased expression of the CSC marker, EpCAM (31).
Several studies reported the role of EPCAM gene in HCC pathogenesis, as a prognostic marker and in HCC recurrence (32)(33)(34)(35). Yet, referring to the TCGA and GEO databases (data compared by GEO2R), it was found that EPCAM gene showed non-significant increase in expression in HCC samples compared to their normal controls (GSE49515), and in HCC patients with and without venous metastasis (GSE5093).
Former studies reported significant association of EpCAM expression in HCC with high tumor grade and AFP level (36,37). In our current study, the expression levels of AFP and EpCAM were significantly higher in the HCC group relative to the HCV and control groups. Furthermore, a positive correlation was found between AFP and EpCAM protein levels. These results are in line with other published data that also reported the overexpression of EpCAM in cirrhotic and liver tumor tissues compared to normal tissues (38). In addition, AFP level can predict the expression of hepatic progenitor cell markers as EpCAM in HCC (39) and is correlated with tumor metastasis (40). ROC analysis was conducted in the present study to compare the diagnostic performance of AFP and EpCAM. It showed that a combined model of both AFP and EPCAM had a higher specificity than either of them alone at all levels of sensitivity. Previously, AFP + / EpCAM gene expression is expressed as DCt mean ± SD, where DCt = Ct value of EpCAM -Ct value of b-actin; a smaller DCt value corresponds to a higher gene expression level. EpCAM protein expression is expressed as mean ± SD. The data were analyzed using Student's t-test for comparing 2 groups and one-way ANOVA followed by Tukey's multiple comparisons test for comparing the three studied groups. † Significant difference from the control group. ‡ Significant difference from the HCV group. § Significant difference from GG within the same group. *Significant at P < 0.05. **Significant at P < 0.0001.  EpCAM + HCC was characterized by poorer prognosis compared to AFP − /EpCAM − HCC (35). By examining the influence of the SNP rs62139665 on the disease outcomes, through correlating the SNP rs62139665 genotypes with several parameters as liver function tests, a significant correlation was detected only between GG genotype frequency, hence high EpCAM expression, and ALP activity. Such association may be attributed to ALP being an embryonic stem cell marker that has been observed during cell reprogramming (41). Furthermore, the inhibition of EpCAM expression resulted in diminished ALP activity (42).
Both OS and the survival period of rs62139665 GG genotype carriers were shorter compared with the GC + CC genotype carriers in HCC patients but did not reach statistical significance. Given the overexpression of EpCAM observed in GG genotype carriers compared to GC + CC genotype carriers, our finding of apparently shorter OS in GG genotype carriers compared with the GC + CC genotype carriers is in accordance with Noh and co-workers' suggestion that patients with positive immunohistochemical expression of EpCAM had reduced OS compared to those who were EpCAM-negative after undergoing surgical resection for HCC (43). Moreover, the detection of EpCAM-positive circulating tumor cells is strongly correlated with the clinical outcome and OS in patients with HCC (39).
In the current study, the MELD score showed no significant difference between GC + CC and GG genotype carriers. Our observation seems to be in agreement with Sancho-Bru and coworkers' report of lack of correlation between EpCAM gene expression and MELD score in patients with alcoholic hepatitis (44). On the other hand, patients with liver cirrhosis who were infused with EpCAM-positive stem cells showed a significant decrease in MELD score and a marked clinical improvement.
In conclusion, the present study accentuates the association of EpCAM rs62139665 SNP, specifically the G allele frequency, with HCC risk. However, no such association was found with HCV infection in the tested sample of Egyptians. In addition, the present findings highlight the association of the rs62139665 GG genotype with increased EpCAM expression at both the gene and protein levels. Further research on larger datasets, other ethnicities and other cancer types are warranted for a comprehensive elucidation of the associations of EpCAM -935 C/G polymorphism with cancer risk.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The name of the repository and accession numbers can be found below: National Center for Biotechnology Information (NCBI) GenBank, https://www.ncbi.nlm.nih.gov/ genbank/, MZ826468, MZ826469, MZ826470 and MZ826471.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by The Research Ethics Committee, Faculty of Pharmacy, Cairo University (Permit number: BC 1813). The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
TM and NAHS conceived, designed and coordinated the study and revised the manuscript. DS collected the samples and participated in the experimental design and lab work. SF carried out the lab work and the statistical analyses and drafted the manuscript. NNS participated in the experimental design, data analysis and curation, and revised and edited the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
The present work has been partially funded by the Faculty of Pharmacy, Cairo University (Cairo, Egypt).