Prognostic Value of a Novel Signature With Nine Hepatitis C Virus-Induced Genes in Hepatic Cancer by Mining GEO and TCGA Databases

Background Hepatitis C virus-induced genes (HCVIGs) play a critical role in regulating tumor development in hepatic cancer. The role of HCVIGs in hepatic cancer remains unknown. This study aimed to construct a prognostic signature and assess the value of the risk model for predicting the prognosis of hepatic cancer. Methods Differentially expressed HCVIGs were identified in hepatic cancer data from the Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) databases using the library (“limma”) package of R software. The protein–protein interaction (PPI) network was constructed using the Cytoscape software. Functional enrichment analysis was performed using the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Univariate and multivariate Cox proportional hazard regression analyses were applied to screen for prognostic HCVIGs. The signature of HCVIGs was constructed. Gene Set Enrichment Analysis (GSEA) compared the low-risk and high-risk groups. Finally, the International Cancer Genome Consortium (ICGC) database was used to validate this prognostic signature. Polymerase chain reaction (PCR) was performed to validate the expression of nine HCVIGs in the hepatic cancer cell lines. Results A total of 143 differentially expressed HCVIGs were identified in TCGA hepatic cancer dataset. Functional enrichment analysis showed that DNA replication was associated with the development of hepatic cancer. The risk score signature was constructed based on the expression of ZIC2, SLC7A11, PSRC1, TMEM106C, TRAIP, DTYMK, FAM72D, TRIP13, and CENPM. In this study, the risk score was an independent prognostic factor in the multivariate Cox regression analysis [hazard ratio (HR) = 1.433, 95% CI = 1.280–1.605, P < 0.001]. The overall survival curve revealed that the high-risk group had a poor prognosis. The Kaplan–Meier Plotter online database showed that the survival time of hepatic cancer patients with overexpression of HCVIGs in this signature was significantly shorter. The prognostic signature-associated GO and KEGG pathways were significantly enriched in the risk group. This prognostic signature was validated using external data from the ICGC databases. The expression of nine prognostic genes was validated in HepG2 and LO-2. Conclusion This study evaluates a potential prognostic signature and provides a way to explore the mechanism of HCVIGs in hepatic cancer.

Methods: Differentially expressed HCVIGs were identified in hepatic cancer data from the Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) databases using the library ("limma") package of R software. The protein-protein interaction (PPI) network was constructed using the Cytoscape software. Functional enrichment analysis was performed using the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Univariate and multivariate Cox proportional hazard regression analyses were applied to screen for prognostic HCVIGs. The signature of HCVIGs was constructed. Gene Set Enrichment Analysis (GSEA) compared the low-risk and high-risk groups. Finally, the International Cancer Genome Consortium (ICGC) database was used to validate this prognostic signature. Polymerase chain reaction (PCR) was performed to validate the expression of nine HCVIGs in the hepatic cancer cell lines.
Results: A total of 143 differentially expressed HCVIGs were identified in TCGA hepatic cancer dataset. Functional enrichment analysis showed that DNA replication was associated with the development of hepatic cancer. The risk score signature was constructed based on the expression of ZIC2, SLC7A11, PSRC1, TMEM106C, TRAIP, DTYMK, FAM72D, TRIP13, and CENPM. In this study, the risk score was an independent prognostic factor in the multivariate Cox regression analysis [hazard ratio (HR) = 1.433, 95% CI = 1.280-1.605, P < 0.001]. The overall survival curve revealed that the high-risk group had a poor prognosis. The Kaplan-Meier Plotter online database showed that the survival time of hepatic cancer patients with overexpression of HCVIGs in this signature was significantly shorter. The prognostic signature-associated GO and INTRODUCTION Hepatocellular carcinoma (HCC) is the sixth most common cancer and was the third leading cause of cancer-related deaths worldwide in 2020 (Sung et al., 2021). The incidence of HCC has increased sharply over the recent decades (Zhang et al., 2012;McGlynn et al., 2021). The 5−year overall survival (OS) rate of HCC is still unsatisfactory, and its pathogenesis remains controversial.
The development of liver cancer is a complex process, including chronic inflammation and epigenetic modifications (Plissonnier et al., 2018). Hepatitis C virus (HCV) infection is a prominent risk factor for the development of HCC (Ali et al., 2011). Each year, 4-5% of patients with chronic hepatitis C develop HCC (Bartosch et al., 2009). The combination of host, environmental, and viral factors results in HCV-related carcinogenesis (Vescovo et al., 2016). HCV enhances the invasiveness of HCC via epidermal growth factor receptor (EGFR)-mediated invadopodia formation and activation (Ninio et al., 2019). Many non-coding RNAs (ncRNAs) play important roles in biological processes, such as differentiation, proliferation, and cell death, in HCC (Yang et al., 2014;Refai et al., 2019). However, the molecular mechanism of HCV-induced genes (HCVIGs) in hepatocarcinogenesis is not well-understood.
To explore HCVIGs in hepatic cancer, we analyzed HCV-induced expression profiling of hepatic and non-hepatic cancer cell lines from GSE70781 and identified a total of 1,582 differentially expressed genes (DEGs). Then, a total of 1,286 DEGs were common genes in the Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) databases. We extracted 1,284 DEGs from TCGA hepatic cancer gene expression, including 407 hepatic cancer tissues and 58 adjacent normal tissues. Gene Ontology (GO) enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses were performed to analyze the molecular processes. We used univariate Cox proportional risk regression analysis to identify 21 prognostic HCVIGs from TCGA hepatic cancer dataset. To predict the prognosis of hepatic cancer, we constructed nine prognostic HCVIGs signatures using ZIC2, SLC7A11, PSRC1, TMEM106C, TRAIP, DTYMK, FAM72D, TRIP13, and CENPM. The risk score was identified as an independent factor. OS analysis showed that patients in the low-risk group had a better prognosis than those in the highrisk group according to the median risk score. Finally, the International Cancer Genome Consortium (ICGC) databases were analyzed to validate the prognostic signature. Polymerase chain reaction (PCR) revealed that ZIC2, SLC7A11, PSRC1, TMEM106C, TRAIP, DTYMK, FAM72D, TRIP13, and CENPM were upregulated in the hepatic cancer cell line HepG2.

The Workflow of the Model Construction Process
The detailed workflow is shown in Figure 1.

Data Acquisition and Processing
Hepatitis C virus-induced genes (HCVIGs) were downloaded from the GEO 1 including hepatic and non-hepatic cancer cell lines in GSE70781. The original gene expression profiles and clinical data of hepatic cancer were obtained from TCGA data portal 2 . Differentially expressed HCVIGs from the GEO database were extracted using TCGA data. The JP Project from the International Cancer Genome Consortium (ICGC-LIRI-JP) was downloaded from the Database of Hepatocellular Carcinoma Expression Atlas (HCCDB) for external validation. Frontiers in Cell and Developmental Biology | www.frontiersin.org FIGURE 3 | Gene Ontology and KEGG pathway enrichment analyses. (A) GO analysis of 143 differentially expressed hepatitis C virus-induced genes. "BP" represents "biological process," "CC" represents "cellular component," and "MF" represents "molecular function." (B) Kyoto Encyclopedia of Genes and Genomes analysis of differentially expressed hepatitis C virus-induced genes.

Function Annotation and Gene Set Enrichment Analysis
Differentially expressed HCVIGs were identified using the limma package. Fold change > 2.0 and adjusted <0.05 were selected as the significantly differentially expressed HCVIGs. GO is a community-based bioinformatics resource that includes biological processes (BP), cell components (CC), and molecular functions (MF) (Peng et al., 2017). The KEGG is a knowledge base for systematic analysis of gene functions, linking genomic information with higher-order functional information (Kanehisa and Goto, 2000). GO and KEGG enrichment results were generated by R package "ggplot2, " "enrichplot, " "clusterProfiler, " and "GOplot" for the purpose of analysis. Gene Set Enrichment Analysis (GSEA) comparing the low-risk and high-risk groups was performed by GSEA software (version 4.0.3). Functional annotation with a P-value < 0.05 was considered statistically significant.

PPI Network and Hub Genes Analysis
STRING database (Szklarczyk et al., 2017) 3 was used to analyze the functional interactions between expressed proteins. 3 https://string-db.org/ The STRING network was then input to Cytoscape software (version 3.6.0)  to construct a proteinprotein interaction (PPI) network of 143 differentially expressed HCVIGs. The top 10 genes were selected using the CytoHubba application (Zhou et al., 2019).

Construction of the Nine Prognostic HCVIGs Signature
Differentially expressed HCVIGs were selected to construct a multivariate Cox regression model. OS curves were plotted using the Kaplan-Meier (K-M) method. The relationships were tested using the log-rank test. The nine prognostic HCVIGs-associated signature was constructed using coefficients in multivariate Cox regression analysis (Qiu et al., 2020). Nine prognostic values of HCVIGs were validated using the Kaplan-Meier Plotter (Menyhárt et al., 2018).

Real-Time RT-qPCR Assay
Hepatic cancer cell line HepG2 and normal hepatic cell line LO-2 were used to test the expression of nine prognostic values of HCVIGs. Real-time RT-qPCR experiments were performed as previously described. The primer sequences used are shown in Supplementary Table 1. Relative mRNA levels after correction for GAPDH control mRNA were expressed.

Statistical Analysis
R software (version 3.5.2) was used to perform all statistical analyses in this study. Statistical significance was set at P < 0.05. Univariate Cox regression analyses were performed to assess the relationship between expression profiles and prognosis. Multivariate Cox regression was used to construct a risk model according to the clinical factors correlated with survival. Receiver operating characteristic (ROC) curve and the corresponding area under the ROC curve (AUC) were analyzed by the package of "survivalROC" in R.

Identification of Differentially Expressed HCVIGs
A total of 1,582 differentially expressed HCVIGs were identified using the R language software in GSE70781. Among the differentially expressed HCVIGs, 744 were downregulated, whereas a total of 838 were upregulated. These results are presented as a heat map (Figure 2A) and a volcano plot ( Figure 2C). We obtained the gene expression and clinical data for all 465 patients, including 407 hepatic cancer tissues and 58 adjacent normal tissues from TCGA database. A total of 1,286 differentially expressed HCVIGs were common genes between GSE70781 and TCGA. A total of 1,284 differentially expressed HCVIGs were extracted from TCGA database. Finally, a total of 143 differentially expressed HCVIGs were selected from TCGA, including 2 low-expression genes and 141 overexpression genes, as shown in the heat map ( Figure 2B) and the volcano plot ( Figure 2D).

GO Enrichment and KEGG Pathway Analysis
A total of 143 differentially expressed HCVIGs were subjected to GO and KEGG functional annotation. A total of three GO terms of BP and one GO term of CC were statistically significant (P < 0.05). The four GO terms included "regionalization, " "negative regulation of cell morphogenesis involved in differentiation, " "DNA-dependent DNA replication, " and "nuclear replication fork." The results are shown in Figure 3A.
In addition, in Figure 3B, a total of 148 KEGG pathways were analyzed; the top three KEGG pathways were DNA replication, endocrine and other factor-regulated calcium reabsorption, and mismatch repair. These results are closely associated with HCV infection.

PPI and Hub Genes Analysis
A total of 141 nodes and 120 edges were downloaded from the STRING database, and then Cytoscape software was used to visualize the regulatory network of the genes ( Figure 4A). The CytoHubba application identified the top 10 hub genes ( Figure 4B). These genes included FEN1, WDHD1, RAD54L, GINS2, CDT1, NCAPG2, CDCA7, TRIP13, POLA2, and PKMYT1.

Nine Prognostic HCVIGs Risk Score Model Construction
To identify HCVIGs associated with hepatic cancer, we screened out 21 prognostic HCVIGs using univariate analysis ( Figure 5A). Finally, nine prognostic genes (ZIC2, SLC7A11, PSRC1, TMEM106C, TRAIP, DTYMK, FAM72D, TRIP13, and CENPM) of hepatic cancer were identified. The coefficients of the nine prognostic genes are shown in Table 1. Nine prognostic genes were determined to construct the risk model. The study formula for the risk score was as follows (  Heat map was drawn to investigate genes expression profiles in the high-risk and low-risk HCC groups. Patients were divided into the high-risk and low-risk groups based on the median risk score. The risk score distribution, follow-up time, and gene heat map are shown in Figures 5B-D.

GSEA of the Prognostic Signature in the Low-Risk and High-Risk Groups
GSEAs were performed to compare the low-risk group with the high-risk group of the HCVIG signature. The results showed that these prognostic signature-associated GO and KEGG pathways are significantly enriched in the high-risk (Figures 8A,B) and low-risk groups (Figures 8C,D). The top five KEGG pathways in the high-risk group were RNA degradation, pyrimidine metabolism, ubiquitin mediated proteolysis, cell cycle, and oocyte meiosis. However, the top five KEGG pathways in the low-risk group were complement and coagulation cascades, drug metabolism cytochrome P450, tryptophan metabolism, linoleic acid metabolism, and retinol metabolism.

External Verification of the Prognostic HCVIG Signature
The HCVIG signature was tested in a hepatic cancer cohort from the ICGC database for validation using the same risk score formula and the same cut-off value.
The gene expression pattern, risk distribution, and survival status are shown in Figures 9A-C. The Kaplan-Meier plot showed that patients in the high-risk group had a significantly poorer OS than those in the low-risk group in the validation cohort ( Figure 9D). The HCVIG signature had a good accuracy to predict OS in the validation cohort, with an AUC of 0.763 ( Figure 9E). Multivariate Cox regression analyses showed that gender, stage, prior malignancy, and risk score were found to be independent risk factors for prognosis in patients with hepatic cancer ( Table 1). The relations between risk score and clinicopathological factors are shown in Figure 10.

DISCUSSION
HCV is an RNA virus that integrates its genetic material into the host genome (Hoshida et al., 2014). Persistent HCV infection is a critical factor in fibrosis/cirrhosis and HCC. HCV can infect and replicate in hepatocytes, impair normal functions of other liver cells, and promote fibrosis/cirrhosis (Sasaki et al., 2017). However, the mechanism of HCV infection in the development of HCC remains unclear.
Many reports have shown that ncRNAs play important roles in many cancers (Hansen et al., 2013;Riquelme et al., 2016;Chen et al., 2020). Aberrant expression of ncRNAs is a key player in HCV-related HCC metastasis, invasion, dissemination, and recurrence (Bandiera et al., 2016;Li et al., 2019;Shehab-Eldeen et al., 2019). However, the functional involvement of HCV-induced genes in liver pathogenesis remains to be explored. In the present study, we analyzed a total of 143 differentially expressed HCVIGs in TCGA hepatic cancer dataset using the mRNA expression of 1,582 differentially expressed HCVIGs in GSE70781. With univariate and multivariate Cox regressions, a total of 21 prognostic genes were identified. Finally, the signature of nine prognostic HCVIGs was constructed. The Kaplan-Meier Plotter was used in this study to validate the prognostic value of the nine HCVIGs. To the best of our knowledge, this study was the first to establish a nine prognostic HCVIG signature predicting prognosis in hepatic cancer.
The zinc finger of the cerebellum (ZIC) gene family is composed of five members, Zic1-Zic5 . The transcription factor zinc family member 2 (ZIC2) has been reported to promote proliferation, invasion, and progression in HCC (Zhu et al., 2015;Lu et al., 2017); we found that the overexpression of ZIC2 had a poor prognosis. The cystine/glutamate antiporter solute carrier family 7 member 11 (SLC7A11) is a novel prognostic biomarker for hepatic carcinoma (Zhang et al., 2018). Proline and serine-rich coiledcoil 1 (PSRC1), which is encoded by PSRC1 and is also known as DDA3, has been shown to be involved in HCC (Yang et al., 2011). Transmembrane protein 106C (TMEM106C) is differentially expressed and promotes the development of HCC (Luo et al., 2020). TRAF-interacting protein (TRAIP) is a master regulator of DNA interstrand crosslink repair (Wu et al., 2019). Guo et al. (2020) showed that TRAIP promotes malignant behaviors and is correlated with poor prognosis in liver cancer. Deoxythymidylate kinase (DTYMK) is a novel gene associated with mitochondrial DNA depletion syndrome (MDDS) (Lam et al., 2019). Wang et al. (2020) reported that DTYMK is a crucial gene associated with immune cell infiltration in HCC. Family with sequence similarity 72-member D (FAM72D) is a prognostic gene signature for kidney renal cell carcinoma ; however, the mechanism of FAM72D in HCC remains unclear. Thyroid hormone receptor interactor 13 (TRIP13) is an AAA+ ATPase that plays an important role in mitotic checkpoint . Zhu et al. (2019) showed that elevated TRIP13 drives the AKT/mTOR pathway to induce the progression of HCC by interacting with ACTN4. Zheng et al. (2020) reported that the upregulation of Centromere protein M (CENPM) facilitates tumor metastasis via the mTOR/p70S6K signaling pathway in pancreatic cancer. CENPM plays an important role in HCC (Xiao et al., 2019;Wu and Yang, 2020).
Bioinformatics enrichment analysis showed that 143 differentially expressed HCVIGs were mainly related to DNA replication. DNA replication is a crucial biological process that is tightly regulated to ensure accurate and complete transmission of genetic information to daughter cells (Boyer et al., 2016). DNA replication is fundamental for cellular proliferation and genome stability. Interestingly, many studies have found that DNA replication plays a vital role in liver cancer development (Cervello et al., 2012;Kitao et al., 2018). According to this result, HCVIGs may be associated with host cell DNA replication.
This study has two limitations. First, we were unable to collect sufficient hepatic cancer cases to validate the predictive power of this signature. Second, the mechanism of HCVIGs contained in gene signatures requires further study.

AUTHOR CONTRIBUTIONS
JW analyzed hepatitis C virus-induced gene expression data of hepatic cancer from TCGA database. BW and XG contributed to the writing and editing of the manuscript. All authors read and approved the final manuscript.

FUNDING
This study was supported by the General Program of the National Natural Science Foundation of China (81770537) and the Tianjin Health Commission Science and Technology Personnel Cultivation Project (KJ20103).