Exploring the molecular mechanism of hepatitis virus inducing hepatocellular carcinoma by microarray data and immune infiltrates analysis

The number of new cases of hepatocellular carcinoma (HCC) worldwide reached 910,000, ranking the sixth, 80% HCC is associated with viruses, so exploring the molecular mechanism of viral carcinogenicity is imperative. The study showed that both HBV and HCV associated HCC and non-viral HCC have the same molecular phenotype (low gene expression and inhibition of immune pathways), but in the tumor immune micro-environment, there is excessive M2-type macrophage polarization in virus-associated hepatocellular carcinoma. To address this phenomenon, the data sets were analyzed and identified five hub genes (POLR2A, POLR2B, RPL5, RPS6, RPL23A) involved in viral gene expression and associated with PI3K-Akt-mTOR pathway activation by six algorithms. In addition, numerous studies have reported that M2-type macrophages participate in the hepatic fibro-pathological process of the development of HCC and are regulated by the PI3K-Akt-mTOR pathway. On this basis, the study showed that hepatitis virus causes abnormal expression of hub genes, leading to the activation of the pathway, which in turn promote the differentiation of M2-type macrophages and eventually promote the formation of liver fibrosis, leading to the occurrence of HCC. In addition, these hub genes are regulated by transcription factors and m6A enzyme, and have good prognosis and diagnostic value. With regard to drug reuse, the results suggest that patients with virus-related HCC for whom Cytidine triphosphate disodium salt and Guanosine-5’-Triphosphate are used as supplementary therapy, and may have a better prognosis. In conclusion, the study has identified novel molecules that are carcinogenic to hepatitis viruses and are expected to serve as molecular markers and targets for diagnosis and treatment.


Introduction
Hepatic malignant tumors can be divided into two major categories: primary and secondary. Primary hepatocellular carcinoma is one of the common malignant tumors in China with a high mortality rate, ranking third in the sequence of deaths from malignant tumors after stomach and esophagus (1). Hepatocellular carcinoma (HCC) is the major subtype of primary hepatocellular carcinoma. In 2020, there were more than 910,000 new HCC cases and 830,000 deaths, which has become a serious public health problem (2). The latest research results have shown that the occurrence of HCC is mainly related to hepatitis B virus (HBV) and hepatitis C virus (HCV) infection (3). At the same time, excessive drinking and smoking are also related to the occurrence of HCC (4).
Systematic understanding of the virus involvement in the occurrence, development and metastasis of HCC is conducive to the early diagnosis and accurate treatment of patients. In theory, the persistent inflammation of hepatocytes caused by viral infection promotes the formation of hepatic fibrosis, which is the basis for the development of HCC (5). Chronic infection caused by HBV to the human body, and chronic hepatitis B, compensated cirrhosis, and decompensated cirrhosis to the onset of HCC are the main pathways for the development of HBV-associated HCC (HBV-HCC) (6). It has been found that the HBx protein carried by HBV can regulate the PI3K-Akt pathway of host hepatocytes, and then activate and release of excessive TGF-b to participate in the occurrence of liver fibrosis (7). TGF-b is a key cytokine involved in fibrogenesis and can be specifically activated by PI3K/Akt (8), which may be an important factor in the induction of HCC by HBV and HCV (9,10). In addition, studies have found that abnormal PI3K/ AKT/mTOR signaling pathway is closely related to HCC resistance (11). Some studies have demonstrated that N6methyladenylate (m 6 A) modification is involved in the progression of hepatitis B virus-related liver fibrosis by regulating the infiltration of immune cells (12), and at the same time, HBx carried by HBV can interact with the methylase METTL3, which is closely related to the development of HCC (13).
Despite the deepening understanding of the etiology of HCC, the available diagnosis and treatment plan has little effect. Microarray sequencing technology has been applied to genome detection for in-depth exploration of the viral carcinogenicity and tumor development mechanism. However, results from single microarray or low sample size analyses are difficult to gain more recognition. In this study, a prospective method was used that differed from conventional experiments, the expression profiles of HCC (GSE87630), HBV-HCC (GSE55092), and HCV-HCC (GSE19665) from the GEO database were integrated. Differentially expressed genes (DEGs) were identified in HCC, HBV-HCC, and HCV-HCC, compared to normal liver tissue. DEG was analyzed for gene set enrichment using "h.all. v7.4" and in addition, three types of HCC were analyzed for tumor immune invasion using CIBERSORT. To further analyze the molecular mechanisms of the involvement of the two hepatitis viruses in the development of HCC, an ExpressAnalyst was used to combine the three data sets. Compare with HCC, and in contrast to HCC, the DEG of HBV-HCC and HCV-HCC were identified, and two cohomology genes were crossed to explore the molecular basis of viral involvement in HCC. Six sub-networks of key molecular interactions were obtained using the MCODE method, and gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG), analyses were performed using the DAVID database. In addition, the hub genes were obtained by analyzing the subnetworks using six algorithms, and the correlation was explored between those genes and immune-infiltrating cells using Pearson correlation analysis. A separate cohort (GSE121248 and GSE69715) was integrated with HCC (GSE87630) to verify the stability of hub gene expression. Meanwhile, the protein levels of those genes were verified in the human protein map (HPA). The prognostic value of hub genes was verified on the UALCAN data analysis platform, and the receiver operating characteristic (ROC) curve was used to analyze the diagnostic value of hub genes in distinguishing HCC tissues (HBV-HCC and HCV-HCC) from normal liver tissues. Transcription factor and m 6 A methylation predictions were performed using hTFtarget and m6A2Target and validated in microarrays for hub gene. Therapeutic drugs related to genes were explored and developed by STITCH, and effect intensity analysis was performed on computer simulation software (Schrodinger). The aim of this study is to provide new insights into the pathogenesis of viral involvement in HCC and to identify new prognostic markers and precise drug targets through comprehensive analysis.

Data extraction, processing and consolidation
All data sets (Table 1) were from GEO (https://www.ncbi. nlm.nih.gov/geo) database, were log transformed and normalized. The differentially expressed genes (DEGs) between hepatocellular carcinoma (HCC) tissues and normal liver tissues were screened out by using "limma" function. The screening criteria for DEGs were adjusted P value < 0.05 and LogFC≥1. Volcanic map of gene distribution by using that ggplot2 function. Batch effects were removed from the dataset using "combat" function by the online platform ExpressAnalyst (https://dev.expressanalyst.ca/ExpressAnalyst/).

GSEA and immune infiltration analysis
To clarify the gene effects caused by DEGs of HCC, the R package "GSEA" was utilized to obtain the GSEA enrichment scores of hallmark pathways (h.all.v7.4.entrez) (14). HCC immune infiltration analysis was performed on the dataset using the "CIBERSORT" method on the Sangerbox3.0 platform (http://vip.sangerbox.com). Due to the small sample size of GSE19665 for HCV-HCC and the potential for large deviations in analysis results, an external independent dataset, GSE69715, was selected for immune invasion analysis.

Analysis of protein-protein interaction network and functional enrichment
Interaction between intersecting gene were analyzed using a STRING database (15). The MCODE functional module was used to cluster the genes and construct a gene sub-network in cytoscape (version 3.8.3). The DAVID 2021 was used for analyzing GO and KEGG pathway of module gene (16). P<0.05 is considered to be statistically significant.

Gene screening and co-expression network construction
In order to identify the hub genes in the common genes, the six algorithms (Closeness, Stress, EPC, Degree, MNC, and Radiality) of the "cytohubba" module were used for gene sequencing, and the intersection analysis of the top 30 hub genes was performed using the R software "Upset" package. Gene co-expression analysis and functional enrichment analysis of the common hub genes were performed using the GeneMania platform (17).

Correlation analysis of gene expression
To explore the association between hub genes and M2 macrophage infiltration, Pearson's method was used for gene association analysis on HBV-HCC and HCV-HCC data sets.
Due to the small sample size of GSE19665 for HCV-HCC and the potential for large deviations in analysis results, an external independent dataset, GSE69715, was selected for gene correlation analysis.

Analysis of mRNA and protein expression
Expression data of hub genes in HCC with different stages and HCC with TP53 mutation were obtained from UALCAN (18), which is a platform for processing data from TCGA. Extraction of protein levels of viral carcinogenic hub genes in HCC Tissue and Normal Liver Tissue from the Human Protein Atlas (HPA) (19). The staining intensity was divided into strong, medium, weak or negative. Expression levels included high, medium, low, and none detected. The proportion of stained cells was three-tiered (> 75%, 25-75%, or < 25%).

Survival analysis and ROC curve drawing
The survival analysis was analyzed by using the UALCAN. Survival analysis was performed by Kaplan-Meier and log-rank test. The expression levels of hub genes were applied for ROC analysis to estimate their diagnostic significance to distinguish between HCC and normal in two independent external sets (GSE121248 and GSE69715) and internal sets (GSE55092 and GSE19665). The area under curve (AUC) > 0.5 was considered to have diagnostic value.

Transcriptional factor and m 6 A enzyme prediction analysis
In order to find the molecules that regulate the hub genes, hTFtarget database was used to predict the TFs of the hub genes and validated the expression levels in the data set (20). Because of the extensive presence of m 6 A enzyme modification after RNA transcription, m 6 A enzyme prediction (21) and validation on the hub genes were performed, and constructed a network diagram based on the interaction relationship.

Drug screening based on hub genes
The sequence of hub gene was obtained from NCBI and homology modeling was performed using Swiss Mode (22). Schrodinger Glide and IFD modules were used to molecular dock the hub gene structure with the best score with drug molecules. The lower binding energy of drug to target (hub gene) indicates more stable binding. Docking mode diagrams are drawn using LigPlus (23).

DGE identification and GSEA in HCC
We compared the gene expression in Hepatocellular carcinoma (HCC) with that in the normal tissue, and obtained 1158 DEGs, among which 819 genes were down-regulated, accounting for more than 70% ( Figure 1A). In HBV-HCC and HCV-HCC, the down-regulated genes account for 63% and 83% of DEG, respectively ( Figures 1B, C). Downstream analysis of DGE using GSEA revealed that virus associated HCC was similar to HCC, mainly manifested are the activation of DNA damage and G2M checkpoint, and inhibition of hypoxia, apoptosis and immune response (Figures 1D-F).

Immune infiltration analysis in HCC
As both the virus-induced HCC and the HCC developed immunologic derangement, we performed immune infiltration analysis on them to distinguish the immune cell levels. The results showed that CD4 T cells and Treg cells were the common manifestations of them (Figures 2A-C). Interestingly, M2-type macrophages and plasma cells proliferate extensively in virusinduced HCC and cause infiltration of mast cells resting in HBV-HCC, but are not observed in HCC.
Differential gene analysis and GSEA analysis in hepatocellular carcinoma (HCC). Gene expression and GSEA analysis of HCC (A, D), HBV-HCC (B, E) and HCV-HCC (C, F) by using R 3.6.3.

Genetic differences between virusinduced HCC and HCC
In order to further clarify the molecular mechanism of virus participation in HCC, the above three data sets were combined in the study. See Figure S1 for quality control. Differential gene analyses of HBV-HCC and HCC, HCV-HCC and HCC, revealed that compared with HCC, HBV-HCC had 3,141 down-regulated genes and 2,975 up-regulated genes ( Figure 3A), and HCV-HCC had 2,619 down-regulated genes and 2,555 up-regulated genes ( Figure 3B). In order to find the possible molecular phenotype of HCC caused by the combination of the two viruses, Venn mapping of all up-regulated and down-regulated genes was conducted, and was found that 1859 down-regulated genes and 1815 up-regulated genes participated in the development of HCC ( Figure 3C). The difference between HCC and virus-associated HCC is that the two expression patterns differ significantly, and this expression pattern may be related to viral carcinogenicity.

Protein-protein interaction analysis and enrichment analyses
Further analysis of the molecular patterns of two hepatitis viruses participating in HCC. We performed protein interaction analysis and MCODE analysis on the intersecting genes. Due to the limitation of the database on the number of genes, a interaction analysis was conducted on the up-regulated and down-regulated genes respectively and the first three molecular networks of down-regulated genes with the scores of 35.925, 14.615 and 12.414 ( Figure S2A), and the first three networks of up-regulated genes with the scores of 13.762, 12.557 and 10.618 ( Figure S2B) were obtained. Considering the molecular cascade effect, combining the genes of the above six sub-networks and conducting GO and KEGG analysis, it is clear that the modular gene is related to the methylation activation of mRNA splicing, mRNA processing and tRNA methylation ( Figure 4A), and in addition, it is also involved in the PI3K- Akt pathway, viral carcinogenesis, mTOR signaling pathway, hepatitis B, and hepatitis C ( Figure 4B).

Hub genes screening and co-expression network construction
To obtain biomarkers or drug targets involving carcinogenesis of hepatitis virus, the study identified 16 potential hub genes using six algorithms ( Figure 5A), The related functions and co-expression network of those genes were analyzed on the basis of the database. It shows the complex network with the physical interactions of 34.30%, coexpression of 32.68%, predicted of 24.59%, and co-localization of 2.73% ( Figure 5B). It is noteworthy that, ribosomal protein L23A (RPL23A), ribosomal protein S6 (RPS6), ribosomal protein L5 (RPL5), RNA polymerase II subunit A (POLR2A) and RNA polymerase II subunit B (POLR2B) participated in viral gene expression. In addition, POLR2A, POLR2B and heterogeneous nuclear ribonucleoprotein A2/B1 (HNRNPA2B1) are related to post-transcriptional gene silencing by RNA.

Correlation analysis between hub genes and immune pathway
Since a large number of M2-type macrophages infiltrate HCC caused by HBV and HCV, and the activation of PI3K-Akt-mTOR pathway is related to the M2 polarization of macrophages (11,24), Through literature review, it was found that 13 differential genes (25)(26)(27)(28)(29)(30)(31)(32)(33)(34)(35)(36), including mTOR, PIK3CA, AKT2, and PIK3CB, involved in the M2-type polarization of macrophages, according to the genes related to M2-type polarization. We first analyzed the expression of these 13 genes in HBV-HCC and HCV-HCC, and found that mTOR was down-regulated while other genes were up-regulated ( Figure  S3), which was consistent with the molecular characteristics of macrophage polarization. Furthermore, the analysis of the correlation between hub genes and these genes was carried out, and the results emphasized that in the HBV-HCC data set, RPL5 had a positive correlation with the expression of six genes including PIK3CB, KRAS, and MAPK1 ( Figure 6A). However, the expressions of POLR2A and POLR2B were negatively correlated with mTOR expression, but positively correlated with KRAS expression. In HCV-HCC, RPL23A is positively correlated with RAC1, JAK1, PIK3CD, etc. ( Figure  6B). Similarly, PIK3CB is positively correlated with the expression of POL2A, but negatively correlated with PLOR2B. The regulation of PIK3CB by hub genes may be an important reason for the polarization of M2-type macrophages.

mRNA and protein expression verification of hub genes
In order to validate the hub gene-expression stability, independent queues that contained HBV-HCC and HCV-HCC were selected for the analysis of these hub genes expression levels. The results demonstrated that while comparing with HCC, the POLR2A was significantly downregulated, however POLR2B, RPL5, RPS6 and RPL23A were significantly up-regulated in HBV-HCC and HCV-HCC ( Figure 7). Moreover, the relevance between hub genes and clinical characteristics (cancer stage and TP53 mutation) in HCC patients was analyzed using the UALCAN. Higher mRNA expression of gene is associated with staging in A B FIGURE 5 Hub genes screening and gene co-expression analysis. patients with HCC ( Figure S4). Similarly, the mRNA expression levels of hub genes were higher in HCC with TP53 mutation. Moreover, the protein expression levels of POLR2B, RPS6 and RPL23A in HCC tissue were higher than those in normal liver tissue by observing the immunohistochemistry results ( Figure 8).

Prognostic analysis and diagnostic value of hub genes
The survival analysis of hub gene was carried out by TCGA data set (HCC) in UALCAN. Although POLR2A expression was not associated with patient survival rate ( Figure 9A), the results reveled that high expression of POLR2B (P <0.01), RPL5 (P < 0.001), RPS6 (P < 0.05), and RPL23A (P < 0.01) were concerned with shorter survival rates ( Figures 9B-E). In summary, POLR2B, RPL5, RPS6, and RPL23A may be used as biomarkers to estimate the prognosis of HCC patients. To determine the diagnostic significance of these genes in virusinduced HCC and normal, ROC analysis was performed using data from internal sets and AUC values for all five genes were greater than 0.5 ( Figures 10A, B). In general, the diagnostic value of viral oncogenes in HBV-HCC is higher than that in HCV-HCC. The same results appeared in separate external data sets ( Figures 10C, D). Therefore, these genes have value for further development as diagnostic biomarkers in virus-related HCC.

Prediction and validation of transcription factor and m 6 A enzymes
As the high expression of hub genes is involved in the development of HCC caused by virus, we further explored the molecular mechanism for regulating the hub genes. Transcription factor (TF) prediction has revealed that a variety of TF are involved in the regulation of hub genes expression ( Figure 11A), and it was identified in the internal data set that high expression of 9 TFs and low expression of 5 TFs may be important causes of abnormal changes in hub genes ( Figure S5). Because the module gene is closely related to RNA methylation ( Figure 4A), the hub genes was predicted by m 6 A methylation and verified by internal data sets. The results showed that the hub genes were regulated by m 6 A writer and reader proteins ( Figures 11B, S6). Analyzing the prognosis of m 6 A, identified that the high expression of mRNA-splicing regulator WTAP (WTAP), Putative RNA-binding protein 15B (RBM15B) and N6-adenosine-methyltransferase catalytic subunit (METTL3) were related to the survival time of HCC patients ( Figure S7).

Drug screening and computer simulation analysis of hub genes
As the high mRNA and protein expressions of POLR2B, RPS6, and RPL23A in HCC tissues and participate in the expression of viral genes, which is closely related to the prognosis and diagnosis of patients, compound-protein interaction analysis was conducted ( Figure 12). Cytidine triphosphate disodium salt (ara-CTP) and Guanosine-5'-Triphosphate (guanosine trip) have the ability to inhibit hepatitis virus replication (37,38). A semi-flexible (Glide) and induced-fit docking (IFD) methods were used in the study, and it was found that the two drugs had low binding energy to the hub genes and more hydrogen bond interactions (Table 2, Figure S8).

Discussion
The carcinogenic effect of virus is obvious, especially virusrelated hepatocellular carcinoma (HCC). However, its carcinogenic mechanism still needs to be further clarified. We started from three independent cohorts of HCC and found their differences. Both DNA damage and G2M checkpoint inhibition are caused by the over-expression of HBx protein carried by HBV (39,40), which is involved in the development of HCC. Therefore, targeted regulation of these two processes may be an important strategy for the treatment of virus-related HCC. The immune response that lymphocytes are widely suppressed is related to the development of HCC (41,42). Hence the infiltration of two T cell subtypes in the lesion site decrease, was observed. In addition, high-expression M2-type macrophages are highly correlated with tissue fibrosis and tumor immune escape (43). Unlike HCC, virus-related HCC shows a large number of M2-type macrophages infiltration, which may be closely related to the viral carcinogenicity. Further comparison analysis was performed on the difference in molecular expression between virus-associated HCC and HCC, and it was unfolded that the key network genes of virusassociated HCC were mainly involved in the chemical modification of RNA, which is closely related to the poor prognosis of tumor patients (44). These genes are inevitably involved in the activation of immune pathways and the development of hepatitis. It is significant that PI3K-Akt and mTOR pathways are involved in the polarization of M2-type macrophages (24,45).
To further screen for molecular markers or targets, five hub genes (POLR2A, POLR2B, RPL5, RPS6 and RPL23A) were identified which are involved in viral gene expression using six algorithms. The existing evidence found that these five genes participated in viral gene expression (46-52), while the Overall survival analysis of hub genes. Survival curve for POLR2A (A), POLR2B (B), RPL5 (C), RPS6 (D) and RPL23A (E) in HCC patients from TCGA database. The expression data were analyzed using log-rank test.
A B FIGURE 11 Prediction and validation of TF and m 6 A enzymes. In the TFs regulatory network (A), the inside is the central gene, and the outside is the transcription factor. Red represents up-regulation, and green represents down-regulation. In the m 6 A regulatory nerwork (B), the red lines represented proteinprotein interactions, the green lines represented protein-RNA interactions, and the blue lines represented protein DNA interactions.
A B D C FIGURE 10 ROCcurve analysis to assess the diagnostic value of hub genes in differentiating HBV-HCC or HCV-HCC from liver tissues. (A, B) were HBV-HCC and HCV-HCC, respectively, in the internal data set, (C, D) were HBV-HCC and HCV-HCC, respectively, in the external verification set.
Zhang et al. 10.3389/fimmu.2022.1032819 mechanism involved in liver cancer was not reported. For this reason, the relationship between the hub genes and PI3K-Akt/ mTOR pathway was explored. The results showed that the hub genes were positively correlated with the expression of multiple targets in the pathway. A large number of studies have reported that M2-type macrophage infiltration promotes the pathological process of hepatic fibrosis in HCC (53-55). It was found significant that these hub genes caused the M2-type polarization of macrophages by activating the PI3K-Akt-mTOR pathway, and then led to hepatic fibrosis and participated in the development of HCC. In view of the carcinogenic effect of the hub gene, was further verified by independent cohort from GEO and the results identified that the hub gene POLR2A was low expressed in virus-related HCC, and POL2B, RPL5, RPS6, and RPL23A were high expressed, which were consistent with the expression in the internal data sets. To expand whether the hub gene expression pattern was the same as HCC, high expression of the hub genes mRNA in tumor tissues was found during validation of the HCC data set in the TCGA database. The differential expression of POLR2A may be due to its association with only virus-associated HCC and not with HCC. The HPA database shows that the proteins of POL2B, RPS6 and RPL23A are highly expressed in HCC tissues, and are expected to be molecular targets for antitumor. Furthermore, it was unveiled that high expression of hub genes, except POLR2A, was associated with a poor prognosis in FIGURE 12 Drug-hub gene interacvtion. Green rhombus is medicine, and ellipse is gene including POLR2B, RPS6 and RPL23A. Note: Glide docking is based on geometry and is used to search for possible binding sites of drugs. IFD is based on the "lock-key theory" and is used as a method to evaluate drug affinity.
patients with HCC. For target-based drug reuse, the ara-CTP and guanosine trip can form multi-hydrogen-bond complexes with POL2B, RPS6 and RPL23A, with low binding energy, suggesting that they may have a better prognosis when used as adjuvant therapy for patients with virus-associated HCC, which is clear from the results. For the regulation of expression of hub genes, it was speculated that abnormal expression of 14 transcription factors associated with high expression of hub genes through database and internal validation, and the m 6 A enzyme may play an important role in post-transcriptional modification. In conclusion, this study identified the novel molecules involved in virus-associated HCC, which can provide a reference for the diagnosis and treatment of patient.

Data availability statement
Publicly available datasets were analyzed in this study. This data can be found here: https://www.ncbi.nlm.nih.gov/ geo/browse.

Author contributions
Y-ZZ and L-FC performed the study and wrote the manuscript. AZ performed drug screening based on hub genes. All authors contributed to the article and approved the submitted version.