ORIGINAL RESEARCH article

Front. Oncol., 28 May 2020

Sec. Cancer Genetics

Volume 10 - 2020 | https://doi.org/10.3389/fonc.2020.00847

Big Data-Based Identification of Multi-Gene Prognostic Signatures in Liver Cancer

  • ML

    Meiliang Liu 1

  • XL

    Xia Liu 2

  • SL

    Shun Liu 1

  • FX

    Feifei Xiao 3

  • EG

    Erna Guo 1,4

  • XQ

    Xiaoling Qin 1

  • LW

    Liuyu Wu 1

  • QL

    Qiuli Liang 1

  • ZL

    Zerui Liang 1

  • KL

    Kehua Li 1

  • DZ

    Di Zhang 1

  • YY

    Yu Yang 1

  • XL

    Xingxi Luo 1

  • LL

    Lei Lei 1

  • JH

    Jennifer Hui Juan Tan 5

  • FY

    Fuqiang Yin 6,7*

  • XZ

    Xiaoyun Zeng 1,7*

  • 1. School of Public Health, Guangxi Medical University, Nanning, China

  • 2. Key Laboratory of Longevity and Ageing-Related Disease of Chinese Ministry of Education, Centre for Translational Medicine and School of Preclinical Medicine, Guangxi Medical University, Nanning, China

  • 3. Department of Epidemiology and Biostatistics, University of South Carolina, Columbia, SC, United States

  • 4. School of International Education, Guangxi Medical University, Nanning, China

  • 5. School of Life Sciences and Chemical Technology, Ngee Ann Polytechnic, Singapore, Singapore

  • 6. Life Sciences Institute, Guangxi Medical University, Nanning, China

  • 7. Key Laboratory of High-Incidence-Tumor Prevention and Treatment, Guangxi Medical University, Ministry of Education, Nanning, China

Abstract

Simultaneous identification of multiple single genes and multi-gene prognostic signatures with higher efficacy in liver cancer has rarely been reported. Here, 1,173 genes potentially related to the liver cancer prognosis were mined with Coremine, and the gene expression and survival data in 370 samples for overall survival (OS) and 319 samples for disease-free survival (DFS) were retrieved from The Cancer Genome Atlas. Numerous survival analyses results revealed that 39 genes and 28 genes significantly associated with DFS and OS in liver cancer, including 18 and 12 novel genes that have not been systematically reported in relation to the liver cancer prognosis, respectively. Next, totally 9,139 three-gene combinations (including 816 constructed by 18 novel genes) for predicting DFS and 3,276 three-gene combinations (including 220 constructed by 12 novel genes) for predicting OS were constructed based on the above genes, and the top 15 of these four parts three-gene combinations were selected and shown. Moreover, a huge difference between high and low expression group of these three-gene combination was detected, with median survival difference of DFS up to 65.01 months, and of OS up to 83.57 months. The high or low expression group of these three-gene combinations can predict the longest prognosis of DFS and OS is 71.91 months and 102.66 months, and the shortest is 6.24 months and 13.96 months. Quantitative real-time polymerase chain reaction and immunohistochemistry reconfirmed that three genes F2, GOT2, and TRPV1 contained in one of the above combinations, are significantly dysregulated in liver cancer tissues, low expression of F2, GOT2, and TRPV1 is associated with poor prognosis in liver cancer. Overall, we discovered a few novel single genes and multi-gene combinations biomarkers that are closely related to the long-term prognosis of liver cancer, and they can be potential therapeutic targets for liver cancer.

Introduction

Liver cancer is the sixth most common cancer and the fourth leading cause of cancer-related deaths (1). Specifically, hepatocellular carcinoma (HCC) accounts for more than 90% of liver cancer cases from a histopathological perspective. According to the GLOBOCAN 2018 database, there are about 841,000 new HCC cases and 782,000 related deaths worldwide each year, with China accounting for nearly half of the total number of global HCC cases and deaths (2, 3). In China, the Guangxi province has higher morbidity and mortality rates than the national average (4). The high mortality and poor prognosis of HCC poses a global challenge. Despite the slight increase in the 5-year survival rate of liver cancer in China from 10.1 to 12.1% over the periods of 2003–2015, it still remains at a low level (5). A survival analysis of 2, 887 liver cancer patients in 14 years showed that the 1-year, 3-year, and 5-year survival rates were 49.3, 26.6, and 19.5%, respectively (6).

Although there are many existing therapies for HCC including surgical resection, transplantation, ablation, and transcatheter chemoembolization, etc., the long-term survival of HCC patients remains poor due to their limited indications and different effects on prognosis (710). A 20-year prospective cohort analysis reported that the 5-year survival rates of TNM stage I, II, IIIA, and IVA patients after hepatectomy were 81.7, 77.2, 44, and 28.2%, respectively (11). Therefore, it is of crucial importance to explore new prognostic biomarkers and investigate treatment strategies to improve the overall prognosis of HCC patients.

Currently, the research on prognostic molecular markers of HCC is still ongoing, and many single-gene or multi-gene combination molecular markers related to HCC invasion, metastasis and prognosis are being gradually discovered. For example, the expression of HMGA1 in HCC is associated with poor prognosis and is found to promote tumor growth and migration in vitro (12). The overexpression of SYPL1 is associated with epithelial-mesenchymal transition (EMT) of HCC cells and can predict the prognosis of HCC (13). RBM8A and SIRT5 promote the migration and invasion of HCC cells by activating the EMT signaling pathway and targeting E2F1 (14, 15), respectively (16, 17). The EpCAM (18), a liver X receptor (LXR) (19), SPAG5 (20), and KOR (21) have been shown to be strongly correlated with HCC metastasis, invasion, or prognosis. Arginase-1, FTCD, and MOC-31 have a good performance in the diagnosis of HCC (22). TMEM88, CCL14, and CLEC3B can serve as potential prognostic markers of HCC (23). At the same time, some multi-gene combined prognostic studies on HCC have also been reported. For example, three genes (UPB1, SOCS2, RTN3) combination markers (24) and four genes (CENPA, SPP1, MAGEB6, HOXD9) combination models can predict the overall survival in patients with HCC prognosis (25).

However, due to the sample size limitation and the heterogeneity of the samples in different studies, the efficiency of the identified prognostic markers for liver cancer still has ample space to improve. In addition, because of the myriad of gene interaction capabilities and the possibility of synergistic promotion of disease progression, it is of great significance to find some multi-gene combinations that may have better prognostic efficacy than single genes for prognostic targets of liver cancer. Therefore, the leverage of the large sample sizes of the public data platforms, integrating new and effective mining and screening methods, as well as reliable experimental verification is a very promising direction for the discovery of multiple effective single genes and multi-gene combination prognostic markers of liver cancer.

High-throughput profiling technologies and bioinformatics methods are now being applied to all fields of biomedical research. A mass of cancer data, such as the mRNA expression, copy number variation, single nucleotide polymorphism (SNP), and microRNA expression generated by those tools are collected in public archives such as The Cancer Genome Atlas (TCGA) (http://cancergenome.nih.gov/), Coremine (http://www.coremine.com/medical/), Oncomine (https://www.oncomine.org/resource/login.html), Gene Expression Omnibus database (GEO, https://www.ncbi.nlm.nih.gov/geo/), etc. Making full use of the public data from these databases is meaningful for exploring and discovering effective HCC prognostic biomarkers. For instance, Li et al. (24) developed a three-gene prognostic signature composing of three genes UPB1, SOCS2, and RTN3, which was revealed to have prognostic value for HCC patients based on TCGA data. Our previous study used data retrieved from the Coremine, TCGA, and GEO database and discovered that high-expressed E2F transcription factor 3 is associated with poor prognosis of HCC (26).

In this study, we used text mining approach to find the medial related candidate gene list for liver cancer prognosis, and a total of 1,173 genes that might be related to the prognosis of liver cancer were finally obtained. The association of the 1,173 genes with overall survival (OS) and disease-free survival (DFS) was accessed in a large sample of TCGA cohort, in which the subgroups of 319 patients with DFS and 370 with OS were available. The survival analyses are carried out for each of these genes to identify single prognostic markers. Moreover, we performed survival analyses of the gene combinations and performed multiple screening for these HCC prognostic molecular markers, revealing the association between the expression of numerous genes or gene combinations and the survival in HCC patients. We then compared the ability of single genes and multiple gene combinations to predict the prognosis of HCC. Moreover, a huge difference between high and low expression group of these three-gene combinations was detected, with median survival difference of DFS up to 65.01 months, and of OS up to 83.57 months. The high or low expression group of these three-gene combinations can predict the longest prognosis of DFS and OS is 71.91 months and 102.66 months, and the shortest is 6.24 months and 13.96 months. Among the above genes that may be strongly correlated with the prognosis of HCC identified in large sample data, it was found that the combination of the three genes F2, GOT2, and TRPV1 that have not been systematically reported has a strong ability to predict the prognosis of HCC. We further verified F2, GOT2, and TRPV1 by three independent expression profile microarray data for liver cancer acquired from the Oncomine database, and conducted the quantitative real-time polymerase chain reaction (qRT-PCR) in 20 pairs of HCC and adjacent tissues, and immunohistochemistry (IHC) staining in 90 pairs of HCC and its precancerous tissues. These results validated that the low expression of F2, GOT2, and TRPV1 in liver cancer was associated with the poor prognosis of liver cancer.

Materials and Methods

Data Sources

We combined 3 corresponding concepts of the key word “liver cancer” with 2 concepts of the key word “prognosis” and 10 concepts of the key word “outcome,” respectively, (Supplementary Table S1), and searched for their corresponding genes or proteins in the Coremine database (http://www.coremine.com/medical/). After deleting duplicates, we selected 1,173 gene entries with p-values < 0.05 that might be associated with the prognosis of liver cancer for further analyses (Supplementary Table S2).

The above genes mined in the Coremine database include some genes obtained from other gene-mining reports; however, the number of samples and data standards in each report is different. Therefore, we selected the cohort of The Cancer Genome Atlas (TCGA) (http://cancergenome.nih.gov/), a database with consistent sample size and data standards, to conduct unified batch verification of these genes and conduct three-gene combinations survival analyses.

We studied the relationship between each of the selected 1,173 genes and the prognosis of liver cancer patients in TCGA cohort which downloaded from cBioPortal for Cancer Genomics (https://www.cbioportal.org/) in September 2018 (27, 28), and a subgroup of 319 liver cancer samples with HCC DFS corresponding follow-up data and a subgroup of 370 liver cancer samples with HCC OS corresponding follow-up data were chosen.

Survival Analysis and Gene Selection

Kaplan-Meier estimation of survival functions and Log-rank tests were used to evaluate effect of genes on DFS and OS. The Cox proportional hazard model was performed for multivariate analyses of HCC prognosis. Survival analyses were performed using the R survival package in R (version 3.3.1). The Kaplan-Meier survival curves and Cox proportional hazards regression model for DFS and OS were generated by IBM SPSS (version 23.0). The median expression level of a gene was used as a cutoff value for the classification of patients into high and low expression groups (29).

Human Tissue Samples

For the validation studies, we used 20 patients who underwent primary and curative hepatectomy from Apr 2016 to Apr 2018 at the First Affiliated Hospital of Guangxi Medical University. Those patients who have distinctive pathologic diagnosis of HCC without preoperative anticancer treatment were eligible for inclusion in this study. The paraffin-embedded pathologic specimens were collected during surgery and stored in a liquid nitrogen tank until the step of mRNA isolation. All patients received an explanation for the purpose of the study and signed informed consent. The Ethics Committee of Guangxi Medical University granted approval for this study. For IHC, a commercial biological tissue microarray containing 90 pairs of HCC and adjacent normal liver tissues was constructed by the Biological sample library of Shanghai Outdo Biotech Company, and the survival information of each case was usable. (Microarray: HLivH180Su14).

Quantitative Real-Time Polymerase Chain Reaction (qRT-PCR)

QRT-PCR was performed to evaluate the mRNA expression of selected genes in 20 HCC and their matched precancerous tissues. Total RNA was isolated using Trizol reagent (Life Technologies, Inc., NY, USA) according to the manufacturer's instructions. The concentration and purity of the total RNA were detected using Microplate reader (Bioteck Instruments, Inc., VT, USA). RNA reverse transcription was then performed with the PrimeScript™ RT reagent Kit (Takara Biomedical Technology (Beijing) Co., Ltd.) with gDNA Eraser (Perfect Real Time), and qRT-PCR was performed using the TB GreenTM Premix Ex TaqTM II (Tli RNaseH Plus) kit (Takara Biomedical Technology (Beijing) Co., Ltd.) protocol in a StepOnePlus system (Applied Biosystems. Life Technologies Holdings Pte Ltd, Singapore).

The sequences of the primers are as follows: F2: forward primer, 5′-CTGAGGGTCTGGGTACGAACT-3′, reverse primer, 5′-TGGGTAGCGACTCCTCCATAG-3′; GOT2: forward primer, 5′-AAGAGTGGCCGGTTTGTCAC-3′, reverse primer, 5′-AGAAAGACATCTCGGCTGAACT-3′; TRPV1: forward primer, 5′-TGCACGACGGACAGAACAC-3′, reverse primer, 5′-GCGTTGACAAGCTCCTTCAG-3′. The cycle conditions are as follows: after an initial incubation at 95°C for 30 s, the samples were cycled 40 times at 95°C for 5 s and 60°C for 30 s. The relative expression level of each gene in the individual samples was calculated using the 2ΔΔCt method and normalized using GAPDH as an endogenous control.

Immunohistochemistry (IHC)

EnVision™ FLEX+, Mouse, High pH, (Link) (K8002, Dako) was used for the immunohistochemistry. After the tissue chips were baked and placed in LEICAST5010 (LEICA), PT Link (Dako North America, Inc.) was used for antigen retrieval. Primary antibodies were diluted (F2, 1:3000; GOT2, 1:80000; TRPV1, 1:1500) and incubated overnight at 4°C. The secondary antibody reactions were carried out using the Autostainer Link 48 (Dako North America, Inc.), the sections were subjected to color development with the DAB chromogenic kit, and finally counterstained with Hematoxylin (SLBT4555, Sigma Aldrich). The following antibodies were used: F2, 1: Anti-Thrombin (ab83981; Abcam), GOT2, 1: Anti-FABP-1 (ab171739; Abcam), TRPV1, 1: Anti-VR1 (ab3487; Abcam). All slides were evaluated by two independent pathologists who were blind about the clinicopathologic data.

The expression levels were scored as the staining intensity (0, negative; 1+, weak; 2+, moderate; 3+, strong) multiplied by the proportion of immunopositive staining area (0, < 25%; 1+, 25–50%; 2+, 50–75%; 3+, >75%) intensity of staining. Expression scores <5 were considered as “low expression,” and scores ≥5 were considered as “high expression.”

Statistics

Statistical analyses were conducted using R 3.3.1 (Auckland, NZ) and IBM SPSS 23.0 (Chicago, USA). McNemar test was used to test the paired 4-fold table experimental data of IHC. The paired t-test was used to analyze the qRT-PCR experimental data. Except for single-gene survival analyses and three-gene prognosis survival analyses with p-value < 0.01 as statistically significant, other statistical analyses were considered statistically significant with two-sided p-value < 0.05.

Results

Selection of Genes Related to Liver Cancer Prognosis and Liver Cancer Samples

We combined 3 corresponding concepts of the key word “liver cancer” [Liver neoplasms (alias Liver Cancer) (disease) (60,666 connections); Liver carcinoma (alias liver cell cancer) (disease) (55,739 connections); Carcinoma, Hepatocellular (alias Adult Liver Cancer) (mesh) (57,034 connections)] with 2 corresponding concepts of the key word “prognosis” [Prognosis (mesh) (77,312 connections); Prognostic Marker (alias Prognosis Marker) (chemical) (22,056 connections)] and 10 corresponding concepts of the key word “outcome” [Fatal Outcome (mesh) (34,016 connections); Outcome Assessment (Health Care) (alias Outcome Study) (mesh) (48,296 connections); Outcome studies (procedure) (9,545 connections); Treatment Outcome (mesh) (77,246 connections); Outcomes research (procedure) (5,540 connections); Outcome monitoring (procedure) (2,030 connections); Patient-focused outcomes (procedure) (3,830 connections); Treatment outcome in HSR (procedure) (998 connections); Patient Reported Outcome Measures (alias Patient Reported Outcome) (mesh) (2,301 connections); Patient Outcome Assessment (mesh) (9,066 connections)], respectively, (Supplementary Table S1), and searched for their corresponding genes or proteins in the Coremine database (http://www.coremine.com/medical/). With p-values < 0.05 as the criteria, a total of 1,173 genes that might be related to the prognosis of liver cancer were finally obtained after screening and elimination of duplicates. As the samples of liver cancer in the Coremine database were not uniform enough, we selected 319 samples for DFS and 370 samples for OS of liver cancer from the TCGA database and obtained the corresponding survival data as well as the expression information of the above 1,173 genes in these samples. This was necessary to carry out the subsequent survival analyses of these genes for liver cancer.

The Single Genes Prognostic Analyses

To clearly describe our process of screening genes, a flowchart of the analysis procedure was developed (Figure 1). First, we performed the Kaplan-Meier analysis of each of the 1,173 genes. It was found that the mRNA expression of 276 genes and 283 genes was significantly associated with DFS in 319 patients (p < 0.05) and OS in 370 patients (p < 0.05), respectively. Additionally, the mRNA expression of 166 of these genes was significantly associated with both DFS and OS (p < 0.05).

Figure 1

To further investigate the value of the genes in the prognosis of liver cancer, we chose 135 genes and 149 genes with p-values < 0.01 for DFS and OS, respectively. Next, we used the Cox proportional hazards regression model to employ multivariate analyses on the above genes, respectively to determine the DFS and OS prediction potential of these genes.

The DFS-related multivariate analysis results showed that the expression of 39 genes (ALDOB, APOB, AURKB, C5, CCNF, CD4, CENPJ, CETP, COL18A1, CPT2, DAND5, DNASE1, EBPL, F7, FLT3, G6PD, GNMT, ITGB2, KLRK1, KNG1, LMOD1, NEK2, PCLAF, PER1, PKM, POU2F1, PPAT, PPIA, PRF1, PTPN6, RUNX3, SELP, SLCO1B1, SPPL2A, STAT5A, TCF21, TRPV1, TUSC1, and TYMS) was significantly associated with DFS in HCC patients (p < 0.05, Table 1). The highly significant results of both the DFS-related single-gene survival analyses for each of these 39 genes and multivariate analysis confirmed that the above 39 genes have a strong association with the DFS of liver cancer, especially the 5-year disease free survival rate of liver cancer.

Table 1

ItemsGenesBSEWaldSig.Exp (B)95.0% CI
LowerUpper
DFS associatedALDOB−0.5800.1869.7500.0020.5600.3890.806
APOB−0.4360.2174.0230.0450.6470.4230.990
AURKB0.5270.2116.2080.0131.6941.1192.564
C5*−0.4200.1706.0930.0140.6570.4710.917
CCNF0.6940.3344.3100.0382.0021.0403.857
CD4*−0.7740.3166.0070.0140.4610.2480.856
CENPJ1.0530.24318.7940.0002.8671.7814.615
CETP*0.8290.4233.8510.0502.2911.0015.245
COL18A1*0.4170.2074.0640.0441.5181.0122.278
CPT20.5580.2475.1140.0241.7471.0772.834
DAND5*−0.4270.1835.4660.0190.6520.4560.933
DNASE1*0.3820.1367.9270.0051.4651.1231.910
EBPL*−0.7660.2807.4630.0060.4650.2680.805
F7*−0.4960.1758.0340.0050.6090.4320.858
FLT3*−0.7000.2408.5120.0040.4970.3100.795
G6PD0.4770.1886.4380.0111.6111.1152.328
GNMT0.4270.1607.1180.0081.5331.1202.097
ITGB2*1.1120.30113.6620.0003.0421.6865.486
KLRK10.9320.3845.8830.0152.5391.1965.390
KNG1*0.6450.2775.4120.0201.9061.1073.282
LMOD1*−0.8730.4104.5240.0330.4180.1870.934
NEK2−0.5460.2634.2990.0380.5790.3460.971
PCLAF0.5260.2434.7000.0301.6931.0522.724
PER1−0.6700.2219.1690.0020.5120.3320.790
PKM−0.6450.2825.2100.0220.5250.3020.913
POU2F10.4550.14210.2360.0011.5771.1932.084
PPAT*0.9660.21021.1210.0002.6281.7413.969
PPIA*0.6260.18311.6610.0011.8701.3062.679
PRF1*−1.6760.37020.5050.0000.1870.0910.386
PTPN6−0.6100.2277.2030.0070.5430.3480.848
RUNX30.9670.3756.6590.0102.6291.2625.479
SELP*0.7900.2708.5870.0032.2031.2993.736
SLCO1B1−0.5240.2136.0290.0140.5920.3900.900
SPPL2A*−0.6690.2179.5280.0020.5120.3350.783
STAT5A−1.7040.48912.1490.0000.1820.0700.474
TCF21−0.9790.4015.9610.0150.3760.1710.824
TRPV1*−0.5200.1897.6040.0060.5950.4110.860
TUSC10.4230.1885.0440.0251.5261.0552.207
TYMS0.5230.2454.5580.0331.6871.0442.727
OS associatedABCC11.0970.3698.8410.0032.9941.4536.168
ANXA7*−0.5540.2017.6180.0060.5750.3880.852
APOB−0.7910.3116.4610.0110.4530.2460.834
ATG70.6130.3123.8760.0491.8471.0033.400
BAK1−0.4900.2314.4970.0340.6130.3900.964
CA90.7610.3634.3990.0362.1401.0514.356
CCNA20.5020.2036.0940.0141.6521.1092.461
CHD1L0.4910.1817.3770.0071.6341.1472.330
CYP3A40.9990.3647.5390.0062.7171.3315.544
E2F10.3600.1724.3710.0371.4331.0232.008
EZH20.9850.3996.1030.0132.6781.2265.852
F2*0.7110.3135.1740.0232.0361.1033.757
G6PC−0.6770.3413.9370.0470.5080.2600.992
GMPS0.7330.2916.3450.0122.0811.1773.681
GOT2*−1.5090.4849.7230.0020.2210.0860.571
HDAC20.8130.3166.6280.0102.2551.2144.187
HPX*0.9300.3845.8820.0152.5351.1955.378
KPNA20.8350.2848.6640.0032.3051.3224.018
LAPTM4B−0.4920.1688.6160.0030.6110.4400.849
MAGEB3*0.3930.1794.8240.0281.4821.0432.105
MAPT*0.6600.2437.3490.0071.9341.2013.117
MPV17*1.1410.4885.4680.0193.1291.2038.141
NTF3*1.0890.3579.3180.0022.9731.4775.983
PPAT*0.7520.2866.8970.0092.1221.2103.719
SLC2A1*−0.9210.4404.3830.0360.3980.1680.943
SLC38A1*−0.7680.2897.0630.0080.4640.2630.817
SPP10.6040.2645.2190.0221.8301.0903.073
TRPV1*0.4530.2015.0440.0251.5721.0592.334

Multivariate analyses of prognosis of DFS of 319 HCC patients and OS of 370 HCC patients in a TCGA cohort.

*

The gene has not been systematically reported to be associated with HCC prognosis.

Cox proportional hazard model was used to analyze the impact of 135 genes on DFS and the impact of 149 genes on OS, respectively, P < 0.05 were considered to be significant.

39 genes and 28 genes were significantly associated with liver cancer DFS and OS, respectively.

The OS-related multivariate analysis results showed that the expression of 28 genes (ABCC1, ANXA7, APOB, ATG7, BAK1, CA9, CCNA2, CHD1L, CYP3A4, E2F1, EZH2, F2, G6PC, GMPS, GOT2, HDAC2, HPX, KPNA2, LAPTM4B, MAGEB3, MAPT, MPV17, NTF3, PPAT, SLC2A1, SLC38A1, SPP1, and TRPV1) was significantly associated with OS in HCC patients. (p < 0.05, Table 1). The strongly significant results of both the OS-related single-gene survival analyses and multivariate analysis confirmed that these 28 genes are significantly associated with the OS of liver cancer, especially the 5-year survival rate of liver cancer.

Additionally, among the above-mentioned genes selected after single-gene survival analyses and multivariate analyses, 3 genes (APOB, PPAT, and TRPV1) were significantly associated with both DFS and OS in HCC patients.

Heat maps of the expression of the above 39 DFS-related genes and 28 OS-related genes in 1173 TCGA liver cancer samples, respectively, which grouped by prognosis status, were shown in Supplementary Figure S1.

Three-Gene-Combination Prognostic Model

To reflect the association of the expression of the combined genes with the prognosis of HCC, three-gene-combinations of the above 39 and 28 single genes that are significantly associated with DFS and OS, respectively, were formed, resulting in 9,139 and 3,276 three-gene-combinations for DFS and OS, respectively. In each combination, simultaneous high expression of the three genes in the same case was defined as the co-high expression group. Similarly, simultaneous low expression of the three genes in the same case was considered to be the co-low expression group. In order to ensure the comparability between the high and the low expression group, we deleted combinations which had < 25 cases in the co-high or co-low expression group.

Three-Gene-Combination of Prediction for DFS in Liver Cancer

K-M survival analysis of each of the above 9,139 combinations constituted by 39 DFS-related single genes was first performed. Then, we selected a total of 2,758 combinations with p-values < 0.01, excluding the combinations with no more than 25 cases in the co-high expression or co-low expression groups. Apparently, these selected 2,758 combinations have significant prognostic implications for DFS in liver cancer.

In addition, 18 of the above 39 single genes have not yet been systematically reported to be associated with HCC prognosis, and these 18 genes can combine into 816 three-gene-combinations. The results of the K-M survival analyses showed that 317 combinations had significant association with DFS of liver cancer (p < 0.01).

The top 15 combinations of the above 2,758 and 317 combinations with the smallest p-values were chosen. The DFS-related survival analyses diagrams and tables of these combinations and the single genes they contain are as follows (Figures 2, 3; Tables 2, 3).

Figure 2

Figure 3

Table 2

DFS (Median) of combinations of 39 genes with HCC prognosisDFS (Median) of combinations of 18 genes have unknown association with HCC prognosis
EstimateStd. Error95% confidence intervalPMedian survival time difference (H-L)EstimateStd. Error95% confidence intervalPMedian survival time difference (H-L)
Lower boundaryUpper boundaryLower boundaryUpper boundary
DNASE1-PPIA-TUSC1H9.4901.5976.36012.6200.000−62.420CD4-F7-TRPV1H71.91020.61931.498112.3220.00065.010
L71.91024.36524.154119.666L6.9001.6573.65210.148
Overall21.6204.84812.11931.121Overall15.7405.3095.33426.146
CD4-F7-TRPV1H71.91020.61931.498112.3220.00065.010CD4-F7-LMOD1H70.0700.00063.830
L6.9001.6573.65210.148L6.2401.4083.4809.000
Overall15.7405.3095.33426.146Overall17.6403.83310.12725.153
CD4-F7-GNMTH71.91022.30328.196115.6240.00063.370CD4-COL18A1-F7H67.58021.11026.205108.9550.00059.660
L8.5401.2416.10810.972L7.9201.6584.67011.170
Overall21.1604.03913.24429.076Overall19.1903.61612.10426.276
CD4-F7-LMOD1H70.0700.00063.830CD4-FLT3-SPPL2AH70.07018.00534.779105.3610.00062.220
L6.2401.4083.4809.000L7.8501.4864.93710.763
Overall17.6403.83310.12725.153Overall19.6507.2755.39133.909
CD4-COL18A1-F7H67.58021.11026.205108.9550.00059.660C5-CD4-F7H67.58015.37437.44797.7130.00059.660
L7.9201.6584.67011.170L7.9201.4145.14910.691
Overall19.1903.61612.10426.276Overall21.1605.7049.98132.339
APOB-CD4-SLCO1B1H66.62013.23940.67292.5680.00057.130CD4-F7-SELPH71.9103.18465.66978.1510.00063.200
L9.4900.9187.69111.289L8.7100.7837.17610.244
Overall19.6504.9769.89729.403Overall21.5508.4964.89838.202
CD4-CPT2-F7H71.91021.20630.347113.4730.00064.060CD4-F7-PRF1H70.07015.89938.908101.2320.00061.500
L7.8502.0243.88311.817L8.5701.0556.50210.638
Overall15.7002.77610.25921.141Overall21.1603.45514.38927.931
CD4-F7-PER1H70.0702.85564.47575.6650.00061.500CD4-SELP-SPPL2AH70.0703.84962.52577.6150.00061.500
L8.5701.2256.17010.970L8.5700.8196.96410.176
Overall25.3008.2279.17541.425Overall18.5905.8377.14930.031
APOB-CD4-SPPL2AH70.07023.92823.171116.9690.00060.940CD4-F7-FLT3H70.0703.04864.09776.0430.00061.360
L9.1300.8557.45510.805L8.7101.3116.14011.280
Overall19.1904.68910.00028.380Overall35.58012.14211.78159.379
CD4-FLT3-SPPL2AH70.07018.00534.779105.3610.00062.220CD4-F7-SPPL2AH0.000
L7.8501.4864.93710.763L7.9201.8644.26611.574
Overall19.6507.2755.39133.909Overall24.77019.2760.00062.551
DNASE1-PPAT-TUSC1H7.4201.1155.2359.6050.000−39.620DAND5-PPAT-PPIAH8.5400.7976.97810.1020.000−33.480
L47.04017.35013.03581.045L42.02015.01412.59271.448
Overall21.1605.10111.16231.158Overall19.2502.76313.83424.666
APOB-CD4-F7H67.58013.50041.12094.0400.00058.840CD4-CETP-KNG1H50.03014.49821.61478.4460.00041.550
L8.7400.8847.00710.473L8.4800.7696.9729.988
Overall24.7709.0577.01842.522Overall18.3301.89414.61722.043
CD4-SLCO1B1-SPPL2AH71.9100.00062.420C5-F7-ITGB2H67.58014.02840.08495.0760.00059.010
L9.4901.1717.19411.786L8.5701.3165.99111.149
Overall19.6506.5196.87332.427Overall35.5809.18517.57753.583
C5-CD4-F7H67.58015.37437.44797.7130.00059.660CD4-F7-KNG1H0.000
L7.9201.4145.14910.691L8.7401.2066.37611.104
Overall21.1605.7049.98132.339Overall21.5508.2935.29537.805
CD4-GNMT-LMOD1H50.03016.34817.98782.0730.00041.550CD4-CETP-SELPH66.62014.88337.45095.7900.00056.370
L8.4801.4305.67711.283L10.2501.3157.67212.828
Overall18.3302.73412.97123.689Overall18.3301.46915.45221.208

The associations of three-gene combinations with disease-free survival (DFS) of HCC patients in a TCGA cohort, analyzed by Kaplan-Meier method.

Table 3

DFS (Median) of single genes of the combinations with HCC prognosisOS (Median) of single genes of the combinations with HCC prognosis
EstimateStd. Error95% confidence intervalPMedian survival time difference (H-L)EstimateStd. Error95% Confidence IntervalPMedian survival time difference (H-L)
Lower boundaryUpper boundaryLower boundaryUpper boundary
APOBH29.3006.37616.80241.7980.00814.450ANXA7H83.18015.49652.807113.5530.00636.430
L14.8502.04910.83418.866L46.7507.28032.48161.019
Overall20.9302.31816.38725.473Overall55.6507.92540.11671.184
C5H29.9606.76216.70643.2140.00116.330ATG7H45.0708.03129.33060.8100.009−35.610
L13.6302.8708.00619.254L80.68010.53360.036101.324
Overall20.9302.31816.38725.473Overall55.6507.92540.11671.184
CD4H36.7007.69321.62251.7780.00023.070CA9H37.2908.31720.98953.5910.000−32.720
L13.6302.0899.53617.724L70.01010.21049.99990.021
Overall20.9302.31816.38725.473Overall55.6507.92540.11671.184
CETPH35.5805.89624.02347.1370.00221.450CCNA2H45.07010.29824.88565.2550.001−24.940
L14.1301.79910.60517.655L70.01011.73047.01993.001
Overall20.9302.31816.38725.473Overall55.6507.92540.11671.184
COL18A1H27.2004.88517.62536.7750.00511.600CHD1LH39.7506.94026.14853.3520.006−40.930
L15.6003.1149.49721.703L80.6806.58767.77093.590
Overall20.9302.31816.38725.473Overall55.6507.92540.11671.184
CPT2H29.3004.76719.95638.6440.00514.350EZH2H37.29010.18117.33557.2450.000−43.390
L14.9501.83611.35218.548L80.68010.81659.480101.880
Overall20.9302.31816.38725.473Overall55.6507.92540.11671.184
DAND5H13.6302.5618.61018.6500.001−16.330F2H69.51011.84246.30092.7200.00523.620
L29.9605.45519.26940.651L45.8907.02032.13259.648
Overall20.9302.31816.38725.473Overall55.6507.92540.11671.184
DNASE1H13.1401.9979.22617.0540.001−16.160GMPSH45.0709.66726.12364.0170.003−24.440
L29.3004.25620.95837.642L69.51010.30849.30689.714
Overall20.9302.31816.38725.473Overall55.6507.92540.11671.184
F7H33.9008.19117.84649.9540.00018.490GOT2H70.01012.02546.44193.5790.00032.260
L15.4101.48512.50018.320L37.7509.38319.36056.140
Overall20.9302.31816.38725.473Overall55.6507.92540.11671.184
FLT3H35.5803.64028.44642.7140.00022.440HPXH69.51010.51848.89490.1260.00223.620
L13.1401.8339.54716.733L45.89010.11226.07065.710
Overall20.9302.31816.38725.473Overall55.6507.92540.11671.184
GNMTH29.3009.16711.33447.2660.00213.370HDAC2H45.0708.36528.67561.4650.002−35.610
L15.9301.82112.36019.500L80.68012.79655.599105.761
Overall20.9302.31816.38725.473Overall55.6507.92540.11671.184
ITGB2H35.5804.23227.28543.8750.00219.840KPNA2H33.0208.16517.01749.0230.000−47.660
L15.7402.67110.50420.976L80.6806.90867.13994.221
Overall20.9302.31816.38725.473Overall55.6507.92540.11671.184
KNG1H25.3006.47812.60337.9970.0079.600LAPTM4BH45.07010.51124.46865.6720.000−35.610
L15.7002.45810.88220.518L80.68012.59855.988105.372
Overall20.9302.31816.38725.473Overall55.6507.92540.11671.184
LMOD1H29.6605.12019.62539.6950.00413.960MAPTH41.7506.88828.24955.2510.006−28.260
L15.7002.65510.49720.903L70.0109.84450.71689.304
Overall20.9302.31816.38725.473Overall55.6507.92540.11671.184
PER1H25.4906.52912.69438.2860.00310.080MPV17H37.2906.64424.26850.3120.000−43.390
L15.4103.4858.57922.241L80.6806.50467.93393.427
Overall20.9302.31816.38725.473Overall55.6507.92540.11671.184
PPATH14.1302.6568.92419.3360.000−19.770NTF3H70.01012.70445.11094.9100.00229.640
L33.9005.40123.31444.486L40.3708.14324.40956.331
Overall20.9302.31816.38725.473Overall55.6507.92540.11671.184
PPIAH15.6001.47512.70918.4910.000−13.280PPATH58.84014.92829.58088.1000.009−10.670
L28.8807.57514.03343.727L69.51011.35447.25691.764
Overall20.9302.31816.38725.473Overall55.6507.92540.11671.184
PRF1H29.9604.35821.41838.5020.00017.350SLC2A1H45.8906.18733.76358.0170.000−37.290
L12.6102.0048.68116.539L83.18017.11349.638116.722
Overall20.9302.31816.38725.473Overall55.6507.92540.11671.184
SELPH29.9606.29417.62442.2960.00114.260SLC38A1H45.0703.91937.38952.7510.001−35.610
L15.7002.46510.86820.532L80.6807.14166.68494.676
Overall20.9302.31816.38725.473Overall55.6507.92540.11671.184
SLCO1B1H35.84010.36815.51856.1620.00020.890SPP1H40.3705.28830.00550.7350.000−29.640
L14.9501.35912.28617.614L70.01013.01644.49895.522
Overall20.9302.31816.38725.473Overall55.6507.92540.11671.184
SPPL2AH27.2005.00017.39937.0010.00511.790TRPV1H80.6807.67265.64295.7180.00235.610
L15.4102.33110.84219.978L45.0706.03033.25056.890
Overall20.9302.31816.38725.473Overall55.6507.92540.11671.184
TRPV1H29.6606.12717.65241.6680.00513.530
L16.1301.96212.28419.976
Overall20.9302.31816.38725.473
TUSC1H15.7402.00311.81419.6660.001−18.160
L33.9008.19317.84149.959
Overall20.9302.31816.38725.473

The associations of single genes contained in the multi-gene combinations with disease-free survival (DFS) and overall survival (OS) of HCC patients in a TCGA cohort, analyzed by Kaplan-Meier method.

Three-Gene-Combination of Prediction for OS in Liver Cancer

Similarly, three-gene-combinations of the 28 single genes significantly associated with OS confirmed by the single gene survival analyses and the multivariate analysis were formed, resulting in 3,276 three-gene-combinations. 930 of these 3,276 combinations were screened out on the conditions that the number of cases in both the co-high and co-low expression groups was > 25, and the p-values were < 0.01 according to the OS-related K-M analyses results.

Furthermore, 12 of the above 28 single genes that were noted to have an unknown association with liver cancer prognosis formed 220 three-gene-combinations. Out of the 220 combinations, there were 31 combinations in which the number of cases in both the co-high and co-low expression groups was > 25 and the OS-related survival analyses results showed p < 0.01.

We found 930 of above 3,276 combinations and 31 of above 220 unreported-gene combinations were significant association with OS related survival of liver cancer patients. Among the 930 combinations and 31 combinations mentioned above, the diagrams and tables of the OS-related survival analyses of the top 15 combinations with the smallest p-values and the single genes they contain are as follows (Figures 4, 5; Tables 3, 4) Among the 12 genes that have an unknown association with HCC prognosis, F2, GOT2, TRPV1, and their combination F2-GOT2-TRPV1 were all significantly associated with OS in 370 liver cancer samples from the TCGA data (F2: p = 0.005; GOT2: p < 0.001; TRPV1: p = 0.002; F2-GOT2-TRPV1: p < 0.001). The overall survival rate in HCC patients with low expression of F2, GOT2, TRPV1, and the three-gene-combination F2-GOT2-TRPV1 were all significantly lower than that in liver cancer patients with high expression. In addition, the median survival time difference between the high expression group and the low expression group of F2, GOT2, TRPV1, and the three-gene combination F2-GOT2-TRPV1 was 23.62, 32.26, 35.61, and 55.68 months, respectively. The median survival time difference of this combination was greater than that of a single gene, which was one of the main reasons why we selected these three genes for qRT-PCR and immunohistochemically validation.

Figure 4

Figure 5

Table 4

OS (Median) of combinations of 28 genes with HCC prognosisOS (Median) of combinations of 12 genes have unknown association with HCC prognosis
EstimateStd. Error95% confidence intervalPMedian survival time difference (H-L)EstimateStd. Error95% confidence intervalPMedian survival time difference (H-L)
Lower boundaryUpper boundaryLower boundaryUpper boundary
EZH2-KPNA2-MPV17H21.3206.1439.28033.3600.000−59.360GOT2-NTF3-TRPV1H0.000
L80.6807.06166.84194.519L25.2303.76417.85232.608
Overall55.35013.44329.00181.699Overall60.84015.62230.22091.460
EZH2-LAPTM4B-MPV17H18.2305.7356.98829.4720.000−62.450MPV17-PPAT-SLC2A1H18.3304.9168.69527.9650.000−64.850
L80.6807.99065.02096.340L83.18015.79452.224114.136
Overall48.95010.01429.32368.577Overall53.35015.42223.12383.577
CA9-KPNA2-SPP1H23.7805.36813.25934.3010.000−59.400MPV17-SLC2A1-SLC38A1H25.1309.2726.95743.3030.000−58.050
L83.18016.29251.248115.112L83.1807.32268.82997.531
Overall51.25011.66828.38174.119Overall46.7506.57133.87059.630
CA9-KPNA2-LAPTM4BH19.7403.69912.49026.9900.000−63.440GOT2-HPX-NTF3H70.01010.63149.17490.8460.00050.430
L83.18020.66942.669123.691L19.5806.2437.34331.817
Overall46.7506.14134.71558.785Overall55.3506.78342.05568.645
KPNA2-SLC38A1-SPP1H19.0906.8765.61432.5660.000−83.570MAPT-SLC2A1-SLC38A1H25.1309.1347.22743.0330.000−58.050
L102.66021.95859.622145.698L83.18012.08559.493106.867
Overall69.51010.95148.04790.973Overall45.8907.00232.16759.613
HDAC2-KPNA2-SPP1H23.7805.61312.77834.7820.000−78.880MAPT-MPV17-SLC38A1H25.1304.04717.19733.0630.000−58.050
L102.66014.18974.850130.470L83.1809.72064.128102.232
Overall83.18021.67240.704125.656Overall45.0706.28432.75457.386
CHD1L-EZH2-SPP1H15.4104.0127.54723.2730.000−67.770GOT2-HPX-TRPV1H83.18011.77060.111106.2490.00050.160
L83.1809.86663.842102.518L33.0206.97119.35646.684
Overall46.75012.30522.63370.867Overall70.01014.67341.25198.769
EZH2-KPNA2-LAPTM4BH21.6805.44511.00832.3520.000−59.000ANXA7-F2-NTF3H83.51015.70252.734114.2860.00058.640
L80.6807.01166.93994.421L24.87010.5614.17045.570
Overall46.75010.38926.38867.112Overall53.35014.23025.45981.241
KPNA2-MPV17-SLC38A1H17.5805.8206.17228.9880.000−63.100ANXA7-GOT2-NTF3H83.18023.27137.569128.7910.00062.580
L80.6807.99265.01596.345L20.6005.4179.98331.217
Overall53.35011.88830.04976.651Overall48.9507.67033.91663.984
ATG7-KPNA2-PPATH21.1206.0879.19033.0500.000−59.560ANXA7-GOT2-HPXH83.18013.67756.373109.9870.00058.310
L80.68010.95359.212102.148L24.8708.5058.20041.540
Overall45.53011.83922.32568.735Overall53.29013.89026.06680.514
GMPS-LAPTM4B-SLC2A1H17.9706.6804.87631.0640.000−65.210MAPT-PPAT-SLC2A1H20.1106.4337.50132.7190.000−63.070
L83.18016.01851.784114.576L83.18014.58054.602111.758
Overall53.35013.67026.55780.143Overall70.01018.75133.258106.762
CHD1L-LAPTM4B-MPV17H24.8704.88215.30234.4380.000−55.810F2-GOT2-TRPV1H83.18011.97659.707106.6530.00055.680
L80.6807.91265.17296.188L27.5006.80514.16240.838
Overall55.65010.70934.66076.640Overall81.67020.41941.649121.691
ATG7-GMPS-PPATH13.9604.4515.23622.6840.000−66.720F2-GOT2-HPXH83.1806.65070.14696.2140.00045.890
L80.68017.66546.057115.303L37.2907.22523.12951.451
Overall37.6808.51021.00154.359Overall69.51012.17045.65793.363
CCNA2-LAPTM4B-MPV17H18.3303.55911.35425.3060.000−51.680MPV17-PPAT-SLC38A1H20.6005.9308.97732.2230.000−60.080
L70.0106.19057.87882.142L80.6809.36562.32499.036
Overall48.9507.27234.69763.203Overall51.25013.88824.03078.470
KPNA2-LAPTM4B-MPV17H21.3205.08211.35931.2810.000−59.360ANXA7-HPX-NTF3H83.18026.57331.096135.2640.00058.310
L80.6807.90065.19696.164L24.8707.24410.67239.068
Overall51.25014.89822.05080.450Overall48.9505.91937.35060.550

The associations of three-gene combinations with overall survival (OS) of HCC patients in a TCGA cohort, analyzed by Kaplan-Meier method.

Low Expression of F2, GOT2, and TRPV1 Predicts Poor Prognosis

Based on the above results of the OS-related survival analyses and multivariate analyses on 28 genes, as well as the results of survival analyses on their three-gene-combinations, we selected three genes F2, GOT2, and TRPV1 with strong liver cancer prognostic potential for subsequent validation.

F2, GOT2, and TRPV1 Were Downregulated in HCC Tissues

The gene expression in HCC was determined based on three independent microarrays which are all collected in Oncomine database (https://www.oncomine.org/resource/login.html). As shown in Roessler Liver 2 Statistics (225 HCC tissues vs. 220 liver tissues), the expression of F2, GOT2, and TRPV1 in HCC tissues were all significantly down-regulated compared with that in normal liver tissues. (p < 0.001; Figure 6) In addition, based on the Mas Liver Statistics (38 HCC tissue vs. 19 liver tissue), both F2 and TRPV1 were significantly down-regulated in HCC tissues. Based on the Chen Liver Statistics (104 HCC tissues vs. 76 liver tissues), both F2 and GOT2 were significantly down-regulated in HCC tissues.

Figure 6

The qRT-PCR results of F2, GOT2 and TRPV1 showed that 20/20, 19/20, and 16/19 of the HCC tissues exhibited significantly lower expression of F2 (p < 0.001; Figure 7A), GOT2 (p < 0.001; Figure 7B), and TRPV1 (p = 0.006; Figure 7C), respectively, when compared with their corresponding non-tumorous tissues.

Figure 7

The protein expression of F2, GOT2, and TRPV1 in HCC tissues was evaluated using IHC. Positive staining of F2, GOT2, and TRPV1 was mainly localized in the cytoplasm of HCC cells. The representative staining of F2, GOT2, and TRPV1 negative and positive protein expression in HCC are shown in Figure 8A.

Figure 8

Among 90 HCC tissues and adjacent non-malignant liver tissues, IHC was employed to measure the protein expression of F2, GOT2, and TRPV1, respectively. Low F2 expression was observed in 62/89 (69.66%) of the HCC tissues, compared to 33/89 (37.08%) in adjacent normal liver tissues (p < 0.001); low GOT2 expression was noted in 72/89 (80.90%) of the HCC tissues, compared to 32/89 (35.96%) in adjacent normal liver tissues (p < 0.001); low TRPV1 expression was also observed in 59/89 (66.29%) of the HCC tissues, compared to 38/89 (42.70%) in adjacent normal liver tissues (p = 0.002).

Expression of F2, GOT2, and TRPV1 and Their Combination F2-GOT2-TRPV1 With OS

Based on the above results of single-genes and three-gene combinations survival analyses of TCGA HCC samples, the low expression of F2, GOT2, TRPV1 and their combination F2-GOT2-TRPV1 was significantly associated with poor OS in HCC. (F2: p = 0.005; GOT2: p < 0.001; TRPV1: p = 0.002; F2-GOT2-TRPV1: p < 0.001). In addition, the median survival time difference between the high expression group and the low expression group of F2-GOT2-TRPV1 was greater than that of any of the three single genes.

The results of IHC for 90 liver cancer cases showed that the low protein expression of F2, GOT2, and TRPV1 was significantly associated with lower 5-year survival in HCC patients (F2: p = 0.033, GOT2: p = 0.035, TRPV1: p = 0.046; K-M survival analyses). However, due to the insufficient number of events in the co-high expression group of the combination F2-GOT2-TRPV1, there was marginally significant difference found in the overall survival rate of HCC patients between the co-high expression group and the co-low expression group of the protein combination F2-GOT2-TRPV1 (p = 0.051) (Figure 8B).

Discussion

Liver cancer is characterized by inconspicuous early symptoms, a high degree of malignancy, recurrence and spread, and unsatisfactory prognosis. With limited treatment options, it is one of the common malignancies that plague the world. Therefore, identification of effective prognostic biomarkers for liver cancer is the key to improving the efficacy of targeted therapy for HCC and reducing the adverse prognostic effects of liver cancer.

In our study, by combining and searching 15 corresponding concepts of the key words “liver cancer,” “prognosis,” and “outcome,” and according to p-values < 0.05, 1,173 genes that may be related to the prognosis of liver cancer were mined from the Coremine platform after merging and removing duplicates. However, due to the insufficient sample size and data related to the prognosis of liver cancer in the Coremine platform as well as the large heterogeneity among the samples, we also selected gene expression data and prognosis data of 319 samples for DFS and 370 samples for OS from the TCGA platform. We then separately conducted DFS-related and OS-related K-M survival analysis for each gene, followed by multivariate analyses, respectively. The large-scale genes mining and a large number of homogenous samples gave us a reliable analytical foundation. By far, this is the first large-scale survival analyses for hundreds of genes for subsequent screening.

In addition, the genes selected by K-M survival analyses with a low p-value (p < 0.01) were further screened by multivariate analyses using the Cox proportional hazards regression model. We found that 39 genes and 28 genes were reliably and significantly associated with DFS and OS, respectively, in liver cancer. Many of the above genes have been confirmed to be associated with the prognosis of HCC by previous reports. For example, of the 39 DFS-related genes, ALDOB inhibits metastasis in HCC and can be a valuable novel prognosis predicting marker (30); APOB was found to be a prognostic biomarker for patients with radical resection of HCC (31, 32); CCNF is downregulated in HCC and is a promising prognostic marker (33). In addition, CPT2 (34), G6PD (35), GNMT (36), NEK2 (37), etc. have also been reported to be prognostic markers of HCC by affecting the occurrence or invasion of HCC. The above findings are consistent with what we identified. Other genes, such as C5, CD4, CETP, COL18A1, DAND5, DNASE1, EBPL, F7, FLT3, ITGB2, KNG1, LMOD1, PPAT, PPIA, PRF1, SELP, SPPL2A, and TRPV1 that have not been systematically reported in relation to the prognosis of liver cancer, are our newly discovered prognostic markers for DFS in liver cancer. Similarly, of the 28 OS-related genes, CA9 regulates the epithelial-mesenchymal transition and is a novel prognostic marker in HCC (38), E2F1 expression has an impact on tumor aggressiveness and affects the prognosis of HCC (14, 15), CYP3A4 (39), HDAC2 (40), and KPNA2 (41) have also been identified as prognostic markers of HCC and are reflected in our findings. The other genes, such as ANXA7, F2, GOT2, HPX, MAGEB3, MAPT, MPV17, NTF3, PPAT, SLC2A1, SLC38A1, and TRPV1 are all novel prognostic markers associated with liver cancer OS found by our reliable and large-scale screening studies. Three genes (APOB, PPAT, and TRPV1) were associated with both DFS and OS of HCC, suggesting that APOB, PPAT, and TRPV1 may be significant and effective in predicting both the progress and the adverse outcomes of HCC.

Moreover, there may be connections among the above selected genes and they can work together to influence the development and prognosis of liver cancer to some extent. Although there are some genes that had been reported as prognostic molecular markers of liver cancer, most reports focused on the impact of a single gene on the prognosis of liver cancer, few studies performed such a large-scale survival analysis. Studies of multiple gene combinations are more effective than the analysis of single genes in predicting the prognosis of liver cancer.

In our study, we performed three-gene combinations of the 39 DFS-related genes and 28 OS-related genes screened from the above survival analyses. In order to further study the predictive effect of the combinations constituted by the selected genes on the prognosis of liver cancer, and to compare the predictive power of single genes and corresponding gene combinations, we carried out thousands of K-M survival analyses on these combinations. To ensure the comparability and credibility, we removed the combinations of which the co-high or co-low expression group cases were fewer than 26, and screened 2,758 DFS-related combinations and 930 OS-related combinations with p-values < 0.01. Moreover, we also performed three-gene-combination models and K-M survival analyses on the 18 DFS-related genes and 12 OS-related genes we found but have not been systematically reported to be related to the prognosis of HCC. 317 unreported-gene combinations and 31 unreported-gene combinations significantly associated with DFS and OS, respectively, were screened out.

For the above four types of three-gene-combinations (the overall genes combinations associated with DFS, the unreported genes combinations associated with DFS, the overall genes combinations associated with OS, and the unreported genes combinations associated with OS), the top 15 combinations with the lowest p-values of the survival analyses and the genes they contained were, respectively, selected for comparison (Tables 2, 3, 4).

For example, for the overall gene combinations associated with OS, KPNA2-SLC38A1-SPP1, the median survival time difference between the co-high and the co-low expression group was 83.57 months. In contrast, that of the single genes KPNA2, SLC38A1, and SPP1, was 47.66, 35.61, and 29.64 months, respectively. After combining KPNA2, SLC38A1, and SPP1, the median survival time difference between the high and low expression groups was larger than that of any of the three single genes by at least 36 months. This shows that these three genes KPNA2, SLC38A1, and SPP1, after combination, may be better predictive values for liver cancer prognosis and may be more clinically useful for future treatment target selection.

We also selected genes that have not been previously reported for liver cancer prognosis and compared their prognostic efficacy with the corresponding three-gene combinations (the chart only shows the top 15 groups with the lowest p-values of the three-gene combinations prognostic models). The expression of one of the combinations F2-GOT2-TRPV1 had a greater effect on the median survival time of OS than any of the three individual genes (The median survival time difference: F2-GOT2-TRPV1: 55.68 months; F2: 23.62 months; GOT2: 32.26 months; TRPV1: 35.61 months).

Coagulation factor II (F2) plays a major role in proteolysis to form thrombin in the first step of the coagulation cascade and eventually generates hemostasis. An enrichment analysis of genetic changes during the development of HCC identified several hub genes, including F2, which interacts in several groups of conditional specific PPI networks (42). Additionally, it was reported that F2 is associated with invasion in neuroendocrine prostate cancer (43). Glutamic-oxaloacetic transaminase 2 (GOT2) plays an important role in amino acid metabolism and the tricarboxylic acid cycle, and it affects the malate-aspartic acid shuttle activity and glycolysis in the liver under the stimulation of liver inflammation. (44, 45) TRPV1 is a regulator of cell homeostasis, previous studies have revealed that the expression of TRPV1 is significantly decreased in renal cell carcinoma, colorectal cancer, and melanoma. In addition, TRPV1 can affect P53 and TRPV1-dependent pathways to inhibit the growth of colorectal cancer and melanoma (4648), and can cause apoptosis in human osteosarcoma MG63 cells (49).

At present, there are few studies on the above three genes F2, GOT2, TRPV1 and particular their combinations in the prognosis of HCC. In our study, the results of the 20 pairs of HCC and paracancerous tissues for qRT-PCR, as well as 90 pairs HCC biochips for IHC confirmed that all of the F2, GOT2, and TRPV1 genes are significantly and consistently down-expressed in HCC tissues, and this is reconfirmed by three independent microarrays. Moreover, the low expression of F2, GOT2, and TRPV1 were all significantly associated with poor prognosis of HCC. However, due to the number of death events in the F2-GOT2-TRPV1 high expression group of in the HCC biochips being 0, the survival analysis of the F2-GOT2-TRPV1 high and the expression group was marginally significant (p = 0.051), but this is still consistent with our above-mentioned big data-based multi-gene combination survival analysis results.

As there may be certain relationships between the genes we screened that are significantly associated with the prognosis of liver cancer, they can work together in the form of multi-gene combinations in the development of liver cancer. However, the predictive potency of different gene combinations varies. Some combinations are better predictors than individual genes, and therefore these combinations may be more valuable than individual genes in determining the target site for liver cancer prognosis. Due to limitations in human and material resources, it still remains unclear how these genes and gene combinations specifically affect the HCC survival. Further investigation and experimentations are needed to elucidate the biological mechanisms of the selected genes, particularly for the significant multi-gene combinations, in the development and progression of HCC.

Our findings cover a large gene level, and we have also explored the predictive efficacy of a number of gene combinations for the prognosis of liver cancer. We believe that these highly significant prognostic-related genes and gene combinations derived from the above multiple screenings are promising, reliable molecular markers for the prognosis of liver cancer, and our screening methods can be extended to other tumor types.

In conclusion, based on a large sample size of public data platform, novel and effective data mining and multiple screening methods, large-scale survival analyses, as well as supplemental reliable experimental verification, we identified a series of novel genes and multi-gene combinations that are significantly associated with DFS or OS in liver cancer. Moreover, a huge difference between high and low expression group of these three-gene combination was detected. Some of the three-gene combinations can predict much longer or shorter survival time for liver cancer patients than the single genes. QRT-PCR, immunohistochemistry, and three independent microarray results confirmed our findings that three of the selected novel genes F2, GOT2, and TRPV1, as well as the corresponding combination F2-GOT2-TRPV1, showed significantly lower expression in HCC and are associated with OS in HCC. Some gene combinations may be more predictors of prognosis than single genes and can be used as potential effective therapeutic targets for liver cancer.

Statements

Data availability statement

The datasets generated for this study are available on request to the corresponding author.

Ethics statement

The studies involving human participants were reviewed and approved by The Ethics Committee of Guangxi Medical University. The patients/participants provided their written informed consent to participate in this study.

Author contributions

ML and XLi performed most analysis. ML led the writing of the manuscript. SL provided the clinical samples and participated in revising the manuscript. FX and JT participated in drafting and reviewing the manuscript. EG conducted a search for genes and preliminary screening work by keyword. XQ obtained and matched the TCGA samples data. ML, LW, and QL performed the single-gene and multi-gene-combination survival analyses. ZL and LL conducted an inquiry about the relevant information of the selected genes. XLu performed validation of the selected genes in three microarrays. KL and DZ performed the mRNA isolation and qRT-qPCR, and collected and analyzed experimental data. YY and XLi were subjected to immunohistochemistry and experimental data processing. FY and XZ participated in designing and reviewing the study. All the authors reviewed the manuscript and all the authors read and approved the final manuscript.

Funding

This study was supported by the National Natural Science Foundation of China (Grant No. 81760611) and the Key Laboratory of High-Incidence-Tumor Prevention and Treatment (Guangxi Medical University), Ministry of Education (No. GKE 2019–01).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fonc.2020.00847/full#supplementary-material

References

  • 1.

    IARC. Fact sheets by Population-Globocan-IARC. (2019). Retrieved from: http://gco.iarc.fr/today/fact-sheets-cancers (accessed September 14, 2019).

  • 2.

    BrayFFerlayJSoerjomataramISiegelRLTorreLAJemalA. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. (2018) 68:394424. 10.3322/caac.21492

  • 3.

    QiuWQShiJFGuoLWMaoAYHuangHYHuGYet al. Medical expenditure for liver cancer in urban China: A 10-year multicenter retrospective survey (2002-2011). J Cancer Res Ther. (2018) 14:16370. 10.4103/jcrt.JCRT_709_16

  • 4.

    DengWLongLLiJLZhengDYuJHZhangCYet al. Mortality of major cancers in Guangxi, China: sex, age and geographical differences from 1971 and 2005. Asian Pac J Cancer Prev. (2014) 15:156774. 10.7314/APJCP.2014.15.4.1567

  • 5.

    ZengHChenWZhengRZhangSJiJSZouXet al. Changing cancer survival in China during 2003-15: a pooled analysis of 17 population-based cancer registries. Lancet Glob Health. (2018) 6:e55567. 10.1016/S2214-109X(18)30127-X

  • 6.

    WangCYLiS. Clinical characteristics and prognosis of 2887 patients with hepatocellular carcinoma: A single center 14 years experience from China. Medicine. (2019) 98:e14070. 10.1097/MD.0000000000014070

  • 7.

    ZhuJYinTXuYLuXJ. Therapeutics for advanced hepatocellular carcinoma: Recent advances, current dilemma, and future directions. J Cell Physiol. (2019) 234:1212232. 10.1002/jcp.28048

  • 8.

    FinnRSZhuAXFarahWAlmasriJZaiemFProkopLJet al. Therapies for advanced stage hepatocellular carcinoma with macrovascular invasion or metastatic disease: a systematic review and meta-analysis. Hepatology. (2018) 67:42235. 10.1002/hep.29486

  • 9.

    CaiWWangZWeiCWuMZhengWZhangHet al. Prognostic evaluation of NANOG and OCT4 expression for posttransplantation hepatocellular carcinoma recurrence. J Cell Biochem. (2019) 120:841929. 10.1002/jcb.28128

  • 10.

    FornerAReigMBruixJ. Hepatocellular carcinoma. Lancet. (2018) 391:130114. 10.1016/S0140-6736(18)30010-2

  • 11.

    FanSTMau LoCPoonRTYeungCLeung LiuCYuenWKet al. Continuous improvement of survival outcomes of resection of hepatocellular carcinoma: a 20-year experience. Ann Surg. (2011) 253:74558. 10.1097/SLA.0b013e3182111195

  • 12.

    AndreozziMQuintavalleCBenzDQuagliataLMatterMCalabreseDet al. HMGA1 expression in human hepatocellular carcinoma correlates with poor prognosis and promotes tumor growth and migration in in vitro models. Neoplasia. (2016) 18:72431. 10.1016/j.neo.2016.10.002

  • 13.

    ChenDHWuQWLiXDWangSJZhangZM. SYPL1 overexpression predicts poor prognosis of hepatocellular carcinoma and associates with epithelial-mesenchymal transition. Oncol Rep. (2017) 38:153342. 10.3892/or.2017.5843

  • 14.

    HuangYLNingGChenLBLianYFGuYRWangJLet al. Promising diagnostic and prognostic value of E2Fs in human hepatocellular carcinoma. Cancer Manag Res. (2019) 11:172540. 10.2147/CMAR.S182001

  • 15.

    HanRChenXLiYZhangSLiRLuL. MicroRNA-34a suppresses aggressiveness of hepatocellular carcinoma by modulating E2F1, E2F3, and Caspase-3. Cancer Manag Res. (2019) 11:296376. 10.2147/CMAR.S202664

  • 16.

    LiangRLinYYeJZYanXXLiuZHLiYQet al. High expression of RBM8A predicts poor patient prognosis and promotes tumor progression in hepatocellular carcinoma. Oncol Rep. (2017) 37:216776. 10.3892/or.2017.5457

  • 17.

    ChangLXiLLiuYLiuRWuZJianZ. SIRT5 promotes cell proliferation and invasion in hepatocellular carcinoma by targeting E2F1. Mol Med Rep. (2018) 17:3429. 10.3892/mmr.2017.7875

  • 18.

    ZhouLZhuY. The EpCAM overexpression is associated with clinicopathological significance and prognosis in hepatocellular carcinoma patients: A systematic review and meta-analysis. Int J Surg. (2018) 56:27480. 10.1016/j.ijsu.2018.06.025

  • 19.

    LongHGuoXQiaoSHuangQ. Tumor LXR expression is a prognostic marker for patients with hepatocellular carcinoma. Pathol Oncol Res. (2018) 24:33944. 10.1007/s12253-017-0249-8

  • 20.

    ZhouHWangSCMaJMYuLQJingJS. Sperm-Associated Antigen 5 expression is increased in hepatocellular carcinoma and indicates poor prognosis. Med Sci Monit. (2018) 24:60218. 10.12659/MSM.911434

  • 21.

    ChenDChenYYanYPanJXingWLiQet al. Down-regulation of the tumour suppressor kappa-opioid receptor predicts poor prognosis in hepatocellular carcinoma patients. BMC Cancer. (2017) 17:553. 10.1186/s12885-017-3541-9

  • 22.

    LabibOHHarbOAKhalilOHBaiomyTAGertallahLMAhmedRZ. The Diagnostic Value of Arginase-1, FTCD, and MOC-31 expression in early detection of hepatocellular carcinoma (HCC) and in differentiation between HCC and metastatic adenocarcinoma to the liver. J Gastrointest Cancer. (2020) 51:88101. 10.1007/s12029-019-00211-2

  • 23.

    ZhangXWanJXKeZPWangFChaiHXLiuJQ. TMEM88, CCL14 and CLEC3B as prognostic biomarkers for prognosis and palindromia of human hepatocellular carcinoma. Tumour Biol. (2017) 39:1010428317708900. 10.1177/1010428317708900

  • 24.

    LiBFengWLuoOXuTCaoYWuHet al. Development and validation of a three-gene prognostic signature for patients with hepatocellular carcinoma. Sci Rep. (2017) 7:5517. 10.1038/s41598-017-04811-5

  • 25.

    LongJZhangLWanXLinJBaiYXuWet al. A four-gene-based prognostic model predicts overall survival in patients with hepatocellular carcinoma. J Cell Mol Med. (2018) 22:592838. 10.1111/jcmm.13863

  • 26.

    ZengXYinFLiuXXuJXuYHuangJet al. Upregulation of E2F transcription factor 3 is associated with poor prognosis in hepatocellular carcinoma. Oncol Rep. (2014) 31:113946. 10.3892/or.2014.2968

  • 27.

    CeramiEGaoJDogrusozUGrossBESumerSOAksoyBAet al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. (2012) 2:4014. 10.1158/2159-8290.CD-12-0095

  • 28.

    GaoJAksoyBADogrusozUDresdnerGGrossBSumerSOet al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal. (2013) 6:pl1. 10.1126/scisignal.2004088

  • 29.

    HedditchELGaoBRussellAJLuYEmmanuelCBeesleyJet al. ABCA transporter gene expression and poor outcome in epithelial ovarian cancer. J Natl Cancer Inst. (2014) 106:dju149. 10.1093/jnci/dju149

  • 30.

    TaoQFYuanSXYangFYangSYangYYuanJHet al. Aldolase B inhibits metastasis through Ten-Eleven Translocation 1 and serves as a prognostic biomarker in hepatocellular carcinoma. Mol Cancer. (2015) 14:170. 10.1186/s12943-015-0437-7

  • 31.

    YanXYaoMWenXZhuYZhaoEQianXet al. Elevated apolipoprotein B predicts poor postsurgery prognosis in patients with hepatocellular carcinoma. Onco Targets Ther. (2019) 12:195764. 10.2147/OTT.S192631

  • 32.

    LeeGJeongYSKimDWKwakMJKohJJooEWet al. Clinical significance of APOB inactivation in hepatocellular carcinoma. Exp Mol Med. (2018) 50:147. 10.1038/s12276-018-0174-2

  • 33.

    FuJQiuHCaiMPanYCaoYLiuLet al. Low cyclin F expression in hepatocellular carcinoma associates with poor differentiation and unfavorable prognosis. Cancer Sci. (2013) 104:50815. 10.1111/cas.12100

  • 34.

    FujiwaraNNakagawaHEnookuKKudoYHayataYNakatsukaTet al. CPT2 downregulation adapts HCC to lipid-rich environment and promotes carcinogenesis via acylcarnitine accumulation in obesity. Gut. (2018) 67:1493504. 10.1136/gutjnl-2017-315193

  • 35.

    LuMLuLDongQYuGChenJQinLet al. Elevated G6PD expression contributes to migration and invasion of hepatocellular carcinoma cells by inducing epithelial-mesenchymal transition. Acta Biochim Biophys Sin. (2018) 50:37080. 10.1093/abbs/gmy009

  • 36.

    HuangYCChenMShyrYMSuCHChenCKLiAFet al. Glycine N-methyltransferase is a favorable prognostic marker for human cholangiocarcinoma. J Gastroenterol Hepatol. (2008) 23:13849. 10.1111/j.1440-1746.2008.05488.x

  • 37.

    ChangYYYenCJChanSHChouYWLeeYPBaoCYet al. NEK2 Promotes hepatoma metastasis and serves as biomarker for high recurrence risk after hepatic resection. Ann Hepatol. (2018) 17:84356. 10.5604/01.3001.0012.3146

  • 38.

    HyugaSWadaHEguchiHOtsuruTIwgamiYYamadaDet al. Expression of carbonic anhydrase IX is associated with poor prognosis through regulation of the epithelialmesenchymal transition in hepatocellular carcinoma. Int J Oncol. (2017) 51:117990. 10.3892/ijo.2017.4098

  • 39.

    AshidaROkamuraYOhshimaKKakudaYUesakaKSugiuraTet al. CYP3A4 Gene is a novel biomarker for predicting a poor prognosis in hepatocellular carcinoma. Cancer Genomics Proteomics. (2017) 14:44553. 10.21873/cgp.20054

  • 40.

    QuintKAgaimyADi FazioPMontalbanoRSteindorfCJungRet al. Clinical significance of histone deacetylases 1, 2, 3, and 7: HDAC2 is an independent predictor of survival in HCC. Virchows Arch. (2011) 459:12939. 10.1007/s00428-011-1103-0

  • 41.

    JiangPTangYHeLTangHLiangMMaiCet al. Aberrant expression of nuclear KPNA2 is correlated with early recurrence and poor prognosis in patients with small hepatocellular carcinoma after hepatectomy. Med Oncol. (2014) 31:131. 10.1007/s12032-014-0131-4

  • 42.

    XueHLuoLYaoYTWeiLLDengSPHuangXL. Integrated analysis of the RNA-Seq data of liver hepatocellular carcinoma. Neoplasma. (2018) 65:97103. 10.4149/neo_2018_170212N98

  • 43.

    ChoeHSbonerABeltranHNanusDTagawaST. PO-43 - Differential coagulation factor expression in neuroendocrine prostate cancer (PC), metastatic castrate-resistant PC, and localized prostatic adenocarcinoma. Thromb Res. (2016) 140(Suppl. 1):S192. 10.1016/S0049-3848(16)30176-1

  • 44.

    WangTYaoWLiJHeQShaoYHuangF. Acetyl-CoA from inflammation-induced fatty acids oxidation promotes hepatic malate-aspartate shuttle activity and glycolysis. Am J Physiol Endocrinol Metab. (2018) 315:E496510. 10.1152/ajpendo.00061.2018

  • 45.

    YangHZhouLShiQZhaoYLinHZhangMet al. SIRT3-dependent GOT2 acetylation status affects the malate-aspartate NADH shuttle activity and pancreatic tumor growth. Embo J. (2015) 34:111025. 10.15252/embj.201591041

  • 46.

    WuYYLiuXYZhuoDXHuangHBZhangFBLiaoSF. Decreased expression of TRPV1 in renal cell carcinoma: association with tumor Fuhrman grades and histopathological subtypes. Cancer Manag Res. (2018) 10:164755. 10.2147/CMAR.S166390

  • 47.

    HouNHeXYangYFuJZhangWGuoZet al. TRPV1 Induced apoptosis of colorectal cancer cells by activating calcineurin-NFAT2-p53 signaling pathway. Biomed Res Int. (2019) 2019:6712536. 10.1155/2019/6712536

  • 48.

    YangYGuoWMaJXuPZhangWGuoSet al. Downregulated TRPV1 expression contributes to melanoma growth via the calcineurin-ATF3-p53 pathway. J Invest Dermatol. (2018) 138:220515. 10.1016/j.jid.2018.03.1510

  • 49.

    BaoZDaiXWangPTaoYChaiD. Capsaicin induces cytotoxicity in human osteosarcoma MG63 cells through TRPV1-dependent and -independent pathways. Cell Cycle. (2019) 18:137992. 10.1080/15384101.2019.1618119

Summary

Keywords

liver cancer, gene combinations, data mining, disease-free survival (DFS), overall survival (OS)

Citation

Liu M, Liu X, Liu S, Xiao F, Guo E, Qin X, Wu L, Liang Q, Liang Z, Li K, Zhang D, Yang Y, Luo X, Lei L, Tan JHJ, Yin F and Zeng X (2020) Big Data-Based Identification of Multi-Gene Prognostic Signatures in Liver Cancer. Front. Oncol. 10:847. doi: 10.3389/fonc.2020.00847

Received

02 January 2020

Accepted

29 April 2020

Published

28 May 2020

Volume

10 - 2020

Edited by

Chiara Romualdi, University of Padova, Italy

Reviewed by

Yanqiang Li, Houston Methodist Research Institute, United States; Fang Wang, University of Texas MD Anderson Cancer Center, United States

Updates

Copyright

*Correspondence: Fuqiang Yin Xiaoyun Zeng

This article was submitted to Cancer Genetics, a section of the journal Frontiers in Oncology

†These authors have contributed equally to this work

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics