Risk Scores Based on Six Survival-Related RNAs in a Competing Endogenous Network Composed of Differentially Expressed RNAs Between Clear Cell Renal Cell Carcinoma Patients Carrying Wild-Type or Mutant Von Hippel–Lindau Serve Well to Predict Malignancy and Prognosis

Clear cell renal cell carcinoma (ccRCC) carrying wild-type Von Hippel–Lindau (VHL) tumor suppressor are more invasive and of high morbidity. Concurrently, competing endogenous RNA (ceRNA) network has been suggested to play an important role in ccRCC malignancy. In order to understand why the patients carrying wild-type VHL gene have high degrees of invasion and morbidity, we applied bioinformatics approaches to identify 861 differentially expressed RNAs (DE-RNAs) between patients carrying wild-type and patients carrying mutant VHL from The Cancer Genome Atlas (TCGA) database, established a ceRNA network including 122 RNAs, and elected six survival-related DE-RNAs including Linc00942, Linc00858, RP13_392I16.1, hsa-miR-182-5p, hsa-miR-183-5p, and PAX3. Examining clinical samples from our hospital revealed that patients carrying wild-type VHL had significantly higher levels of all six RNAs than those carrying mutant VHL. Patients carrying wild-type VHL had significantly higher risk scores, which were calculated based on expression levels of all six RNAs, than those carrying mutant VHL. Patients with higher risk scores had significantly shorter survival times than those with lower risk scores. Therefore, the risk scores serve well to predict malignancy and prognosis.


INTRODUCTION
Kidney cancer is one of the top 10 most common tumors, the second-ranked malignant tumor in urologic system, and contributes 2% to 3% of the human malignant tumors worldwide (1,2). There are 400,000 new cases of kidney malignant tumors of which 175,000 lead to death per year (3). Among patients with renal cancer at early stage and even some of renal cancer patients who do not have any specific symptom regardless of the stage of the disease, only about 30% of patients diagnosed with renal cancer was found to be at initial stage of malignancy, but the other 70% were found to be already at late stage of malignancy (4,5). About 90% of the cases with kidney cancers are those with renal cell carcinomas, of which 70% are with clear cell renal cell carcinomas (ccRCCs), which are either hereditary or sporadic (6). Among many genetic factors that were found to be related to ccRCC, the inactivation of Von Hippel-Lindau (VHL) tumor suppressor gene occurs in 80% of patients and is the most common event (7)(8)(9), leading to the interruption of both vascular endothelial growth factor (VEGF) and mammalian target of rapamycin (mTOR) signaling pathways, which serve as the targets of most of the drugs currently used in clinical practice (10,11). Although the inactivation of VHL gene alone is not sufficient to cause tumors (12,13), a study of ccRCC patients from Germany suggested that low expression of VHL was identified as a risk factor for worse overall survival (OS) (14). However, it was reported that there is no difference in survival between patients carrying wild-type or mutant VHL or between patients with different levels of either VHL protein or messenger RNA (mRNA), and patients with nonsense mutations in exon 1 have a worse prognosis based on a study of ccRCC patients from Brazil (15), suggesting that loss of VHL function is directly related to the prognosis of ccRCC patients. A recent study indicated that about 40%-60% of patients with sporadic ccRCC carry wild-type VHL, and such tumors are more invasive and lead to a dramatic reduction of survival rates as compared with those carrying mutant VHL (16,17). Therefore, it is more urgent to identify other factors that are closely associated with patients' survival in addition to VHL mutation.
Long non-coding RNA (lncRNA) is a type of transcript with more than 200 nucleotides that does not encode a protein (18). It was initially considered to be "noise" generated by genome transcription without biological functions. More and more evidences show that lncRNA is involved not only in various physiological processes, such as cell proliferation, differentiation, and apoptosis, but also in pathological processes of various diseases (18). The long intergenic non-coding RNA (lincRNA) is the most common form of lncRNA. LncRNA is reported to be related to a variety of tumor diseases such as liver cancer (19), gastric cancer (20), oral squamous cell carcinoma (21), bladder cancer (22), prostate cancer (23), and kidney cancer (24). MicroRNAs (miRNAs) are small single-stranded non-coding molecules of about 22 nucleotides long and target the 3′UTR or CDS of mRNAs to form RNA-Induced Silencing Complexes (RISCs) to induce the degradation of the correspondent mRNA (25). Circular RNA (circRNA) is homologous to its target gene sequence and may act as a molecular sponge. Since lncRNA and mRNA share sequence homology, it is also possible to combine miRNA with lncRNA or circRNA to change expression levels of their downstream target genes. Therefore, miRNAs, lncRNAs, mRNAs, and other non-coding RNAs form a large-scale network of competing endogenous RNAs (ceRNAs), which are currently considered to be involved in the regulation of a variety of tumor diseases and play an important role in different stages of tumorigenesis (26).
Here, we constructed a ceRNA network after the sets of differentially expressed lncRNAs (DE-lncRNAs), miRNAs (DE-miRNAs), and mRNAs (DE-mRNAs) between patients carrying wild-type and those carrying mutant VHL, which were common in multiple database, were identified. Further identification of six survival-related RNAs including Linc00942, Linc00858, RP13_392I16.1, hsa-miR-182-5p, hsa-miR-183-5p, and PAX3, in the ceRNA network led to the building of a model in which risk scores were successfully applied to predict malignancy and prognosis of ccRCC.

Selection of Patient Data in The Cancer Genome Atlas Database
The selection of patient data in The Cancer Genome Atlas (TCGA) database was performed following the workflow as shown ( Figure 1A). The expression data (read count) of ccRCC patients in TCGA were downloaded and integrated using R's GDCRNATools package. The sample size for mRNA and lncRNA was 530, and miRNA was 516. The information about mutation of those samples was from cBioPortal (http:// www.cbioportal.org/). The information about OS of those samples was from xenabrowser (https://xenabrowser.net/). Since RNA sequencing data were obtained directly from TCGA, approval by any ethics committee was not required.

Identification of Differentially Expressed
Long Non-Coding RNAs, Differentially Expressed MicroRNAs, and Differentially Expressed Messenger RNAs Between Clear Cell Renal Cell Carcinoma Patients Carrying Wild-Type or Mutant Von Hippel-Lindau php), and TargetScan (http://www.targetscan.org/vert_72/) databases. In order to improve the reliability of the results, we selected the intersection of DE-miRNA-DE-mRNA interaction pairs found in the three databases as candidate genes to construct the ceRNA network. Finally, we used Cytoscape software to visualize the ceRNA network.

Enrichment Analyses of the Competing Endogenous RNAs
The Gene Ontology (GO) molecular function enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment of DE-mRNA in the ceRNA network were analyzed with R's ClusterProfiler package. p < 0.05 was considered statistically significant.

Survival Analysis of Differential RNAs in the Competing Endogenous RNA Regulation Network
The expression level of each DE-lncRNA, DE-miRNA, and DE-mRNA in the ceRNA network in each patient was classified into either high-or low-expression group with their median as a threshold. The survival curve for each group was plotted through the Kaplan-Meier (K-M) survival analysis. p < 0.05 was considered statistically significant.

Univariate and Multivariate Cox Regression Analyses
Univariate and multivariate Cox regression analyses were used to screen the survival-related differentially expressed RNAs (DE-

Establishment of a Prognostic Risk Score System and Clinical Correlation Analysis for Differentially Expressed RNAs in the Competing Endogenous RNA Regulation Network
A risk score system was established using the DE-RNAs selected by multivariate Cox regression analysis. To predict the patient's prognosis, risk scores were calculated with the formula "Risk score = ∑(bi * Xi)" in which "i" was the number of characteristic genes, "b" the correlation coefficient of mRNA in the multivariate Cox regression analysis, and X the levels of gene expression after log2 conversion. Pearson's chi-squared test and Fisher's exact test were applied to analyze the correlation between patient's risk score and clinicopathological characteristics in TCGA data set. p < 0.05 was considered to be statistically significant. The t-test was used to compare the difference of risk scores between clinical features that were significantly related to OS. p < 0.05 was considered to be statistically significant.
Gene Set Enrichment Analysis (GSEA) was applied to analyze the signaling pathways associated with patients carrying wild-type VHL and high risk score. The gene expression data from patients carrying high risk score and low risk score in the c2.cp.kegg.v7.0.symbols.gmt database were subjected to estimate the normalized enrichment scores (NESs), p-values, and q-values after false discovery rates (FDRs) were adjusted for each signal pathway with the GSEA software.

A Competing Endogenous RNA Network Is Established
We searched the miRcode database, identified 210 miRNAs that were targeted by DE-lncRNAs, and found interactions between 10 DE-miRNAs and 98 DE-lncRNAs. We searched the miRDB, miRTarBase, and TargetScan databases to identify interactions between miRNA-mRNA interaction and found 10 DE-miRNA and 14 DE-mRNA interactions ( Figure 2A). Eventually, a ceRNA regulation network including 290 edges and 122 nodes (98 lncRNAs, 10 miRNAs, and 14 mRNAs) was established between patients carrying wild-type or mutant VHL as shown in Figure 2B.

The Signaling Pathways Affected by Genes in Competing Endogenous RNA Network Are Predicted
We performed GO enrichment and pathway enrichment analysis on 14 differential genes in the ceRNA network and displayed the top 10 candidate signaling pathways based on the rich factors (the number of differential genes in the GO term divided by the total number of genes in the GO term). The biological processes were mainly related to some kinase activity, growth factor receptor binding, receptor regulatory activity, and so on; cell components were mainly involved in the formation of cell adhesion, synapses and axon membrane structures, and mast cell granules; and molecular functions were mostly related to the kidney functions such as renal vesicle development, epithelial cell differentiation during kidney development, kidney capsule morphogenesis, and renal vesicle morphogenesis ( Figure 3A).
KEGG enrichment search identified not only some kidneyrelated pathways such as collecting duct acid secretion and vasopressin-regulated water absorption but also some cancerrelated pathways such as PI3K-AKT signaling pathway and MAPK signal pathway as shown in the bubble chart ( Figure 3B).

Survival-Related RNAs in Competing Endogenous RNA Network Are Identified Through Univariate and Multivariate Cox Regression Analyses
We estimated the significance of 98 lncRNAs, 10 miRNAs, and 14 mRNAs in the ceRNA network with OS of patients and identified 18 lncRNAs, four miRNAs, and three mRNAs, which significantly impacted OS either positively or negatively ( Table 1). Univariate regression analysis found that these RNAs were significantly related to the aggressiveness of disease. Multivariate Cox regression analysis found that LINC00942, LINC00858, RP13_392I16.1, hsa-miR-183-5p, hsa-miR-182-5p, and PAX3 were considered as independent factors that impacted the prognosis of patients (Table 1). Therefore, these six RNAs were selected to build the risk score system.

The Survival-Related RNAs Detected in Databases Are Verified With External Clinical Samples
A total of 21 ccRCC tissue samples were collected in our hospital and subjected to gene sequencing and immunohistochemistry analysis. Using three sets of primers covering the three exons of VHL gene (Figures 4A, B), we identified 18 samples carrying wild-type VHL and three samples carrying mutant VHL as shown in Figure 4C. The identification of high percentage of patients carrying wild-type VHL gene was a surprise to us and triggered us to investigate if the wild-type gene has been suppressed due to other epigenetic modifications. Patients carrying wild-type VHL  gene expressed higher levels of VHL protein than those carrying mutant VHL ( Figures 4D-G), suggesting that the expression of wild-type gene was not suppressed in general by other epigenetic events such as DNA methylation. The qRT-PCR analyses using primer sets listed ( Figure 5A) demonstrated that levels of LINC00942, LINC00858, RP13_392I16.1, hsa-miR-183-5p, hsa-miR-182-5p, and PAX3 were significantly higher in patients carrying wild-type VHL than in patients carrying mutant VHL ( Figures 5B-G). Therefore, the results obtained from databases were verified with external clinical samples.

Risk Scores Are Significantly Different Between Patients with Different Clinical Characteristics
Based on a ratio of 3 to 1, we divided 217 ccRCC patients carrying wild-type VHL into a training set (163 cases) and a test    (Figures 6D-F). Patients carrying wild-type VHL had significantly higher risk scores than those carrying mutant VHL ( Figure 6G). There was no significant difference in risk scores between different age groups, but patients at higher and advanced ccRCC classification stages exhibited significantly higher risk scores ( Table 2).

Analysis of Signaling Pathway in Clear Cell Renal Cell Carcinoma Patients Carrying Wild-Type Von Hippel-Lindau and High Risk Scores
We conducted GSEA and identified 24 signaling pathways such as the "OLFACTORY_TRANSDUCTION" and "INTESTINAL _IMMUNE_NETWORK_FOR_IGA_PRODUCTION," which were enriched in the group of patients with high risk scores ( Table 3).

DISCUSSION
A ceRNA network has been proposed to play important roles in ccRCC tumorigenesis and aggressiveness. Previous studies have revealed that the LINC01094 enhances the expression of miR-184 and inhibits the expression of SLC2A3, which suppresses the development of ccRCC (27). LINC01426 interacts with insulinlike growth factor 2 mRNA binding protein 1 (IGF2BP1) to enhance expression of CTBP1 in cytoplasm and promote the binding of CTBP1 to the promoter of miR-423-5p in the nucleus to recruit HDAC2 to synergistically inhibit the expression levels of miR-423-5p, leading to an elevation of expression levels of FOXM1 and a promotion of proliferation and migration of ccRCC cells (28). The ceRNA network constructed in this study yielded 98 DE-lncRNAs, 10 DE-miRNAs, and 14 DE-mRNAs. Analyses of the 14 DE-mRNAs revealed their involvement in functions related to kidney. Further analysis of the relationship of DE-RNAs in the ceRNA network with patients' survival rates led to the identification of LINC00942, LINC00858, RP13_392I16.1, hsa-miR-182-5p, hsa-miR-183-5p, and PAX3, which were used to calculate risk scores for each individual patient as independent factors to successfully predict the malignancy and prognosis of patients. Three lncRNAs was identified to be survival-related ceRNAs. Currently, there is no report related to any role of RP13_392I16.1 yet, so this lncRNA needs to be further studied. LINC00942 was reported to promote METT14-mediated M6A methylation and regulate the expression and stability of its target gene CXCR4 and CYP1B1 during the initiation and progression of breast cancer (29). In another report, LINC00942 was identified as one of seven lncRNAs in an lncRNA signature related to immune cell infiltration and immune checkpoint blockade of immunotherapyrelated molecules and may serve as a prognostic biomarker of hepatocellular carcinoma (30). Similarly, LINC00942 was identified as one of four immune-related lncRNAs to predict prognosis and immunotherapy efficiency in bladder cancer patients (31). It also reported that LINC00942 was identified as one of the two components in a ceRNA network associated with gene number copy variation, which can be used to predict tumor response to drug treatment in patients with lung adenocarcinomas (32). Initially, LINC00858 was identified as a ceRNA impacting miR-422a to control the expression of kallikrein-related peptidase 4, and its high expression was found to be closely correlated to tumor progression of non-small cell lung carcinomas (33). It was reported to be one of the four lncRNAs delivered from database search that were further validated with clinical samples (34) and able to promote colorectal cancer by sponging miR-4766-5p (35) or miR-22-3P (36). Literatures published recently revealed that LINC00858 was found  (41), and osteosarcomas (42). Both hsa-miR-182-5p and hsa-miR-183-5p belong to the miR-183/96/182 family because of their sequence homology and function similarity (43). They were found to be among the 10 miRNA signatures that significantly distinguish cancer tissues from adjacent normal tissues from patients with ovarian cancer (44), the five miRNAs associated with tumorigenesis in lung adenocarcinomas (45), the top three miRNAs that target the largest number of genes in atypical endometrial hyperplasia (46), and the five miRNAs that show optimal diagnostic biomarkers for hepatocellular carcinomas (47). The levels of hsa-miR-182-5p are elevated in tumor tissues from patients with ovarian cancer and may predict poor prognosis (48). hsa-miR-182-5p was reported to be associated with resistance of breast cancer cell lines to anti-tumor drug veliparib (49). The significance of hsa-miR-183-5p with additional 10 genes in survival of ccRCC patients has been reported, and high expression of hsa-miR-183-5p predicts a reduced OS rate and poor prognosis (50). It was reported that hsa-miR-182-5p is reduced in renal cancer tissues and cell lines and regulates the expression of DLL4 gene, which causes change of tumor microenvironment and tumor inhibition (51).
PAX3 gene (paired box gene 3) encodes a member of PAX family of transcription factors whose target genes impact proliferation, survival, differentiation, and motility (52). Rhabdomyosarcoma is one of the typical tumors in children and adolescence with a poor prognosis and an OS rate of only 20%-40% (53). PAX3-FOXO1 or PAX3-FKHR, a specific fusion gene that resulted from chromosomal translocation, is an important factor in the occurrence and development of rhabdomyosarcoma (54,55). Such fusion gene was also reported to be associated with other types of cancers such as melanoma (56) and biphenotypic sinonasal sarcoma, a low-grade spindle cell sarcoma that affects middle-aged adults (57). PAX3 is highly expressed in prostate cancer tissues and cell lines and promotes the progression of prostate cancer by inhibiting the TGF-b/SMAD signal axis (58). Although there is no report about PAX3 in ccRCC, PAX3 gene is differentially hyper-methylated in chromophobe renal cell carcinomas compared with renal oncocytomas, a benign kidney neoplasm (59).
As discussed above, the relation between VHL mutation and patient survival is different in different reports. We identified LINC00942, LINC00858, RP13_392I16.1, hsa-miR-182-5p, hsa-miR-183-5p, and PAX3 as survival-related DE-RNAs between patients carrying wild-type or mutant VHL and were further used to calculate risk scores for each individual patient. The identified RNAs may not directly impact the VHL function. Because each of these DE-RNAs has already been shown to play significant roles in the tumorigenesis and aggressiveness of other types of cancers as discussed above, it is predicted that the risk scores may serve well as factors independent to VHL gene status to predict the malignancy and prognosis of ccRCC patients in the future.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by The Ethics Committee of The Fifth Affiliated Hospital of Guangzhou Medical University. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
RZ, XL, ZC, LL, and GX have full access to all of the data in the study, take responsibility for the integrity of the data and the accuracy of the data analysis, and wrote the manuscript. SL, YY, YX, DL, HZ, WY, and JB contributed the reagents. All authors contributed to the article and approved the submitted version.