Construction of Competitive Endogenous RNA Network and Verification of 3-Key LncRNA Signature Associated With Distant Metastasis and Poor Prognosis in Patients With Clear Cell Renal Cell Carcinoma

Clear cell renal cell carcinoma (ccRCC) is a common malignancy with high distant metastasis rate. Long non-coding RNAs (LncRNAs) are reported to be upregulated or downregulated in multiple cancers and play a crucial role in the metastasis of tumors or prognosis. Therefore, the purpose of our study is to construct a prognostic signature for ccRCC based on distant metastasis-related lncRNAs and explore the involved potential competitive endogenous RNA (ceRNA) network. The differentially expressed genes (DEGs) screened from the database of the cancer genome atlas (TCGA) were used to construct a co-expression network and identify the distant metastasis-related module by weighted gene co-expression network analysis (WGCNA). Key genes with metastatic and prognostic significance were identified through rigorous screening, including survival analysis, correlation analysis, and expression analyses in stage, grade, and distant metastasis, and were verified in the data set of gene expression omnibus (GEO) and the database from gene expression profiling interactive analysis (GEPIA). The potential upstream miRNAs and lncRNAs were predicted via five online databases and LncBase. Here, we constructed a ceRNA network of key genes that are significantly associated with the distant metastasis and prognosis of patients with ccRCC. The distant metastasis-related lncRNAs were used to construct a risk score model through the univariate, least absolute shrinkage selection operator (LASSO), and multivariate Cox regression analyses, and the patients were divided into high- and low-risk groups according to the median of the risk score. The Kaplan–Meier survival analysis demonstrated that mortality was significantly higher in the high-risk group than in the low-risk group. Considering the other clinical phenotype, the Cox regression analyses indicated that the lncRNAs model could function as an independent prognostic factor. Quantitative real-time (qRT)-PCR in the tissues and cells of ccRCC verified the high-expression level of three lncRNAs. Gene set enrichment analysis (GSEA) revealed that the lncRNA prognostic signature was mainly enriched in autophagy- and immune-related pathways, indicating that the autophagy and immune functions may play an important role in the distant metastasis of ccRCC. In summary, the constructed distant metastasis-related lncRNA signature could independently predict prognosis in patients with ccRCC, and the related ceRNA network provided a new sight on the potential mechanism of distant metastasis and a promising therapeutic target for ccRCC.

Clear cell renal cell carcinoma (ccRCC) is a common malignancy with high distant metastasis rate. Long non-coding RNAs (LncRNAs) are reported to be upregulated or downregulated in multiple cancers and play a crucial role in the metastasis of tumors or prognosis. Therefore, the purpose of our study is to construct a prognostic signature for ccRCC based on distant metastasis-related lncRNAs and explore the involved potential competitive endogenous RNA (ceRNA) network. The differentially expressed genes (DEGs) screened from the database of the cancer genome atlas (TCGA) were used to construct a co-expression network and identify the distant metastasis-related module by weighted gene co-expression network analysis (WGCNA). Key genes with metastatic and prognostic significance were identified through rigorous screening, including survival analysis, correlation analysis, and expression analyses in stage, grade, and distant metastasis, and were verified in the data set of gene expression omnibus (GEO) and the database from gene expression profiling interactive analysis (GEPIA). The potential upstream miRNAs and lncRNAs were predicted via five online databases and LncBase. Here, we constructed a ceRNA network of key genes that are significantly associated with the distant metastasis and prognosis of patients with ccRCC. The distant metastasis-related lncRNAs were used to construct a risk score model through the univariate, least absolute shrinkage selection operator (LASSO), and multivariate Cox regression analyses, and the patients were divided into high-and low-risk groups according to the median of the risk score. The Kaplan-Meier survival analysis demonstrated that mortality was significantly higher in the high-risk group than in the low-risk group. Considering the other clinical phenotype, the Cox regression analyses indicated that the lncRNAs model could function as an independent prognostic factor. Quantitative real-time (qRT)-PCR in the tissues and cells of ccRCC verified the high-expression

INTRODUCTION
Renal cell carcinoma (RCC), the third most common malignancy of the urinary system, also belongs to the ninth most common cancer, worldwide (1). It is estimated that RCC accounts for 73,750 new cases and 14,830 death cases in the United States in 2020 (2). Clear cell renal cell carcinoma (ccRCC), as the most common pathological type of RCC, has more malignant features and a worse prognosis compared to other types of RCC, accounting for the majority of RCC-related deaths (3). Therapeutically, the ccRCC is insensitive to radiotherapy and chemotherapy. The nephrectomy (radical or partial) and ablation techniques may be the effective treatment for localized RCC; however, recurrence or metastasis still occurs in about 20-40% of patients after surgery (4). Patients with ccRCC and with distant metastasis almost lose the indication of surgery, and mainly rely on conservative treatment; their 5year survival rate is <10% despite the use of one or more combination therapies, including cytoreductive nephrectomy or metastasectomy, VEGFR inhibitors, mTOR inhibitors, and immunotherapy (5). Therefore, in order to improve the prognosis of patients with ccRCC and distant metastasis, it is very urgent to explore the mechanism of the development of ccRCC and identify the potential therapy target associated with tumorigenesis, distant metastasis, and prognosis.
Long non-coding RNAs (lncRNAs) are a class of noncoding RNA transcripts that have more than 200 nucleotides in length (6). Currently, lncRNAs are considered to play an important role in the biological behavior of cancers and have been extensively studied in the expression profiles of various cancers (7). MicroRNAs (miRNAs), a range of small endogenous noncoding RNAs of ∼22 nucleotides in length, have emerged as one of the most important players in the initiation and progression of cancers in recent decades (8). Salmena et al. (9) proposed the hypothesis of competing endogenous RNA (ceRNA), which suggests that miRNAs and lncRNAs as well as other noncoding RNAs are able to cross-talk by competitively binding to shared miRNAs through microRNA response elements (MREs), thereby exerting their biological effects. Nowadays, increasing studies have proven that the ceRNA regulation theory plays a critical role in the development of cancers (10). Furthermore, weighted gene co-expression network analysis (WGCNA), a newly invented hierarchical clustering method, focused on the whole genome information to summarize the signature of gene networks, which can avoid bias and subject judgment, has been used to describe the connectivity of gene clusters inside the comprehensive network and assess the correlations of gene modules with different clinical features (11).
Although previous studies have identified that certain lncRNAs could be used as biomarkers to predict the progression or prognosis in patients with ccRCC (12,13), the distant metastasis-related lncRNA signature and its potential regulatory network have not been fully explored. Considering this situation, we accurately constructed a ceRNA network that was closely associated with distant metastasis and prognosis of patients with ccRCC via a series of strict screening and verification. Furthermore, a distant metastasis-related lncRNA signature that can independently predict the poor prognosis of patients with ccRCC was extracted from the ceRNA network and validated in the tissues and cell lines of ccRCC. The prognostic model will help to improve the clinical assessment of patients with advanced ccRCC. In addition, we explored the potential pathways of the lncRNA signature and found that the prognostic signature might promote distant metastasis of ccRCC by regulating autophagy and immunity.

Data Collection and Pre-processing
A flow chart of the study design is shown in Figure 1. The data of 539 human ccRCC samples and 72 normal samples downloaded from the database of the cancer genome atlas (TCGA) (https://portal.gdc.cancer.gov), including the data of the RNA-sequencing (Seq) and corresponding clinical information ( Table 1), was used to screen the differentially expressed genes (DEGs). Meanwhile, the microarray data set GSE53757 of 72 paired ccRCC/normal tissue samples was downloaded from the database of the gene expression omnibus (GEO) (http://www. ncbi.nlm.nih.gov/geo/) in Affymetrix Human Genome U133 Plus 2.0 Array platform, providing validation for screened hub genes. All the raw expression data were normalized with R package "limma" (14) and the value of gene expression in the data set from TCGA was transformed as log 2 (x + 1) counts.

Identification of DEGs
R packages, such as "edgeR" (15) and "DESeq2" (16) were used to separately screen DEGs between 539 ccRCC tissues and 72 normal tissues in the TCGA data set. The threshold  was set to |logFC| >1 and the adjusted p < 0.05. A web tool for creating Venn diagrams (http://bioinformatics.psb.ugent.be/ webtools/Venn/) was used to find the intersection of DEGs that commonly upregulated or downregulated in the two R packages mentioned above. Volcano plots were generated based on the "Enhanced Volcano" package, and a heatmap was generated using the "heatmap" in the R package.

Identify Significant Relevant Module
The relationships between co-expression modules and clinical traits were calculated by Pearson's correlation coefficient and plotted using a heatmap. In the current study, distant metastasis was selected as the interested clinical phenotype for subsequent analysis. Furthermore, in the linear regression between gene expression and clinical phenotype, gene significance (GS) was calculated as the absolute value of the correlation between the gene expression profile and each external trait. Module membership (MM) represents the correlation between a single gene and the module in which it is located, indicating whether it is consistent with the trend of the module.

Functional Enrichment Analysis
To investigate the biological function of the pink module that is significantly associated with the distant metastasis of ccRCC, the R package, "clusterProfiler" (18) was used for gene ontology (GO) function analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. The "enrichplot" package was used for the visualization of KEGG function analysis, and the "GOplot" package was used to generate chord diagrams of GO function analysis. The cut-off criterion denoted that the p < 0.05.

Verify the Expression Patterns and Prognostic Value of Hub mRNAs
To validate the expression level of hub mRNAs in different stages or in the normal and ccRCC samples, the GSE53757 was chosen as the external validation data set. Subsequently, we used the database from gene expression profiling interactive analysis (GEPIA) (http://gepia.cancer-pku.cn/) to verify the prognostic value of hub genes in the overall survival (OS) and disease-free survival (DFS).
The starBase (21) database is an open-source platform for identifying the RNA-RNA interaction networks and providing platforms to perform the survival and differential expression analysis of lncRNAs, miRNAs, and mRNAs. Therefore, we also used the starBase to analyze the expression correlation between miRNAs and mRNAs, and R < −0.1 and p < 0.05 were set as the cut-off criteria. The starBase database was also introduced to conduct survival analysis and expression analysis between ccRCC tissues and adjacent tissues for predicted miRNAs. Additionally, the expression information of targeted miRNAs in different stages, distant metastasis tissues, and non-distant metastasis tissues were downloaded from the LinkedOmics database (http:// www.linkedomics.org).

Identification of Distant Metastasis-Related lncRNAs
The LncBase (www.microrna.gr/LncBase) was introduced to predict the upstream lncRNAs potentially targeting the key miRNAs. The expression correlation between lncRNAs and miRNAs was analyzed using the starBase database. The GEPIA database was utilized to evaluate the prognostic value of upstream lncRNAs in OS and DFS. Furthermore, the expression analysis of lncRNAs in different stages, different grades, ccRCC samples and adjacent samples, distant metastasis tissues, and in non-distant metastasis tissues was conducted in the database from TCGA.

Construction of the 3-lncRNA Prognostic Signature
The prognostic significance of identified three distant metastasisrelated lncRNAs was firstly validated by univariate Cox regression analysis. Then, least absolute shrinkage and selection operator (LASSO) regression analysis was performed to enhance the prediction accuracy of the prognostic model using the "glmnet" (22) R package. Subsequently, a risk score model was constructed by multivariate Cox regression analysis. Based on the expression level of each lncRNA and the regression coefficient obtained from multivariate Cox regression analysis, the risk score was calculated as follows: where, Coef is the regression coefficient of each lncRNA, and Exp represents the expression level of each lncRNA. Patients were divided into high-and low-risk groups according to the median of risk score, and Kaplan-Meier survival curve was used to assess the survival differences between high-and low-risk groups using the R package, "survminer." Furthermore, the predictive power of the lncRNA signature was assessed by calculating the area under the curve (AUC) of 3 years dependent receiver operating characteristic (ROC) curve using the "survivalROC" package. In addition, the clinical variables that were used to interact with the risk score in univariate and multivariate Cox regression analyses include age, gender, race, grade, stage, N stage, and M stage.

Gene Set Enrichment Analysis
To identify the significant gene sets enriched in high-and low-risk groups, GSEA (http://software.broadinstitute.org/gsea/ index.jsp) was conducted in the data set from TCGA. The annotated gene set list from c2.cp.kegg.v7.2.symbols.gmt was selected as the reference gene set. A false discovery rate (FDR) < 0.25 and a nominal p < 0.05 were considered as the cut-off criteria of enriched gene sets after performing 1,000 permutations. The results of GSEA were visualized using the EnrichmentMap (23) in Cytoscape 3.7.1.

Collection of Specimens From Clinical Tissue
A total of 30 matched tumor/normal tissue samples were collected from patients with ccRCC in the operating room of the First Affiliated Hospital of Anhui Medical University (Hefei, China). The inclusion criteria of tissue specimens included as follows: (1) pathological diagnosis for ccRCC; (2) except ccRCC with no other malignancy; (3) consistent with radical nephrectomy or partial nephrectomy indications; (4) no radiotherapy and chemotherapy before surgery. All patients signed informed consent forms before surgery, and pathological diagnosis of ccRCC was confirmed by two senior pathologists from our institution, after surgery. The pathological stages of 30 patients with ccRCC were as follows: 19 for Stage I, 8 for Stage II, 2 for Stage III, and 1 for Stage IV. Specimens were immediately frozen in liquid nitrogen as soon as they were removed and stored at −80 • C until RNA extraction. The study was approved by the Ethical Committee of Human Research of the hospital.

Acquisition and Culture of Cell Lines
The human ccRCC cell lines, such as 786-O, A498, ACHN, Caki-1, and the normal renal tubular epithelial cell line, such as HK-2 were acquired from the Cell Bank of Wuhan University (Wuhan, China). Among them, the 786-O, A498, ACHN, and Caki-1 were cultured in RPMI 1640 medium (Gibco Laboratories, MD, USA) with 10% fetal bovine serum (FBS) (ScienCell Research Laboratories, CA, USA) and 1% penicillin-streptomycin (P/S), while the HK-2 cell line was cultured in Dulbecco's Modified Eagle Medium (DMEM) (Gibco Laboratories, MD, USA) with 10% FBS and 1% P/S. All cell lines were cultured at 37 • C with 5% CO 2 and the culture medium was replaced every 3 days.

Quantitative Real-Time Polymerase Chain Reaction
Total RNA from the tissue samples of ccRCC and cell lines was extracted using Trizol reagent (Invitrogen, Carlsbad, CA, USA). The reverse transcription reactions were performed using a PrimeScriptTM RT reagent kit (Takara, Gunma, Japan) according to the instructions of the manufacturer, and quantitative real-time (qRT)-PCR was prepared at a final volume of 20 µl using an SYBR Green Mix (Takara, Gunma, Japan). The reactions were measured on an AB7500 platform (Thermo Fisher Scientific, MA, USA). The 2 − CT method was used to determine the relative gene expression level, and Glyceraldehyde 3-phosphate dehydrogenase (GAPDH) was used as an internal control to normalize the data. The primers used for these reactions are listed in Table 2.

Statistical Analysis
Statistical analyses were performed with R software (Version 3.6.1) and GraphPad 7.0. Survival analysis in the data set from TCGA was performed with the R package, "survminer." Oneway ANOVA was used to examine the significance of expression differences among pathological stages or histologic grades. Pearson's correlation test was used to analyze the expression correlation between miRNAs and lncRNAs or mRNAs. Student's t-test was used to compare paired data, such as data from ccRCC tissues and adjacent tissues or data from samples of distant metastasis and non-distant metastasis. The value, p < 0.05 was considered statistically significant.

Screening of DEGs
The R packages, such as "DEseq2" and "edgeR" were used to screen for DEGs between ccRCC and normal tissues in TCGA, where a total of 11,721 (7,532 upregulated and 4,189 downregulated) and 12,019 (8,253 upregulated and 3,766 downregulated) DEGs were obtained separately. Next, we acquired the upregulated and downregulated DEGs that coexist in the two analysis methods, where 11,034 DEGs (7,327 upregulated and 3,707 downregulated) were eventually obtained. Volcano plots were utilized to illustrate the distribution of each gene (Figures 2A,B). Venn diagrams were plotted to find the intersection of upregulated and downregulated DEGs separately (Figures 2C,D). Finally, the expression levels of DEGs between ccRCC and normal tissues were displayed using the heatmap in Figure 2E.

Division of Co-expression Modules
To further identify distant metastasis-related modules, we first performed a hierarchical clustering analysis. Samples with a distance >300 were screened as outliers, and the TCGA-CJ-4642 sample was ultimately excluded (Supplementary Figure 1A). Then, the R package, "WGCNA" was used to construct the WGCNA. In this study, the power of β = 4 (scale-free R 2 = 0.88) was chosen as the soft-thresholding value to ensure a scalefree network (Supplementary Figures 1B,C), and we got 23 coexpression modules by the average linkage hierarchical clustering (Supplementary Figure 1D, Supplementary Table 1). The nonco-expressed genes were grouped into gray module and removed from further analysis. The eigengene adjacency heatmap showed that the expression pattern of each module was independent of each other ( Figure 3A). The heatmap of 5,000 randomly selected DEGs showed high degree of topological overlap of co-expression genes in each module (Supplementary Figure 1E).

Identification of the Key Pink Module
To identify genes associated with distant metastasis of ccRCC, we analyzed the connection between co-expression modules and clinical phenotype using Pearson's correlation. As a result, distant metastasis (r = 0.3, p = 3e-12) was most significantly associated with pink module. Furthermore, pink module was also most significantly related to histologic grade (r = 0.41, p = 1e-22) and pathological stage (r = 0.37, p = 2e-18, Figure 3B). In a module related to a trait of interest, genes with high module membership (MM) tended to have high gene significance (GS), implying that hub genes of the co-expression module tend to be highly correlated with the selected clinical characteristics. The scatterplots of MM vs. GS consistently revealed that MM in the pink module was highly correlated with distant metastasis (cor = 0.71, p = 2.3e-63), histologic grade (cor = 0.41, p = 7.5e-18) and pathological stage (cor = 0.59, p = 2.4e-39, Figures 3C-E), while was not highly correlated with age (cor = 0.14, p = 0.0048), gender (cor = −0.012, p = 0.81), and race (cor = −0.18, p = 0.00027, Supplementary Figures 1F-H). Taking together, we chose the pink module for the next analysis.

Functional Annotation of Pink Module
In order to study the function of the distant metastasisrelated pink module, we performed a functional annotation analysis of all genes in the pink module. The top 10 enriched KEGG pathways were presented in Figure 4A, including Human T-cell leukemia virus 1 infection, cell cycle, p53 signaling pathway, transcriptional misregulation in cancer, and homologous recombination. The top 10 enriched GO terms were shown in Figures 4B-D, containing nuclear division, organelle fission, and chromosome segregation in the biological process category, spindle, chromosomal region, chromosome centromeric region in the cell component category, DNA replication origin binding, DNA-dependent ATPase activity, and tubulin binding in the molecular function category. The detailed information of functional annotation is listed in Supplementary Table 2, and the threshold is at p < 0.05.

Symbol
Forward primer Reverse primer

Identification and Verification of Hub mRNAs
To further screen the most significant hub genes, we conducted a series of analyses in the pink module. First, we visualized the WGCNA network of pink modules using Cytoscape 3.6.1, and 54 WGCNA hub genes with the node degree >200 were screened for next analyses ( Figure 3F). Secondly, we uploaded these 54 genes to the STRING database for PPI network analysis. Under the threshold of a minimum interaction score >0.99, 33 PPI hub mRNAs were identified ( Figure 3G, Supplementary Table 3). Thirdly, survival analysis was performed to these 33 PPI hub mRNAs in TCGA set and 30 mRNAs were significantly positively associated with poor prognosis in patients with ccRCC (Supplementary Figure 2). Fourthly, we conducted expression analysis to these 30 mRNAs separately in stage, grade, distant metastasis, ccRCC, and normal tissues and eventually obtained six hub mRNAs (BUB1B, CCNB2, KIF18B, PLK1, PTTG1, and TOP2A) that were not only upregulated in the samples of distant metastasis but were also positively significantly associated with higher grades in patients with ccRCC ( Figure 5). Furthermore, the six hub mRNAs were also positively significantly associated with higher stages in patients with ccRCC (Supplementary Figures 3A-F) and highly expressed in tumor tissues (Supplementary Figures 3G-L).
The detailed screening process of hub mRNAs is shown in Supplementary Table 4. To verify the prognostic value of six hub mRNAs, we performed a survival analysis in the GEPIA database. The results showed that all six hub genes were significantly predictive of OS and DFS in patients with ccRCC (Supplementary Figures 3M-X). Meanwhile, the external validation data set, GSE53757 was used to confirm the expression patterns of six hub mRNAs. Consistent with our previous analyses in TCGA set, all six hub mRNAs were significantly positively associated with the pathological stage of patients with ccRCC (Supplementary Figures 4A-F) and were upregulated in ccRCC tissues (Supplementary Figures 4G-L).

Screening of 3-Key miRNAs Linked to Distant Metastasis of ccRCC
We predicted the upstream miRNAs of six hub mRNAs through five online databases as mentioned above and selected genes that co-occurred in two or more databases for further analysis (Figures 6A-F). In view of miRNA functional mechanism and oncogenic roles of six hub mRNAs, upstream miRNAs of the six hub mRNAs could be tumor suppressive genes. Therefore, all miRNAs-mRNAs pairs were selected for expression correlation analysis, and 45 miRNAs were significantly negatively correlated with targeting hub mRNAs. To evaluate the prognostic value of these miRNAs, we performed survival analysis for all of 45 miRNAs, and 18 miRNAs were significantly associated with good prognosis in patients with ccRCC (Supplementary Figure 5). Subsequently, we compared the expression level between ccRCC samples and normal samples for these 18 prognosis-related miRNAs, and 11 tumor suppressor miRNAs significantly downregulated in ccRCC tissues were identified (Supplementary Figure 6). Finally, we conducted expression analyses in stage and distant metastasis in these 11 miRNAs. The results indicated that the three key miRNAs (hsa-miR-10b-3p, hsa-miR-23b-3p, and hsa-miR-139-3p) were not only significantly negatively related to the expression of four targeting hub mRNAs (CCNB2, KIF18B, PLK1, and TOP2A, Figures 6G-K) and higher pathological stages in patients with ccRCC (Figures 6L-N) but also significantly low-expressed in distant metastasis tissues (Figures 6O-Q). The detailed screening process of key miRNAs is displayed in Supplementary Table 5.

Upstream Distant Metastasis-Related LncRNAs of 3-Key miRNAs
We utilized LncBase database to predict the upstream potential of lncRNAs that may be combined with the above 11 tumor suppressor miRNAs. To obtain distant metastasis-associated lncRNAs, an intersection was taken between the predicted lncRNAs and genes in the distant metastasis-associated pink module, and a total of 26 lncRNAs were finally obtained. To fully screen the distant metastasis-related lncRNAs, we chose the upstream 10 lncRNAs of three key miRNAs for subsequent analysis (Supplementary Figures 7A-C). Based on ceRNA mechanism, these lncRNAs may function as oncogenes in ccRCC and could negatively regulate downstream key miRNAs. Therefore, we conducted expression correlation analysis for these 10 lncRNAs, and found that nine lncRNAs were significantly negatively correlated with the downstream miRNAs (Supplementary Figures 7D-M). Subsequently, the expression differences of nine lncRNAs among different clinical traits including stage, grade, distant metastasis, and normal and tumor tissues were explored in the TCGA set. As a result, three key lncRNAs (LINC01234, LINC02577, and LINC02609) were identified, which not only significantly expressed higher in advanced stages and grades than in the early stages and grades (Figures 7A-F) but significantly upregulated in the tissues of tumor and distant metastasis than in the normal and non-distant metastasis control (Figures 7G-L). Finally, survival analysis indicated that the three lncRNAs were significantly associated with the poor prognosis of both OS and DFS in patients with ccRCC (Figures 8A-F). The detailed screening process of key lncRNAs is shown in Supplementary Table 6.

Construction of ceRNA Regulatory Network in ccRCC
Eleven tumor suppressor miRNAs and their interactions in the pink module were used to construct the lncRNA-miRNA-mRNA network (Figure 8G). To further explore the potential mechanism of distant metastasis in ccRCC, all of the interactional key genes including the three key lncRNAs (LINC01234, LINC02577, and LINC02609), three key miRNAs (hsa-miR-10b-3p, hsa-miR-23b-3p, and hsa-miR-139-3p), and four hub mRNAs (CCNB2, KIF18B, PLK1, and TOP2A) in the network were extracted to form a core ceRNA network, and visualized by the Sankey diagram, in which all genes were significantly associated with distant metastasis and prognosis in patients with ccRCC ( Figure 8H).

The Distant Metastasis-Related LncRNA Signature as a Prognostic Indicator Independent of Other Clinical Characteristics
To further verify the prognostic potential of these three distant metastasis-associated lncRNAs, a univariate Cox proportional hazards regression analysis was performed. As a result, all three lncRNAs were significantly associated with poor prognosis in patients with ccRCC (p < 0.0001, Table 3). Then, the key lncRNAs were further evaluated by LASSO regression analysis, and we repeated the process 1,000 times to avoid over-fitting in the prognostic signature. The result displayed that all of the three key lncRNAs could be included for further analysis (Figures 9A,B). Subsequently, through multivariate Cox regression analysis ( Table 3), we built the three lncRNA prognostic signature to predict OS as follows: Risk score = (0.0977 × expression level of LINC01234) + (0.1350 × expression level of LINC02609) + (0.0464 × expression level of LINC02577).
Of note, LINC01234 is all the most closely associated with prognostic signature (p = 0.00022). The OS rate of high-risk patients was significantly lower than that of lowrisk patients (Figure 9C). Furthermore, the area under ROC curve was 0.718 (Figure 9D), showing a high predictive ability of the prognostic signature. Finally, all 529 patients in the TCGA set were divided into high-risk (n = 264) and lowrisk groups (n = 265) according to the median value of the risk score. Risk curve and scatterplot separately displayed the risk score and survival status of each patient with ccRCC (Figures 9E,F). Patients in the low-risk group had lower mortality than patients in the high-risk group. The heatmap showed the expression level of the prognostic signature in patients with ccRCC, indicating that the three key lncRNAs were obviously highly expressed in high-risk patients with ccRCC ( Figure 9G). Additionally, the univariate and multivariate Cox regression analyses of clinical characteristics displayed that age, grade, M stage, and the risk score were the independent prognostic factors (Figures 9H,I). The risk score of patients with distant metastasis was significantly higher than that of patients with non-distant metastasis, which indicated that the patients with higher risk score were more likely to develop distant metastasis compared to the patients with lower risk score ( Figure 9J).

GSEA of the Distant Metastasis-Related LncRNA Signature
In order to explore the potential pathways of the 3-lncRNA prognostic signature, we performed GSEA to the signature in the TCGA set. The result indicated that the patients with high-risk score were mainly associated with immunoregulatory pathways, such as primary immunodeficiency, intestinal immune The green circle represents non-key lncRNA, the red circle represents key lncRNA, the blue hexagon stands for prognostic miRNA, the red hexagon stands for prognostic and distant metastatic miRNAs, the yellow triangle means hub mRNAs that are related to key miRNAs, and the green triangle means hub mRNAs that are only associated with prognostic miRNAs. (H) The core ceRNA network associated with distant metastasis and prognosis of patients with ccRCC was presented by Sankey diagram. The value of p < 0.05 was considered statistically significant. lncRNAs, Long non-coding RNAs; ceRNA, competitive endogenous RNA; OS, overall survival; DFS, disease free survival. network for lgA production, complement and coagulation cascades, cytokine-cytokine receptor interaction, cell cycle, and p53 signaling pathway (Figures 10A-F). Autophagy-related and cancer-related pathways were mainly enriched in low-risk patients, including regulation of autophagy, mTOR signaling pathway, renal cell carcinoma, pathways in cancer, and MAPK signaling pathway (Figure 10G), indicating that the autophagy was mainly involved in patients with low-risk ccRCC. The top eight signaling pathways in high-and low-risk groups based on GSEA are shown in Table 4. These results suggested that a highrisk score in patients with ccRCC may be linked to impaired immune function, while a low risk score in patients with ccRCC might be correlated with enhanced autophagy.

Validation of Distant Metastasis-Related LncRNA Signature in Tissues and Cell Lines
To verify the expression patterns of the 3-lncRNA prognostic signature, the expression level of the three lncRNAs was further verified by qRT-PCR, not only in 30 paired ccRCC/normal tissue specimens that were collected at our institution but in one normal renal tubular epithelial cell line (HK-2) and 4 ccRCC cell lines (A498, 786-O, Caki-1, and ACHN). The results indicated that the three key lncRNAs in the prognostic signature were not only significantly expressed high in tumor tissues than in normal tissues (Figures 11A-C) but significantly upregulated in the cell lines of ccRCC compared to the cell line of normal renal tubular epithelial cells (Figures 11D-F), which is in agreement with the result of the database analysis from our study.

DISCUSSION
Clear cell renal cell carcinoma, one of the most common malignancies of the urinary system, is prone to recur and result in metastasis. Approximately 25-30% of patients with ccRCC already exhibited distant metastasis at the initial diagnosis (24). Although great progress has been made in the diagnosis and treatment of ccRCC, the 5-year survival rate of patients with ccRCC and distant metastasis remain very low (5). Therefore, the exploration of prognostic biomarkers with high sensitivity and specificity for distant metastasis in patients with ccRCC has become increasingly urgent. The dysregulation and mutation of lncRNAs could promote or inhibit the occurrence and metastasis of tumors, which makes them to be the novel prognostic biomarkers and promising therapeutic targets in cancers (7). At present, some lncRNAs have been found to play an important role in the prognosis of cancers, including CYTOR (25), CASC19 (26), IGFBP4-1 (27), and CDC6 (28). However, numerous studies indicated that the combined prognostic model of lncRNAs has a prominent advantage over the single lncRNA prognostic biomarker in terms of statistical stability. Although some combined prognostic models of lncRNAs have been identified in multiple types of cancers (29,30), studies on combined prognostic signatures of lncRNAs in ccRCC are still very limited. Additionally, different from the other lncRNA signatures that were only associated with ccRCC prognosis (31,32), the lncRNA signature we constructed was closely associated with distant metastasis in ccRCC and prognosis simultaneously. More importantly, a potential ceRNA regulatory network of the lncRNA signature was constructed, which all genes in the network could be regarded as the promising biomarkers that accurately predicted distant metastasis and prognosis of patients with ccRCC, achieving the precise identification of distant metastasis-associated ceRNA regulatory network in ccRCC for the first time.
In the present study, we first downloaded the transcriptome sequencing data of ccRCC and the corresponding clinical information from patients with ccRCC obtained from the database of TCGA. Unlike other similar studies that only used one package to identify DEGs (33,34), we simultaneously utilized the R packages, such as "DESeq2" and "edgeR, " and finally obtained 11,034 DEGs after intersecting the results of the analysis. Subsequently, 23 gene co-expression modules were constructed using WGCNA. Among them, we identified the key pink module that was all the most closely associated with distant metastasis of ccRCC using the Pearson's correlation analysis. Interestingly, it was found that the pink module was also all the most closely associated with the pathological stage and histological grade of ccRCC, indicating that the pink module has an important clinical significance. After conducting a series of rigorous analyses, such as PPI analysis, survival analysis, expression analyses in stage, grade, and distant metastasis to genes in the pink module, a total of six hub mRNAs (BUB1B, CCNB2, KIF18B, PLK1, PTTG1, and TOP2A) that could predict distant metastasis and poor prognosis in patients with ccRCC were successfully identified. Subsequently, the GEO data set and GEPIA database were utilized for verification, which further confirmed that the six hub mRNAs were upregulated in the tissues of ccRCC, and were highly correlated with the poor  prognosis and the pathological stages of patients with ccRCC. Moreover, some studies have also indicated that the six hub mRNAs could be used as promising biomarkers related to the metastasis or prognosis of cancer. For example, over-expression of BUB1B is associated with poor prognosis in patients with prostate cancer, and could promote proliferation and migration of prostate cancer cells (35); Wang et al. (36) proposed that miR-335-5p could regulate metastasis of lung adenocarcinoma by targeting CCNB2. Yan et al. (37) showed that KIF18B could act as a promoting factor for proliferation and metastasis of cutaneous melanoma. Fu et al. (38) found that the over-expression of PLK1 could induce oncogenic transformation of prostate epithelial cells, thereby driving the metastasis of prostate cancer. Song et al. (39) suggested that miR-144-3p could inhibit proliferation and metastasis of glioma cells by targeting TOP2A. Huang et al. (40) found that upregulation of PTTG1 was able to promote the growth and metastasis of hepatocellular carcinoma cells. These reports together with the results of our analyses further confirmed that the six hub mRNAs may function as oncogenes to play a crucial role in distant metastasis and poor prognosis in patients with ccRCC. MiRNAs can negatively regulate the expression of genes at the post-transcriptional level by interacting with MREs, thereby promoting or inhibiting the progression of cancer (41). Therefore, it is of great significance to explore the miRNAs targeting the above six hub mRNAs. First, five online databases were utilized to predict the potential miRNAs. According to the action mechanism of miRNAs in the ceRNA hypothesis, we speculated that the above miRNAs may function as tumor suppressor genes in ccRCC. Based on this inference, we screened three key miRNAs (hsa-miR-10b-3p, hsa-miR-23b-3p, and hsa-miR-139-3p) targeting four hub mRNAs (CCNB2, KIF18B, PLK1, and TOP2A) using the correlation analysis, survival analysis, expression analyses in stage, distant metastasis, tumor, and normal tissues. It was reported that the three key miRNAs played a key regulatory role in the oncogenesis and development of human cancers. For instance, Yang et al. (42) constructed a prediction model of five-miRNAs, including hsa-miR-10b-3p that functions as the independent prognostic factor in colorectal cancer; Zhou et al. (43) found that patients with higher hsa-miR-23b-3p expression had better outcomes than those with lower expression of hsa-miR-23b-3p in colorectal cancer. Xu et al. (44) demonstrated that over-expression of hsa-miR-139-3p could significantly inhibit the proliferation, migration, and invasion of HeLa cells in cervical cancer, in vitro and could significantly suppress the growth of cervical tumor in vivo. All of these reports further supported the accuracy of the results of the current study.
In addition to miRNAs, lncRNAs can function as ceRNAs to interact with mRNAs by competitively binding to specific miRNAs and thus regulate various human diseases including cancer (45). Therefore, we utilized LncBase database to predict lncRNAs that potentially bind to the 3-key miRNAs. To explore the potential molecular mechanism of distant metastasis in ccRCC, we took the intersection of the predicted lncRNAs with genes in the distant metastasis-associated pink module and obtained 10 lncRNAs. According to the ceRNA hypothesis (9), lncRNAs could function as ceRNAs to negatively regulate miRNAs, which means that the expression trend of lncRNAs may be opposite to that of miRNAs. Therefore, we performed a correlation analysis to the resulting 10 lncRNAs based on the above relationship. Subsequently, we conducted a series of strict analyses, such as survival analysis, expression analyses in stage, grade, distant metastasis, normal, and tumor tissues to the screened lncRNAs, and finally obtained three key lncRNAs with significant distant metastasis and prognosis potential (LINC01234, LINC02577, and LINC02609). Among them, LINC01234 has been reported to be involved in the biological processes of some cancers. For instance, Chen et al. (46) identified that LINC01234 was an oncogenic lncRNA that could promote proliferation and inhibit apoptosis of gastric cancer cells through miR-204-5p-CBFB axis. Lin et al. (47) found that LINC01234 was significantly upregulated in colon cancer, and promoted the proliferation of colon cancer cells by competitively combining miR-642a-5p. Chen et al. (48) demonstrated that LINC01234 was upregulated in human non-small cell lung cancer (NSCLC) and was closely associated with the metastasis and poor prognosis of NSCLC. However, there are very few reports on LINC01234 in ccRCC. Only Zhang et al. (49) constructed a risk score model of five lncRNAs including LINC01234 in ccRCC. More importantly, there is no relevant report on LINC01234 in the distant metastasis of ccRCC. Of note, LINC02577 and LINC02609 have not yet been reported in human cancers. Therefore, it is needed to conduct more studies on the three lncRNAs to explore their potential action mechanism in the development of tumors (especially ccRCC). Subsequently, we performed LASSO and Cox regression analyses to the above three lncRNAs, and the results further displayed that they were closely associated with poor prognosis in patients with ccRCC. Therefore, we constructed a risk score model in ccRCC using the above three lncRNAs to assess the prognosis of each patient. According to the median value of the risk score, we divided these patients into high-and low-risk groups. The results showed that the patients in the high-risk group have higher mortality and worse prognosis compared to the patients in the low-risk group. Finally, we evaluated the risk score model of lncRNAs using ROC curve in the entire TCGA data set, and the results showed that this model owned good predictive value. Furthermore, compared to the other clinical traits (age, gender, race, grade, stage, N stage, and M stage) of patients with ccRCC, the 3-lncRNA signature was an independent factor predicting the overall survival of patients with ccRCC. Additionally, GSEA results showed that the immune-related pathways were mainly enriched in the highrisk group, whereas the autophagy-and cancer-related pathways were mainly enriched in the low-risk group. These results revealed that the disorder of immune may be associated with the poor prognosis of patients with ccRCC. More importantly, the results also suggested that the three lncRNAs could promote distant metastasis of ccRCC by regulating autophagy, which is consistent with our previous findings (50). Finally, we took the four hub mRNAs (CCNB2, KIF18B, PLK1, and TOP2A), three key miRNAs (hsa-miR-10b-3p, hsa-miR-23b-3p, and hsa-miR-139-3p) and three key lncRNAs (LINC01234, LINC02577, and LINC02609) screened above together to construct a ceRNA network that could regulate the distant metastasis and prognosis in ccRCC. Compared to other similar reports in which only some genes play crucial molecular functions (51,52), all genes in the ceRNA network are closely related to the distant metastasis and prognosis of patients with ccRCC. Moreover, the results of qRT-PCR at the histological and cellular levels showed that three key lncRNAs in the ceRNA network were significantly upregulated in the tissues and cell lines of ccRCC compared to that in the normal tissues and renal tubular epithelial cell, which further confirmed the accuracy of our previous analysis on bioinformatics on the clinical and experimental levels.
In conclusion, based on the three upregulated lncRNAs in ccRCC, we identified a distant metastasis-related signature that could independently predict the overall survival of patients with ccRCC. Furthermore, the potential ceRNA regulatory network of the three key lncRNAs was accurately constructed, and all genes in the network could be the predicted factors for the distant metastasis and prognosis in ccRCC, providing a new sight on the mechanism underlying the distant metastasis of ccRCC.
The distant metastasis-related lncRNAs in the network could be used as reliable prognosis biomarkers and promising therapeutic targets for ccRCC, but in the future, further studies are still needed to verify the stability of this prognostic signature.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Ethics Committee of the First Affiliated Hospital of Anhui Medical University. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
YS and TZ conceived, designed the study, and writing of the manuscript. TZ acquired and analyzed the data. CL supervised and analyzed the results. JT and SF collected the tissue samples and cultured cells. YS performed experiments. LZ and JZ contributed analysis tools. All authors reviewed and approved the submitted manuscript.

ACKNOWLEDGMENTS
The authors would like to thank the TCGA, GEO, LinkedOmic, and starBase databases for the availability of the data. The heatmap describes the TOM among genes based on divided co-expression modules. About 5,000 randomly selected genes were grouped into different modules, which were shown in different color codes below the cluster dendrogram. Light yellow displays a high degree of topological overlap, while dark red indicates a low degree of topological overlap. Scatter plots of GS for age (F), gender (G), and race (H) vs. MM in pink module. The value of p < 0.05 was considered statistically significant. WGCNA, weighted gene co-expression network analysis; ccRCC, Clear cell renal cell carcinoma; TOM, topological overlap matrix; DEGs, differentially expressed genes; GS, gene significance; MM, module membership.