Regulation of Long Non-coding RNA KCNQ1OT1 Network in Colorectal Cancer Immunity

Over the past few decades, researchers have become aware of the importance of non-coding RNA, which makes up the vast majority of the transcriptome. Long non-coding RNAs (lncRNAs) in turn constitute the largest fraction of non-coding transcripts. Increasing evidence has been found for the crucial roles of lncRNAs in both tissue homeostasis and development, and for their functional contributions to and regulation of the development and progression of various human diseases such as cancers. However, so far, only few findings with regards to functional lncRNAs in cancers have been translated into clinical applications. Based on multiple factors such as binding affinity of miRNAs to their lncRNA sponges, we analyzed the competitive endogenous RNA (ceRNA) network for the colorectal cancer RNA-seq datasets from The Cancer Genome Atlas (TCGA). After performing the ceRNA network construction and survival analysis, the lncRNA KCNQ1OT1 was found to be significantly upregulated in colorectal cancer tissues and associated with the survival of patients. A KCNQ1OT1-related lncRNA-miRNA-mRNA ceRNA network was constructed. A gene set variation analysis (GSVA) indicated that the expression of the KCNQ1OT1 ceRNA network in colorectal cancer tissues and normal tissues were significantly different, not only in the TCGA-COAD dataset but also in three other GEO datasets used as validation. By predicting comprehensive immune cell subsets from gene expression data, in samples grouped by differential expression levels of the KCNQ1OT1 ceRNA network in a cohort of patients, we found that CD4+, CD8+, and cytotoxic T cells and 14 other immune cell subsets were at different levels in the high- and low-KCNQ1OT1 ceRNA network score groups. These results indicated that the KCNQ1OT1 ceRNA network could be involved in the regulation of the tumor microenvironment, which would provide the rationale to further exploit KCNQ1OT1 as a possible functional contributor to and therapeutic target for colorectal cancer.


INTRODUCTION
Colorectal cancer (CRC) is a common malignant cancer and is the second-highest contributor to the worldwide incidence of cancer-related deaths (Miller et al., 2019). CRC develops sporadically from some inflammatory bowel diseases or hereditary cancer syndromes. The development of colorectal cancer is based on the adenoma-carcinoma sequence. So far, the molecular mechanism of the adenoma-carcinoma sequence has been only partly identified. CRC prognosis depends on factors related to the patient and treatment. The expertise of the treatment team is one of the most important determinants of the outcome. Early detection of CRC cells and cancer precursor cells significantly reduce morbidity and improve patient prognosis (Sung et al., 2021).
The mortality of colorectal cancer can be effectively reduced by screening for the cancer. The most common screening procedures include flexible sigmoidoscopy, double-contrast barium enema, fecal occult blood tests, and colonoscopy (Bibbins-Domingo et al., 2016). There is no consensus regarding which screening method is the best, and it appears that no one test is better than the other. Risk, cost, and effectiveness are the main factors to be considered when discussing different options (Issa and Noureddine, 2017). Undoubtedly, a complete colonoscopy has the advantages of allowing the entire colon to be assessed, material for a biopsy to be collected, and a polypectomy to be carried out all within the same examination time; however, it also has disadvantages of higher costs as well as risks, discomfort and inconvenience for the patient being examined. Therefore, it is for all practical purposes necessary to develop effective biomarkers of CRC for applications in screening, diagnosis and prognosis.
Most of the RNA transcribed from the human genome does not encode for proteins. Some of these non-coding RNAs (ncRNAs) have been found to dysregulate the normal expression of genes, including tumor suppressor genes and oncogenes. Therefore, ncRNAs are considered to be new promising targets for studying tumorigenesis. ncRNAs include long non-coding RNAs (lncRNAs), microRNAs (miRNAs), circular RNAs, and small interfering RNAs (siRNAs). An miRNA is a highly conserved non-coding RNA approximately 21-24 nucleotides in length, and interacts with target mRNAs to regulate gene expression. Some miRNAs have been reported to be involved in the occurrence and development of cancer (Hayes et al., 2014), and miRNA-based therapeutics have in fact reached the stage of clinical development (Toden et al., 2021). In recent years, the importance of lncRNA has gradually become recognized. An lncRNA is essentially an ncRNA with more than 200 base pairs. So far, some lncRNAs have been shown to play a major regulatory role in genetic regulation. Recent studies have shown multiple roles for lncRNAs in tumorigenesis (de Oliveira et al., 2019). The competitive endogenous RNA (ceRNA) hypothesis is related to lncRNAs and miRNAs, proposed by Salmena and others (Salmena et al., 2011), has been described as the "Rosetta Stele", used to decode the RNA language in order to regulate RNA crosstalk and regulate biological functions. Many studies have shown that miRNAmediated ceRNA regulation plays a crucial role in the occurrence and development of cancer (de Oliveira et al., 2019). Long non-coding RNA KCNQ1 opposite strand/ antisense transcript one gene (KCNQ1OT1) were markedly upregulated in gastric cancer tissues and cells (Zhong et al., 2021). High expression of lncRNA KCNQ1OT1 was significantly related to poor survival in patients with CRC in a pan-cancer meta-analysis. The upregulation of KCNQ1OT1 in CRC tissues and cell lines was also confirm the important role of KCNQ1OT1 in CRC (Lin et al., 2021). In our current work, we investigated the KCNQ1OT1 by mining the CRC RNA-Seq dataset in The Cancer Genome Atlas (TCGA), tested its prognostic potential in a CRC cohort and constructed the KCNQ1OT1-related lncRNA-miRNA-mRNA ceRNA network. We aimed to provide the rationales to the further exploitation of KCNQ1OT1 as a possible functional contributor to and therapeutic target for CRC.

MATERIALS AND METHODS
Differentially Expressed Genes From TCGA RNA-Seq Data RNA-seq data of a colorectal cancer cohort (TCGA-COAD) was downloaded with gdc-client (version 1.3.0) from the data portal of TCGA (the dbGaP accession: phs000178. v11. p8. Release date: December 18, 2019). The RNA sequencing read counts of sample were obtained from TCGA. All these samples were sequencing by Illumina Genome Analyzer IIX. According to the bioinformatics pipeline for mRNA analysis in TCGA (https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/ Expression_mRNA_Pipeline/), quality assessment was performed with FASTQC (version 0.11.8), the alignment was performed using a two-pass method with STAR (version 2.4.2a). The reads were aligned to the GRCh38 reference genome and then were quantified with HTSeq (version 0.6.1). CPM normalization was performed to correct library size differences between samples. As an initial filter, we retained only genes with log 2 CPM >1 in more than half of the samples. To compare expression levels between samples, for these genes alone, we then re-normalised raw count data by TMM (implemented in edgeR version 3.32.1) and transformed these by voom in limma (version 3.46.0) (Robinson and Oshlack, 2010;McCarthy et al., 2012;Ritchie et al., 2015). After carrying out this normalization, mRNAs, microRNA and lncRNAs differentially expressed in tumor group and solid normal tissues were identified. Thresholds for differential expression were set each as the adjusted p value less than 0.01 and |log 2 fold change|>1.
Construction of a lncRNA-miRNA-mRNA ceRNA Network All of the differentially expressed genes were considered as candidates in the lncRNA-miRNA-mRNA ceRNA network construction, which was based on multiple factors such as binding affinity of miRNAs to their lncRNA sponges, RNA secondary structures and RNA-binding proteins, and the abundance and subcellular localization of ceRNA components (Qi et al., 2015). To construct the ceRNA network, the R package GDCRNATools (version 1.10.1) was used . with five databases on miRNA-mRNA interactions including STarMir (version 2.2) (Kanoria et al., 2016), StarBase (version 2.0) (Li et al., 2014), miRcode (version 11) (Jeggari et al., 2012), spongeScan (version 1.0) (Furió-Tarí et al., 2016), and mirTarBase (version 7.0) (Chou et al., 2018) incorporated for the interactions analysis. The criteria for identifying competing lncRNA-mRNA interactions are: a) the strength of positive association between expression of lncRNA and its target mRNAs, b) the hypergeometric probability of shared miRNAs on the lncRNA-mRNA pair, c) the strength of regulation similarity of all shared miRNAs on the lncRNA-mRNA pair. Based on above criteria, Pearson's correlation was used to measure the association between expression of lncRNA and mRNA. Hypergeometric distribution was measured by Fisher's exact test. Regulation similarity was calculated based on the total number of the lncRNA-mRNA shared miRNAs, the Pearson's correlation between the miRNA with lncRNA, as well as miRNA with mRNA . After the construction, the ceRNA network was plotted using Cytoscape (version 3.7.0) software (Smoot et al., 2011).

Survival Analysis
We investigated the prognostic values of the main differentially expressed lncRNAs in the ceRNA network for CRC patients. Kaplan-Meier curves for survival analysis were depicted to present the survival-related lncRNAs. A log-rank test was performed to compare the survival distributions of samples grouped by lncRNA differential expression levels of the patient cohort.

Functional Annotation
To identify the biological function of the ceRNA network possibly contributing to tumor development, we performed a functional annotation analysis with multiple pathway databases, including Gene Ontology (GO) (Ashburner et al., 2000), Reactome (Haw et al., 2011), and Speed2 (Signalling Pathway Enrichment using Experimental Datasets 2) (Rydenfelt et al., 2020). A p value of less than 0.05 was considered to indicate statistically significant enrichment. The R package clusterProfiler V3.11 (Yu et al., 2012) was used for the GO and Reactome pathway enrichment analysis and visualization of significant modules. The R package SPEED2 was used for checking the upstream pathway activity from the genes in the ceRNA network, and a Bates test was used to calculate the test statistics for pathway enrichment.

Optimization of the ceRNA Network
There were more than 100 genes in the ceRNA network ( Figure 2), so for practical applicability, we confirmed the target structural accessibility and selected the critical lncRNAs associated with the survival of CRC patients as the hub genes. Only the mRNAs both correlated with the hub gene and in the original ceRNA network were used to reconstruct the optimized network. Target structural accessibility for miRNA target recognition was calculated and visualized using STarMirDB and Sfold (version 2.2) (Kanoria et al., 2016;Rennie et al., 2019). The optimized network was visualized using Cytoscape.

Network Signature Analysis
Gene set variation analysis (GSVA) was used to determine the network expression level of each single sample, analogously to a competitive gene set test (Hänzelmann et al., 2013). By performing GSVA scoring, we were able to estimate the variation in the gene enrichments of the networks of the samples, and to do so independently for tumor and normal tissues. GSVA scores are designed to range from −1 to 1, with negative scores indicating relative decreases in network expression while positive scores indicated elevations. For practical applicability, only lncRNAs and mRNAs in networks were listed as the gene set signature for the GSVA assessment. The R package GSVA (v3.11) was used to calculate the GSVA score of networks over the samples in the CRC transcriptome dataset.

Estimation of Immune Cell Infiltration
Immune cells infiltrated in the tumor microenvironment play crucial roles in tumor invasion and metastasis. To estimate immune cell infiltration in samples with different network expression levels, the web-based tool ImmuCellAI was applied to calculate the abundance of 24 immune cell subsets via their gene expression profiles (Miao et al., 2020). The immune cell subsets estimated in this study included 18 T cells subsets:

CRC Tumor Samples and Normal Samples Were Significantly Distinguished on the Basis of Differentially Expressed lncRNAs
We obtained the aligned read counts (GRCh38 (hg38) version) of tissue samples from 478 primary tumor, one metastatic tumor, one recurrent tumor and 41 normal solid tissue from TCGA. Samples from primary tumor, metastatic tumor and recurrent tumor were combined as tumor group. The age and sex of patients in normal group was matched to those in tumor group (Supplementary Table 1). There were total number of 60483 genes in this dataset. In prefiltering step, 14768 genes were kept for the downstream analyses. After TMM normalization and voom transformation, we explored the DGEs (differentially expressed genes) based on GLM (generalized linear model) likelihood ratio test, 2935 CRC tissue-specific mRNAs and 213 lncRNAs via the differential expression analysis of the TCGA-COAD dataset (Supplementary Figures S1A,B, Supplementary Table S2). The heatmaps of the differential expression of lncRNAs between CRC tissues and solid normal tissues are shown in Supplementary Figure S1C. The expression of 213 differentially expressed lncRNAs separated the tumor group and normal group clearly.

Construction of the lncRNA-miRNA-mRNA ceRNA Network
In the current work, certain lncRNAs and mRNAs were shown to co-express in the ceRNA networks. We built the ceRNA network based on the co-expression patterns of the lncRNAs, miRNAs and mRNAs. In total, 133 nodes and 288 edges constituted the ceRNA network. The preliminary network is shown in Figure 2.

Knowledge-Driven Pathway Analysis of ceRNA Network
According to the Reactome analysis, biological processes of the genes in the ceRNA network from tumor tissue were mainly related to MET pathways ( Figure 3A). MET is a receptor tyrosine kinase (RTK), and like other related RTKs such as EGFR, MET can be activated by binding to its ligand, namely hepatocyte growth factor/scatter factor (HGF/SF), resulting in MET dimerization and trans-autophosphorylation. A total of 106 GO terms were extracted using the GO analysis (Supplementary Table S3). Of the 106 GO terms, 97 were Biological Process terms, eight were Cellular Component terms and one was a Molecular Function term. The BP GO terms have all been confirmed to be related to the regulation of neurogenesis and mesenchymal cell proliferation ( Figure 3B) with, for instance, GO:0050768 (negative regulation of neurogenesis), GO:0010975 (regulation of neuron projection development) SPEED2 analyses allow one to infer upstream pathways. In the current work, the VEGF pathway was indicated from this analysis to be the signaling pathway that likely caused the genes in the ceRNA network to be deregulated ( Figure 3C, Supplementary Table S4). We used another web-based tool LnCeVar-Cluster which provides cluster profiles between differential expressed lncRNAs and ceRNAs, especially the similarity profiles between differential expressed lncRNAs . The differential expressed lncRNAs clustering profile of TACG-COAD indicated that NEAT1, MALAT1, KCNQ1OT1 and LINC00969 had the similar expression pattern in CRC ( Figure 3D). Functional annotations showed the ceRNA network genes to be closely related to the tumor pathogenesis in general.
Of the lncRNAs present in the preliminary ceRNA network, only one gene, namely KCNQ1OT1, was indicated to be associated with survival in the CRC cohort. Thus, a correlation analysis was performed based on this critical KCNQ1OT1 hub gene. Based on a hypergeometric test and correlation analysis ( Table 2), the mRNAs and lncRNAs not significantly correlated with KCNQ1OT1 or without a direct link with KCNQ1OT1 were removed from the network. The KCNQ1OT1-miRNA-mRNA subnetwork consisted of one centroid lncRNA node, nine miRNA nodes and nine mRNA nodes. This KCNQ1OT1 ceRNA network was reconstructed and visualized using Cytoscape ( Figure 5). The expression level of the lncRNA KCNQ1OT1 was positively correlated with the expression levels of the nine mRNAs ( Figure 6). This result was consistent with the ceRNA theory that the lncRNA regulated other RNA transcripts by competing for shared microRNAs. We checked the expression profile of KCNQ1OT1 and other hub network genes in tumor group and normal group in TCGA data set (Supplementary Table S2, Supplementary Figure S2). An additional searching in The Genotype-Tissue Expression (GTEx) portal (dbGaP accession number phs000424. vN.pN) which is a comprehensive public dataset for tissue-specific gene expression and regulation in nondiseased cohort for verification of the expression of KCNQ1OT1 in colon was conducted (Supplementary Figure S3). The results showed that the gene expression of KCNQ1OT1 in both transverse colon and sigmoid colon was higher than in blood. And the gene expression of KCNQ1OT1 across all tissues showed that, although KCNQ1OT1 did not represent the highest expression in colon, its expression level was not low either, consistent with what we found in TCGA-COAD: the KCNQ1OT1 expression in normal tissue was low (CPM: median: 2.41, mean: 3.46). But its expression level in tumor tissue in TCGA-COAD was higher (CPM: median: 4.73, mean: 10.67). All this data showed that KCNQ1OT1 had enough expression to physiologically function as a miRNA sponge. Target structural accessibility for miRNA target recognition to lncRNA KCNQ1OT1 or mRNAs was calculated and visualized using Sfold and STarMirDB (Supplementary Figures S4, S5).

The KCNQ1OT1 ceRNA Network Signature Was Highly Expressed in CRC Transcriptomic Profiles
A GSVA assessment of network signatures in tumor and normal samples and using the TCGA-COAD dataset showed higher network signature GSVA scores for the tumor tissues than for the normal samples. This result was expected because we constructed this network from DEGs in the TCGA-COAD dataset. For validation, we applied the GSVA assessment of network signatures in three GEO CRC transcriptome datasets (GSE21510, GSE113513 and GSE32323). In these GEO datasets, KCNQ1OT1 ceRNA network signature GSVA scores were significantly higher for tumor samples than for normal samples, which indicated the disease specificity of this KCNQ1OT1 ceRNA network (Figure 7).

Different Immune Cell Infiltration Levels for High-and Low-GSVA-Score Tumor Samples
The immune cell infiltration levels for 24 immune cell subsets in tumor samples from the TCGA-COAD dataset are shown in Figure 8. Overall, the profiles of immune infiltration varied significantly between the tumor samples with high GSVA scores and those with low scores ( Figure 8A). Also, lower CD4 + /CD8 + T-cell ratios were found in the tumor samples with high network GSVA scores than in the tumor samples with low scores. In the high-score group, along with the increase in the levels of CD8 + T cells in general was observed an increase in those of naïve CD8 + T cells. Notably, in contrast, lower levels of cytotoxic T cells were found for the high-score group than for the low-score group. Also, along with the relatively low levels of CD4 + T cells in the tumor samples with high scores, were relatively low levels of the T helper subsets Th2 and Th17. Other T cell subsets, namely Treg (nTreg and iTreg), Tex, Tem and MAIT cells were also downregulated. However, the B cells were upregulated. Regarding another lymphoid cell line, natural killer cell levels showed relatively low levels for the high-score group. Further, regarding the myeloid cell line, lower levels of monocytes, macrophages and neutrophils were also found for the high-score group.
Patient sex, tumor stage and age were also investigated. However, the network GSVA score was neither associated with sex (p 0.8) nor with age (p 0.77) nor with tumor stage (p 0.78), as shown in Supplementary Figures S6A-C

DISCUSSION
There has been increasing evidence for a link between dysregulation of lncRNAs and cancers (Gutschner and Diederichs, 2012). For instance, studies have shown the lncRNA MALAT1 to be associated with the development and metastasis of cancer cells (Gutschner et al., 2013;Tripathi et al., 2013). HOTAIR to be implicated in cancer metastasis regulation by targeting the chromatin repressor polycomb protein (Gupta et al., 2010), and linc00673 to activate WNT/β-catenin signaling and aggravate lung adenocarcinoma by binding between casein kinase 1ε (CK1ε) and DEAD box RNA helicase DDX3 (Guan et al., 2019).
Recent studies have also demonstrated important roles played by lncRNAs in the tumorigenesis of CRC. Colorectal cancer associated transcript 1 (CCAT1) was reported to be a specific biomarker for CRC (Xiang et al., 2014), and to be expressed at high levels not only in pre-malignant conditions but also throughout the various disease stages of CRC (Ozawa et al., 2017). Further, CCAT2, a lncRNA derived from the human chromosome MYC-335 region, has been shown to enhance metastasis and invasion through the MYC pathway-regulating miRNAs miR-20a29 and miR-17-5p. These two CRC-specific lncRNAs were shown to be transcribed from the 8q24 region where previous studies have found common genetic variants related to the risk of CRC (Yang et al., 2019).
In the current work, we showed that the lncRNA KCNQ1OT1, an antisense lncRNA transcribed from the human chromosome 11p15.5 KCNQ1 locus, is related to the pathogenesis and progression of CRC. It has been shown in previous work to function as a cis-silencer of the imprinted KCNQ1 cluster and to be involved in the metastasis and proliferation of various tumors, such as hepatocellular carcinoma, cholangiocarcinoma, ovarian and breast cancer tumors (Feng et al., 2018;Luo and Jin, 2019).
In our research, there were 213 lncRNAs differentially expressed between tumor and normal tissue. Among these 213 lncRNAs, we identified 22 lncRNAs involved in ceRNA networks. We analyzed the survival association with these 22 lncRNAs. Only KCNQ1OT1 significantly associated with the clinically obtained survival data. We compared 228 patients expressing KCNQ1OT1 at high levels with 228 patients expressing it at low levels. According to our Kaplan-Meier survival analysis, the patients with overexpressed KCNQ1OT1 did not survive on average for as long as did those displaying low KCNQ1OT1 expression. This result was consistent with the latest research (Lin et al., 2021). KCNQ1OT1 was reported to be upregulated in CRC tissue according to a study by Li and others (Li F. et al., 2019). KCNQ1OT1 knockdown in HCT116 and SW480 CRC cells downregulated the expression of Atg4B, which has been shown to cleave LC3 and promote the formation of autophagosomes (Hemelaar et al., 2003). The viability of these cells decreased after being treated with oxaliplatin, which implicated KCNQ1OT1 in inducing autophagy protection and chemo-resistance. Moreover, the relationship between the upregulation of KCNQ1OT1 and poor prognosis of CRC patients also suggested that higher KCNQ1OT1 levels in patients make them resistant to chemotherapy or other anticancer treatments (Li Y. et al., 2019). These results suggested that KCNQ1OT1 might become a promising therapeutic target for use in CRC patients.
As the GSVA method can be used to score a gene set signature and depends neither on the composition nor size of a dataset, we applied this method to measure the optimal network signature across different datasets. Samples with a high network score across the independent datasets were found, based on the results, to be particularly enriched in the CRC tumor group.
The identified subnetwork was reported previously to be significantly more reproducible between different disease cohorts then were individual marker genes (Chuang et al., 2007), and in our study, the results of all these datasets indicated dramatically higher network scores for the tumor samples than for the normal samples (Figure 7). These results suggested that the KCNQ1OT1 ceRNA network might play a role in the mechanism of CRC pathogenesis. This network signature could be a potentially way to distinguish CRC tissue form normal tissue.
To gain more knowledge about the function of the KCNQ1OT1 ceRNA network related to immune infiltration, we used ImmuCellAI to estimate the abundance of immune cells from individual samples. The estimation implied that the overall profiles of immune infiltration differed significantly between the tumor samples with high GSVA scores and those with low scores ( Figure 8A, Supplementary Table S7, S8). Lower CD4 + /CD8 + T-cell ratio s were indicated for the higherscore groups. Both the reduction of the CD4 + T cell population and increase of the CD8 + T cell population contributed to the lower ratio of CD4 + /CD8 + T-cell. The CD4 + /CD8 + T-cell ratio was considered as an immunostimulatory marker in the general population (Gojak et al., 2019). A low CD4 + /CD8 + T-cell ratio has been shown to be an immune risk phenotype related to chronic inflammation, persistent immune dysfunction and immune senescence (Wikby et al., 2005). In some investigations, the CD4 + /CD8 + T-cell ratio was described as a significant marker for prognostic prediction in HIV/AIDS patients (Castilho et al., 2019;Gojak et al., 2019;Li F. et al., 2019). Notably, in our study, the trend found for CD8 + T cells was opposite that found for cytotoxic T cells (cytokine-produced CD8 + T cells) in the network score comparisons. The generation of a functional cytotoxic T-cell response in general depends on the activation of Th1 cells. However, in the current work, the proportions of Th1 cells in the high-and low-GSVA-score groups showed no significant difference. Another possibility was that a recruitment of CD8 + T cells in the tumor tissue accompanied the reduction of cytotoxic T cells, and that this recruitment reflected a suppression of the cytotoxic machinery of the infiltrates, suggesting that the dysfunctional status of the effector cells was due to the microenvironment in the samples with high network scores.
Beside the immune infiltration, the network score differences in sex, tumor stage and age were also investigated. Neither any sex bias nor age association with the network scoring was found according to the statistical analysis (Supplementary Figure S6).
Our findings investigated a network signature for the lncRNA KCNQ1OT1, which was shown in the current work to be a representative of the ceRNA network transcribed in CRC tissues. This ceRNA network could be a potential regulator in colorectal cancer immunity. These findings indicated an oncogenic role for KCNQ1OT1 and its downstream target mRNAs in the pathogenesis and progression of CRC.

CONCLUSION
The results of our study indicated that the KCNQ1OT1 ceRNA network could be involved in the regulation of the CRC tumor microenvironment, providing a rationale for the further exploitation of KCNQ1OT1 as a possible functional contributor to and therapeutic target for CRC.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

AUTHOR CONTRIBUTIONS
JL wrote the manuscript. JD contributed the design of the study. WL and SL organized the references and databases. JD and JL explored the RNAseq data. JL and WL revised manuscript. All authors contributed to write and approve the final manuscript.

ACKNOWLEDGMENTS
Appreciation to National Cancer Institute (NCI), the National Human Genome Research Institute (NHGRI) due to the joint efforts on The Cancer Genome Atlas (TCGA). We would like to express our gratitude to the contributors of GEO transcriptome datasets (GSE21510, GSE113513 and GSE32323). The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS.