A Five-MicroRNA Signature Predicts the Prognosis in Nasopharyngeal Carcinoma

Background There is no effective prognostic signature that could predict the prognosis of nasopharyngeal carcinoma (NPC). Methods We constructed a prognostic signature based on five microRNAs using random forest and Least Absolute Shrinkage And Selection Operator (LASSO) algorithm on the GSE32960 cohort (N = 213). We verified its prognostic value using three independent external validation cohorts (GSE36682, N = 62; GSE70970, N = 246; and TCGA-HNSC, N = 523). Through principal component analysis, receiver operating characteristic curve analysis, and C-index calculation, we confirmed the predictive accuracy of this prognostic signature. Results We calculated the risk score based on the LASSO algorithm and divided the patients into high- and low-risk groups according to the calculated optimal cutoff value. The patients in the high-risk group tended to have a worse prognosis outcome and chemotherapy response. The time-dependent receiver operating characteristic curve showed that the 1-year overall survival rate of the five-microRNA signature had an area under the curve of more than 0.83. A functional annotation analysis of the five-microRNA signature showed that the patients in the high-risk group were usually accompanied by activation of DNA repair and MYC-target pathways, while the patients in the low-risk group had higher immune-related pathway signals. Conclusions We constructed a five-microRNA prognostic signature, which could accurately predict the prognosis of nasopharyngeal carcinoma, and constructed a nomogram that could conveniently predict the overall survival of patients.


INTRODUCTION
Nasopharyngeal cancer (NPC) is a rare tumor, with the highest incidence in southeast Asia, such as southeast China, India, and Thailand (1). The main cause of treatment failure in NPC patients was the occurrence of local metastatic events, in which more than 70% of patients presented local progression (2). An accurate prognosis prediction can guide the selection of treatment options for patients with NPC. Currently, the tumor lymph node metastasis (TNM) system is still a widely recognized system for predicting prognosis (3). However, previous studies have found that, among NPC patients with the same tumor stage, even after receiving similar treatment regimens, the treatment outcomes are quite different (4). The current TNM staging system only focuses on the anatomical features of the tumor and ignores the biological heterogeneity of the tumor so that it cannot accurately predict the prognosis of individual patients (5). Therefore, it is urgent to find a new biomarker that could predict the prognosis independently or in combination with the TNM staging system so as to improve the accuracy of the prediction.
MicroRNA plays a significant role in the development of cancer and its therapeutic response. Recent studies have demonstrated the value of some microRNA in predicting the prognosis of NPC, and some microRNA signatures have been constructed to predict the prognosis of NPC (6)(7)(8). Studies have reported that Epstein-Barr virus (EBV)-related microRNA have potentially important effects on NPC. The in-depth study of microRNA encoded by EBV has potential significance and importance for the discovery of prognostic markers of NPC (9).
At present, Least Absolute Shrinkage and Selection Operator (LASSO) has been widely used to construct the prognostic model (10)(11)(12). Here we developed and validated a prognostic signature based on five microRNAs in four large cohorts (GSE32960, GSE36682, GSE70970, and TCGA-HNSC)-a total of 1,143 patients with NPC. The random forest and LASSO Cox regression algorithm were used to select the characteristics and quantify the Cox regression coefficient, to build a scoring model, which could not only accurately predict the prognosis and progress of NPC. An accurate nomogram was provided for clinicians to predict the prognosis of NPC patients.

Cohort Information and Data Preprocessing
We selected four large microRNA microarray/sequencing cohorts (GSE32960: N = 312, GSE36682: N = 62, GSE70976 (13): N = 246, and TCGA-HNSC: N = 523) from the Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/) and The Cancer Genome Atlas (TCGA; https://cancergenome. nih.gov/) database, all of which contained detailed prognostic follow-up data. The cohort (GSE32960) with the largest number of patients in the GEO database was selected as the training cohort, and the other three cohorts (GSE36682, GSE70976, and TCGA-HNSC) were selected as independent external validation cohorts. The GSE32960 cohort contained 312 non-metastatic NPC tissues and 18 non-cancerous nasopharyngitis tissues. The GSE36682 cohort contained 62 cases of NPC and six cases of normal tissues. The GSE70976 cohort contained 246 cases of NPC and 17 cases of normal tissues. We retained only NPC samples with prognostic follow-up data. The level 3 microRNA sequencing data of head and neck squamous cell carcinomas were downloaded from the UCSC Xena website (https://gdc. xenahubs.net/download/TCGA-HNSC.mirna.tsv.gz) based on the TCGA database. The cohort contained 523 head and neck squamous cell carcinoma samples (including NPC). The data normalization method adopted reads per million mapped reads. All data were log2(x + 1) transformed. The baseline information for all cohorts used in this study is listed in Table 1, and the flow chart of the experiment is shown in Figure 1.

Construct and Validate the Five-MicroRNA Signature
We first used the random forest algorithm to feature select all microRNA and found the microRNA which had the confirmed ability to distinguish the overall survival outcome (dead vs. alive). Next, we performed the LASSO Cox regression analysis using the R package "glmnet" (14) on training cohort GSE32960. The LASSO algorithm can reduce the dimensionality of high-latitude data and explain the characteristics of the data with a model with fewer variables (15). Tenfold cross-validation was used to avoid overfitting the model built with the training cohort. Finally, we constructed a scoring system based on the regression coefficients calculated by the LASSO cox regression analysis. We divided all NPC patients into high-and low-risk groups according to the cutoff values given by R package "survminer". The Kaplan-Meier survival curve was used to explore the prognostic value of the five-microRNA signature, the time-dependent receiver operating characteristic (ROC) curve was used to measure the prediction accuracy, and principal component analysis (PCA) was used to observe the ability of the model to distinguish the overall survival outcome events. The prognostic value of the five microRNAs was validated by three independent external validation cohorts (GSE36682, GSE70970, and TCGA-HNSC).

Functional Annotation of the Five-MicroRNA Signature
To carry out functional annotation with the five-microRNA signature, we downloaded RNA-seq data from the TCGA database. By calculating the correlation coefficient between risk score and all genes, the top 500 genes with the highest correlation were extracted for Gene Ontology, Kyoto Encyclopedia of Genes and Genomes, and gene set enrichment analysis (GSEA). Enrichment analysis and visualization were performed by the R package "ClusterProfiler" (16).

Statistical Analysis
For the statistical test of the two groups of data, if the data characteristics meet the normal distribution, the t-test was used; if the data characteristics do not meet the normal distribution, the non-parametric test (Wilcoxon test) was chosen. The R package "rms" was used to construct a nomogram. The R package "rmda" was used to perform decision curve analysis (DCA) to evaluate the net benefits of the prediction signature (17). P-value < 0.05 was considered statistically significant.

Feature Selection and Prognostic Signature Construction
We used random forest algorithm to select the features of all microRNAs in the training cohort GSE32960. Random forest is an algorithm that can effectively reduce the dimension and noise of high-latitude data (18). After selection, only five microRNAs (has-miR-93, has-let-7e, has-miR29c, has-miR-26a, and has-miR-30e) were selected as candidate features. Subsequently, we used the LASSO Cox regression algorithm to construct a prognostic signature based on the five microRNAs ( Figure 1). The calculation method of the scoring model is as follows: risk score = 1.088 * hsa-miR-93 + 0.673 * hsa-let-7e + (-0.299) * hsa-miR-29c + (-0.395) * hsa-miR-26a + (-0.927) * hsa-miR-30e. The gene symbols of microRNA in the formula represent the transcriptional expression of the microRNA.

The Prognostic Value of the Five-MicroRNA Signature Was Validated in Multiple Cohorts
According to the cutoff value given by R package "survminer"  (Figures 2A-D). In the external validation cohort GSE36682, we found that the five-microRNA signature has a high predictive value for the OS (HR = 2.87, 95% CI: 1.24-6.63, p = 0.0299) of NPC (Supplementary Figure S1A).
In the external validation cohort GSE70970, we found that the five-microRNA signature has a high predictive value for OS (HR = 2.32, 95%CI:   Figures S2A, B).
The time-dependent ROC curves in the GSE32960, GSE36682, GSE70970, and TCGA-HNSC cohorts showed that the five-microRNA signature had a high prediction accuracy. In particular, the area under the curve of prediction of 1-year OS was greater than 0.83 in three cohorts (GSE32960: 0.856, Figure 2E; GSE70970: 0.838, Figure 3F; and GSE36682: 0.898, Figure S1B). The PCA showed that the dominant features of the five microRNAs could significantly distinguish the prognostic outcome of OS ( Figure 2F). The heat map of the expression of the five microRNAs indicated that, with the increase of risk score, each microRNA has a significant trend of change, and the higher the risk score is, the more survival outcome events will occur ( Figures 2G, 3G, Supplementary Figures S1C, and S2).
We calculated the C-index of the prognostic model in all the cohorts and found that the C-index in all of the three cohorts of GEO was greater than 0.77, indicating the high consistency of the prediction model (Supplementary Table S3).

The Nomogram Was Constructed by Cox Regression Analysis
We used univariate and multivariate Cox regression analysis to analyze some prognostic factors, including risk score, in the   Figures 4A, B). Based on this result, we constructed a nomogram that conveniently predicted the OS of each NPC patient for 1, 3, and 5 years ( Figure 4C). We found through DCA and clinical impact curve that the risk score combined with N stage could improve the clinical net benefits (Figures 4D-F).

Subgroup Survival Analysis and Functional Annotation of the Five-MicroRNA Signature
We found that, in NPC patients with different tumor stages (III/ IV vs. I/II), T stages (III/IV vs. I/II), gender (male vs, female), age (>=65 vs. <65), whether or not to undergo radiotherapy and whether or not to undergo chemotherapy, the five-microRNA signature showed a remarkable ability to predict OS ( Figure 5).
In the TCGA head and neck squamous cell carcinoma cohort, a subgroup analysis based on anatomical site subdivision showed a little difference in the expression of the risk score in each site (Supplementary Figure S3).
To understand the signaling pathways and biological functions associated with the five-microRNA signature, we conducted an enrichment analysis of the 500 genes most associated with the risk score, and the results indicated that the five-microRNA signature was associated with T cell activation, B cell activation, regulation of T cell activation, regulation of lymphocyte activation, immune response, and other biological functions ( Figure 6D and Supplementary Table S1) as well as associated with MAPK, ether lipid metabolism, arachidonic acid metabolism, and ABC transporter, and other signaling pathways ( Figure 6D and Supplementary Table S2). The GSEA analysis results indicated that patients with a high-risk score were usually activated by DNA repair, MYC targets, E2F targets, mTORC1, oxidative phosphorylation, and other signaling pathways ( Figure 6E, Table 2). The NPC patients in the low-risk group were usually accompanied by a higher activation of INF-g, INF-a, KRAS, inflammatory response, apoptosis, and other related signaling pathways ( Figure 6F and Table 2). The risk score was significantly correlated with the signal values of these signaling pathways ( Figure 6C).
Next, the R package "miRNAtap" was used for functional enrichment analysis of each microRNA separately (has-miR-93, has-let-7e, has-miR29c, has-miR-26a, and has-miR-30e). It was found that each microRNA participated in different major biological functions, such as hsa-let-7e-which was mainly involved in cell cycle-related signaling pathways (cell cycle process, DNA replication, regulation of cell cycle process, cell cycle phase transition, etc.), hsa-miR-29cwhich was mainly involved in the regulation of gene expression-related signaling pathways, and hsa-miR-30ewhich was mainly involved in the regulation of metabolicrelated biological events (carboxylic acid metabolic process, coenzyme metabolic process, ribonucleotide metabolic process, etc.; Supplementary Table S4).

Five-MicroRNA Signature Is Associated With Chemotherapy Response and Immune Checkpoint
An analysis of the clinical efficacy of the TCGA-HNSC cohort e x h i b i t e d t h a t , a m o n g N P C p a t i e n t s u n d e r g o i n g chemotherapy, the risk score of nonresponse patients was significantly higher than that of patients with response (p = 0.0021), indicating that the five-microRNA signature is a potential prediction target of chemotherapy ( Figure 6A). At the same time, we found that the risk score had no predictive value for the radiotherapy response (p = 0.7472, Figure 6B). By calculating correlations between risk scores and major immune checkpoints, we found that the risk scores were significantly negatively correlated with PD-L1 and PD-1, but not with TMB (Supplementary Figure S4). Moreover, the risk score was also correlated with tumor stage and T stage, while N stage was not (Supplementary Figure S4).

DISCUSSION
NPC is a kind of malignant head and neck cancer (20,21). At present, the development of comprehensive treatment methods of radiotherapy and chemotherapy has significantly improved the prognosis and outcome of NPC patients. However, NPC is still considered a malignant and invasive tumor due to its high aggressiveness, delayed diagnosis, and relatively poor prognosis (22). MicroRNAs are endogenous non-coding RNA families of approximately 22 nucleotides in length (23). miRNAs are believed to have important effects on the development, progression, and response to treatment of tumors (24). Currently, researchers have found that microRNA plays a crucial role in the development of NPC. In recent years, studies on the pathogenesis and development of microRNA in NPC and its prognosis have been developing rapidly (9,(25)(26)(27)(28). A metaanalysis identified 65 microRNAs as potential prognostic markers for NPC, providing new targets for future studies (29).
Currently, there are some prognostic signatures of microRNA in NPC (6,7,30,31). Two miRNAs in the five-miRNA signature, has-miR29c and has-miR-30e, have been studied in nasopharyngeal carcinoma (32)(33)(34). It is possible that these two miRNAs have a particularly significant predictive role in the prognosis of nasopharyngeal carcinoma, so they are also used in other signatures. However, most of these studies ignore the analysis of subgroups and cannot prove that the prognostic signature still has a predictive value in certain subgroups of NPC patients. Our study conducted a subgroup analysis on the prognostic signature of five microRNAs and found that the five-microRNA signature showed a good prognostic value in each subgroup (different stages, grades, whether chemotherapy or radiotherapy, etc.), indicating that the five-microRNA signature could accurately predict the prognosis of various groups of NPC patients. At the same time, our study added a large number of cohorts to verify the results so as to avoid the overfitting effect and ensure that the model can exert its efficiency stably.
Resistance to chemotherapy is currently a major concern in the treatment of NPC. Recently, many studies have found that microRNAs are key molecules in the production of resistance to chemotherapy (35,36). Our results indicated that the risk score based on the five-microRNA signature was negatively correlated with the efficacy of chemotherapy. We could predict the chemotherapy sensitivity of NPC patients according to the risk score so as to propose specific treatment plans for each NPC patient. However, due to the small number of cohorts with documented chemotherapy responses, our results were not validated in more cohorts. We will collect our chemotherapy response data in the future to verify our conclusions.
Through the analysis of the function annotation, we found that the higher risk score tends to the activation of cell cyclerelated signal pathways, and the low-risk group of patients with significant immune response related to activating signaling pathways; it also provides a new way of thinking for the treatment of NPC patients, namely, the high-risk group of patients can try drugs with a targeted cell cycle, and the low patriarch of patients can try immune checkpoint inhibitors. This part of the conjecture needs further verification, which is our future research direction.
In addition, according to the univariate and multivariate Cox regression analyses, we found that the risk score and N stage could be independent of other prognostic factors to predict the prognosis of patients with NPC. On this basis, we constructed a nomogram, which could be convenient to predict the 1, 3, and 5year overall survival in patients with NPC. The DCA analysis found that the joint risk score and N stage could have a higher clinical net benefit.
In summary, through four big miRNA data cohorts, with a total of more than 1,143 NPC patients, we develop and validate a five-microRNA prognostic signature. It could accurately predict the prognosis of patients with NPC and progress by constructing a nomogram, combined with risk score and N stage, to accurately predict the prognosis of patients with NPC. Besides this, based on the risk score, a potential layered treatment scheme for NPC was proposed.

AUTHOR CONTRIBUTIONS
SW and SL conceived and designed the study. SH involved in most of the revision of the manuscript. CZ and SW performed the analysis procedures. SW, SH, CZ, JX, and SL analyzed the results. SW, SH, JX, and SL contributed the analysis tools. SW, SH, CZ, JX, and SL contributed to the writing of the manuscript.
All authors contributed to the article and approved the submitted version.

FUNDING
This work was supported by the National Natural Science Foundation of China (81902781).

ACKNOWLEDGMENTS
Thanks to the open resources of the TCGA and GEO databases. These have provided a valuable resource for our analysis.