Development of a Novel Six-miRNA-Based Model to Predict Overall Survival Among Colon Adenocarcinoma Patients

Introduction: Colon carcinoma is a common malignant tumor worldwide. Accurately predicting prognosis of colon adenocarcinoma (CA) patients may facilitate clinical individual decision-making. Many studies have reported that microRNAs (miRNAs) were associated with prognosis for patients with colon carcinoma. This study aimed to identify the prognosis-related miRNAs for predicting the overall survival (OS) of CA patients. Methods: Firstly, we analyzed the CA datasets from the Cancer Genome Atlas (TCGA), and looked for the prognosis-related miRNAs. Then, we developed a novel prediction model based on these miRNAs and the clinical characteristics. Time-dependent receiver operating characteristics (ROC) curves and calibration plots were used to evaluate the discrimination and accuracy of the signature and model. Finally, cell function assays and bioinformatics analyses were performed to evaluate the role of these selected miRNAs in modulating biological process in CA. Results: Six prognosis-related miRNAs were included in the miRNA-based signature, and it could effectively distinguish low-risk patients and high-risk patients. Furthermore, we established a prognostic model incorporating the six-miRNA-based signature and clinical characteristics. Areas under curves (AUCs) indicated that the six-miRNA-based model has a better predictive ability than TNM stage (AUC: 0.805 vs. 0.694). The calibration plots suggested close agreement between model predictions and actual observations. GO analysis showed that the target genes of these miRNAs are mainly involved in enrichment in protein binding and regulation of transcript and cytosol. KEGG pathway enrichment analysis indicated that these genes were mainly enriched in PI3K-Akt signaling pathway. Finally, we found that the five miRNAs except miR-152 were upregulated in tumor tissues and CA cells. The functional experiments revealed that miR-1245a, miR-3682, miR-33b, and miR-5683 promoted the migratory abilities and proliferation of CA cell, whereas miR-152 showed opposite effects. However, miR-4444-2 did not influence the migratory ability and proliferation of CA cell. Conclusions: In conclusion, we developed a novel six-miRNA-based model to predict 5-year survival probabilities for CA patients. This model has the potential to facilitate individualized treatment decisions.

Introduction: Colon carcinoma is a common malignant tumor worldwide. Accurately predicting prognosis of colon adenocarcinoma (CA) patients may facilitate clinical individual decision-making. Many studies have reported that microRNAs (miRNAs) were associated with prognosis for patients with colon carcinoma. This study aimed to identify the prognosis-related miRNAs for predicting the overall survival (OS) of CA patients.
Methods: Firstly, we analyzed the CA datasets from the Cancer Genome Atlas (TCGA), and looked for the prognosis-related miRNAs. Then, we developed a novel prediction model based on these miRNAs and the clinical characteristics. Time-dependent receiver operating characteristics (ROC) curves and calibration plots were used to evaluate the discrimination and accuracy of the signature and model. Finally, cell function assays and bioinformatics analyses were performed to evaluate the role of these selected miRNAs in modulating biological process in CA.
Results: Six prognosis-related miRNAs were included in the miRNA-based signature, and it could effectively distinguish low-risk patients and high-risk patients. Furthermore, we established a prognostic model incorporating the six-miRNA-based signature and clinical characteristics. Areas under curves (AUCs) indicated that the six-miRNA-based model has a better predictive ability than TNM stage (AUC: 0.805 vs. 0.694). The calibration plots suggested close agreement between model predictions and actual observations. GO analysis showed that the target genes of these miRNAs are mainly involved in enrichment in protein binding and regulation of transcript and cytosol. KEGG pathway enrichment analysis indicated that these genes were mainly enriched in PI3K-Akt signaling pathway. Finally, we found that the five miRNAs except miR-152 were upregulated in tumor tissues and CA cells. The functional experiments revealed that miR-1245a, miR-3682, miR-33b, and miR-5683 promoted the migratory abilities and proliferation of CA cell, whereas miR-152 showed opposite effects. However, miR-4444-2 did not influence the migratory ability and proliferation of CA cell.

INTRODUCTION
Colon carcinoma is a common malignant tumor worldwide, which is the fourth leading cause of cancer-related deaths (1). Among all histological types, colon adenocarcinoma (CA) is the most common one with a poor prognosis (2). Although the therapeutic strategies for CA have been greatly improved, the 5-year overall survival (OS) rate remained poor. The biomarkers for CA, such as CA199 and CEA, lack good specificity and sensitivity for recognizing all primary and recurrent lesions for all patients. In addition, TNM stage, as a common tool for the prognostic assessment mainly on the basis of clinical characteristics, could not display the biological heterogeneity of CA. In clinical practice, accurately predicting OS for CA patients may facilitate clinical individual decisionmaking. Therefore, accurate prediction tools integrating the clinical features and pathological and molecular findings are much desired.
MicroRNA (miRNA), a small non-coding RNA with 18-25 nucleotides, can regulate gene expression in various carcinomas (3). According to current reports, several of these miRNAs are important to CA patients because of their usefulness in making diagnoses, evaluation of treatment responses, as well as the prediction of prognosis (4-7). However, different miRNA platforms were explored in these previous studies with limited patient samples, suggesting that they lack normalized standard of miRNAs. A series of novel prediction models based on miRNAs were developed in several cancers, including breast cancer, esophageal squamous cell cancer, and acute myeloid leukemia (8)(9)(10). These models indicated that miRNAs play important and valid roles in predicting the prognosis of cancer.
In this study, we mined the CA datasets from the Cancer Genome Atlas (TCGA) and screened the prognosis-related miRNAs. Then, we developed and validated a novel prediction model based on these prognosis-related miRNAs and clinical characteristics. Finally, bioinformatics analyses and cell function assays were performed to evaluate the role of these selected miRNAs in modulating biological process in CA.

Patients and Design
In the present study, we acquired the counts of CA miRNA expression profiles from TCGA data portal in May 2019, and available miRNAs were 1,881. The eligibility criteria used to screen the included patients were as follows: (1) patients with certain TNM stage; (2) histologically confirmed CA; and (3) OS time was more than 3 months. Finally, CA patients with clinical characteristics including age, sex, T stage, N stage, and TNM stage were analyzed in the present study and defined as training cohort. Additionally, half of the patients were assigned as the validation cohort based on a computer-generated allocation sequence randomly.

Risk Score Formula Generation and Development of miRNA-Based Prognostic Model
We normalized the miRNA expression profiles through R/Bioconductor package of edger. All miRNAs with FDR < 0.05 and |log2FC| ≥ 2 were defined as differently expressed miRNAs (DEMs). We used Cox proportional hazards regression analysis to screen for miRNAs. Significant miRNAs were included in the multivariate Cox proportional hazards regression model. The coefficients of significant miRNAs from multivariate Cox analysis was used to build the risk score formula: Risk score = sum of coefficients × miRNA expression level.
Next, the CA patients were classified into low-risk group and high-risk-group using the ROC curve with the optimal cutoff risk score. Furthermore, we developed a prognostic nomogram that incorporated clinical characteristics and miRNA-based risk score with the COX regression model.

Evaluation of miRNA-Based Signature and Novel Prediction Model
To test whether miRNA-based signature was related to OS independent of TNM stage, stratified analysis was performed. Additionally, we used time-dependent receiver operating characteristic (ROC) curves and calibration plots to assess the prognostic models' discriminative ability and accuracy, respectively. Calibration plots were used to measure the agreement between the actual and predicted probabilities. The predictive ability of miRNA-based prognostic model was compared with six-miRNA-based signature and TNM stage using ROC curves.

Enrichment Analysis of miRNAs' Target Genes
Candidate target genes of prognostic miRNAs were obtained from miRDB, TargetScan, and miTarBase. Overlapping genes in the three online resources are selected as miRNAs' target genes. The interaction network of interactions between miRNAs and target genes was visualized using Cytoscape (11). At last, GO (Gene Ontology) and KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway enrichment analyses were carried out using DAVID 6.8 (Database for Annotation, Visualization and Integrated Discovery).

Total RNA Extraction and Quantitative Real-Time PCR
We extracted RNA sample with RNAiso Plus (TaKaRa, Japan) according to the manufacturer's instructions, and the measure of concentration and purity was completed by NanoDrop 2000 (Thermo Scientific, Wilmington, DE, USA). All cDNA was generated using PrimeScript RT reagent Kit (TaKaRa, Japan). Quantitative real-time PCR (qRT-PCR) for miRNAs was performed on a LightCycler R 96 System (Roche, Switzerland).

Cell Culture
CA cell lines (DLD-1 and SW480) and normal colon mucosal epithelial cell line (NCM460) were cultured in DMEM medium with 10% fetal bovine plasma at 37 • C in a humidified atmosphere with 5% CO 2 , and they were transfected with miRNA inhibitors for 48 h before cell function experiments.

Transwell Migration Assays
The transwell assays were used to evaluate the migration of DLD-1 and SW480 cells using transwell chambers. In test, 2 × 10 5 cells in serum-free medium were added into the upper chambers. DMEM medium with 10% fetal bovine serum was added to the lower chambers. After incubation for 24 h, the CA cells migrated into the lower chambers were fixed in 4% paraformaldehyde and strained with a crystal violet staining solution. Random fields were digitally imaged and counted.

Colony Formation Assays
We used the colony-forming assays to evaluate clonogenic ability of transfected DLD-1 and SW480 cells. Cells were seeded into 6-well plates (1,000/well) and incubated for about 15 days. The visible colonies were counted after staining with crystal violet.

Statistics
Descriptive analyses of baseline clinicopathological features were conducted. Continuous variables were reported using the mean and standard deviation (SD), and categorical variables were reported as percentages. In developing the nomogram for predicting the 5-year OS probability, we used univariate and multivariate Cox proportional hazards regression model to screen for predictors in the training cohort. OS was defined as the interval from surgery to the date of death. Then, the nomogram was developed based on the significant predictors. The nomogram's discriminative ability and accuracy were assessed by time-dependent receiver operating curve (ROC) analysis and calibration plots, respectively. Calibration plots were used to measure the agreement between the actual and predicted probabilities. P < 0.05 was considered statistically significant. We draw the volcano plot using the "ggplot2" package of R software. Data analyses were performed using Stata version 13.1 (StataCorp, College Station, TX), and the nomogram was developed using R (version 3.2.4; R Foundation for Statistical Computing, Vienna, Austria).

RESULTS
By the eligibility criteria, 321 CA patients from TCGA were included in training cohort. For further analyses, 161 patients were divided randomly into the validation cohort. In Table 1, there were the baseline features of patients in the training cohort and the validation cohort. The 5-year OS rate of the total patients was 74.3%.

Screen the Prognostic miRNAs Related to OS
We obtained miRNA sequencing data of samples from TCGA database, including CA tissue samples and normal tissue samples. Next, through the R/Bioconductor package of edgeR, 190 DEMs with false discovery rate (FDR) < 0.05 and |log2fold change (log2FC)| ≥ 2 were suggested to be significant for further analyses. The volcano plot of all miRNAs was presented using the "ggplot2" package of R software (Figure 1). To screen the prognostic miRNAs, 190 DEMs were performed in univariate COX analysis. Then, 42 miRNAs that were significant in the univariate COX analysis were entered into multivariate COX analysis. As a result, six DEMs (hsa-miR-1245a, hsa-miR-3682, hsa-miR-4444-2, hsa-miR-5683, hsa-miR-33b, and hsa-miR-152) were confirmed as independent prognosis factors of CA patients in the training cohort ( Table 2).

Construction of Risk Score Signature and Six-miRNA-Based Prognostic Model
The coefficients of significant miRNAs from multivariate Cox analysis were used to build the risk score formula: Risk score = sum of coefficients × miRNAs expression level. Therefore, the risk score signature was calculated using the formula below:  calculated. Then, we used ROC curve to determine the optimal cutoff risk score for stratifying the patients into risk groups. When the Youden index was highest, the risk score is 1.58 with a sensitivity and specificity of 56.25% and 80.22%, respectively. Therefore, the patients were divided into a high-risk group (n = 81) and a low-risk group (n = 240) in the training cohort.
With the cutoff value, the patients in the validation cohort were also divided into a high-risk group (n = 35) and a low-risk group (n = 126). The predictive value of six-miRNA-based signature in OS was detectable through Kaplan-Meier curve of two cohorts as shown in Figure 2. Patients with high risk have a poorer survival than the low-risk group in training cohort (P < 0.001) and validation cohort (P = 0.0157  Table 3). As a result, a novel six-miRNA-based prognostic model to predict the 5year OS rate was developed based on the above three variables (Figure 3). In the nomogram, it showed that six-miRNA-based signature and TNM stage were the largest contribution to 5-year OS of CA patients. To use this nomogram, we use the "point" scale to estimate the points for each variable by drawing a vertical line. Then, the "Total points" scale was used to estimate the corresponding 5-year OS of this patient.

Assessment of the Six-miRNA-Based Signature and Novel Six-miRNA-Based Prognostic Model
Risk stratification in patients was performed to assess whether the six-miRNA-based signature could predict OS regardless of TNM stage. The patients with high risk have significantly poorer survival than patients with low risk in TNM stage II (P = 0.0027) and TNM stage III (P = 0.0130) (Figure 4). Then, to evaluate the discrimination of the six-miRNA-based signature and prognostic nomogram, time-dependent receiver operating  characteristic (ROC) curves were used to compare the respective area under the curves (AUC). As a result, AUCs of six-miRNAbased signature were 0.724 and 0.716 in the training cohort and the validation cohort, respectively (Figures 5A,B). Furthermore, we found that there was no significant difference between the T stage, N stage, and risk groups ( Table 4). In addition, the AUCs of six-miRNA-based prognostic model were 0.805 and 0.763 in the training cohort and the validation cohort, respectively (Figures 5C,D). Furthermore, the calibration plots showed good agreement between the predicted and actual OS rates in two cohorts when using the new model (Figure 6). These results indicated that the novel six-miRNA-based prognostic model had a favorable discrimination and accuracy prediction for CA patients. Importantly, the AUCs indicated that the six-miRNAbased prognostic model (AUC: 0.805) had better discrimination performance for CA than six-miRNA-based signature (AUC: 0.724) and TNM stage (AUC: 0.694) (Figure 7).

MicroRNA-Target Genes Co-expression Network
Potential connections between microRNAs and target genes were explored by using Cytoscape. As shown in Figure 8, hsa-miR-152 and hsa-miR-33b were the two largest nodes in the network.

GO and KEGG Pathway Analyses of Predicted Target Genes
Using miRDB, TargetScan, and miRTarBase, 164 target genes of these six miRNAs were predicted. The GO molecular function (MF) enrichment analysis showed that these genes   are dominated by functions of protein binding, poly(A) RNA binding, and enzyme binding ( Figure 9A). The GO biological process (BP) enrichment indicated that these genes are mainly involved in regulation of transcript, regulation of apoptotic process, and regulation of gene expression (Figure 9B). The enriched GO cell complement (CC) term of these genes included cytosol, nucleus, and nucleoplasm ( Figure 9C). KEGG pathway enrichment analysis indicated that these genes were mainly enriched in PI3K-Akt signaling pathway and FoxO signaling pathway ( Figure 9D).

miRNAs in Risk Score Modulate Biological Processes in CA Cell Line
First, we evaluated the expressions of the six miRNAs in 20 pairs in tissues and found that miR-1245a, miR-3682, miR-444-2, miR-5683, and miR-3b were higher in tumor tissues   than normal tissues, while miR-152 was low in tumor tissues ( Figure 10A). Additionally, in DLD-1 and SW480 cells, miR-1245a, miR-3682, miR-444-2, miR-5683, and miR-3b except miR-152 were significantly high in two CA cell lines normalized to NCM-460 cells (Figure 10B). The six miRNAs were evaluated by cell function assessment in CA cell lines (DLD-1 and SW480). After transfection with the miRNA inhibitors, transwell assays and colony assays were performed. Transwell assays showed that knockdown of miR-1245a, miR-3682, miR-33b, and miR-5683 inhibited the migratory abilities of DLD-1 cells and SW480 cells, whereas knockdown of miR-152 had opposite effects. However, the results showed that miR-4444-2 did not influence the cell's migratory ability (Figures 11A,C). In colony assays, we found that, compared to the negative control, down-regulation of miR-1245a, miR-3682, miR-33b, and miR-5683 inhibited the CA cell proliferation, while the colony was promoted by the knockdown of miR-152. Interestingly, miR-4444-2 still had no effects in cell proliferation (Figures 11B,D).

DISCUSSION
For the therapy of colon carcinoma, there were few reliable and effective prognostic biomarkers, let alone accurate model for predicting clinical outcome. The prediction of OS in these patients may facilitate individualization of the clinical decisionmaking process, such as the precise selection of patient pollution for treatment of neo-adjuvant chemotherapy. In this study, we aimed to explore the biological value of miRNAs in CA and tried to develop a prediction model using miRNA expression. In brief, we screened 190 DEMs through comparing tumor tissue with normal tissue in TCGA dataset. Finally, six out of them (miR-1245a, miR-3682, miR-4444-2, miR-5683, miR-33b, and miR-152) were screened as prognosis-related miRNAs and were used to develop a risk score signature. Multivariate COX analysis indicated that this risk score signature was an independent prognosis factor. Then, we built a novel model based on six-miRNA-based signature and other clinical characteristics. Currently, TNM staging system could provide prognostic information and treatment option. The present study indicated that TNM staging system was a strong independent prognostic factor in the multivariable Cox analysis, which was consistent with previous studies (12). Furthermore, in stratification analysis, the prognostic value of six-miRNA-based signature was revealed to be independent of TNM stage for CA. In addition, TNM staging system cannot provide individualized estimations. Therefore, the six-miRNA-based prognostics model was designed as an easy-to-use nomogram for patients and physicians. Compared to the TNM staging system, our new model has better discrimination and accuracy, suggesting that this model might serve as a potential predictive tool for CA patients. With the more accurate prediction of OS, physicians may be able to provide more accurate recommendations to CA patients.
In the previous studies, although several prognostic models have been developed (13-16), a few models associated with miRNAs were reported. Among these models, a previous study has reported a prognostic miRNA model for predicting OS for CA (16). However, compared with this model, our model has several advantages. Firstly, we selected candidate miRNAs in the DEMs that were analyzed by the LIMMA analysis (FDR < 0.05 and |log2FC| ≥ 2) and then six of these miRNAs were screened from the univariate and multivariate COX analysis, rather than screening directly the candidate variables from a total of 1,881 miRNAs though statistics analysis. Secondly, our study used AUCs and calibration plots to evaluate the discrimination FIGURE 10 | miR-1245a, miR-3682, miR-444-2, miR-5683, and miR-3b were higher in tumor tissues than normal tissues, while miR-152 was low in tumor tissues (A). miR-1245a, miR-3682, miR-444-2, miR-5683, and miR-3b, except miR-152 were significantly high in DLD-1 and SW480 cells normalized to NCM-460 cell (B). *P < 0.05, **P < 0.01, ***P < 0.001 (Student' t-test). and accuracy of the novel model in the training cohort and the validation cohort. Briefly, in our study, the results indicated that our new model has a good predictive ability in two cohorts. Although the patients in the validation cohort came from primary patients, they were assigned randomly as the validation cohort based on a computer-generated allocation sequence as previously described (10), suggesting that the validation cohort was valid in this study. Thirdly, we further used experiment to explore the role of these six miRNAs in the biological processes in CA.
Among these six prognosis-related miRNAs, miR-1245a, miR-33B, and miR-152 were reported in previous studies for multiple cancers. For miR-1245, Yang et al. reported that it could promote proliferation and invasion of lung cancer cells though targeting BRCA2 (17). In addition, several studies have shown that miR-152 could suppress various cancer types (18)(19)(20). For example, Feng et al. suggested that miR-152 could inhibit cell growth in prostate cancer progression (21). Interestingly, several studies found that miR-33b was a tumor suppressor (22,23). However, miR-33b showed its tumorigenesis and was associated with poor survival in our studies. To date, there were no studies focused on miR-33b in CA. Therefore, further studies were needed to explore the biological process of miR-33b in CA. The other three miRNAs have not been reported in previous studies. Our cell assays indicated that both miR-3682 and miR-5683 could promote proliferation and migration of CA cells, implying that these two prognosis-related miRNAs may employ their biological processes in CA. Moreover, our study showed that miR-4444-2 has no effect on CA cell growth. However, STC2, its predicted target gene, has been revealed in published studies about its tumorigenesis in colorectal carcinoma (24)(25)(26), which implied that miR-4444-2 may regulate STC2 though a certain mechanism in CA. Although the precancerous conditions of colon carcinoma, such as ulcerative colitis and Crohn's disease, are the risk factors of colon carcinoma, there were no studies reporting that these six miRNAs correlate with precancerous conditions (27,28).
Several limitations exist in our studies. Firstly, like the previous studies, detail systematic therapy data, such as surgery, chemotherapy, and radiotherapy, were not accessible, and whether incorporating these factors into our model would improve its performance is unknown. Secondly, the sample size in this study was medium. Therefore, larger multicenter validation should be performed to verify this novel model before application in routine clinical practice.

CONCLUSIONS
In conclusion, we developed a novel model based on six-miRNA score signature and clinical features to predict 5-year survival probabilities for CA patients. This model had higher prognostic value than TNM stage in CA patients. Therefore, this model has the potential to facilitate individualized treatment decisions for CA patients.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study, these can be found in The Cancer Genome Atlas (https://portal.gdc.cancer. gov/).