A Novel TCGA-Validated, MiRNA-Based Signature for Prediction of Breast Cancer Prognosis and Survival

Breast cancer (BC) is the most common cancer affecting women and the leading cause of cancer-related deaths worldwide. Compelling evidence indicates that microRNAs (miRNAs) are inextricably involved in the development of cancer. Here, we constructed a novel model, based on miRNA-seq and clinical data downloaded from The Cancer Genome Atlas (TCGA). Data from a total of 962 patients were included in this study, and the relationships among their clinicopathological features, survival, and miRNA-seq expression levels were analyzed. Hsa-miR-186 and hsa-miR-361 were identified as internal reference miRNAs and used to normalize miRNA expression data. A five-miRNA signature, constructed using univariate and multivariate Cox regression, was significantly associated with disease-specific survival (DSS) of patients with BC. Kaplan–Meier (KM) and receiver operating characteristic (ROC) analyses were conducted to confirm the clinical significance of the five-miRNA signature. Finally, a nomogram was constructed based on the five-miRNA signature to evaluate its clinical value. Cox regression analysis revealed that a five-miRNA signature was significantly associated with DSS of patients with BC. KM analysis demonstrated that the signature could efficiently distinguish high- and low-risk patients. Moreover, ROC analysis showed that the five-miRNA signature exhibited high sensitivity and specificity in predicting the prognosis of patients with BC. Patients in the high-risk subgroup who received adjuvant chemotherapy had a significantly lower incidence of mortality than those who did not. A nomogram constructed based on the five-miRNA signature was effective in predicting 5-year DSS. This study presents a novel five-miRNA signature as a reliable prognostic tool to predict DSS and provide theoretical reference significance for individualized clinical decisions for patients with BC.


INTRODUCTION
Globally, breast cancer (BC) is the leading cause of cancer mortality among women. It is estimated that 2.1 million new BC cases were diagnosed worldwide in 2018, accounting for almost one in four cancer cases (Bray et al., 2018). BC is clinically categorized into three types based on tumor hormonal status: luminal-like, human epidermal growth factor receptor 2 (HER2)positive, and triple-negative BC. Luminal-like and HER2-positive BC are driven by the hormone receptors and HER2, while triplenegative BC does not express receptors for either hormones or human epidermal growth factor. The classification of BC by hormonal status has important therapeutic implications, as luminal-like cases may be treated using hormonal therapy, which is ineffective against HER2-positive and triple-negative breast cancer (Slamon et al., 1987;Fisher et al., 1989;Perou, 2011). Although multiple prognostic tools, including Oncotype Dx, MammaPrint, Theros, MapQuant Dx, and PAM50, have been developed for routine prognostic application in luminal-like BC, none is capable of predicting prognosis in all types of BC (Krop et al., 2017). While BC hormonal receptor status, HER2 status, tumor size, and the extent of lymph node metastases are independent prognostic indicators, these factors alone are inadequate guides for personalized BC treatment.
MicroRNAs (miRNAs) are small non-coding RNA molecules that regulate gene expression (Liu et al., 2014). Multiple studies have shown that miRNA function has a critical role in tumorigenesis, progression, and therapy (Rupaimoole and Slack, 2017). For instance, high expression levels of hsa-miR-21 are associated with specific BC clinical features, including advanced tumor stage, lymph node metastasis, and shortened patient survival (Yan et al., 2008). In addition, hsa-miR-210 and hsa-miR-221 have been linked with BC invasion and poorer prognosis of patients with BC (Volinia et al., 2012). Other miRNAs associated with prognosis in BC include hsa-miR-155, hsa-miR-206, hsa-miR-133b, hsa-miR-10b, let-7b, hsa-miR-30a, and hsa-miR-505 Cheng et al., 2012;Ma et al., 2014;Parrella et al., 2014;Quan et al., 2018;Wang et al., 2019;Yao et al., 2019). However, many previous studies have been limited by small numbers of participants or a lack of extensive study scope, which has limited the translation of their findings into clinical application; for instance, the research results of the hsa-miR-30a only apply to triple-negative breast cancer (Cheng et al., 2012), and the study of the hsa-miR-10b has scarce representation in the population of cases classified as low risk by the Nottingham Prognostic Index (Parrella et al., 2014).
Recent research shows that multiple-miRNA signature (model) could predict the prognosis of some cancer; for instance, four-miRNA classifier is a reliable prognostic prediction tool for overall survival (OS) in lymph node-positive locoregional esophageal squamous cell carcinoma (Wen et al., 2021), a six-miRNA-based classifier is a reliable prognostic and predictive tool for disease recurrence in patients with stage II colon cancer (Zhang et al., 2013), and a robust six-miRNA prognostic signature (Zhao and Cui, 2020) and a novel seven-miRNA prognostic model are reliable to predict OS (Lu et al., 2019) for head and neck squamous cell carcinoma. It is worth mentioning that there were three multi-miRNA prognostic models for breast cancer. Lai et al. (2019) had constructed a novel six-microRNAbased model to predict OS, Tang et al. (2019) constructed a 17-miRNA-based model for OS and a 13-miRNA-based model for recurrence-free survival, and Lu et al. (2020) constructed a three-miRNA-based model for OS. However, all of the three multi-miRNA prognostic models for breast cancer lack a comprehensive analysis and do not identify the risk subgroup patients to tailor adjuvant chemotherapy.
The Cancer Genome Atlas (TCGA) 1 project has applied extensive genomic sequencing and bioinformatics analyses to catalog human cancer-causing mutations in large cohorts, and the resulting datasets are publicly available (Tomczak et al., 2015). In this study, we took advantage of miRNA-seq and clinical datasets from TCGA for use in constructing a novel prognosis prediction model for disease-specific survival (DSS) in patients with BC.

MiRNA-Seq and Clinical Datasets
Level 3 miRNA-seq data and accompanying clinical datasets were obtained through TCGA Genomic Data Commons portal (GDC, see text footnote 1). Samples from 962 female patients with BC cataloged by TCGA, and for which clinical features and miRNAseq data were available, were selected for further analyses. The BC dataset from TCGA used was last updated on June 13, 2018. The results reported here are completely based on data generated by TCGA Research Network.

Expression Data Normalization
Perl Critic software was used to merge miRNA-seq files and the Normalize Quantiles (edgeR, R package) was used to normalize miRNA expression levels. A total of 1,601 miRNAs were filtered out. Reference miRNA candidates were selected if they met all of the following inclusion criteria (Zhan et al., 2014): (1) Mean (normal) > 100 and mean (tumor) > 100 (2) Standard deviation (normal) Mean (normal) < 0.5 and Standard deviation (tumor) Ultimately, hsa-miR-186 and hsa-miR-361 were chosen as internal reference molecules ( Table 1) and used to normalize miRNA expression levels, based on the following formula: The logarithm of the data is convenient for statistical analysis (Olivier et al., 2008;Sorrentino, 2010), which can reduce the absolute value of the data and facilitate calculation. Logarithm will not change the nature and correlation of the data but only  for control (x-axis) and experimental (y-axis) groups. Red and green points correspond to twofold-change up/down, respectively (P < 0.05). (B) The pheatmap R package was used to plot expression levels of the differentially expressed miRNAs in breast cancer (BC) and adjacent tissue; red and green indicate that miRNA was expressed at high or low levels, respectively.

Risk Prognostic Model Construction and Evaluation
The dataset (N = 962) was used to develop a clinical prediction model, and half of the dataset (N = 481) were randomly selected as a validation dataset, which was used for verification of the accuracy of the miRNA-based prognostic model as a predictor of DSS of patients with BC. Univariate and multivariate Cox proportional hazard models were constructed as risk prognostic models and area under the curve (AUC) analysis conducted to evaluate their accuracy.

Statistical Analyses
Statistical analyses were conducted using SPSS statistical suite version 25.0, ActivePerl version 5.26.3, and R version 3.6.1. Statistical significance was defined as a two-sided P-value or adjusted P-value < 0.05, except in univariate Cox proportional hazard analysis, where P < 0.1 was considered significant. The main outcome in this study was DSS, and the event time of DSS was defined according to the guidelines for time-to-event end point definitions in BC trials (Gourgou-Bourgade et al., 2015). The DSS event times for the individual patients enrolled in this retrospective study were manually retrieved from TCGA clinical records and a previous study (Thorsson et al., 2018).
MiRNAs with prognostic value identified using univariate Cox proportional hazards models (P < 0.1) were further analyzed by multivariate Cox regression (default settings: backward, conditional, and entry 0.05; removal 0.10), and miRNAs identified by this analysis were used to construct a formula for calculation of prognostic risk scores. The results of univariate and multivariate Cox regression are presented as hazard ratio (HR) with 95% confidence interval (CI). Receiver operating characteristic (ROC) curve and distance on curve [DOC, equaling the square root of (1 − Sensitivity) 2 + (1 − Specificity) 2 ] were performed to calculate cut-off points. Next, BC cases were categorized as "low risk" or "high risk" based on risk score higher or lower than the cut-off point. Chi-square analysis was used to assess the correlation between BC clinicopathological features and risk subgroups. Kaplan-Meier (KM) analyses were applied to generate survival curves, and log rank test was used to establish the significance of differences between curves. A prognostic nomogram to predict individual survival based on an miRNA signature and clinical risk factors was constructed by Cox regression. The accuracy of the risk prognostic model was tested using AUC (95% CI) values.

Identification of MiRNAs Associated With DSS in Patients With BC
A total of 1,601 miRNAs were filtered out, and 75 miRNAs were detected as differentially expressed using the edgeR package, including 14 up-and 61 down-regulated miRNAs ( Figure 1A), and the expression levels of the 75 miRNAs were shown using the pheatmap package ( Figure 1B). Further, univariate Cox proportional hazard analysis identified 23 miRNAs with prognostic value, five of which (hsa-miR-574, hsa-miR-30b, hsa-miR-224, hsa-miR-210, and hsa-miR-130a) were determined by multivariate Cox regression to be the optimum prognostic model for predicting DSS risk in patients with BC ( Table 2). Risk scores were calculated using the formula: indicating that hsa-miRNA-574, hsa-miRNA-224, and hsa-miRNA-210 were associated with higher risk scores, whereas hsa-miRNA-30b and hsa-miRNA-130a were linked with lower risk scores. Expression levels of the five miRNAs are shown separately in Figure 2. The risk score, −0.197, was calculated as the cut-off point via ROC curve and DOC (DOC = 0.436) ( Table 3). ROC curves for various risk score cut-off points are shown in Table 3. Using this model, patients were grouped into high-risk (N = 338) and low-risk (N = 624) groups, according to the cut-off point.

Clinicopathological Features
A total of 962 female cases with BC recorded in TCGA were extracted for analyses in this study. The median patient age was 58 years (range, 26-90 years), while the median DSS was 825 days. The 5-years DSS rate for all cases analyzed was 89.4%. BC tumor size, lymph node status, and metastasis status [tumor, node, metastasis (TNM) stage] was defined as outlined by the Eighth Edition American Joint Committee on Cancer (AJCC) Staging Manual (Amin et al., 2017), and molecular subtype (PAM50) was derived from a previous report by Thorsson et al. (2018). The proportion of HER2 subgroup patients in the low-risk group was significantly lower than that in the high-risk group (total dataset, χ 2 = 92.295, P < 0.0001; validation dataset, χ 2 = 45.400, P < 0.0001). Further, a larger tumor size was associated with a higher proportion of patients in the high-risk group for the total dataset (χ 2 = 18.447, P < 0.0001), but not in the validation dataset (χ 2 = 2.484, P = 0.478) ( Table 4).

A Five-MiRNA Signature Associated With DSS of Patients With BC
Univariate and multivariate Cox proportional hazard regression analyses indicated that a higher five-miRNA risk score was correlated with higher incidences of clinical events (univariate analysis, HR 5.686, 95% CI: 3.065-10.550, P < 0.0001; multivariate analysis, HR 4.376, 95% CI 2.288-8.369, P < 0.0001) ( Table 5). Moreover, KM survival curves showed that the high-risk group had worse prognosis in both the total (log rank P < 0.0001) and validation (log rank P < 0.0001) datasets (Figure 3).

Evaluation of the Predictive Power of the Five-MiRNA Signature
To determine the sensitivity and specificity of the five-miRNA signature for predicting survival, we conducted ROC analyses of the total and validation datasets. The AUC value for the five-miRNA signature was 0.714 (95% CI 0.645-0.783, P < 0.0001) in the total dataset and 0.679 (95% CI 0.574-0.783, P = 0.004) in the validation dataset, respectively, suggesting  that the five-miRNA signature was highly sensitive and specific (Figure 3).

Relevance of the MiRNA Signature in Clinical Decision-Making
We found that patients in the high-risk group who underwent adjuvant chemotherapy had significantly better prognosis than those who did not (P = 0.004), while no such difference was detected in the low-risk group (P = 0.466) (Figure 8).
These results suggest that patients in the high-risk group could benefit from adjuvant chemotherapy, while those in the lowrisk group may not.

Nomogram Development
To apply the five-miRNA signature in clinical settings, we combined it with conventional clinical predictors of prognosis (age, tumor size, lymph node, metastasis, and molecular subtype) to create a nomogram (Figure 9). Each risk factor corresponds to a designated point determined by drawing a line perpendicular to the points axis. The corresponding sum of risk factor points located on the total points represents the probability of 5-year DSS by reading straight down to the 5-year DSS axis.

DISCUSSION
Although, numerous molecular predictors of prognosis have been developed to guide clinical decisions in the management of patients with BC, the scope for application of predictors has been somewhat limited; for example, Oncotype Dx was developed for prediction of prognosis only in cases with ERpositive, HER2-negative, node-negative, and early stage BC (Paik et al., 2004). Numerous studies have demonstrated that miRNAs have potentially important functions in breast tumorigenesis, and multiple miRNAs have been identified as candidate biomarkers for prediction of prognosis in patients with BC (Yan et al., 2008;Cheng et al., 2012;Zhou et al., 2014). The results of multigene analyses demonstrate that combinations of multiple biomarkers can achieve higher sensitivity and specificity, relative to single-gene biomarkers (Paik et al., 2004;Cardoso et al., 2016). In the present study, a five-miRNA signature was identified as significantly associated with DSS. Further analyses demonstrated the utility of this five-miRNA signature as a powerful predictor of prognosis in patients with BC. A nomogram constructed by combining the five-miRNA signature and conventional prognostic factors had high value for prediction of 5-year DSS of patients with BC. In the high-risk group, patients who underwent adjuvant chemotherapy had better prognosis than those who did not, but this was not the case in the low-risk group, indicating that patients in the high-risk group could benefit from adjuvant chemotherapy more than those in the low-risk group.
The biological functions of the five miRNAs have been elucidated in numerous experiments, and they are associated with BC prognosis or pathological features. Hsa-miR-30b has been validated as a tumor suppressor that inhibits bone metastasis in BC (Croset et al., 2018). Up-regulation of hsa-miR-210 promotes BC stem cell metastasis, proliferation, and self-renewal by targeting E-cadherin (Tang et al., 2018). Further, hsa-miR-224 promotes tumorigenesis through down-regulation of caspase-9 in triple-negative BC , hsa-miR-574 enhances doxorubicin resistance by down-regulating SMAD4 in BC cells , and hsa-miR-130a suppresses BC cell migration and invasion (Chen et al., 2018;Kong et al., 2018) and reduces drug resistance in BC (Huang et al., 2019).
Recent research shows that multiple-miRNA signature (model) could predict the prognosis of some cancer, for instance, four-miRNA classifier for esophageal squamous cell carcinoma (Wen et al., 2021), a six-miRNA-based classifier for stage II colon cancer (Zhang et al., 2013), a robust six-miRNA prognostic signature (Zhao and Cui, 2020), and a novel seven-miRNA prognostic model (Lu et al., 2019) for head and neck squamous cell carcinoma. In breast cancer, three multi-miRNA prognostic models have been developed to date (Lai et al., 2019;Tang et al., 2019;Lu et al., 2020). However, our model has several advantages as a predictor of prognosis in patients with BC. First, our study included 962 female cases with BC, while excluding males and cases with missing clinical information, which avoided the possibility of sex-specific effects and ensured more credible results. Second, DSS was chosen as a clinical outcome, rather than OS, as OS is less sensitive for BC-specific mortality. Third, we identified two internal reference miRNAs (hsa-miR-186 and   hsa-miR-361), which were essential to normalize expression data for other miRNAs. Quantification methods for miRNA expression levels include relative and absolute expression quantification. Although absolute quantification could detect exact miRNA expression, small changes in an experiment may cause huge relative quantification differences. The purpose of relative quantification is the degree of change in the expression level of the target miRNA relative to the expression level of the reference miRNA, and its function is to correct aspects of operation differences to ensure the accuracy of the experimental results. Also, hsa-miR-186 and hsa-miR-361 were screened by following strict criteria that have been reported by Zhan et al. (2014), they had certain and stable expression levels in tumor and normal tissue, and there was no statistical difference between tumor and normal tissue. So, relative quantification appears to be more robust than absolute quantification approaches. Fourth,    patients with high-risk scores using our model were shown to benefit from adjuvant chemotherapy, which could inform clinical decision-making regarding appropriate treatment strategies for patients with BC. Notwithstanding these strengths, the study had a number of shortcomings and limitations, which should be acknowledged. First, a real-world validation dataset was lacking. Our breast center has already started to establish a validation dataset of BC to verify the findings in this research. However, limited by the follow-up time and the number of cases, it may take a long time. If the validation dataset is consistent with the results of this study, the finding will be applied to a prospective clinical study. Second, the biological functions of the five miRNAs remain to be fully elucidated. Third, KM curve or ROC subgroup analyses did not reveal any significant difference in the subgroups: T4 (N = 28), LumB-like (N = 164), HER2 (N = 67), age ≤ 40 years (N = 87), and M1 (N = 14); therefore, further study to identify more accurate molecular models for these patient subgroups is warranted. Fourth, while our analyses show that patients with high-risk scores could benefit from adjuvant chemotherapy, which indicates that the five-miRNA signature has theoretical reference significance for individualized clinical decision-making, more clinical studies are necessary to confirm this observation. Fifth, our study lacks an independent validation dataset. In the early stage of study design, we had considered that the full dataset be randomly divided into training and validation datasets by the proportions of 1:1, 3:2, 7:3, and 4:1, but the larger the sample size of the dataset, the higher the credibility of the model established, absolutely selecting full dataset for training and modeling. Although the verification dataset was randomly selected from the total dataset with the overlap of the sample points, it can also verify the reliability of the model.

CONCLUSION
In conclusion, we identified a novel five-miRNA prognostic model significantly associated with DSS in patients with BC and developed a nomogram based on the five-miRNA signature with high prognostic prediction value. Moreover, our analyses indicated that patients with high-risk scores using our model could benefit from adjuvant chemotherapy, indicating that the five-miRNA signature has theoretical reference significance for individualized clinical decision-making.

DATA AVAILABILITY STATEMENT
All datasets generated and analyzed in this study are available from the corresponding author on reasonable request. TCGA datasets analyzed in this study are available at https:// cancergenome.nih.gov/ (TCGA repository).

ETHICS STATEMENT
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
BT and MH collected, analyzed, interpreted TCGA data, and participated in writing the manuscript. KZ critically revised the manuscript for intellectual content. XQ, YD, and YG controlled the quality of tables and figures. JW and XY designed this study and reviewed the manuscript. All authors contributed to the article and approved the submitted version.