Associations of mRNA:microRNA for the Shared Downstream Molecules of EGFR and Alternative Tyrosine Kinase Receptors in Non-small Cell Lung Cancer

Lung cancer is the top cancer killer worldwide with high mortality rate. Majority belong to non-small cell lung cancers (NSCLCs). The epidermal growth factor receptor (EGFR) has been broadly explored as a drug target for therapy. However, the drug responses are not durable due to the acquired resistance. MicroRNAs (miRNAs) are small non-coding and endogenous molecules that can inhibit mRNA translation initiation and degrade mRNAs. We wonder if some downstream molecules shared by EGFR and the other tyrosine kinase receptors (TKRs) further transduce the signals alternatively, and some miRNAs play the key roles in affecting the expression of these downstream molecules. In this study, we investigated the mRNA:miRNA associations for the direct EGFR downstream molecules in the EGFR signaling pathway shared with the other TKRs, including c-MET (hepatocyte growth factor receptor), Ron (a protein tyrosine kinase related to c-MET), PDGFR (platelet-derived growth factor receptor), and IGF-1R (insulin-like growth factor receptor-1). The multiple linear regression and support vector regression (SVR) models were used to discover the statistically significant and the best weighted miRNAs regulating the mRNAs of these downstream molecules. These two models revealed the similar mRNA:miRNA associations. It was found that the miRNAs significantly affecting the mRNA expressions in the multiple regression model were also those with the largest weights in the SVR model. To conclude, we effectively identified a list of meaningful mRNA:miRNA associations: phospholipase C, gamma 1 (PLCG1) with miR-34a, phosphoinositide-3-kinase, regulatory subunit 2 (PIK3R2) with miR-30a-5p, growth factor receptor-bound protein 2 (GRB2) with miR-27a, and Janus kinase 1 (JAK1) with miR-302b and miR-520e. These associations could make great contributions to explore new mechanism in NSCLCs. These candidate miRNAs may be regarded as the potential drug targets for treating NSCLCs with acquired drug resistance.


INTRODUCTION
Lung cancer is the top cancer killer worldwide with high mortality rate . Over 80% of cases belong to non-small cell lung cancer (NSCLC) with the continuously rising incidence rate every year (Al-Saleh et al., 2012). The 5year overall survival rate of NSCLC is very low, due to lack of the effective diagnosis and treatment methods (Indovina et al., 2011). Hence, novel and effective targets are needed urgently for cancer diagnosis and therapy. The epidermal growth factor receptor (EGFR) has been broadly explored as a drug target for treatment, since it is overexpressed in various tumors, such as breast and lung carcinomas (Kim et al., 2001;Camp et al., 2005). The receptor tyrosine kinase inhibitors (TKIs), including Gefitinib and Erlotinib, are small molecules that inhibit EGFR phosphorylation, receptor activation and the subsequent signal transduction (Camp et al., 2005). However, the drug responses are not durable due to the acquired resistance. The secondary mutation in EGFR exon 20 (T790M) was found in approximately 50% of resistant tumors (Zhang et al., 2012;Kumarakulasinghe et al., 2015). Besides the mutation, activation of alternative tyrosine kinase receptors (TKRs) sharing similar downstream pathways with EGFR has been proved to be one of the multiple resistance mechanisms, including c-MET (hepatocyte growth factor receptor), Ron (a protein tyrosine kinase related to c-MET), PDGFR (platelet-derived growth factor receptor), and IGF-1R (insulin-like growth factor receptor-1) (Camp et al., 2005).
MicroRNAs (miRNAs) are small non-coding and endogenous molecules with 19-22 nucleotides, which inhibit mRNA translation initiation and degrade mRNAs (Zandberga et al., 2013). MiRNAs play a vital role in cancer development and progression by regulating the signaling circuits and the dysregulation in a cell (Zandberga et al., 2013). MiRNAs are regarded as the important biomarkers for cancer, since their expression profiles are mostly tissue-specific and able to signify human cancers (Lu et al., 2005). It has been found that nearly all the cancers have different miRNA expression profiles compared with the adjacent normal tissues, especially in lung and breast cancers (Zhang and Farwell, 2008). MiRNAs were also reported to be related to tumor invasion and metastasis (Ma et al., 2007). Let-7 is a notable miRNA in lung cancer, which is involved in the pathogenesis and significantly decreased in lung cancer tissues (Zhang and Farwell, 2008). MiR-128b directly targets EGFR, and its expression is positively correlated with clinical response and survival for Gefitinib treatment in lung cancer (Weiss et al., 2008;Lin et al., 2010). Multiple genes in the EGFR signaling pathway are targeted by miR-7, such as EGFR and protein kinase B (Akt) (Webster et al., 2009). MiRNAs are vital in the EGFR signaling pathway, and remarkable for potential drug discovery.
Due to the critical regulatory role of miRNAs in the cellular processes and disease pathology, many prediction algorithms have been made publicly online to predict the putative mRNA:miRNA pairs based on the sequence analysis, such as TargetScan and PicTar (Krek et al., 2005;Agarwal et al., 2015). However, the identification of the regulatory miRNA and mRNA pairs is still a major challenge, because a single miRNA may target several mRNAs, and multiple miRNAs may co-regulate the same mRNA (Jayaswal et al., 2011). Moreover, the prediction databases do not take into account miRNA/mRNA expression levels. The regulatory effects of miRNAs may vary in different situations. In other words, different cancer cells may have different regulatory miRNAs. Hence, Microarray profiling of miRNAs and mRNAs has been widely used to analyze the whole genome to further explore their regulatory effects. Both mRNA and miRNA expression levels can be quantified using microarray technology. We introduced multiple linear regression and support vector regression models to identify the regulatory effects of multiple miRNAs on the same mRNAs by analyzing the microarray data.
The aim of this study was to identify the direct downstream proteins of EGFR in the EGFR signaling pathway shared with the other TKRs (c-MET, Ron, PDGFR, and IGF-1R), and to further explore the candidate miRNAs affecting the mRNA expressions of these proteins. We hypothesize that there are some important downstream proteins shared by EGFR and the other TKRs, possibly leading to the resistance of EGFR TKIs, and some key miRNAs targeting these proteins are dysregulated in NSCLCs. In order to support our hypotheses, we proposed a novel mRNA:miRNA association identification approach that consists of two steps. The first step involved the identification of putative miRNA and mRNA pairs from the prediction databases. The second step is to identify the statistically significant and the best weighted miRNAs for the mRNAs of the shared downstream molecules using multiple linear regression and support vector regression models based on the microarray expression profiles. Our approach could reduce the intricate interactions between miRNAs and mRNAs to a few meaningful mRNA:miRNA associations, which facilitates the subsequent biological analysis and more efficient experimental validation. These candidate miRNAs may be served as potential drug targets for the treatment of NSCLCs with drug resistance.

Microarray Expression Data
In this study, the microarray datasets GSE51852 and GSE51853 for mRNAs and miRNAs were collected from Gene Expression Omnibus (GEO) repository database (Arima et al., 2014). Both collected mRNA and miRNA data have been normalized. The subjects were recruited in Japan. The researchers analyzed the global expression profiles of miRNAs and the target mRNAs based on a systems biology-based approach using a Bayesian network and non-parametric regression to identify the functional gene regulatory circuitry involved in the lung adenocarcinoma (Arima et al., 2014). We chose 46 subtype NSCLC specimens for the analysis of miRNA and mRNA expression profiles obtained from the same patients (Table S1). In the microarray data, each mRNA is usually interrogated by more than one probes. We took the average of the multiple probes for the same mRNA (Breslin et al., 2005;Kapp et al., 2006).

MiRNA Profiling for the Shared Downstream Molecules
The directly interacting downstream molecules of EGFR in the EGFR signaling pathway were selected using MetaCore R from GeneGo Inc. MetaCore R incorporates the online validated interactions from researchers to the pathways. The signaling pathways of the other TKRs (c-MET, Ron, PDGFR, and IGF-1R) were also considered to select the shared downstream molecules. The miRNAs regulating the mRNAs of the shared downstream molecules were selected using the representative resources: MicroCosm-Targets (Griffiths-Jones et al., 2008), TargetScan (Agarwal et al., 2015), miRanda-mirSVR (Betel et al., 2010), miRDB (Wong and Wang, 2015), and PicTar (Krek et al., 2005). We searched for the miRNAs targeting each mRNA using these databases, and selected the miRNAs with the support of at least two out of the five databases. We further obtained the expression levels of these mRNAs and miRNAs from the microarray data. This set of mRNAs and miRNAs, as well as their expression profiles were considered for multiple regression analysis.

Identification of Statistically Significant miRNAs from Multiple Linear Regression Model
After obtaining the expression of the selected mRNAs and miRNAs, we performed the multiple linear regression analysis to explore the relationship of each mRNA with the corresponding miRNAs (Wang et al., 2014a,b). Since multiple miRNAs may simultaneously target one mRNA during the post-transcriptional process, and the effect of each miRNA may be different, multiple linear regression analysis was applied to explore their relationship in formula (1): where y was the expression level of an mRNA; x 1 , x 2 , ... and x n were the expression profiles of the corresponding miRNAs. The parameters, a 0 , a 1 , a 2 , ... and a n , represented the regression constant and coefficients, respectively.

Support Vector Regression (SVR) Model
Furthermore, support vector regression (SVR) machine (Smola and Vapnik, 1997;Gunn, 1998) was used to model the relationship of the selected mRNAs and multiple miRNAs. This problem was considered as approximating the set of data, with a linear function where f was an output variable, which denoted the expression level of mRNA, x was an input feature vector, which represented the expression profiles of multiple miRNAs, and w and b were the linear regression coefficients. To obtain the model, the error was measured based on the ǫ−insensitivity loss function defined by The optimal regression function can be obtained by minimizing under constraints In formula (4), the first term measured the confidence interval, the second term was used to measure the empirical risk, and C was applied to control the magnitude of approximation errors.
In the simulation, C was equal to the difference between the maximum and the minimum of mRNA expression profiles.

Selection of Direct Downstream Molecules of EGFR Shared with the Other TKRs and the Corresponding miRNAs
The EGFR signaling pathway was downloaded from MetaCore R , as shown in Figure 1. There were 11 key directly interacting downstream molecules of EGFR: phospholipase C, gamma 1 (PLCG1), phosphoinositide-3-kinase, regulatory subunit 1 & 2 (PIK3R1/2), Cbl proto-oncogene, E3 ubiquitin protein ligase (CBL), growth factor receptor-bound protein 2 (GRB2), Src homology 2 domain containing transforming protein 1 (SHC1), v-src sarcoma (Schmidt-Ruppin A-2) viral oncogene homolog (avian) (SRC), Janus kinase 1 & 2 (JAK1/2), NCK adaptor protein 1 (NCK1), and protein tyrosine kinase 2 (PTK2). In total, there were 8 out of 11 downstream molecules shared by EGFR and the other TKRs according to their signaling pathways from MetaCore R (Table 1, Figure 1). MiRNAs regulating these mRNAs were obtained from five miRNA prediction databases. The candidate miRNAs were selected, as shown in Table S2. The expression profiles of the mRNAs and the corresponding miRNAs were further extracted from the microarray datasets for regression analysis (Table S1).

Multiple Linear Regression Model to Identify the Statistically Significant miRNAs Regulating the Shared Downstream Molecules
The associations of each mRNA transcript of the shared downstream molecules with the corresponding miRNAs were investigated by multiple linear regression model. This analysis was performed using IBM SPSS and MATLAB. The results   demonstrated the significant miRNAs (p < 0.05): miR-34a was negatively significantly associated with PLCG1, miR-30a-5p was negatively significantly associated with PIK3R2, miR-27a was negatively significantly associated with GRB2, miR-302b was positively significantly associated with JAK1 and miR-520e was negatively significantly associated with JAK1, as well as miR-155 was positively significantly associated with JAK2 ( Table 2). Two directions were found for the mRNA:miRNA relationship: negative and positive. The results revealed that some mRNAs had only one significant miRNA, while, some mRNAs had more than one significant miRNAs.

SVR Model to Determine the Best Weighted miRNA(s)
Through the SVR model manipulation, the linear regression coefficient of each miRNA was obtained, which represented the weight of each miRNA making the contribution to the expression levels of the target mRNAs. The results depicted (Figure 2): miR-34a had the largest negative weight in the PLCG1 model, miR-30a-5p had the largest negative weight in the PIK3R2 model, miR-27a had the largest negative weight in the GRB2 model, as well as miR-302b had the largest positive weight and miR-520e had the largest negative weight in the JAK1 model. The results were consistent with the findings of the multiple linear regression model that these best weighted miRNAs had the significant pvalues, and the signs of the weights were also the same, indicating the same direction. A little bit difference was found in JAK2 model that miR-155 had the second largest positive weight. The weight of miR-621 was the largest. In the multiple linear regression model, the p-value for miR-621 was 0.057, nearly significant. Possibly, enlarging the sample size could polish the results. In order to prove the prediction efficiency of the SVR model, we plotted the scatter diagrams for the expression levels of the original mRNAs from the microarray data and the predicted mRNAs based on the constructed SVR model (Figure 3). From the results, we could demonstrate that the data points were concentrated around the red line Y = X, which means that the SVR method was capable of well modeling the relationship between mRNAs and the multiple miRNAs.

DISCUSSION
In this study, we investigated the direct EGFR downstream molecules in the EGFR signaling pathway shared with the other TKRs that may affect the drug resistance (Table 1, Figure 1). The statistically significant and the best weighted miRNAs regulating the mRNAs of these shared downstream molecules were further explored using the multiple linear regression and support vector regression models (Table 2, Figure 2). Multiple linear regression can obtain the linear regression equation to represent the relationship between the mRNA and the multiple miRNAs by minimizing the mean least square error. It can also indicate the significance of each miRNA coefficient from the point of statistical view. By contrast, SVR can model the associations between mRNAs and miRNAs through maximizing regression margin and minimizing error. The two introduced methods found the similar mRNA:miRNA associations. The potential miRNAs significantly affecting the expression of mRNAs from multiple regression analysis were further identified through the weights in the SVR model. Since one mRNA could be regulated by more than one miRNAs, multiple regression analysis is more suitable to simulate the in vivo environment and identity the significant miRNAs. Furthermore, SVR method could well model the associations between the mRNAs and the multiple miRNAs from the results (Figure 3). Our approaches using in silico analysis could demonstrate the important findings and infer the biological meaning before performing the experimental validation.
Our method could enable the identification of mRNAs that were co-regulated by multiple significant miRNAs and those that were targeted by only one significant miRNA, as well as the direction of the relationship. Both positive and negative relationships were found in our results (Table 2, Figure 2), such as the positive association between JAK1 and miR-302b, and the negative association between PLCG1 and miR-34a. Some molecules had more than one significant miRNAs, e.g., JAK1. Different interaction types may exist to affect the associations between mRNAs and miRNAs. The direct binding of miRNAs to the seed regions of mRNAs can influence the stability of mRNAs and the translation process from mRNAs to proteins (Androsavich et al., 2012). Since miRNA binding could degrade the target mRNAs, negative associations will be possibly found. Researchers have reported that some mRNAs may not be degraded when miRNAs bind to the seed regions. The bindings could lead to the accumulations of the inhibited mRNAs (Franco-Zorrilla et al., 2007;Ambs et al., 2008). This mechanism may result in the positive relationship between mRNAs and miRNAs. Moreover, the indirect interaction through the intermediators between them could also generate the positive associations.
The presence of other activated TKRs can mimic the function of EGFR by activating the shared signal transduction pathways to promote cell survival (Camp et al., 2005). Downstream molecule, phosphoinositide-3-kinase (PI3K), shared by both EGFR and one of these TKRs IGF-1R has been reported to affect the resistance to EGFR-TKIs (Camp et al., 2005). MiRNAs play important roles in the EGFR signaling pathway, therefore, they are remarkable for drug discovery in cancer therapy. To investigate the relationship between miRNAs and drug resistance makes the valuable contribution to improve the therapeutic efficacy of cancer. Since not every protein is suitable for drug target, specific manipulation of the undruggable proteins can be realized through the corresponding miRNAs targeting the mRNAs: miRNA inhibitors can upregulate one protein and miRNA mimics can silence one gene (Schmidt, 2014). People have tested a liposomal nanoparticle loaded with synthetic miRNA-34a mimics (MRX34) that against cancer cells when combined with the EGFR-TKI Erlotinib treatment (Bader, 2012;Zhao et al., 2014). The additive or synergistic effects of miRNAs together with another treatment provide a novel platform for treating resistant cancer cells (Wu, 2010). MiR-15b and miR-16 in combination with chemotherapy can induce the sensitivity of gastric cancer cells by inhibition of cyclin D1 (BCL1) (Zhao et al., 2008). Consequently, inducing the cancer cell sensitivity by miRNAs provides a novel approach for cancer treatment, at least in vitro (Wu, 2010).
The candidate miRNAs discovered in our study have been investigated in cancer research. MiR-34a has been identified as a FIGURE 3 | Scatter diagram for the expression levels of the original mRNAs from microarray data and the predicted mRNAs from SVR models. The red line referred to Y = X to indicate the location of the data points.
novel prognostic marker in NSCLCs, making great contributions for evaluating the recurrence risk of patients and providing useful information for cancer treatment (Gallardo et al., 2009). Other researchers found that miR-34a can significantly reduce the tumor growth in an in vivo murine model and activate some major proteins involved in the apoptosis process, which may act as a suppressor of tumorigenesis (Welch et al., 2007;Tivnan et al., 2011). It has been reported that miR-30a might function as a metastasis suppressor by downregulating phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit delta (PIK3CD) in colorectal carcinoma to suppresses cell migration and invasion (Zhong et al., 2013). MiR-27a plays an important role in the regulation of drug resistance, indicating its potential property for therapeutic strategy in cancer cell lines (Zhu et al., 2008). MiR-155 has been explored as the well characterized signatures for breast, colon and lung cancers (Volinia et al., 2006). To our best knowledge, some of the identified miRNAs in our study have limited information for NSCLC research in the literatures, which may be the novel and important findings needed to be further exploration.
In summary, our study introduced the mRNA:miRNA multiple linear regression and SVR models to identify the significant and the best weighted miRNAs regulating the shared downstream molecules of EGFR and other TKRs that may affect the drug resistance. These candidate miRNAs can be regarded as the potential drug targets for NSCLC treatment, or combined with Gefitinib and Erlotinib treatment. Several methods have been established to identify the associations between miRNAs and mRNAs. One of the widely used methods is the application of gene set test (GST). In this method, the first step is to select the putative miRNA-mRNA pairs using the prediction databases. And then, the GST is performed to get the significant mRNAs for a given miRNA based on the microarray data (Subramanian et al., 2005;Efron and Tibshirani, 2007). Another method is the application of odds-ratio (OR) statistic (Jayaswal et al., 2009). If the significant associations are found between miRNA and its computationally predicted target mRNAs according to microarray expression levels, the miRNA is considered to have the regulatory function. These two mentioned methods can only identify the individual regulatory miRNAs, and ignore the behind biological meaning that multiple miRNAs may co-regulate a target mRNA simultaneously. Jayaswal et al. proposed a method to identify the mRNA:miRNA modules (Jayaswal et al., 2011). Firstly, to identify the miRNA and mRNA clusters using computationally predicted approach and expression data. Secondly, to obtain the statistically significant associations between miRNA and mRNA clusters based on the expression levels. This method only considers the significant cluster pairs, and do not take into account the weight of the multiple miRNAs on the same mRNAs. Our developed method incorporated these factors, and could identify the statistically significant and the best weighted miRNA(s) for a particular mRNA.
In the future, we will collect more public datasets to perform the similar analysis to polish our findings. Laboratory experimental validation will be done to explore the true effects of the identified miRNAs on drug resistance in NSCLCs, including Polymerase chain reaction (PCR) and wester-blot. Furthermore, knock-down of the other TKRs and the shared downstream molecules, as well as miRNA mimics/inhibitors experiments will be performed to support the hypothesis and the current findings.

AUTHOR CONTRIBUTIONS
FW and LC initiated the project and participated in its design. FW, FM, and LC performed the multiple linear regression analyses and participated in the interpretation of data. FW and LW constructed the support vector regression (SVR) model. FM, LW, SW, and WC participated in the design and coordination of the study. FW was responsible for writing the manuscript. All the authors participated in the discussion and editing of the manuscript.