Identification of a Potentially Functional microRNA–mRNA Regulatory Network in Lung Adenocarcinoma Using a Bioinformatics Analysis

Background Lung adenocarcinoma (LUAD) is a common lung cancer with a high mortality, for which microRNAs (miRNAs) play a vital role in its regulation. Multiple messenger RNAs (mRNAs) may be regulated by miRNAs, involved in LUAD tumorigenesis and progression. However, the miRNA–mRNA regulatory network involved in LUAD has not been fully elucidated. Methods Differentially expressed miRNAs and mRNA were derived from the Cancer Genome Atlas (TCGA) dataset in tissue samples and from our microarray data in plasma (GSE151963). Then, common differentially expressed (Co-DE) miRNAs were obtained through intersected analyses between the above two datasets. An overlap was applied to confirm the Co-DEmRNAs identified both in targeted mRNAs and DEmRNAs in TCGA. A miRNA–mRNA regulatory network was constructed using Cytoscape. The top five miRNA were identified as hub miRNA by degrees in the network. The functions and signaling pathways associated with the hub miRNA-targeted genes were revealed through Gene Ontology (GO) analysis and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway. The key mRNAs in the protein–protein interaction (PPI) network were identified using the STRING database and CytoHubba. Survival analyses were performed using Gene Expression Profiling Interactive Analysis (GEPIA). Results The miRNA–mRNA regulatory network consists of 19 Co-DEmiRNAs and 760 Co-DEmRNAs. The five miRNAs (miR-539-5p, miR-656-3p, miR-2110, let-7b-5p, and miR-92b-3p) in the network were identified as hub miRNAs by degrees (>100). The 677 Co-DEmRNAs were targeted mRNAs from the five hub miRNAs, showing the roles in the functional analyses of the GO analysis and KEGG pathways (inclusion criteria: 836 and 48, respectively). The PPI network and Cytoscape analyses revealed that the top ten key mRNAs were NOTCH1, MMP2, IGF1, KDR, SPP1, FLT1, HGF, TEK, ANGPT1, and PDGFB. SPP1 and HGF emerged as hub genes through survival analysis. A high SPP1 expression indicated a poor survival, whereas HGF positively associated with survival outcomes in LUAD. Conclusion This study investigated a miRNA–mRNA regulatory network associated with LUAD, exploring the hub miRNAs and potential functions of mRNA in the network. These findings contribute to identify new prognostic markers and therapeutic targets for LUAD patients in clinical settings.


INTRODUCTION
Lung adenocarcinoma (LUAD) is the most common histological subtype of lung cancer, accounting for approximately 40% of all cases of lung cancer in China and other countries (Bray et al., 2018;Cao and Chen, 2019;Siegel et al., 2020). LUAD is characterized by rapid progression and early development of metastases and a high recurrence rate, with the highest morbidity and mortality in both genders (Bray et al., 2018;Cao and Chen, 2019;Siegel et al., 2020). The overall 5-year survival rates for LUAD only reach approximately 20% (Bray et al., 2018;Miller et al., 2019) given the lack of early detection and limited effective therapies at earlier stages of disease (Gan et al., 2017). LUAD patients have an 80% chance of surviving 5 years if diagnosed at an early stage (Henschke et al., 2006). Thus, there is a growing need to identify and characterize the molecular pathogenesis of LUAD in order to better understand the underlying disease mechanisms and to improve treatment outcomes of this malignancy.
MicroRNAs (miRNAs) are a family of small non-coding RNAs that downregulate gene expression by repressing or degrading messenger RNA (mRNA) targets, thereby controlling the genes involved in cellular processes (Di Leva et al., 2014;Du et al., 2018). miRNAs regulate the expression of approximately 30% of all human genes, playing important regulatory roles in many human diseases (Lewis et al., 2005;Di Leva et al., 2014;Ramassone et al., 2018;Wu K.L. et al., 2019;Wu S.G. et al., 2019;Condrat et al., 2020). miRNAs could be readily detected in circulation and carry information regarding the origin of a neoplasm, thus serving as diagnostic and prognostic biomarkers in the development of tumors (Hummel et al., 2010;Blondal et al., 2013;Qi et al., 2013;Zhang et al., 2016;Świtlik et al., 2019;Cojocneanu et al., 2020;Wang et al., 2020). An increasing number of studies have reported various expression levels of miRNAs in LUAD (Cazzoli et al., 2013;Zhong et al., 2018), although findings remain inconsistent. Extensive genomic studies have verified that abnormal expressions of multiple mRNAs of genes involved in LUAD tumorigenesis and progression play vital roles (Herbst et al., 2018). The miRNA-mRNA regulatory network is characterized in such a way that individual miRNA could regulate a wealth of different mRNAs of genes, and the individual mRNA of a target gene could be correspondingly suppressed by multiple different miRNAs (Cui et al., 2020;Li et al., 2020;Liu et al., 2020). Thus, it is necessary to examine the miRNA-mRNA regulatory network in LUAD to advance our understanding of its molecular mechanisms. Microarray analysis has been widely used in the evaluation of miRNA in cancers to understand the complexity and heterogeneity of malignant disease (Jurj et al., 2017;Cojocneanu et al., 2020), but has not been fully elucidated in LUAD. A public dataset from the Cancer Genome Atlas (TCGA) is widely used in LUAD analyses, although cases primarily originate from a non-Asian population, yet include primary tissue samples. Hence, the TCGA dataset and actual clinical patient data from multiple sample types contribute to determining the biomarkers of LUAD and exploring the underlying mechanisms (Rheinbay et al., 2020).
Here, we constructed a miRNA-mRNA regulatory network of LUAD using TCGA data and our microarray analysis data, including multiple sample types and ethnicities, to subsequently identify hub miRNAs in the network. The functional enrichment analysis of the targeted mRNAs of hub miRNAs were investigated using the Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses. In addition, we analyzed the protein-protein interaction (PPI) network to understand the potential mechanism of mRNAs in LUAD occurrence and development. In doing so, we identified the top ten key mRNAs using the PPI network and CytoHubba. The Gene Expression Profiling Interactive Analysis (GEPIA) allowed us to evaluate these key mRNAs and identify the hub genes in LUAD using survival analysis. Figure 1 illustrates the workflow. This study received ethical approval from the Ethics Committee of the Gansu Provincial Hospital (14 April 2020. Informed consent was obtained from all participants in the microarray experiment, and the research adhered to the principles of the Declaration of Helsinki.

MATERIALS AND METHODS
The TCGA dataset from tissue samples and the microarray data from plasma in LUAD were included in this study. Supplementary Tables 1,2 provide the clinical characteristics in LUAD from the TCGA and microarray data. The TCGA data included miRNA (400 LUAD and 15 normal lung tissue) and mRNA (515 LUAD and 20 normal lung tissue) samples, all downloaded using Firehose 1 on 16 April 2020 (Izadi et al., 2017). The mRNA samples were obtained by filtering the samples with low-quality reads as expressions <0.8. The miRNA data were obtained from intersecting the mRNA-sequence data and filtering the samples with low-quality reads as expressions <0.5. The microarray data (n = 10) were collected between October 2018 and March 2019 at Gansu Provincial Hospital (China), and included initially diagnosed LUAD patients (n = 6) and controls (n = 4). We excluded patients who (a) previously received chemotherapy, radiotherapy, molecular-targeted therapy, immunotherapy or surgery before blood samples were collected; (b) had other combined cancers; (c) were pregnant or lactating; or (d) presented with cardiopulmonary insufficiency, serious cardiovascular disease and a serious infection as well as severe malnutrition (Wang et al., 2014;Li et al., 2019).
RNA degradation and contamination, especially DNA contamination, was monitored on 1.5% agarose gels. RNA FIGURE 1 | The study workflow. LUAD, lung adenocarcinoma; DE, differential expression; Co-DE, common differentially expressed; Co-mRNAs, common mRNAs; TCGA, the Cancer Genome Atlas; GO, gene ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; PPI, protein-protein interaction. concentration and purity were measured using the NanoDrop 2000 Spectrophotometer (Thermo Fisher Scientific, Wilmington, DE). RNA integrity was assessed using the RNANano 6000 Assay Kit of the Agilent Bioanalyzer 2100 System (Agilent Technologies, CA, United States). A total amount of 2.5 ng RNA per sample was used as the input material for the RNA sample preparations. Sequencing libraries were generated using NEBNext R Ultra TM small RNA Sample Library Prep Kit for Illumina R (NEB, United States) following the manufacturer's recommendations and index codes were added to attribute sequences to each sample. Raw data (raw reads) of fastq format were first processed through in-house perl scripts. The clean data (clean reads) were obtained by removing reads containing adapter, reads containing ploy-N and low-quality reads from raw data. The reads were trimmed and cleaned by removing sequences <15 nt or >35 nt. At the same time, Q20, Q30, and GC-content of the clean data were calculated. All downstream analyses were based on clean data with a high quality. Then, using the Bowtie soft tools, clean reads were compared to the Silva database, GtRNAdb database, Rfam database, and Repbase database, respectively, to identify the sequence alignment, and filtered for ribosomal RNA (rRNA), transfer RNA (tRNA), small nuclear RNA (snRNA), small nucleolar RNA (snoRNA), and other ncRNA and repeats. The remaining reads were used to detect known miRNA and novel miRNA predicted through comparison with known miRNAs from miRBase. Detailed information on the microarray appears elsewhere (Zhong et al., 2018). The sequencing reads were uploaded to the gene expression omnibus (GEO) dataset (access number GSE151963).

Construction of the miRNA-mRNA Regulatory Network
The differentially expressed miRNAs (DEmiRNAs) and mRNA (DEmRNAs) were analyzed comparing the LUAD and normal samples using the R package limma. The DEmiRNAs consistent with the expression of TCGA were further analyzed. The isoform expressions of the DEmiRNAs were identified using the R package miRNAmeConverter. The common consistently and differentially expressed (Co-DE) miRNAs were obtained using intersected analyses comparing TCGA and microarray datasets (Cojocneanu et al., 2020). The targeted mRNAs of the Co-DEmiRNAs were predicted using three online analytical software tools including miRDB 2 , TargetScanHuman (version 7.2) 3 , and miRWalk (Agarwal et al., 2015;Dweep and Gretz, 2015;Chen and Wang, 2020) 4 . An overlap was applied to confirm the Co-DEmRNAs identified both in targeted mRNAs and DEmRNAs using FunRich tools (Pathan et al., 2015). Adjusted p-values (adj. p) were used to correct false-positive results (Patten et al., 2016). DEmiRNAs and DEmRNAs were identified applying p < 0.05 and | Log 2 fold-change (log2FC) | >1. The volcano plots were drawn using R package ggplot2. Based on the results of the Co-DEmiRNAs and Co-DEmRNAs, we constructed the miRNA-mRNA regulatory network using the Cytoscape software version 3.7.0 (Shannon et al., 2003). The top five miRNAs (degree >100) were selected as hub miRNAs in the network.

Functional Enrichment Analysis and Confirmation of Hub Genes
The biological functions of the targeted mRNAs of these hub miRNAs were evaluated using the GO and KEGG pathway enrichment analyses with the R package ClusterProfiler (Yu et al., 2012). We considered p < 0.05 statistically significant. Using the STRING database, 5 we downloaded data from the PPI network for targeted mRNAs for the hub miRNAs applying the following criteria: (a) homo sapiens; and (b) medium confidence 0.400 (Zhu et al., 2020). Then, the PPI network was further adjusted using the Cytoscape software version 3.7.0, and we identified the top ten key mRNAs of the PPI network using cytoHubba's Maximal Clique Centrality (MCC) ranking . The hub mRNAs were further verified using survival analysis using the Gene Expression Profiling 5 http://string-db.org Interactive Analysis (GEPIA) tools 6 . Significant survival was identified through survival analysis (log-rank p < 0.05). The key mRNAs with significant survival outcomes were identified as hub genes in LUAD.

GO and KEGG Analyses of Targeted mRNAs for Hub miRNAs
In total, we used 677 Co-DEmRNAs to identify the potential functions of hub miRNAs using GO and KEGG pathway enrichment analyses. We obtained 836 results in the GO analysis, in which Figure 4A illustrates the top 10 results from the biological process, cellular component and molecular function. Some results associated with the occurrence and progression of a tumor, including the regulation of the cellular response to the growth factor stimulus, the collagen-containing extracellular matrix and the basement membrane among others. In addition, 48 signaling pathways emerged from the KEGG pathways analysis ( Table 2), with the top ten results shown in Figure 4B.
Most of these pathways associated with the occurrence and progression of a tumor, including focal adhesion, proteoglycans in cancer, extracellular matrix-receptor interaction and the Wnt signaling pathway ( Table 2). The focal adhesion pathway contained the largest number of mRNAs and the smallest adj. p-value.

Identification of Hub Genes
The PPI networks identified 597 mRNAs from 677 Co-DEmRNAs with 2,246 edges related to each other ( Figure 5).

DISCUSSION
LAUD is a high-risk disease with a high mortality, and its potential occurrence and development mechanisms have not been fully identified. Previously, several studies reported the role of miRNAs in LAUD, although those findings lacked agreement (Cazzoli et al., 2013;Zhong et al., 2018). Here, we identified several hub miRNAs and hub mRNAs in LUAD using the miRNA-mRNA regulatory network using multiple sample types and ethnicities. In total, five hub miRNAs-namely, miR-539-5p, miR-656-3p, let-7b-5p, miR-2110, and miR-92b-3pplayed important roles in LUAD tumorigenesis and progression. Two hub mRNAs, namely, SPP1,and HGF, associated with LUAD patient survival. These findings suggest that the specific miRNA and mRNA expression patterns and functional analyses contribute to identifying new predictive markers and therapeutic targets for LUAD patients in clinical settings. Five hub miRNAs in the network played the most important roles in LUAD tumorigenesis and progression. Specifically, miR-539-5p emerged with the highest node degrees compared to   other nodes in the network. Several studies found that miR-539-5p expression levels associated with a poor prognosis and negatively correlated with hypoxia and the stem index among LUAD patients (González-Vallinas et al., 2018;). Yet, further experiments are needed to verify these results given the limited evidence documenting the direct mechanisms of miR-539-5p in LUAD. For example, Chen et al. showed that miR-656-3p could reduce AKT serine/threonine kinase 1 (AKT1) expression and suppress the occurrence and development of nonsmall cell lung cancer (NSCLC) . Additionally, upregulating miR-656-3p might improve the chemotherapeutic efficacy in NSCLC by target-regulating sex-determining region Y-related high-mobility group box 4 (SOX4) . The expression of let-7b-5p was upregulated and might play a vital role in NSCLC based on the GEO dataset using bioinformatics analysis (Zhou et al., 2020), and participate in lymph node micro-metastases of LUAD stage IA (Zhu et al., 2019). miR-2110 and miR-92b-3p express at different levels in different tumors (Long et al., 2017;Gong et al., 2018;Neerincx et al., 2018;Zhao et al., 2018;Ma et al., 2019). To the best of our knowledge, very few studies have attempted to clarify the effect of these two miRNAs in LUAD. In our study, we found that miR-2110 and miR-92b-3p were upregulated in LUAD. Thus, these studies confirmed that miR-539-5p, miR-656-3p, and let-7b-5p are indeed involved in the occurrence and progression of LUAD. Furthermore, miR-92b-3p and miR-2110 may represent new potential therapeutic targets in LUAD. SPP1 and HGF were identified as hub mRNAs in PPI and significantly predicted survival outcomes in LUAD. These two mRNAs appear enriched in the focal adhesion signaling pathway, containing the largest number of mRNAs with the smallest significant adj. p-value. The focal adhesion signaling pathway could involve the tumor microenvironment (TME), resulting in tumor progression and indicative of a poor outcome in LUAD (Yue et al., 2019;Wei et al., 2020). Accumulating evidence has revealed that the TME could influence malignant behavior and the progression of tumors. Previous studies found that SPP1 played a role in lung cancer escape and mediating macrophage polarization, while inhibiting SPP1 expression might overcome resistance to second-generation epidermal growth factor receptor gene (EGFR)-tyrosine kinase inhibitors (TKIs) . HGF is a pleiotropic cytokine composed of an α-chain and a β-chain, which mediates malignant biological behaviors in LUAD, including growth, invasion, metastasis and the epithelial-to-mesenchymal transition (EMT), as well as increases resistance to EGFR-TKIs in LUAD (Tarhini et al., 2017;FIGURE 5 | PPI network diagram of targeted mRNAs from the hub miRNAs. PPI, protein-protein interaction; miRNA, microRNA; mRNA, messenger RNA. Suzuki et al., 2019). Importantly, it is worth noting that SPP1 presented with the targets of miR-539-5p, correlated with a poor survival in LUAD and appeared involved in the regulation of focal adhesion and the extracellular matrix-receptor interaction pathways. Thus, we found a significant miR-539-5p-SPP1 axis, which played important roles in LUAD occurrence and progression by regulating the TME through the aforementioned pathways in our regulatory network. Although the other eight key mRNA played some important roles in LUAD or lung cancer (Ding et al., 2008;Roybal et al., 2011;Licciulli et al., 2013;Neri et al., 2017;Tian et al., 2018;Yao et al., 2019), these were not the significant predictors of survival in LUAD.
Aside from the strengths of our study, we should point out several limitations. Firstly, the small sample size in our cohort mirrors the limitations of similar previous studies (Patnaik et al., 2012;Xu et al., 2020). Yet, all of these miRNA expression profiling studies offer opportunities to develop new therapeutic targets. Although increasing numbers of miRNAs have been identified, a meta-analysis approach and a larger cohort are needed in future in order to minimize the drawbacks resulting from small sample sizes and different technological platforms. Secondly, given our findings related to the target genes from the Co-DEmiRNAs and hub miRNAs identified primarily based on bioinformatics analyses, further mechanistic studies from cells as well as animal models and clinical validation studies may be necessary in the future.
In this study, we constructed a miRNA-mRNA regulatory network for LUAD, identifying five hub miRNAs (namely, miR-539-5p, miR-656-3p, let-7b-5p, miR-2110, and miR-92b-3p) and two hub mRNAs (SPP1and HGF) through our microarray data and TCGA dataset. We also performed a FIGURE 6 | The network of key mRNAs. (A) The network diagram of the top ten hub miRNAs and (B) the rank and score of the top ten hub mRNAs in the PPI network. PPI, protein-protein interaction; miRNA, microRNA; mRNA, messenger RNA; NOTCH1, neurogenic locus notch homolog protein 1; MMP2, matrix metalloproteinase 2; IGF1, insulin-like growth factor 1; KDR, kinase-insert domain-containing receptor; SPP1, secreted phosphoprotein 1; FLT-1, vascular endothelial growth factor receptor 1; HGF, hepatocyte growth factor; TEK, angiopoietin-1 receptor; ANGPT1,angiopoietin-1 gene; PDGFB, platelet-derived growth factor-beta. functional enrichment analysis on these final target genes to understand the potential functional mechanisms in LUAD. These findings provide new molecular markers for the prediction, prognosis and therapeutic targets in clinical settings, as well as emphasize new mechanistic insights into LUAD.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Ethics Committee of the Gansu Provincial Hospital, China. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
QY, X-JW, and JG contributed to the design of the study and revised the manuscript. X-JW and ZW performed the sample collection, analysis and downloaded the data. X-JW and JG contributed to the data analysis and writing the manuscript. All authors approved the final version of the manuscript.

FUNDING
This study was supported by the Science-Technology Foundation for Young Scientist of the Gansu Province of China (Grant No. 18JR3RA059) and the Young Scientists Fund of the Gansu Provincial Hospital of China (Grant No. 17GSS7-1). JG was supported by the Sigrid Jusélius Foundation, the Finnish Anti-Tuberculosis Association Foundation and the Jalmari and Rauha Ahokas Foundation.

ACKNOWLEDGMENTS
We extend our deepest gratitude to all subjects who volunteered to participate in our study. In addition, we also thank the Biomarker Technologies Corporation (Beijing, China) for the sequence technology and support. We are also grateful to Vanessa L. Fuller, from Language Services at the University of Helsinki, Finland, for assistance in the English-language revision of this manuscript.