Combined Analysis of RNA Sequence and Microarray Data Reveals a Competing Endogenous RNA Network as Novel Prognostic Markers in Malignant Pleural Mesothelioma

Malignant pleural mesothelioma (MPM) is a highly aggressive cancer with short survival time. Unbalanced competing endogenous RNAs (ceRNAs) have been shown to participate in the tumor pathogenesis and served as biomarkers for the clinical prognosis. However, the comprehensive analyses of the ceRNA network in the prognosis of MPM are still rarely reported. In this study, we obtained the transcriptome data of the MPM and the normal samples from TCGA, EGA, and GEO databases and identified the differentially expressed (DE) mRNAs, lncRNAs, and miRNAs. The functions of the prognostic genes and the overlapped DEmRNAs were further annotated by the multiple enrichment analyses. Then, the targeting relationships among lncRNA–miRNA and miRNA–mRNA were predicted and calculated, and a prognostic ceRNA regulatory network was established. We included the prognostic 73 mRNAs and 13 miRNAs and 26 lncRNAs into the ceRNA network. Moreover, 33 mRNAs, three miRNAs, and seven lncRNAs were finally associated with prognosis, and a model including seven mRNAs, two lincRNAs, and some clinical factors was finally established and validated by two independent cohorts, where CDK6 and SGMS1-AS1 were significant to be independent prognostic factors. In addition, the identified co-expressed modules associated with the prognosis were overrepresented in the ceRNA network. Multiple enrichment analyses showed the important roles of the extracellular matrix components and cell division dysfunction in the invasion of MPM potentially. In summary, the prognostic ceRNA network of MPM was established and analyzed for the first time and these findings shed light on the function of ceRNAs and revealed the potential prognostic and therapeutic biomarkers of MPM.


INTRODUCTION
Malignant pleural mesothelioma (MPM), which is mainly associated with the asbestos exposure and derived from the pleural or peritoneal mesothelial cell surfaces, is an aggressive tumor with very poor prognosis (median survival time < 12 months after diagnosis) (1). According to epidemiological investigations, the number of patients has been increasing in recent years, especially in developing countries, which leads to ca. 40,000 deaths per year worldwide (2). Currently, it is difficult to differentially diagnose MPM, since inclusion of more biomarkers based on the latest research would likely increase the accuracy and efficiency of prognosis. It is therefore essential and urgent to identify new prognostic biomarkers and therapeutic targets and to understand its molecular mechanisms.
The competing endogenous RNA (ceRNA) theory illustrates a novel regulatory mechanism for gene regulation, by which one transcript (e.g., an lncRNA) can control and modulate the expression of another transcript (e.g., an mRNA) by competitive interactions with an miRNA (3). It has been demonstrated that ceRNAs are widely involved in the occurrence and progression of various cancers (4)(5)(6). Moreover, some ceRNAs have been shown to be valuable targets for the treatment and prognosis of multiple cancers (7,8). With the development of highthroughput sequencing technologies, more and more ceRNA networks have been constructed and analyzed in multiple cancers, such as gastric cancer and breast cancer (9,10). However, there are few studies focused on the function of ceRNA networks in MPM. Some recent studies implicated that noncoding RNA (ncRNA) dysfunction is closely associated with the properties of MPM cells (11)(12)(13). Therefore, it is highly likely that certain ceRNA networks may also be involved in the pathogenesis of MPM.
In this study, we analyzed the differential expression of mRNAs, lncRNAs, and miRNAs (DEmRNAs, DElncRNAs, and DEmiRNAs) based on the RNA-Seq and miRNA-Seq data in different subgroups of the MPM patients categorized according to the overall survival of the MPM patients. The DEGs related to the overall survival were further compared with the DEGs between the MPM and the normal tissues in the microarray data. Then, we constructed a ceRNA network of the overlapped DERNAs based on their interactions obtained from multiple databases. Furthermore, Kaplan-Meier survival analyses and univariate, lasso, and multivariate Cox regression analyses were conducted to explore the mRNA, miRNA, and lncRNA biomarkers in the ceRNA network. Accordingly, the risk assessment model combined with the multiple clinical factors and the screened RNAs based on regression coefficients were established, evaluated, and validated based on the two independent MPM datasets. Weighted gene co-expression network analysis (WGCNA) was used to further explore the reliability of the ceRNA network as a prognostic marker as well as its potential mechanism. This study provides new insights on the biological functions related to the lncRNA in MPM patients and the additional biomarkers for the prognosis of MPM.

TCGA MPM Dataset
Data for patients with MPM collected from the TCGA database was regarded as a training group. The criteria of exclusion were set as follows: (1) patients without lncRNA and mRNA information; (2) survival time of patients was unavailable or survival time of alive patients was <30 months. Overall, 80 MPM patients were included in our study. Next, the 80 patients were further screened to be included into two groups according to the survival time: high-risk group and low-risk group. The criteria of grouping were set as follows: patients with overall survival <12 months were included into high-risk group, and patients with overall survival >30 months were included in the lowrisk group. Finally, the expression profiles of the patients in the two groups were used for the differential expression gene (DEG) analyses. In addition, the clinical information and the expression profiles of all 80 patients were used for the univariate, the lasso and multivariate Cox regression analyses, and WGCNA. Similarly, another study from the EGA database including 211 MPM patients was used to validate the RNA expression results. The GSE12345 (four normal tissues and nine MPM tissues) and GSE42977 (nine normal tissues and 39 MPM tissues) datasets were downloaded from the GEO database to explore the DEGs between the normal tissues and the MPM tissues for cross comparison of DEG associated with overall survival.

Data Processing
The normalized read-count data and the Fragments Per Kilobase of transcript per Million fragments of RNA-Seq by RNASeqV2 were obtained from the TCGA database as well as the STAD level 3 microRNA sequencing (miRNA-seq) data. The cleaned fastq files were downloaded from the EGA database. According to the protocol in the TCGA database, the STAR software was used to align the fastq files to the human GRCh38 genome file, and the Htseq-Count software was used to count the reads. Next, the Edge R package was applied to analyze the gene expression profiles to identify the DEmRNAs, DElincRNAs, and DEmiRNAs (log 2 -fold change > 1.0, false discovery rate (FDR) < 0.05). The TPM (Transcripts Per Million) were calculated for the subsequent analyses and validation. The GSE12345 dataset obtained from the Affymetrix Human Genome U133 Plus 2.0 Array was annotated by GPL570. The GSE42977 dataset based on the Illumina HumanRef-6 v2.0 expression beadchip was annotated by GPL6790. The two microarray datasets were normalized by the robust multi-array average algorithm. When multiple probes were mapped to the same gene, the mean of the probe intensities will be used. After that, the differential expression analyses of the microarray data were performed using limma package [fold change > 2.0, false discovery rate (FDR) < 0.05].

Functional Enrichment Analysis
The functional enrichment analyses of the DEGs, including GO function analyses and KEGG pathway analyses, were carried out by the clusterProfiler package. For the GO analyses, cellular component (CC), biological process (BP), and molecular function (MF) terms were analyzed. FDR < 0.05 was used as the statistically significant cutoff in these analyses. Besides, we also performed the gene set enrichment analysis (GSEA) for all the mRNAs using the clusterProfiler package with the c2.all.v6.2.entrez.gmt and c5.all.v6.2.entrez.gmt gene set collections. Gene sets with FDR < 0.05 were considered to be significantly enriched.   Construction of an lncRNA-Related ceRNA Network The overlapped DEGs between the RNA-Seq and the microarray were used for the construction of the prognostic ceRNA network. We used starBase v3.0 (http://starbase.sysu.edu.cn/) to retrieve the lncRNA-miRNA interactions and then used miRBase targets (http://mirdb.org/miRDB/), miRTarBase (http://mirtarbase.mbc. nctu.edu.tw/), and TargetScan (http://www.targetscan.org/) to predict lmiRNA-mRNA interactions (14)(15)(16)(17). To increase the reliability of the results, only the miRNA-mRNA interactions intersecting in all three databases were included into the ceRNA network. Moreover, the Pearson correlations of the pairs between miRNAs and mRNAs were calculated and the absolute values of the pairs more than 0.4 were regarded as interacted pairs. Finally, a ceRNA network was constructed and visualized by Cytoscape v3.7.1.

Survival Prediction Model Construction and Evaluation
The mRNAs, miRNAs, and lncRNAs from the ceRNA network were used for the univariable Cox regression analyses to screen for the RNAs associated with overall survival, and only the RNAs with P < 0.05 were enrolled into the followed Kaplan-Meier survival analyses based on the log-rank test according to the grouping by the median value of the expression quantities. After Kaplan-Meier survival analyses, the mRNAs and lncRNAs with P < 0.05 were included into the lasso Cox regression analyses to further screen the variables. Accordingly, the survival prediction model was constructed through the regression coefficients of the multivariable Cox regression analysis, as described in the following equation: risk score = R1G1 + R2G2 + R3G3 + . . . . . . RnGn (R is the regression coefficient obtained from the final multivariable Cox analysis and L is the expression value of each gene or the type of different clinical factor). Then, the patients were divided into two groups (high-risk-score group and lowrisk-score group) according to the cutoff value obtained from the OptimalCutpoints package, and the Kaplan-Meier survival analyses were performed between the two groups. Moreover, the receiver-operating characteristic (ROC) curve analyses were performed to estimate the efficiencies of the model under the same grouping; only AUC values > 70% were regarded as efficient models.

Identification and Annotation of Gene Modules Related to the Risk Model According to WGCNA
The WGCNA algorithm was used to construct the gene coexpression network (18). We first conducted the WGCNA analysis to identify the gene modules closely related to the overall survival and the risk scores. Then the significant modules were further selected for further analyses, and the mRNAs of the ceRNA network co-expressed in the significant modules were further input in the String database (string.org) to predict the protein-protein interaction (PPI) relationship. The PPI network was then visualized by Cytoscape v3.7.1, and the hub genes were identified using 12 algorithms of the CytoHubba plugin. Then, the results were cross-compared to generate a final list of hub genes. Besides, the genes enrolled into these significant modules were used for GO-BP, GO-MF, and KEGG pathway enrichment analyses by clusterProfiler package.

Evaluation of the Tumor Infiltrating Immune Cells With the Risk Model
To infer the infiltrating immune cells of MPM associated with the ceRNA axes, we used the mRNAs in the ceRNA network to estimate the proportions of the 22 types of infiltrating immune cells by deconvolution of the CIBERSORT algorithm (19). According to the grouping, we obtained the significant changes of the immune cell types between the high-and low-risk-score groups using Student's t-test.

Statistical Analysis
The differential expression gene analyses of the RNA-seq data were performed by the R package "edge R." The differential expression gene analyses of the microarray data were conducted by the R package "limma." The enrichment analyses were performed by R package "clusterProfiling." All of these analyses involved in the multiple-hypothesis test were finally corrected by the FDR method. The univariate and the multivariate Cox regression analyses were performed by R package "survival." The lasso regression analyses were performed by the R package "glmet." Moreover, the Kaplan-Meier analyses were carried out by R package "survival, " whose statistical significances were assessed by the two-sided log-rank test. The statistical analyses between the clinical traits and the risk model were performed by the chi-square test or Fisher exact test according to the distribution of the data. The R software version 3.6.3 was used for the study. All statistical tests were two-sided.

Screening of Patients and Identification of DEGs
A total of 80 patients with MPM were downloaded from the TCGA database. Thereinto, the RNA expression profiles of 21 low-risk (overall survival > 30 months) samples and nine high-risk (overall survival < 12 months) samples were used to explore the DEmRNAs, DElncRNAs, and DEmiRNAs. The clinical and pathological characteristics of MPM patients are shown in Table 1. According to the results of the DEGs, the patients were clearly clustered into two groups, indicating that the overall gene expression patterns are significantly different in the two groups (Figures 1A,B). A total of 1,499 DEmRNAs, 358 DElncRNAs, and 29 DEmiRNAs were obtained according to the criteria (P < 0.05 and fold change > 2, Figures 1C-E).
To explore the functional features of these DEGs related to overall survival, we performed GO and KEGG enrichment analyses (FDR-corrected P < 0.05). The top 10 significant GO terms are mainly involved in the processes and the components related to the extracellular matrix, the cell division, and the receptor regulator activity (Figures 2A-C). Four pathways were enriched in the KEGG analysis ( Figure 2D). These results all suggested that abnormal extracellular matrix components, cell cycle, and immune-response components are potentially important causes in the progression of MPM, which have been closely associated with growth and metastasis of other cancer cells (20).
In addition, to explore the changes of the overall gene expression levels of MPM, 17746 mRNAs were input to implement GSEA based on the fold change order using the gene sets of c2.  (Figure 2E), supporting the role of the extracellular matrix in the progression of MPM and indicating the convergence between the invasion of MPM and the morphogenesis and differentiation of the epithelium. Some specific targets and pathways were mainly enriched in the c2.all.v6.2.entrez.gmt gene set (Figure 2F), suggesting that progression of MPM may partly be involved in the invasive pathways of breast cancer and pathways related to TCF21 and SMARCA2 targets. We next performed the differential expression analyses to explore the DEGs between the MPM tissues and the normal tissues by the limma algorithm based on the microarray data. A total of 1351 DEGs (725 upregulated genes and 626 downregulated genes) were obtained from the GSE12345 dataset, and 1211 DEGs (556 upregulated genes and 655 downregulated genes) were obtained from the GSE42977 dataset (Figures 3A,B). To screen the DEGs associated with the development of the MPM, we further compared the DEGs related to the overall survival from the RNA-Seq dataset with the DEGs related to MPM obtained from the two microarray datasets, which showed the 32 overlapped upregulated DEGs and the 77 downregulated DEGs (Figures 3A,B). Similar to the above results, the enrichment analyses of the overlapped downregulated  DEGs also revealed the abnormal roles of the cell division in the development of the MPM ( Figure 3C). Moreover, the overlapped upregulated DEGs were mainly involved in the regulation of the metabolic processes ( Figure 3D).

Construction of an lncRNA-miRNA-mRNA ceRNA Network
To construct the credible ceRNA network among the lncRNAs-miRNAs-mRNAs, we input the 29 DEmiRNAs and the 358 DElncRNAs to get the interaction data between miRNAs and lncRNAs from the starBase v3.0 database. We next performed target prediction for miRNAs-mRNAs among the 29 miRNAs and the 109 overlapped mRNAs in the three databases (miRTarBase, miRDB, and TargetScan) and cross-compared the results to retain the miRNA-mRNA interactions that are consistent in all three databases. As a result, 26 lncRNAs, 13 miRNAs, and 73 mRNAs were predicted to have close interaction ( Figure 4A, Table 2). To further explore the reliability of the ceRNA network, the Pearson correlations among the miRNAs-mRNAs, lncRNAs, and mRNAs were calculated based on the 80 expression profiles. The interacted miRNAs-mRNAs almost have relative high correlations with |r| > 0.4 ( Figure 4B). Similar to the scattered distributions of the lncRNAs in the ceRNA network, the correlations among the lncRNAs were relatively weak, which suggested the accuracy of our ceRNA network ( Figure 4C). Moreover, the majority of the mRNAs in the ceRNA network have positive correlations; the same with that many of the mRNAs in the ceRNA network are downregulated genes ( Figure 4D). These results suggested the validity of the prognostic ceRNA network.

Construction and Evaluation of Survival Prediction Model
To further determine the association between the expression levels of the 73 mRNAs, 13 miRNAs, and 26 DElncRNAs in the ceRNA network and the overall survival of MPM patients, we performed the univariate Cox regression analyses in the 80 patients from TCGA (Table 3). Next, 52 mRNAs, six miRNAs, and 12 lncRNA were analyzed by the Kaplan-Meier survival analyses according to the grouping by the median value of expression quantities. Thirty-three mRNAs, three miRNAs, and seven lncRNAs were significant in the Kaplan-Meier survival analyses, and the significant mRNAs and lncRNAs were further screened to avoid overfitting by lasso Cox regression analyses ( Figure 5A). As a result, CDC25A, CDK6, CHAF1B, CKS2, CORO1C, WEE1, KIF23, SGMS1-AS1, and LINC00689 were enrolled into the survival model ( Figure 5B, Table 3).
Interestingly, WEE1 and KIF23 were also identified as hub genes in the co-expressed PPI network, highlighting the importance of them related to the prognosis and the progression of MPM and the reliability of the model. Besides, the lncRNAs, WEE1, CKS2, and KIF23 included in the model were interacted with the miRNAs associated with survival time, suggesting that the model possessed high efficiency by integrating the information of mRNAs, lncRNAs, and miRNAs. Next, several clinical factors related to prognosis were also enrolled into the model, including age (≤60 or > 60), gender (female, male), T pathological stage (T1 + T2 or T3 + T4), N pathological stage (N0 + N1 or N2 + N3), and tumor type (epithelioid mesothelioma or non-epithelioid mesothelioma). According to the result of multivariate analysis, a risk model was constructed: Risk score = −0.472 * age + 0.400 * gender + 0.418 * (T pathological stage)−0.180 * (N pathological stage) + 0.102 * (tumor type) + 0.036 * CDC25A(exp) + 0.017 * CDK6(exp) + 0.054 * CHAF1B(exp) + 0.003 * CKS2(exp) + 0.002 * CORO1C(exp) + 0.0237 * WEE1(exp)−0.004 * KIF23(exp)−0.0597 * SGMS1-AS1(exp)−0.056 * LINC00689(exp). The cutoff value was determined by the sensitivity equal to the specificity method from the OptimalCutpoints package, which showed the best sensitivity and specificity for prognostic prediction (AUC values = 86.9%, Figure 5D). The Kaplan-Meier survival curves showed that the overall survival of patients with high-risk scores (n = 32) were significantly lower than those with the low-risk scores in the three models (p < 0.001, Figure 5C). The results showed that the model is of value for prognostic prediction (Figures 5C,D). In this risk model, CDK6 and SGMS1-AS1 were significant as potential independent prognostic factors ( Figure 5E). Interestingly, the histological types of MPM were associated with the risk score after the statistical analyses between the clinical traits and the risk score, similar to the previous report about the correlation between the histological types and the overall survival (Table 4). Furthermore, another dataset of MPM patients was used to validate our results. According to the TPM values of 128 patients in the EGA database, the expression of the nine DEGs in our risk model were further confirmed to be significant based on Student's t-test with or without Welch correction (low risk: 40 patients, high risk: 88 patients, Figure 6). Moreover, the Kaplan-Meier survival analyses of the nine DEGs in our risk model based on the data of the 211 patients from EGA also showed that all of them are closely related to the survival time of the MPM patients (Figure 7). Likewise, the Kaplan-Meier survival analyses and the ROC curves showed that the risk model has great values to estimate the prognosis of MPM (P < 0.001, AUC values = 77.0%, Figures 5F,G). Also, the multivariate Cox analysis was performed to verify the identified potential independent prognostic factors according to the variates from the survival model in the EGA data, which showed that CDK6 had marginal significance and KIF23 and CORO1C were significant in the model (Figure 5H). In summary, all of these results using the EGA data validated the effectiveness of the risk model.

Identification of Gene Modules and Functional Analyses According to WGCNA
To investigate the potential biological function of the ceRNA network, we used the WGCNA method to cluster the DEmRNAs based on the 80 expression profiles obtained from the TCGA database. Except the non-co-expression module (gray), we identified nine co-expression modules ( Figure 8A) and associated the clinical traits with the identified modules, including the tumor stage, survival time, and risk score ( Figure 8B). We found that the turquoise module and the blue module were the top two most significant modules associated with the risk scores and the survival time ( Figure 8B). We then performed the Pearson analyses based on the genes in the two modules and further revealed that the turquoise module was positively correlated with the risk scores and the blue module was negatively correlated with the risk scores (turquoise module-risk score: cor = 0.9, P = 1e-200; blue modulerisk score: cor = 0.81, P = 6e-59; Figures 8C,D). Notably, 7 of 73 mRNAs in the ceRNA network co-expressed in the blue module and 42 of 73 mRNAs in the ceRNA network co-expressed in the turquoise module (turquoise module: 513 genes, blue module: 195 genes), which showed that the genes in the turquoise co-expression modules associated with survival were overrepresented in the ceRNA network and indicated the reliability of the ceRNA network (P = 2.458e-05). We then input the genes in the turquoise module into the String database to predict the PPI relationship ( Figure 8E). We further analyzed the topical structure of the PPI network by 12 methods to obtain the hub genes. According to the degrees among the genes, WEE1 and KIF23 were finally selected as the hub genes in the ceRNA PPI network, suggesting that they play important roles in the MPM. The top 10 results of GO-BP, GO-MF, and KEGG pathway enrichment analyses using the genes included into the blue module and the turquoise module are shown in Figure 9. In the blue module, GO terms are mainly involved in the constituent of extracellular matrix (Figures 9A,B). As for pathways, they may be associated with the terminal symptoms of cancer patients (Figure 9C).
In the turquoise module, multiple processes related to the cell division were enriched (Figures 9D,E). Moreover, the markedly enriched pathways are associated with the cell cycle, the homologous recombination, the microRNAs in cancer, the p53 signaling pathway, etc. Figure 9F), which have been reported to participate in multiple levels of the tumorigenesis and the invasion. Overall, the results of the enrichment analyses demonstrated that the genes in the turquoise module are mainly involved in the abnormal process of cell division.

The Immune Cell Infiltration Analysis in the ceRNA Network
The tumor microenvironment involves the extracellular matrix components, the immune cells, and the other cellular components, which have partly been highlighted in multiple analyses above. To further investigate the biological function of the ceRNA network regarding the tumor microenvironment, we utilized the CIBERSORT algorithm to study the changes of tumor-infiltrating immune cells between the high-and low-risk-score groups according to the DEmRNAs in the ceRNA network (73 mRNA). We calculated the variations of the 22 types of immune cells among the 80 patients ( Figure 10A) and found that the components of the nine immune cell subtypes exhibited obvious variations between the high-and low-risk groups. Compared to the low-risk group, the high-risk group possesses a higher proportion for the neutrophils, the NK cells activated, and the T cell gamma delta, while the dendritic cell resting, the dendritic cells activated, the naïve B cells, the macrophages M0, the monocytes, and the CD4-naive T cells decreased (Figures 10B-J). These results suggested that the ceRNA network may be involved in the regulation of the tumorrelated microenvironment partly by changing the immune infiltration, which is potentially important for the subsequent targeted intervention.

DISCUSSION
MPM is a highly aggressive cancer. In the past years, great progress has been made in the knowledge of the occurrence and the development of MPM, and some target molecules and pathways were identified, including BAP1 and YAP/TAZ/TEAD oncogenic axis (21,22). However, the pathomechanism of MPM is still largely unknown, and the clinical outcomes are highly heterogeneous. Thus, the pathogenesis of MPM requires further study, and more effective biomarkers and predictive models need to be established to satisfy the clinical setting.
Up to now, the studies aimed at screening the potential diagnostic and therapeutic targets of MPM mainly focused on the protein-coding genes (23). However, the biological processes of MPM are very complicated, which are involved in the complex interactions of the multiple types of components rather than regulation of individual molecules. Thus, it is necessary to understand the mechanism of tumor progression based on the regulatory networks. In this study, we comprehensively analyzed the expression and function of aberrant RNAs and established a ceRNA network of MPM to screen the prognostic biomarkers and construct a survival prediction model.
We firstly identified the DEGs closely related to overall survival and then annotated the abnormal function. We found that extracellular matrix organization, chromosome structure, and cell cycle were mainly involved in MPM progression. It has been well-established that the chromosome structure and the cell-cycle regulation are closely related to the cancer. Interestingly, recent studies also suggest that the tumor microenvironment is mainly formed by extracellular matrix organization, which plays an important role in the invasion and metastasis of tumor (24)(25)(26). As tumor cells proliferate, the surrounding extracellular matrix dynamically interacts with resident cells to change the architecture of the microenvironment (27). Our findings supported the importance of the extracellular matrix in MPM. In addition, the GSEA analyses based on the expressional changes of all genes also suggested anomalous morphogenesis and differentiation of epithelium and the invasive process of breast cancer partly shared the processed with the MPM development. The pathways related to oncogenes TCF21 and SMARCA2 were also enriched in the GSEA analyses. TCF21 and SMARCA2 have been reported to participate in the multiple steps of the invasion of several cancers, such as proliferation, chemoresistance, and migration (28)(29)(30), suggesting the convergence of targets among the different cancers. Besides, the overlapped DEmRNAs between the RNA-seq data and the microarray data were also identified. The function of the overlapped DEmRNAs further suggested the dysfunction of the cell division in the development of MPM.
Next, multiple databases were simultaneously used to predict the interaction among the lncRNA-miRNA-overlapped mRNAs. Finally, 26 lncRNAs, 13 miRNAs, and 73 target mRNAs were included to construct the ceRNA network. Thirty-three mRNAs, three miRNAs, and seven lncRNAs are finally identified as the genes associated with overall survival by univariate Cox regression analyses and Kaplan-Meier survival analyses. Seven mRNAs and two lncRNAs were finally selected and further used to establish survival predictive models (mRNAs: CDC25A, CDK6, CHAF1B, CKS2, CORO1C, WEE1, and KIF23; lncRNAs: SGMS1-AS1 and LINC00689). Besides, after reviews of these genes in the model, we found that all of the seven genes have been associated with multiple processes of tumor, where CDC25A, CDK6, and KIF23 have been reported to directly participate in the pathophysiological process of MPM (31)(32)(33). The expression of CDC25A has been reported to correlate with lymph-node spread of MPM (33). CDK6, WEE1, and KIF23 are thought as potential therapeutic targets, which are closely related to the prognosis of the MPM patients (31,34). Similarly, it is noteworthy that CDK6 is also significant or marginal significant in the multivariate Cox analysis of the model according to both TCGA and EGA data, which indicated that CDK6 may be an important target and an independent prognostic factor. Although the other four genes have not been directly linked to MPM, they are widely involved in the development of cancers. CHAF1B plays a considerable role in leukemia pathogenesis and proliferation of the lung cancer and the prognosis of some cancers (32,33). CKS2 and CORO1C participate in the growth, invasion, and prognosis of several types of cancers (35)(36)(37)(38). Therefore, it is possible that they are also involved in the progression of MPM. LINC00689 and SGMS1-AS1 have not been extensively studied. However, there are some hints that they may be genuine tumor factors. It has been reported that LINC00689 can promote the growth, metastasis, and glycolysis of the glioma cells by completing with miR-338-3p (39). The abnormal expression of SGMS1 has been reported to regulate the epithelial-to-mesenchymal transition in several tumors (40)(41)(42). Moreover, SGMS1-AS1 may be an independent prognostic factor, which needs more data to verify. Thus, the two lncRNAs are likely to participate in the invasive process of MPM. The functions of all these genes are worth further studying, especially the concrete validation of the lncRNA-miRNA-mRNA axis. The genes included in the model exhibited high efficiency by integrating the information of mRNAs, lncRNAs, and miRNAs. The other genes may also have high clinical significance. Interestingly, some of these genes out of the models were also directly associated with the MPM, which were mainly regarded as potential therapeutic targets, including CHEK1, DNMT3B, AXL, PAPPA, and miR-302b. Previous studies suggested that these genes are involved in the multiple antineoplastic processes of MPM, such as relief of chemo-and radioresistance, anti-proliferation, inhibition of growth and migration, and induction of apoptosis (31,(43)(44)(45)(46)(47)(48). Only two small-molecule drugs aiming to target the identified genes have been preliminarily evaluated, including palbociclib targeting CDK6 and AZD1775 targeting WEE1. Therefore, more preclinical experiments should be conducted to screen the drugs targeting the corresponding genes and assess the antitumor effect of these identified targets before the clinical studies, especially for these identified lncRNAs, in consideration of their central roles in the ceRNA network.
We next clustered the nine co-expression modules from the 1500 DEmRNAs based on the method of WGCNA and found that the blue module and the turquoise module were significantly associated with the overall survival and the risk model. Importantly, the two hub genes WEE1 and KIF23 in the turquoise co-expression PPI network have been previously reported to play important roles in the aggressive property of MPM, whose inhibition was previously regarded as the promising therapeutic targets (49,50). Interestingly, WEE1 and KIF23 are also included into the survival model, which indicated the validity of the model and the ceRNA network. Besides, KIF23 was identified as an independent survival predictor in EGA data, but not in the TCGA data, possibly due to insufficient data. As for the biological functions of KIF23 and WEE1 in MPM, they remain largely unknown. The ceRNA mechanisms of the two genes in MPM also have not been assessed, which may be important for the growth and metastasis of the MPM cells. Similarly, the gene enrichment analyses further suggested that the blue module mainly regulates the function of the extracellular matrix, indicating that the ceRNA network could affect the tumor microenvironment and consequently contribute to MPM progression. Moreover, the functions of the turquoise module were mainly involved in the process of cell division and in the structure of cells, whose dysfunction is widely regarded as the initiation and the invasive factor of cancer. In other words, we deduce that the ceRNA network mainly regulates the invasion of MPM by regulating the extracellular matrix and the cell division.
The tumor-related microenvironment also includes the immune cells, the fibroblasts, and the endothelial cells, which could inhibit the development of tumor. However, with the progression of tumor, the inhibitory signals and the immune cells could be circumvented by changing the tumor microenvironment (51). Therefore, there is a complex relationship between the tumor cells and the immune cells, which plays important roles in either elimination of tumor cells or invasion of rumors. Thus, we calculated the proportions of the 22 types of immune cells by the deconvolution of the DEmRNAs from the ceRNA network, whose result suggested that the ceRNA network may also be involved in the infiltration of multiple immune cells and the regulation of the microenvironment of MPM.
To conclude, we have identified a number of potential prognostic genes and analyzed their functions related to the progression of the MPM. A ceRNA regulatory network was accordingly constructed, where 33 mRNAs, three miRNAs, and seven lncRNAs are finally identified as the genes associated with the overall survival. Then an efficient risk score assessment system based on two lncRNAs, seven mRNAs, and five clinical factors were accordingly established and validated to predict the overall survival of MPM patients by two independent MPM casecontrol studies. Besides, the influence of the ceRNA network on the tumor microenvironment has also been assessed from several perspectives, providing a further understanding of the mechanisms of MPM progression. The follow-up clinical and experimental researches are needed to further optimize the models and concretely clarify the role that these genes play in the MPM.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary materials, further inquiries can be directed to the corresponding author/s.

ETHICS STATEMENT
Ethical approval was not provided for this study on human participants because the human datasets were obtained from the public database including TCGA, EGA and GEO, which have been approved by the ethical reviews. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
WD, BX, and PH: conceptualization. WD and KW: methodology, software, and writing-original draft preparation. WD, YD, and XYC: validation. WD and XFC: formal analysis. BX and PH: writing-review and editing, supervision, and project administration. WD: visualization. BX: funding acquisition. All authors have read and agreed to the published version of the manuscript.