Integrated Analysis of a Risk Score System Predicting Prognosis and a ceRNA Network for Differentially Expressed lncRNAs in Multiple Myeloma

Long non-coding RNAs (lncRNAs) are non-protein-coding RNAs longer than 200 nucleotides. Accumulating evidence demonstrates that lncRNA is a potential biomarker for cancer diagnosis and prognosis. However, there are no prognostic biomarkers and lncRNA models for multiple myeloma (MM). Hence, it is necessary to screen novel lncRNA that can potentially participate in the initiation and progression of MM and consequently construct a risk score system for the disease. Raw microarray datasets were obtained from the Gene Expression Omnibus website. Weighted gene co-expression network analysis and principal component analysis identified 12 lncRNAs of interest. Then, univariate, least absolute shrinkage and selection operator Cox regression and multivariate Cox hazard regression analysis identified two lncRNAs (LINC00996 and LINC00525) that were formulated to construct a risk score system to predict survival. Receiver operating characteristic analysis certificated the superior performance in predicting 3-year overall survival (area under the curve = 0.829). The similar prognostic values of the two-lncRNA signature were also observed in the tested The Cancer Genome Atlas dataset. Furthermore, two other lncRNAs (LINC00324 and LINC01128) were differentially expressed between CD138+ plasma cells from normal donors and MM patients and were verified to be associated with cancer stage in the Gene Expression Omnibus dataset. A lncRNA-mediated competing endogenous RNA network, including 2 lncRNAs, 12 mitochondrial RNAs, and 103 target messenger RNAs, was constructed. In conclusion, we developed a two-lncRNA expression signature to predict the prognosis of MM and constructed a key lncRNA-based competing endogenous RNA network in MM. These lncRNAs were associated with survival and are probably involved in the occurrence and progression of MM.


INTRODUCTION
Multiple myeloma (MM) is the second most common hematological malignancy. It is caused by the clonal proliferation of malignant plasma cells in the bone marrow (BM) (Laubach et al., 2011). MM is characterized by renal impairment, lytic bony lesions, anemia, and bone pain. The survival of MM patients ranges from a few weeks to more than 10 years (Decaux et al., 2008;Chen W. C. et al., 2017;Cowan et al., 2018).
As a newly discovered type of non-coding RNA, long noncoding RNAs (lncRNAs) function as imperative regulators involved in tumorigenesis, tumor suppression (Poliseno et al., 2010;Hung and Chang, 2010), and many biological processes (Geisler and Coller, 2013;Fatica and Bozzoni, 2014). Many lncRNAs involved in the initiation and progression of MM have been identified. Furthermore, lncRNAs can also regulate gene expression by interacting with mitochondrial RNA (miRNA) at miRNA-binding sites (MREs). For example, MALAT1 is an lncRNA that inhibits the proliferation and adhesion of myeloma cells by upregulating the expression of miR-181a-5p (Sun et al., 2019a). The aberrant expression of urothelial cancer associated 1 lncRNA affords it the ability to promote proliferation and inhibits apoptosis by regulating miR-1271-5p and hepatocyte growth factor in MM cells (Yang and Chen, 2019). Abnormally expressed lncRNA NR_046683 in patients of different MM subtypes and stages indicated that it could be used as a new indicator for potential drug target and prognosis (Dong et al., 2019). Although several lncRNA prognostic models have been identified in uterine corpus endometrial carcinoma (Ouyang et al., 2019), hepatocellular carcinoma (Sun et al., 2019b), cervical cancer (Wu et al., 2019), and lung adenocarcinoma , the clinical implication of most lncRNAs in MM remains unclear.
Weighted gene co-expression network analysis (WGCNA) is an algorithm that is frequently used to cluster highly synergistically altered gene sets into separate modules. This can establish connections with clinical traits and thus screen out candidate indicator genes or therapeutic targets (Langfelder and Horvath, 2008;Shi et al., 2010). Principal component analysis (PCA) is another mathematical algorithm. It is a powerful technique that is widely applied in bioinformatics and other fields. It can reduce the dimensionality of the data while retaining most of the variations that are uncorrelated in the data set. These unrelated variables are called principal components (PCs) (Ringner, 2008). After identifying new variables, the PCs, with a sample-like pattern and a weight for each gene, further exploration can be done by building a link with clinical data, and candidate genes can be obtained by comparing component loadings. In the present study, the Gene Expression Omnibus (GEO) public integrated database provided an application platform of genomic sequencing data along with the clinical information of each MM patient. WGCNA and PCA were performed to explore public sequencing data and clinical information of MM patients.
A few key gene modules associated with tumor stage and PCs correlated with risk score and proliferation index were identified, and 12 lncRNAs in the intersection were identified. We found a two-lncRNA signature that might act as an independent prognostic factor to identify MM patients that are at higher risk of poor clinical outcome. Furthermore, using other datasets, we recognized database of essential genes (DEG) and constructed a competing endogenous RNA (ceRNA) network in MM based on two of the 12 abnormally expressed lncRNAs. These two lncRNAs may participate in tumorigenesis or serve as clinical indicators of the progression of MM.

Weighted Gene Co-expression Network Analysis Identification of Clinically Significant Modules
A total of 32 MM samples with a known stage of cancer were utilized to conduct the hierarchical clustering analysis using the WGCNA package. The sample dendrogram and clinical trait heatmap of GSE16791 is displayed in Figure 1A.
No obvious outlier was evident in the sample clustering. The information of two clinical traits of 32 MM samples, including age and cancer stage, is presented in Figure 1A. Selecting the best soft-thresholding powers is imperative to obtain relatively balanced scale independence and mean connectivity. As presented in Supplementary Figure S2A, we selected β = 8 (scale-free R 2 = 0.81) as a soft-threshold to construct a scale-free network, and a total of 21 modules were detected ( Figure 1B). As the overall gene expression level of the corresponding module, the module eigengenes were calculated to assess the relationship between modules and clinical information by Pearson's correlation analysis. The results indicated that the stage was negatively associated with blue and green modules ( Figure 1C). Scatterplots of gene significance of stage vs. module membership in the blue and green modules revealed that they were highly correlated (Supplementary Figure S2B). Also, we calculated eigengenes of all modules and clustered them on the base of their correlations. A module eigengenes dendrogram indicated that the blue and green modules were clustered together, and the eigengene network heatmap revealed similar results (cor = 0.65, P = 5e-05; Supplementary Figure S2C). Therefore, we chose blue and green modules for further analysis.

Principal Component Analysis Determination of Interesting Principal Components Associated With Clinical Traits
Principal component analysis was performed on the 52 samples in GSE17306. In this dataset, the gene expression profiling (GEP)risk score and proliferation index of each sample were calculated according to the GEP (Zhou et al., 2010). Initially, PCA created 52 composite variables (PCs) by reducing the dimensionality of numerous genes. The first 33 components, which explained 80% of the variability among the 52 samples, were retained to correlate clinical traits (Figure 2A). These 33 composite variables are enough to explain the sample differences to the greatest extent. Next, to ascertain the capability of PCs to differentiate risk score level and proliferation index level, the pairs plot was conducted to compare PC1 with PC8 on a pairwise basis ( Figure 2B). Additionally, a bi-plot of PC1 versus PC6 indicated that PC6 could roughly distinguish the high-risk group from the lowrisk group ( Figure 2D). Next, we correlated the PCs back to the clinical data, including the GEP-risk score and proliferation index, to identify interesting PCs. PC6 and PC8 were negatively associated with risk score and proliferation index in all the 33 PCs retained ( Figure 2C). PC11 and PC12 were positively correlated with the proliferation index. For each PC of interest, "plotloadings" determined the genes ranked in the top 20 of the loadings range and then created a final consensus list of these ( Figure 2E).

Construction a Risk Assessment Model
To construct a lncRNA scoring system that is predictive of survival in the MM patients, we extracted lncRNAs from the blue and green modules and PC6 and PC8 based on the Genecode annotation 1 . Finally, a total of 12 lncRNAs were obtained from the intersection of the interesting modules and PCs ( Figure 3A). The expression levels of 12 lncRNAs were extracted from GSE57317 to conduct the univariate Cox regression analysis. The results of the univariate Cox analysis of 12 prognostic lncRNAs from the discovery cohort are shown in Table 1. After this, six significant lncRNAs (P < 0.05) were identified and were included in the least absolute shrinkage and selection operator (LASSO) model; cross-validation was adopted to select the penalty parameters ( Figures 3B,C). Two lncRNAs were identified based on lambda.1se values (Supplementary Table S1). The quantitative real-time polymerase chain reaction (qRT-PCR) results showed that the expression of LINC00525 was significantly downregulated in Roswell Park Memorial Institute (RPMI)-8226 and KM3 cell lines, whereas LINC00996 was significantly upregulated in KM3 cell line compared with normal plasma cells (Supplementary Figures S3A,B). We further included expression levels of the two lncRNAs in a multivariate Cox model. The risk score = (−0.3647) × (expression value of LINC00996) + (−0.4266) × (expression value of LINC00525). The details of the two lncRNAs are depicted in Figure 4B. We used the median of the risk score as the cutoff to define the groups of MM patients with high and low scores ( Figure 4A). The survival time and overall survival (OS) status in the training dataset are presented in the middle panel of Figure 4A. Compared with those in the low-risk score group, patients in the high-risk score group displayed an obviously worse OS ( Figure 4C). The 3-year survival receiver operating characteristic (ROC) curve was also plotted. The area under the curve of the  risk score reached 0.829 ( Figure 4D), revealing that the risk score based on the two lncRNAs is a good indicator of prognosis. The results of univariate and multivariate Cox regression analyses indicated that the risk score (P < 0.001 and P = 0.006) was an independent prognostic indicator (Supplementary Table S2). To further examine the accuracy of the lncRNA risk score model developed in the training dataset, the performance of the risk score was also evaluated in The Cancer Genome Atlas (TCGA) dataset. The result of multivariate Cox regression analysis for the expression level of two lncRNAs in the TCGA dataset is presented in Supplementary Figure S4B. The risk survival status, score distribution, and expression pattern of the two lncRNAs in the 787 MM patients in the TCGA dataset are displayed in Supplementary Figure S4A. Also, corresponding to our previous conclusion, the OS was significantly shorter in the high-risk group compared with that in the low-risk group (Supplementary Figure S4C), and the AUC of the risk score reached 0.584 (Supplementary Figure S4D). Univariate Cox regression analyses were conducted to detect various factors correlated with prognosis. The results revealed that age (P = 0.009), tumor stage (P < 0.001), and risk score (P = 0.002) were significantly associated with the OS of the MM patients. A subsequent multivariate Cox regression analysis indicated that the tumor stage (P < 0.001) and risk score (P = 0.001) were independent prognostic indicators (Supplementary Table S3).  Data are presented as the mean ± standard deviation. *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001.
Frontiers in Genetics | www.frontiersin.org FIGURE 6 | A co-expression network of six lncRNAs and DEmRNAs in the blue module. For simplicity, only DEmRNAs with a connection with the interesting lncRNAs in the blue module were retained to construct the co-expression subnetwork. lncRNAs are depicted by diamonds, whereas DEmRNAs are indicated by the rounded rectangles (upregulation) or vs. (downregulation). Size of genes is related to the intra-modular connectivity, and the color is related to the weighted score of the interactions.

Detection and Validation of Differentially Expressed Long Non-coding RNAs
CD138+ plasma cells obtained from healthy donors in GSE16558 and GSE47552 were analyzed. Based on the cutoff criteria of P < 0.05, 20 DELs were detected ( Figure 5A). Surprisingly, among the 12 prognostic lncRNAs we identified earlier, LINC00324 and LINC01128 are abnormally expressed (Figures 5B,C). The relationship between the two lncRNAs and cancer stages in GSE16791 is displayed in Figure 5D. Expression levels of the two lncRNAs among patients with different stages were compared, and statistical differences were calculated using Student's t-test. Corresponding to our previous WGCNA and PCA results, patients with poorly differentiated stage III cancer displayed significantly lower LINC00324 expression levels compared with patients with moderately differentiated cancer of less advanced stage I. Furthermore, increased expression of LINC01128 was correlated with advanced MM stage. Also, to determine the prognostic value of these two lncRNAs in MM, the survival data of MM patients were obtained from the TCGA database and GSE57317. As presented in Figure 5E, patients with high LINC01128 expression exhibited a significantly poorer OS rate compared with patients with high LINC01128 expression. On the contrary, we observed that patients with higher LINC00324 expression had better OS than those with lower LINC00324 expression. These results indicate that LINC00324 may be a tumor suppressor gene, whereas LINC01128 may be a cancer gene. The qRT-PCR results also showed that the expression pattern of the two lncRNAs in MM cells and normal plasma cells was similar to the microarray results ( Figure 5F). LINC01128 was upregulated, whereas LINC00324 was significantly downregulated in the three MM cell lines.
FIGURE 7 | Functional annotation analysis of DEmRNAs co-expressed with LINC01128. (A) Gene ontology (GO) terms enrichment analysis was performed using the Database for Annotation, Visualization, and Integrated Discovery and visualized by GOplot. Significance of a term is indicated by the adjusted P-value (adj_p-val). logFC of selected genes is taken from GSE16558. z-score provided a hint if the biological process (/molecular function/cellular components) was more likely to be decreased (negative value) or increased (positive value). z-score is assigned to the x-axis and the negative logarithm of the adjusted P-value to the y-axis. Area of the displayed circles is proportional to the number of genes assigned to the term, and the color corresponds to the category. A threshold for the labeling is set as log(adj_p-value) > 2.8. (B) Plot of the enriched KEGG pathway. Outer circle shows a scatter plot for each term of the logFC of the assigned genes. Red circles display upregulation and blue ones downregulation by default. There were no upregulated DEmRNAs co-expressed with LINC01128 enriched.

Co-expression Network of Key Long Non-coding RNAs and Differentially Expressed Messenger RNAs in the Blue Module
Based on the previous results, we recognized six lncRNAs (LINC00525, LINC00996, LINC01128, LINC00324, LINC101929759, and LINC339803) as potential biomarkers or prognostic indicators. These lncRNAs were all in the blue module. To further dissect the role of six lncRNAs in MM, we created a gene co-expression subnetwork for the genes in the blue module according to their topology overlap matrix similarity; messenger RNAs (mRNAs) connected to six lncRNAs are too much to display perfectly; thus, we selected only differentially expressed mRNAs (DEmRNAs) to construct a network. Our lncRNAs may potentially regulate these co-expressed DEmRNAs through the ceRNA mechanism. DEmRNAs were obtained from GSE16558 and GSE47552 based on the cutoff criteria of a P-value < 0.05; | log (FC)| > 1.680 DEmRNAs that overlapped in GSE16558 and GSE47552 were identified (Supplementary Table S4). Finally, the connections between the six lncRNAs and DEmRNAs are displayed in

Functional Annotation
The preceding findings indicated that the LINC00324 and LINC01128 were potentially involved in the occurrence and progression of MM. To more precisely understand the biological relevance and function of these two lncRNAs, we uploaded DEmRNAs, which were co-expressed with key lncRNAs in the blue module into the Database for Annotation, Visualization, and Integrated Discovery to conduct Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses. The results were visualized using the GOplot R package. The results of the differential analysis were used to calculate a z-score for presenting enriched KEGG pathways (Supplementary Table S4). Regarding enriched GO terms, DEmRNAs coexpressed with LINC01128 were mainly enriched in the endoplasmic reticulum to Golgi vesicle-mediated transport, protein transport, mitotic nuclear division, cytosol, Golgi membrane, nucleoplasm, and protein binding ( Figure 7A). Regarding the enriched KEGG pathways, there were no upregulated DEmRNAs co-expressed with LINC00324 enriched, and other downregulated DEmRNAs were significantly enriched in the cell cycle, propanoate metabolism, B-cell receptor signaling pathway, protein processing in the endoplasmic reticulum, chronic myeloid leukemia, human T-cell lymphotropic virus type 1 infection, and beta-alanine metabolism ( Figure 7B). There were no significant results for LINC00324 because too few mRNAs are connected with it.

Gene Set Enrichment Analysis and Gene Set Variation Analysis Reveal a Close Relationship Between Key Long Non-coding RNAs, Multiple Cancer-Related Pathways, and Metabolic Pathways
To further investigate the potential functions of LINC01128 and LINC00324, we performed gene set enrichment analysis (GSEA) and gene set variation analysis (GSVA) on the GSE16791 dataset. We divided these samples into two groups based on the expression levels of these two lncRNAs. As shown in Figures 8A-G, samples in GSE16791 with high expression of LINC01128 were enriched in multiple cancer-related pathways, including the P53 signaling pathway, cell cycle, mismatch repair, nucleotide excision repair, and several metabolic pathways, including cysteine and methionine metabolism, peroxisome, and beta-alanine metabolism. Also, our previous finding that the DEmRNAs co-expressed with LINC01128 were enriched in beta-alanine metabolism was, surprisingly, verified by GSEA and GSVA results (Figures 8G,M). The expression level of LINC00324 was also extracted for enrichment analysis. Genes in the high expression groups of LINC00324 were mainly involved in multiple metabolic pathways, including propanoate metabolism, selenoamino acid metabolism, aminoacyl-tRNA biosynthesis, tyrosine metabolism, and lysine degradation (Figures 8H-L).

Long Non-coding RNA-Mediated Competing Endogenous RNA Network Revealed Potential Mechanisms of LINC01128 and LINC00324
To investigate the interaction between the lncRNA and mRNAs, the lncRNA-miRNA-mRNA network was constructed according to the ceRNA hypothesis by integrating expression profile data and their regulatory relationships. We obtained DEmRNAs, DEmiRNAs based on the criteria mentioned in section "Materials and Methods." The interaction between the two lncRNAs and miRNAs were first predicted through Starbase3.0 and the RNA22 tool. We then predicted that the potential DEmiRNAs can target LINC01128 and LINC00324 co-expressed DEmRNAs in the blue module using DIANA TOOLS (Supplementary Table S5). Finally, a total of 12 miRNAs overlapped in our prediction results; 2 lncRNAs and 103 mRNAs were included in the ceRNA network (Supplementary Table S6), and their regulatory relationships were visualized by Cytoscape (Figure 9). In this network, different shapes represent different RNA types, with pink and blue denoting up-and downregulation, respectively.

DISCUSSION
Multiple myeloma is the most common primary bone cancer among 70-year-old and older American adults (Reisenbuckler, 2014). Although genetic and epigenetic events contributing to the occurrence and progression of MM have been increasingly identified, the diagnosis, treatment, and clinical outcome of MM remain mostly unclear (Prideaux et al., 2014). More recently, aberrant lncRNA expression in MM was observed and further validated to be involved in epigenetic, transcriptional, and posttranscriptional regulation (Meng et al., 2018). Several lncRNA prognostic models have been identified in multiple cancers, including hepatocellular carcinoma , bladder cancer , non-small cell lung cancer (Zhou et al., 2015a;Sun et al., 2020), breast cancer (Shen et al., 2020), glioma , glioblastoma , and diffuse large B-cell lymphoma (Zhou et al., 2017). These studies had highlighted the diagnostic and prognostic roles of lncRNAs, and the lncRNA signatures they constructed had an imperative value for survival predicting for different cancer patients. Therefore, identifying new and effective prognostic biomarkers and establishing a reliable prognostic model based on lncRNA expression signature are critical for patients with MM.
WGCNA is a powerful algorithm that has not yet been utilized to analyze the expression profile of MM samples. Presently, a total of 32,216 genes, which were not all DEGs, were selected to conduct WGCNA analysis in case of missing significant information. Furthermore, we applied PCA for the first time to correlate PCs with clinical traits to find key lncRNAs. Then, 12 key lncRNAs that were associated with cancer stage, risk score, and proliferation index were identified in the intersection of key modules and PCs. Univariate Cox regression analysis retained six significant lncRNAs (P < 0.05) for further analysis. A co-expression network of six lncRNAs and co-expressed DEmRNAs in the blue module was constructed to present the co-expression pattern and the relationship between key lncRNAs and DEmRNAs. This network can provide insights for identifying possible targets of key lncRNAs. After the LASSO and Cox proportional hazard regression analysis, we detected a prognostic formula for predicting survival based on the two lncRNAs, including LINC00525 and LINC00996, and verified it in the testing set. The patients were ultimately divided into high-or lowrisk patients according to the median risk value. Kaplan-Meier analysis showed that the patients in the high-risk score group displayed obviously worse OS compared with those in the lowrisk score group. Furthermore, ROC curve analysis revealed the stability and accuracy of the two-lncRNA signature in predicting patient prognosis. Further analysis showed that the two-lncRNA risk score signature is an independent predictor of MM patient prognosis. Indeed, prior studies had established several lncRNA prognostic signatures that can provide a comprehensive clinical assessment of MM prognosis (Zhou et al., 2015b;Samur et al., 2018). Significantly, instead of simply utilizing survival associated lncRNAs to construct lncRNA prognostic signatures, it Is our first time to combine WGCNA and PCA to select prognostic lncRNAs that could be further used to establish a survival model. Subsequently, we performed a series of rigorous analyses, including univariate, LASSO Cox regression, and multivariate Cox hazard regression analysis to realize exact survival prediction. Additionally, in contrast to the earlier lncRNA model in MM (Zhu et al., 2020), we utilized an external dataset to examine the accuracy of our lncRNA signature.
Many recent studies have indicated that lncRNAs can regulate gene expression by interacting with the miRNA via MREs in MM (Sun et al., 2019a;Yang and Chen, 2019). Thus, it is imperative to recognize MM-specific lncRNAs as biomarkers and determine their potential mechanisms. These lncRNAs may be essential in the initiation and development of MM. Firstly, we identified 12 interesting lncRNAs, which may participate in the development of MM. To further select MM-specific lncRNAs, we screened DElncRNAs that overlapped in GSE16558 and GSE47552. Surprisingly, our PCR and microarray results indicated that 2 of the 12 lncRNAs (LINC01128 and LINC00324) were differentially expressed. LINC00324 can promote proliferation and metastasis but can inhibit cell apoptosis of lung adenocarcinoma cells by sponging miR-615-5p to promote AKT1 expression . Similar results were also found where LINC00324 can promote gastric cancer cell proliferation by binding with HuR and stabilizing FAM83B expression (Zou et al., 2018). It can also be used to predict the prognosis in patients with thymoma (Gong et al., 2018). There are no references for LINC01128. Its potential function remains to be determined. Next, GO analysis revealed that those LINC01128 co-expressed DEmRNAs were associated with protein transport and protein binding processes. KEGG pathway analysis demonstrated that they were enriched in cancer-related pathways, including cell cycle, chronic myeloid leukemia, small cell lung cancer, and metabolism-related pathways, including propanoate metabolism and beta-alanine metabolism. To further explore the underlying mechanism of LINC00324 and LINC01128, we formulated a ceRNA network based on predicted interactions between DEmiRNAs and DEmRNAs. Based on our network and the ceRNA mechanism, we speculated that LINC01128 might act as a tumor suppressor in MM through multiple mechanisms, including miR-142-5p/PARP9 or FAM133A axis, and the miR-299-3p/estrogen-related receptor gamma axis. The cancer-testis antigen FAM133A is a downstream target of miR-155 and is a negative regulator of glioma invasion and migration . Estrogen-related receptor gamma is a tumor suppressor as well as an activator of multiple cancers, including gastric cancer (Kang et al., 2018), breast cancer (Kumari et al., 2018), laryngeal squamous cell carcinoma (Shen et al., 2019), and liver cancer (Kim et al., 2016). LINC00324 may exert tumor-promoting functions in MM through targeting the miR-512-3p/ZNF566 axis. However, this remains to be verified. Finally, GSEA revealed that samples with high expression of LINC01128 were in multiple cancer-related pathways, including the P53 signaling pathway, cell cycle, mismatch repair, nucleotide excision repair, and several metabolic pathways, including cysteine and methionine metabolism, peroxisome, and betaalanine metabolism. Several studies have reported that the cell cycle, P53 signaling, and DNA repair-related pathways are important tumor biological mechanisms (Balint and Vousden, 2001;Jackson and Bartek, 2009). Also, high beta-alanine concentrations are linked with cancer (Pine et al., 1982;Nishimura et al., 2012). Our findings suggested that the high expression of LINC01128 may be crucial in tumorigenesis and progression of MM, probably by regulating the cell cycle, DNA damage, or amino acid metabolism. Corresponding with our predicted mechanism of LINC01128, the mutation of the NAD+ binding site in PARP9 has been reported to increase the DNA repair activity of the heterodimer (Yang et al., 2017). On the other hand, genes in high expression groups of LINC00324 were mainly involved in multiple metabolic pathways, including propanoate metabolism, selenoamino acid metabolism, aminoacyl-tRNA biosynthesis, tyrosine metabolism, and lysine degradation. These observations can be explained by the hypothesis that LINC00324 suppresses tumorigenesis of MM by interfering with carbohydrate metabolism, amino acid metabolism, and protein translation.
In conclusion, WGCNA and PCA were performed to correlate the gene expression profile of patients with MM to the corresponding clinical traits. We identified lncRNAs that may potentially be involved in the initiation and development of MM. Finally, a two-lncRNA risk score model was formulated, and its precise prediction value was demonstrated. We also identified two lncRNAs as biomarkers and predicted their possible function as ceRNAs. These findings provide fundamental insights for further basic studies.

Gene Expression Profile Data and Clinical Characteristics
The overall design and workflow of this study are presented in Supplementary Figure S1. The RNA expression profiles of MM patients and normal donors were identified from the GEO database 2 ( Table 2). GSE16791 was utilized to conduct a WGCNA analysis for this study. This series of microarray experiments include 16,325 mRNA and 1,137 lncRNA expression profiles of purified plasma cells (PCs) obtained from 32 newly diagnosed MM. GSE17306 is a microarray analysis that contains 16,401 mRNA, 556 miRNA, and 1,146 lncRNA expression profiles of MM patients with corresponding clinical information, including mRNA-based GEP-risk score and proliferation index (Shaughnessy et al., 2007). It was used here to implement the PCA algorithm to correlate clinical traits with gene expression patterns. Corresponding clinical information, including survival time and vital status, was obtained from the GSE57317, including 16,325 mRNA and 1,137 lncRNA expression profiles of 55 MM patients, and TCGA RNA-Seq dataset contains 56,753 mRNA, 1,881 miRNA, and 14,142 lncRNA expression profiles of 765 MM patients to construct lncRNA risk score system. GSE16558, including 18,966 mRNA, 382 miRNA, and 431 lncRNA expression profiles of 60 MM patients and 5 healthy donors, GSE47552, including 18,966 mRNA and 431 lncRNA expression profiles of 41 MM patients and 5 healthy donors, and GSE17498, including 722 miRNA expression profiles of 40 MM patients and 3 healthy donors, were used to screen DEGs including DElncRNAs, DEmiRNAs, and DEmRNAs. Microarray annotation information was utilized to match probes with corresponding genes, and  Male  Female  I  II  III  Alive  Dead   GSE16791  32  0  40-65  --3  7  22  --GSE17306  52  2  --------GSE57317  55  0  ------43  12   GSE16558  60  5  --------GSE17498  40  3  39-85  23  17   GSE47552  41  5  --------TCGA  765  0  27-88  449  316  266  276  223  609  156 lncRNA expression was obtained based on the annotation of Genecode (see footnote 1).

Weighted Co-expression Network Analysis
A total of 32,216 genes identified in each sample of GSE16791 were utilized to construct a gene co-expression network using the WGCNA R package (Langfelder and Horvath, 2008;. Sample clustering of all genes was applied to check if they were good genes and good samples. A scale-free co-expression network was achieved when the soft-threshold power was set as 8 (scale-free R 2 = 0.81), cut height as 0.25, and minimal module size as 30. Then, to evaluate co-expression levels between genes, Pearson correlations were performed and then weighted by raising their absolute value to a power. Hierarchical clustering dendrograms visualized gene modules in different colors. Modules with the highest correlation with cancer stage were selected for further analysis.

Principal Component Analysis
Principal component analysis compresses all the original variables into a smaller subset of composite variables (PCs) instead of ignoring or discarding variables. PCA tools, a useful R package that provides functions for data exploration, were applied to analyze GSE17306 dataset 3 . At first, PCA helped us to determine PCs, accounting for 80% of the explained variation. Secondly, we correlated the PCs back to the clinical data, including mRNA-based GEP-risk score and proliferation index, to gain interesting PCs. Finally, the plotLoadings function could contribute to determining the variables ranked top 5% of the loadings range.

Identification and Evaluation of a Risk Assessment Model
The prognostic value of 12 lncRNAs in the intersection of blue and green modules, and PC6 and PC8, were evaluated by a univariate Cox model with a statistical level of significance set at P < 00.05. Critical prognostic lncRNAs were further identified by the LASSO regression method (Gao et al., 2010). LASSO 3 https://github.com/kevinblighe/PCAtools regression is a penalized regression method that is often used in machine learning to select the subset of variables. The R glmnet software package was adopted to carry out the LASSO Cox analysis (Tibshirani, 1997). Also, lncRNAs obtained in these steps were then enrolled into a multivariate Cox regression model using a survival R package, and prognosis-associated lncRNAs were selected. The risk score of each patient was calculated based on the summation of each lncRNA and its coefficient, and we distinguished high-from low-risk patients according to the median risk score. The Kaplan-Meier method was applied to analyze the difference of OS between two groups, and a ROC analysis was adopted to estimate the predictive power of this lncRNA risk score system. The TCGA dataset served as a testing set for further validation.

Construction of Co-expression Network of Key Long Non-coding RNAs and Differentially Expressed Messenger RNAs in the Blue Module
The multivariate Cox regression analysis identified six lncRNAs with P < 0.05, which were considered as key lncRNAs. We created a gene co-expression subnetwork for the genes in the blue module according to their topology overlap matrix similarity; DEmRNAs connected to key lncRNAs were selected to construct a co-expression network using Cytoscape. DEmRNAs that overlapped in GSE16558 and GSE47552 (n = 680) were identified based on the cutoff criteria of P < 0.05 and | log (FC)| > 1. The size of the nodes reflected the strength of connectivity, and the color was related to the weighted score of the interactions.

Screening of Database of Essential Genes and Survival Analysis
The Limma package in R (Ritchie et al., 2015) was used to identify the DEGs from GSE16558 and GSE47552. We identified DElncRNAs and DEmiRNAs according to the criterion that adjusted P < 0.05. Abnormally expressed miRNAs in GSE17306, GSE16558, and GSE17498 were all selected for constructing the ceRNA network. The two DElncRNAs were utilized to perform Kaplan-Meier analysis and log-rank test to identify whether they were correlated with OS using the GSE57317 and TCGA datasets. Log-rank test with P < 0.05 was set as statically significant.

Construction of Interesting Differentially Expressed Long Non-coding RNA-Based Competing Endogenous RNA Network
The ceRNA hypothesis posits that lncRNAs can regulate gene expression by interacting with miRNA at miRNA-binding sites (MREs). It is vital to match the DEmRNAs, miRNAs, and lncRNAs to figure out a novel molecular mechanism involved in the development of MM. The MIRanda database 4 , Starbase3.0 5 , and RNA22 tool 6 were used to predict the interactions between DElncRNAs and miRNAs. The miRNAs that potentially target DEmRNAs were predicted by DIANA Tools 7 . DElncRNAs,

Functional Annotation of Long Non-coding RNA Target Genes
The GO and KEGG enrichment analyses for DEmRNAs, which were co-expressed with LINC00324 and LINC01128, were analyzed using the Database for Annotation, Visualization, and Integrated Discovery database  and visualized by the GOplot R package (Walter et al., 2015). The z-score is a value that can be easily calculated and reveals whether the biological process (molecular function/cellular components) is more likely to be decreased (negative value) or increased (positive value). It is calculated as z-score = (up-down)/ √ count. Up or down represents the number of upregulated or downregulated genes, respectively. The count represents the number of genes that belong to each term. A threshold for the labeling is set as log (adjust p-value) > 2.8.

Gene Set Enrichment Analysis and Gene Set Variation Analysis
The GSE16791 dataset was used to conduct GSEA according to expression levels of two lncRNAs (high expression vs. low expression) (Subramanian et al., 2005). Annotated gene sets c2.cp.kegg. v 7.0.symbols.gmt was chosen as the reference gene sets 8 . The nominal P-value estimates the statistical significance of the enrichment score, and a nominal P-value ≤ 0.05 was set as the cutoff criterion.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the ethics committee of the First Affiliated Hospital of Zhengzhou University (2019-KY-357). The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
SZ and JF contributed equally to this work, they were responsible for study design, data collection, and data analyzing. YS was involved in manuscript preparation and literature searching. HL took part in study design and manuscript revision.
All the authors contributed to the article and approved the submitted version.

ACKNOWLEDGMENTS
The authors would like to thank the TCGA and GEO databases developed by the National Institutes of Health.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2020.00934/full#supplementary-material FIGURE S1 | Overall design and workflow of this study. FIGURE S3 | Relative quantification of Linc00525 and Linc00996 expression by qRt-Pcr. The expression of Linc00525 (A) and Linc00996 (B) in human multiple myeloma cell lines (Rpmi-8226, Sko-007, Km3) as well as normal plasma cells. Data are presented as the mean ± standard deviation. The ns represents not significant, * represents P < 0.05, * * represents P < 0.01, * * * represents P < 0.001 and * * * * represents P < 0.0001.