Development and Validation of a 9-Gene Prognostic Signature in Patients With Multiple Myeloma

Background: Multiple myeloma (MM) is one of the most common types of hematological malignance, and the prognosis of MM patients remains poor. Objective: To identify and validate a genetic prognostic signature in patients with MM. Methods: Co-expression network was constructed to identify hub genes related with International Staging System (ISS) stage of MM. Functional analysis of hub genes was conducted. Univariate Cox proportional hazard regression analysis was conducted to identify genes correlated with the overall survival (OS) of MM patients. Least absolute shrinkage and selection operator (LASSO) penalized Cox proportional hazards regression model was used to minimize overfitting and construct a prognostic signature. The prognostic value of the signature was validated in the test set and an independent validation cohort. Results: A total of 758 hub genes correlated with ISS stage of MM patients were identified, and these hub genes were mainly enriched in several GO terms and KEGG pathways involved in cell proliferation and immune response. Nine hub genes (HLA-DPB1, TOP2A, FABP5, CYP1B1, IGHM, FANCI, LYZ, HMGN5, and BEND6) with non-zero coefficients in the LASSO Cox regression model were used to build a 9-gene prognostic signature. Relapsed MM and ISS stage III MM was associated with high risk score calculated based on the signature. Patients in the 9-gene signature low risk group was significantly associated with better clinical outcome than those in the 9-gene signature high risk group in the training set, test, and validation set. Conclusions: We developed a 9-gene prognostic signature that might be an independent prognostic factor in patients with MM.


INTRODUCTION
Multiple myeloma (MM), originated from malignant plasma cells secreting monoclonal M protein, represents the second most common malignancy in hematological malignancies (1)(2)(3). MM is differentiated from monoclonal gammopathy of undetermined significance (MGUS) and smoldering multiple myeloma (SMM) by the presence of end-organ damage (4,5). The clinical symptoms of MM range from asymptomatic forms to manifestations of anemia, bone pain, and eventually spontaneous of fractures, renal failure, and frequent infections (6)(7)(8). Thanks to the introduction of novel agents (proteasome inhibitors, immunomodulatory drugs, and monoclonal antibodies), the management strategies for MM have been improved considerably in the past decade (9)(10)(11)(12). Accordingly, the clinical outcomes of patients with MM have been significantly improved, however, MM remains an incurable disease and the prognosis of patients with MM remains poor (with a median survival of approximately 3-4 years) (13).
The International Staging System (ISS) stage of MM, based on β 2 -microglobulin (β 2 M) and albumin (ALB), divides MM patients into three different stages with significant dissimilar clinical outcomes. The ISS was a collaborative effort by investigators from 17 institutions worldwide and from data on 11,171 patients (4,14). Patients with stage 1, 2, 3 diseases have median survivals of 62, 44, 29 months, respectively (4,15).
Weighted gene co-expression network analysis (WGCNA), a systems biology algorithm that can be applied to describing the correlation patterns among genes across microarray samples, finding and summarizing modules of high related genes, and relating modules to certain clinical phenotype (16,17), is widely used to facilitate the screening or identification of candidate biomarkers or therapeutic targets (18). Therefore, in the present study, we used WGCNA to screen potentially relevant molecular biomarkers correlated with the ISS stages of patients with MM. Moreover, we developed and validated the associated signature in patients with MM.

MM Microarray Data
MM gene expression data and clinical information were downloaded from gene expression omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/) and ArrayExpress (http:// www.ebi.ac.uk/arrayexpress/). Affymetrix gene expression profiles were performed using Affymetrix Human Genome U133 Plus 2.0 (HG-U133 Plus_2.0) [GSE19784 (19) and GSE24080 (20)], Affymetrix Human Genome U133A Array(GSE6477) (21,22), and Affymetrix GeneChip Human Gene 1.0 ST Array[E-MTAB-4032 (23)]. GSE19784, including 328 samples from patients with newly diagnosed MM, was used to construct the co-expression network. GSE24080, including 559 untreated MM samples, was randomly assigned patients in a 3:2 ratio to a training set and test set to develop and validate a prognostic signature. GSE6477, including 15 samples of normal donors, 73 samples of newly diagnosed MM, and 28 samples of relapsed MM, was used to evaluate the risk score calculated based on the prognostic signature among normal donor, newly diagnosed MM, and relapsed MM. E-MTAB-4032 (including 151 untreated MM) was used as an independent validation cohort to evaluate the prognostic role of the signature. Raw data of GSE19784 and E-MTAB-4032 were preprocessed using the R/Bioconductor "affy" package (24) and oligo (25) package, respectively. Robust Multi-array Average (RMA) (26) normalized data of these two studies at gene level were analyzed. For GSE6477 and GSE24080, the raw data had been normalized using MAS5 method and the expression levels of genes were transformed using the logarithm function. Purified CD138+ plasma cells (including myeloma cells and normal plasma cells) in GSE19784, GSE6477, E-MTAB-4032, and GSE20480 were separated using positive magnetic cell sorting selection with CD138 magnetic microbeads and subjected to gene expression profiling (GEP) as mentioned previously (19)(20)(21)(22)(23).

Construction of Co-expression Network and Identification of Hub Genes
The R package "WGCNA" (27) was used to construct a coexpression network for genes with highest variances (top 10000) in GSE19784. Prior to constructing the co-expression network, we applied sample networks method which was introduced by Oldham et al to detect outliers (28). A sample was considered as outlying, if the associated Z.K value was <-2.5. The soft threshold power β was selected according to the scale-free topology criterion as introduced previously (16,17). Subsequently, Pearson's correlations between each gene pair was calculated to generate a matrix of adjacencies, and then the adjacencies were transformed into topological overlap matrix (TOM) (29). Next, we conducted average linkage hierarchical clustering based on the TOM-based dissimilarity. The minimum module size and medium sensitivity was 30 and 2, respectively, and other parameters were default. After relating modules to the ISS stage of MM patients and calculating the associated Gene Significance (the correlation between the genes and the trait) and Module Membership (the correlation of the first principal component of the expression matrix of the corresponding module and the gene expression profile), we screened hub genes using a networkScreening function based on Gene Significance and Module Membership and genes with q.Weighetd value (qvalue (local FDR) calculated from weighted p-value of association with the ISS stage of MM) <0.01 were finally treated as hub genes (30).

Functional Enrichment Analysis of Hub Genes
To understand the biological function of the hub genes, we performed gene ontology (GO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis using the DAVID (31) online tool. GO and KEGG terms at P < 0.05 and false discovery rate (FDR)<0.05 were considered significantly enriched and the significantly enriched GO and KEGG terms were visualized using R package "ggplot2" (32).

Development of the Prognostic Signature Based on the Hub Genes
To investigate the associations between the hub genes and the survival of MM patients, we performed univariate Cox proportional hazards regression model in GSE24080. Genes significantly correlating with the overall survival (OS) of MM patients were included in a Least absolute shrinkage and selection operator (LASSO) penalized Cox proportional hazards regression model to minimize overfitting, and a 10-fold cross validation was also conducted using the R package glmnet (33,34). Then, we calculated the risk score for each patient based on this penalized Cox proportion model in the training set.

Validation of the Predictive Value of the Prognostic Signature in MM Patients
To validate the predictive value of the prognostic signature, Kaplan-Meier survival analysis, and univariate and multivariable Cox proportional hazards regression model were performed in the training set and test set in terms of OS, and event-free survival (EFS). Prior to multivariable Cox proportional hazards regression analysis on the OS, and EFS, we performed a variable selection based on the LASSO penalized Cox proportional hazards regression model. The definitions of OS and EFS was introduced previously (21-23). Meanwhile, we also validated the performance of the signature in the independent cohort E-MTAB-4032. The above survival analyses were conducted using the R packages "survival" (35) and "survminer" (version 0.4.3). MM patients in GSE24080 and E-MTAB-4032 were classified into the prognostic low risk group and the 9-signature high risk group based on the cutoff calculated through time dependent receiver operating characteristic (ROC) analysis using the R package "survivalROC" (36). The risk score of the signature in patients with ISS I, II, and III disease were evaluated using E-MTAB-4032. Meanwhile, the risk score of the signature in normal plasma cells, untreated MM, and relapsed MM were evaluated using GSE6477. The risk scores of the signature in E-MTAB-4032 and GSE6477 were presented as mean ± the standard error of the mean (SEM). Grouped data was analyzed using unpaired T-test, and P < 0.05 was considered statistically significant.

Results of Data Preprocessing, Co-expression Network Construction and Hub Genes Identification
No sample was demonstrated to be an outlier after all samples were clustered based on their Euclidean distances. Meanwhile, β = 12, the lowest power for which the scale-free topology fit index reaches 0.9, was used for the subsequent adjacency calculation. After TOM based clustering, 14 gene modules were obtained. After co-expression network construction, a total of 780 probes were identified based on our screening criteria, 758 of which annotated to gene symbol were treated as hub genes (Supplementary Table 1). The major process of co-expression network construction and hub gene identification was shown in Supplementary Figure 1.

GO and KEGG Pathway Enrichment Analysis of Hub Genes
In order to have a preliminary understanding of the biological significance, we conducted GO and KEGG enrichment analysis. As shown in Figure 1, the hub genes were mostly enriched in GO terms related to cell proliferation ("cell division, " "cell proliferation, " "mitotic nuclear division, " "DNA replication, " "DNA unwinding involved in DNA replication, " "sister chromatid cohesion, " "mitotic cytokinesis, " "DNA replication-dependent nucleosome assembly, " "DNA replication initiation, " "G1/S transition of mitotic cell cycle, " "chromosome segregation, " and "mitotic spindle organization") and immune response ("complement activation, " "antigen processing and presentation of exogenous peptide antigen via MHC class II, " " B cell receptor signaling pathway, " " immune response, " "phagocytosis, recognition, " " positive regulation of B cell activation, " "receptor-mediated endocytosis, " "interferon-gamma-mediated signaling pathway, " "innate immune response, " "Fc-gamma receptor signaling pathway involved in phagocytosis, "and "immunoglobulin production involved in immunoglobulin mediated immune response") ( Figure 1A). Furthermore, the results of KEGG pathway enrichment analysis of the hub genes suggested that these genes were mainly enriched in infection or immune related pathways ("Staphylococcus aureus infection, " "antigen processing and presentation, " "leishmaniasis, " "asthma, " "intestinal immune network for IgA production, " "graft-vs.host disease, " "HTLV-I infection, "and "Rheumatoid arthritis"), and cell proliferation ("cell cycle, "and "DNA replication") ( Figure 1B).

Development of a 9-Gene Signature in Patients With MM
To investigate the prognostic value of the hub genes, we conducted univariated Cox proportional hazards regression analysis, and the results suggested that the expression of 325 hub genes were significantly correlated with the OS of MM patients in the training set of GSE24080. To avoid overfitting as much as possible, we conducted LASSO penalized Cox proportional hazards regression model in the training set in GSE24080, and the results identified 9 of the 325 hub genes (HLA-DPB1(major histocompatibility complex, class II, DP beta 1), TOP2A (topoisomerase 2A), FABP5 (Fatty Acid-Binding Protein 5), CYP1B1(cytochrome P450 family 1 subfamily B member 1), IGHM (immunoglobulin heavy constant mu), FANCI (FA complementation group I), LYZ (lysozyme), HMGN5 (high mobility group protein N5 subtype), and BEND6 (BEN domain containing 6) with non-zero coefficient. Thus, we built the 9-gene signature on the basis of the coefficients of these 9 hub genes in the LASSO penalized model (Supplementary Table 2).

Associations Between the 9-Gene Signature and the ISS Stage of MM and the Disease Status
Firstly, we calculated the risk score of each MM patients for the 9-gene signature in GSE6477 and E-MTAB-4032 based on the coefficients of the 9 hub genes (Supplementary Table 2). As shown in Figure 2A, the risk score of patients with ISS stage III disease was significantly higher than that of patients with ISS stageI (t = −0.362, P = 0.001) and II (t = −0.218, P = 0.031). Meanwhile, the risk score of patients with relapsed MM was significantly higher compared with that of normal donor (t = 5.782, P < 0.001) and patients with untreated MM (t = 2.977, P = 0.004), and the risk score were higher in patients with untreated MM compared with that in normal donor (t = −4.13, P < 0.001, Figure 2B). The predictive value of the 9-gene signature was evaluated in the training set, test set and an independent set E-MTAB-4032. Based on the optimal cutoff (1.939) calculated using time dependent ROC analysis (Supplementary Figure 1). The Kaplan-Meier survival analysis suggested that patients in the 9-gene signature low risk group had better OS compared with those in the 9-gene signature high risk group in the training set (HR = 0.2664, 95% CI: 0.1772-0.4007, log-rank P < 0.001, Figure 3A, Supplementary Table 3) and the test set (HR = 0.5115, 95% CI: 0.3137-0.8339, log-rank P = 0.0062, Figure 3B, Supplementary Table 4). Meanwhile, patients in the 9-gene signature low risk group had with better EFS compared with those in the 9-gene signature high risk group in the training set (HR = 0.3321, 95% CI:0.2395-0.4606, P < 0.0001, Figure 3C, Supplementary Table 5) and test set (HR = 0.5174, 95% CI: 0.3447-0.7765, P = 0.0015, Figure 3D, Supplementary Table 6). Based on the results of variable selection (Supplementary Table 7), age, B2M (β2-microglobin), CRP (C reaction protein), LDH (lactate dehydrogenase), BMPC (bone marrow plasma cell), MRI (magnetic resonance imaging), and the 9-gene signature was included multivariable Cox proportional hazards regression analysis which indicated that the 9-gene signature was an independent prognostic factor in terms of OS and EFS in the training set and test (Supplementary Tables 3-6). Moreover, MM patients in the 9-gene low risk group also had better OS compared with those in the 9-gene signature high risk group in the independent validation cohort E-MTAB-4032 (HR = 10.6091, 95% CI: 3.2120-35.0409, log-rank P = 0.0061, Figure 4). Meanwhile, the results of multivariable  Cox proportional hazards suggested that the 9-gene signature was also an independent prognostic factor in the validation cohort (HR = 14.8092, 95% CI:1.2282-178.5591, P = 0.0339, Supplementary Table 7).

DISCUSSION
As stated above, although several novel agents for patients with MM have been introduced into clinical practice, the disease remained incurable, and the clinical outcome of patients with MM is still poor (2,3). Thus, it is of vital importance to develop such molecular biomarkers or signature that are significantly correlated with the clinicopathological features and clinical outcome of patients with MM. In the present study, a total of 758 hub genes associated with the ISS stage of MM patients were identified through WGCNA. Thus, we performed univariate Cox proportional hazards regression analysis to analyzed the relations between the expression of hub genes and the OS of patients with MM, and the results suggested that 325 hub genes were associated with the OS of MM patients. LASSO (34) was introduced in order to improve the prediction accuracy and interpretability of regression models by forcing the sum of the absolute value of the regression coefficients to be less than a fixed value, which forces certain coefficients to be set to zero, effectively choosing a simpler model that does not include those coefficients. Based on this, we included the above 325 hub genes into a LASSO penalized Cox proportional hazards regression model, as a result, 9 hub genes with none-zero coefficients in this model was identified. Thus, we calculated the risk score of each MM patients, and built the 9 hub genes [HLA-DPB1(major histocompatibility complex, class II, DP beta 1), TOP2A (topoisomerase 2A), FABP5 (Fatty Acid-Binding Protein 5), CYP1B1(cytochrome P450 family 1 subfamily B member 1), IGHM (immunoglobulin heavy constant mu), FANCI (FA complementation group I), LYZ (lysozyme), HMGN5 (high mobility group protein N5 subtype), and BEND6 (BEN domain containing 6)] based signature.
Results of functional enrichment analysis of hub genes suggested that hub genes were mainly enriched in cell proliferation and immune response related GO terms and pathways, this was in accordance with the prognostic value of the 9-gene signature developed based on the hub genes. MM patients in the 9-gene low risk group was associated with better OS and EFS compared with those in the 9gene high risk group, and the 9-gene signature was shown to be an independent prognostic signature in patients with MM.
Actually, most of these hub genes in the signature had been demonstrated be associated with the proliferation or invasion of several human cancers. HLA-DPB1, also known as major histocompatibility complex, class II, DP beta 1,  (41). Mutation patterns of IGHM was associated different progression pathways in follicular lymphoma (42). Chen et al demonstrated that PRC1 could promote early recurrence of patients with hepatocellular carcinoma by regulating the expression of FANCI (43). Mariano FV et al demonstrated that the expression of LYZ could be used to differentiate mammary analog secretory carcinoma from acinic cell carcinoma of salivary glands (44). Wu et al demonstrated that the expression of HMGN5 was increased in bladder cancer cells and high expression of HMGN5 was associated with poor prognosis of patients with bladder cancer (45). The results of the above literature review provided strong support for the 9gene signature in the clinical out prediction in patients with MM.
Limits of our study are as follows. First, our study is a retrospective analysis based on previously published MM gene expression studies, although its conclusions have been confirmed in the test set and independent validation set, we recommend that the conclusions of this study should be verified by molecular biology experiments in subsequent studies. Second, the prognostic performance of the 9-gene signature should be evaluated through prospective clinical trials.
Taken together, we developed a 9-gene prognostic signature based on the hub genes obtained through a co-expression network, patients in the 9-gene signature low risk group were associated with better clinical outcomes compared with those in the 9-gene signature high risk group.
AUTHOR CONTRIBUTIONS X-PL collected and analyzed the data, write the manuscript. Xia-HY analyzed the data, and review the manuscript. X-YM and Xin-HY participated in revising the manuscript. FW revised the manuscript, analyzed the data, and participated in preparation of the figures. LH designed the study and participated in data analysis.  Table 1 | Hub genes calculated using weighted co-expression network analysis and their impact on the overall survival of patients with multiple myeloma.
Supplementary Table 2 | The 9 hub genes with non-zero coefficients in the LASSO Cox proportional hazards regression model. Table 3 | Cox proportional hazards regression analysis the association between the clinical characteristics and overall survival of multiple myeloma in the training set.

Supplementary
Supplementary Table 4 | Cox proportional hazards regression analysis on the association between the clinical characteristics and overall survival of multiple myeloma patients in the test. set.
Supplementary Table 5 | Cox proportional hazards regression analysis on the association between the clinical characteristics and event-free survival of multiple myeloma patients in the training set.
Supplementary Table 6 | Cox proportional hazards regression analysis on the association between the clinical characteristics and event-free survival of multiple myeloma patients in the test set.

Supplementary Table 7 |
The prognostic role of the 9-gene signature in the independent validation cohort.