A Liquid-Liquid Phase Separation-Related Gene Signature as Prognostic Biomarker for Epithelial Ovarian Cancer

Objective The aim of the present study was to construct and test a liquid-liquid phase separation (LLPS)-related gene signature as a prognostic tool for epithelial ovarian cancer (EOC). Materials and Methods The data set GSE26712 was used to screen the differentially expressed LLPS-related genes. Functional enrichment analysis was performed to reveal the potential biological functions. GSE17260 and GSE32062 were combined as the discovery to construct an LLPS-related gene signature through a three-step analysis (univariate Cox, least absolute shrinkage and selection operator, and multivariate Cox analyses). The EOC data set from The Cancer Genome Atlas as the test set was used to test the LLPS-related gene signature. Results The differentially expressed LLPS-related genes involved in several cancer-related pathways, such as MAPK signaling pathway, cell cycle, and DNA replication. Eleven genes were selected to construct the LLPS-related gene signature risk index as prognostic biomarker for EOC. The risk index could successfully divide patients with EOC into high- and low-risk groups. The patients in high-risk group had significantly shorter overall survival than those with in low-risk group. The LLPS-related gene signature was validated in the test set and may be an independent prognostic factor compared to routine clinical features. Conclusion We constructed and validated an LLPS-related gene signature as a prognosis tool in EOC through integrated analysis of multiple data sets.


INTRODUCTION
Although rapid progress was made in recent decades in identifying the genetic causes of cancers, our mechanistic understanding of these diseases remains incomplete and limits our ability to provide effective treatments. Novel concepts may be required to reveal the complex mechanisms underlying these diseases. Evidence is mounting that liquid-liquid phase separation (LLPS) (1) underlies the formation of various subcellular structures, such as membraneless bodies, heterochromatin (2), and the transport channel in the nuclear pore complex (3). LLPS has emerged as a new concept to elaborate the organization of living cells (4). Hundreds of genes (5) were considered involved in the dynamic process of LLPS in the form of protein or RNA molecules (6). The emerging evidence indicated that aberrant forms of LLPS are associated with many human diseases, including cancer (7). For instance, the FET protein family is involved in phase transitions at sites of RNA storage (8,9) and assembles into higher-order structures by a process that is stimulated by RNA (10,11). Notably, these functions are often impaired in human diseases, such as cancer and neurodegenerative diseases (12,13).
Epithelial ovarian cancer (EOC) is the most lethal gynecological cancer with 46% survival five years after the diagnosis (14). A risk score system help in identifying the patients at high risk and decision-making for treatment. Thus, we hypothesized that the LLPS-related genes way be potential prognostic signature in EOC. To test our hypothesis, an LLPSrelated gene signature was constructed in a discovery data set and tested in another independent data set.

Data Processing
The LLPS-related genes were obtained from PhaSepDB (http:// db.phasep.pro/) (5). Three epithelial ovarian cancer (EOC)related processed gene expression data sets were downloaded from Gene Expression Omnibus (GEO, https://www.ncbi.nlm. nih.gov/geo/) using the "GEOquery" package (15). The data set GSE26712 (16) based on the GPL96 platform contains the gene expression profiles of 185 EOC and 10 normal ovarian surface epithelium and was used to screen the differentially expressed genes (DEGs) in EOC compared to normal ovarian surface epithelium. The data set GSE17260 (17) based on the GPL6480 platform contains the gene expression profiles of 110 EOC samples and prognosis information of the corresponding patients. The gene expression profiles of 260 EOC samples based on GPL6480 from GSE32062 (18) were also downloaded from GEO. The GSE17260 and GSE32062 were combined as the discovery set, and then the batch effects were removed using the ComBat function in the "sva" package (19). Principal component analysis was performed to visualize the results of removing batch effects. The discovery set was used to generate an LLPS-related gene signature in EOC. Another EOC-related data set (20), including gene expression profiles based on Affymetrix Human Genome U133 Plus 2.0 Array platform (Affymetrix; Thermo Fisher Scientific Inc., Waltham, MA, USA) and the clinical data The Cancer Genome Atlas (TCGA, https://portal.gdc.cancer.gov/) was downloaded from UCSC Xena (http://xena.ucsc.edu/) and used as the test set to validate the LLPS-related gene signature. In the above data sets, if one gene matched multiple probes, the average value of the probes was calculated as the expression of the corresponding gene. The workflow of the present was showed in Figure 1.
The fold changes (FCs) of individual genes were calculated, and DEGs with FCs > 1.5 and P (adjusted by false discovery rate) value < 0.05 were considered significant.

Functional Enrichment Analysis
Functional enrichment analysis was performed to reveal the potential biological functions of the DEGs using clusterProfiler (22) package, including gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. P adjusted by Benjamini & Hochberg method < 0.01 and q value < 0.05 was considered significant.

Construction of an LLPS-Related Gene Signature
In our present study, a three-step analysis was carried out to construct a robust LLPS-related gene signature for predicting prognosis in the discovery set. Firstly, univariate Cox regression analysis was performed to identify overall survival (OS)-related DEGs. A DEG with a P < 0.05 was considered a OS-related gene. Secondly, the gene expression profiles of the OS-related DEGs were subjected to least absolute shrinkage and selection operator (LASSO) Cox regression analysis using the "glmnet" package (https://CRAN.R-project.org/package=glmnet). In this analysis, the OS-related DEGs with non-zero regression coefficients were identified in 10-fold cross-validation. The relevant parameters were set to "family="cox"," "maxit = 1000", and "nfolds=10". Third, the expression profiles of the OS-related DEGs with nonzero coefficients were used to perform multivariate Cox regression analysis. The LLPS-related gene signature was constructed as the formula: Risk index = Exprgene 1* Coef gene 1 + Exprgene 2* Coef gene 2 + Exprgene 3* Coef gene 3 + … The "Expr" represents the expression value of a gene with P < 0.05 in the multivariate Cox regression analysis. The "Coef" represents the coefficient of the corresponding gene. Each individual was assigned the LLPS-related gene signature index. The patients were divided into high-and low-risk group, and the OS between the two groups were compared.

Validation of the LLPS-Related Gene Signature in the Test Set
As it was in the discovery set, each individual in the test set was assigned an LLPS-related gene signature index according to the above formula. Moreover, the prognostic value of the LLPSrelated gene signature and the routine clinical features was compared using multivariate Cox regression analysis.

Gene Set Enrichment Analysis
In order to reveal the biological functions of candidate genes in EOC, we performed gene set enrichment analysis (GSEA) (23,24). We use the median expression value of each candidate gene as a threshold, and divide the EOC in TCGA into high-and lowexpression groups. The canonical pathways of Kyoto Encyclopedia of Genes and Genomes gene sets derived from the Molecular Signatures Database (25) were selected as the reference gene sets. The P value adjusted by Benjamini & Hochberg method < 0.05 was set as the cut-off criteria. GSEA was performed using the clusterProfiler package and visualized using enrichplot package (https://github.com/YuLab-SMU/enrichplot).

Statistical Analysis
In the present study, all these analyses were performed in R (version 4.0.2) (https://www.r-project.org). The DEGs were screened using unpaired t-tests provided by "limma" package. The OS was compared using Kaplan-Meier curve with log-rank method. The predictive value of the LLPS-related gene signature was evaluated by time-dependent receiver operating characteristic (tROC) curve analysis using the timeROC package (https:// CRAN.R-project.org/package=timeROC). All tests were twosided and P < 0.05, unless otherwise stated, was considered statistically significant.

Multiple LLPS-Related Genes Aberrantly Expressed in EOC
PhaSepDB database includes 2957 eligible genes, however, a total of 1767 LLPS-related genes were found in the GSE26712, among them, 252 genes were down-regulated and 248 were up-regulated in EOC compared to the normal ovarian surface epithelium ( Figure S1A). These DEGs showed clearly different expression patterns in EOC and normal ovarian surface epithelium ( Figure  S1B). This indicates that the abnormal state of liquid-liquid phase separation may contribute to EOC.

Biological Functions Involved in Differentially Expressed LLPS-Related Genes
The GO enrichment analysis included cellular component (CC), biological process (BP), and molecular function (MF). The top significant (ranked by P value) 15 GO terms were showed in Figure S2. In CC ( Figure S2A), the DEGs mainly involved in the composition of macromolecules and organelles, such as spliceosomal complex, ribonucleoprotein granule, and preribosome. In BP ( Figure S2B), the DEGs significantly involved in RNA processing, such as RNA splicing, regulation of mRNA metabolic process, and RNA catabolic process. In MF ( Figure S2C), the DEGs involve the activity of multiple enzymes, such as helicase activity, phosphatase activity, and DNAdependent ATPase activity. The DEGs involved in several cancer-related pathways ( Figure S2D), including MAPK signaling pathway, cell cycle, and DNA replication.

The LLPS-Related Gene Signature in the Discovery Set
The results of PCA showed the GSE17260 and GSE32062 had obvious batch effects (Figure 2A, left), which was removed by "sva" package for the subsequent analysis (Figure 2A, right). Sixty differentially expressed LLPS-related genes were identified as OS-related genes by univariate Cox analysis ( Table 1), and 32 LLPS-related genes were identified with non-zero regression coefficients by LASSO analysis (Table 1). Finally, 11 LLPSrelated genes (EIF3J, BYSL, NRGN, SAP18, PACSIN2, DUSP10, EIF6, HMBOX1, UTP3, HOMER2, and KIAA0355) remained significantly associated with OS in multivariate Cox analysis ( Table 1) and were selected to construct the LLPSrelated gene signature risk index (RI) ( Figure 2B). The RI was significant associated with poor prognosis (hazard ratio {HR] = 2.771, 95% confidence interval [CI] for HR = 2.272-3.379, P < 2.2e-16). The tROC curve analysis showed the predictive value of the RI was high with area under the ROC curve (AUC) = 0.7-0.8 ( Figure 2C), and the AUC of 5-year tROC curve was 0.793 ( Figure 2D). The patients with high RI had significantly shorter OS than those with low RI ( Figure 2E).

The LLPS-Related Gene Signature Was Validated in the Test Set
After removing the TCGA-OV patients without survival information, and a total of 566 patients remained in the test set (Table S1). Each individual in the test set was also assigned a RI according to the formula ( Figure 3A). It is exciting that the RI remained associated with poor prognosis (HR = 1.211, 95% CI for HR = 1.070-1.372, P = 0.003). It also successfully divided patients into high-and low-risk groups, the patients with high RI had significantly shorter OS than those with low RI ( Figure 3B). Moreover, the LLPS-related gene signature RI is an independent prognostic factor adjusted by some clinical features ( Figure 3C).

Pathways Involved in These 11 Candidate Genes
According to the GSEA results, the 11 candidate genes may involve in various pathways (Figure 4). For instance, the high expression of BYSL may associate with activation of DNA replication, Mismatch repair, and Proteasome. However, LLPS is the introduction of physical and chemical concepts to explain biological phenomena, the specific link between these pathways and LLPS remains to be further elucidated.

DISCUSSION
LLPS provides a new framework to understand and interpret cancer, with potentially new way for treatment. The mutation of LLPS-related gene may lead to aberrant forms of LLPS (26)(27)(28), the aberrant forms of LLPS contribute to the abnormal activity in cancer-related pathways (29). In the present study, we found that differentially expressed LLPS-related genes in EOC were involved in multiple cancer-related pathways, such as MAPK signaling pathway, cell cycle, and DNA replication. This indicated that LLPS in EOC was complicated. We also found that the expression patterns of LLPS-related genes were associated with prognosis in EOC and proposed an LLPSrelated gene signature for predicting prognosis. Our LLPSrelated gene signature was constructed and validated in two independent data sets based on different platforms. Thus, this    might indicate that this LLPS-related gene signature is still robust in different populations and suitable for different platforms. In some previous studies (30,31), univariate and multivariate Cox regression analyses were used but lack of LASSO analysis to create the prognostic gene signatures. However, these previous studies may encounter overfitting problems and not validated in independent data sets. The LASSO method was used for the optimal selection of features in highdimensional data with a robust predictive value and low correlation between each other to prevent overfitting (32). Thus, our LLPSrelated gene signature was validated in independent data sets. Moreover, the LLPS-related gene signature is independent prognostic factor compared clinical features, including age, pathological staging, and grade. According to the risk score system, the patients at high risk may be followed up more frequently and accept more active management than those at low risk. The present LLPS-related gene signature consists of 11 LLPSrelated genes. Unsurprisingly, some of the 11 LLPS-related genes were reported associated with EOC, such as a previous study proposed that altered EIF6 expression is associated with clinicopathological features in EOC (33), and low expression level of HMBOX1 in EOC may accelerate cell proliferation by inhibiting cell apoptosis (34). BYSL may be an oncogene in various cancer, including hepatocellular carcinoma (35), glioblastoma (36), and diffuse large B-cell lymphoma (37). NRGN was reported as a tumor suppressor in glioma cells (38), and we found that it is also associated with good prognosis in EOC. SAP18 was reportedly associated with the promotion of cell invasion and angiogenesis in virus oncogenic (39). PACSIN2 polymorphism was associated with thiopurine metabolism in children with acute lymphoblastic leukemia (40). The role of DUSP10 in cancer may be related to specific cancer types, some studies indicated that it is an oncogene, while in other studies indicated that it is a tumor suppressor (41). Although further molecular experiments are required, our GSEA results may help reveal the biological functions of these 11 candidate genes in EOC.
Although the present study may provide new insight into the risk stratification in EOC, several limitations should be noticed. First, the LLPS-related gene signature was proposed through retrospective study, prospective study is needed before it is used in clinical practice. Second, molecular function experiments were lacking in our present study, thus, it is not clear whether these genes are causal or merely prognostic markers in EOC.
In conclusion, we found significantly different expression patterns of LLPS-related genes in EOC compared to normal ovarian surface epithelium, and constructed and validated an LLPS-related gene signature as a prognosis tool in EOC through integrated analysis of multiple data sets.

DATA AVAILABILITY STATEMENT
Publicly available data sets were analyzed in this study. These data can be found here: https://www.ncbi.nlm.nih.gov/geo/, https://portal.gdc.cancer.gov/.

AUTHOR CONTRIBUTIONS
YQ designed the study. YQ analyzed the data and wrote the manuscript. MP and XC participated in analysis and interpretation of the data and reviewed the article. All authors contributed to the article and approved the submitted version.