Four Prognosis-Associated lncRNAs Serve as Biomarkers in Ovarian Cancer

Long non-coding RNAs (lncRNAs) play crucial roles in ovarian cancer (OC) development. However, prognosis-associated lncRNAs (PALs) for OC have not been completely elucidated. Our study aimed to identify the PAL signature of OC. A total of 663 differentially expressed lncRNAs were identified in the databases. According to the weighted gene coexpression analysis, the highly correlated genes were clustered into seven modules related to the clinical phenotype of OC. A total of 25 lncRNAs that were significantly related to overall survival were screened based on univariate Cox regression analysis. The prognostic risk model constructed contained seven PALs based on the parameter λmin, which could stratify OC patients into two risk groups. The results showed that the risk groups had different overall survival rates in both The Cancer Genome Atlas (TCGA) and two verified Gene Expression Omnibus (GEO) databases. Univariate and multivariate Cox regression analyses confirmed that the risk model was an independent risk factor for OC. Gene enrichment analysis revealed that the identified genes were involved in some pathways of malignancy. The competitive endogenous RNA (ceRNA) network included five PALs, of which four were selected for cell function assays. The four PALs were downregulated in 33 collected OC tissues and 3 OC cell lines relative to the control. They were shown to regulate the proliferative, migratory, and invasive potential of OC cells via Cell Counting Kit-8 (CCK-8) and transwell assays. Our study fills the gaps of the four PALs in OC, which are worthy of further study.


INTRODUCTION
Ovarian cancer (OC) is a gynecological malignancy with the highest morbidity and mortality rates worldwide (Stewart et al., 2019). Although the prognosis of patients in early cancer stages is better, most patients are already in the late stages of OC during the first diagnosis (Kaldawy et al., 2016;Eisenhauer, 2017). Thus, there is a need to identify novel biomarkers for predicting tumorigenesis and clinical diagnosis in earlier stages and to develop new therapeutic strategies and targets for OC.
Long non-coding RNAs (lncRNAs) are endogenous RNA transcripts more than 200 nucleotides in length that are not translated into polypeptides (Kopp and Mendell, 2018). Previous studies have found that lncRNAs could serve as strong prognostic biomarkers and play an important role in different cell processes such as cell migration, growth, invasion, apoptosis, and differentiation in OC . For example, SNHG9 was shown to serve as an anticancer biomarker by regulating miR-214-5p . LINC01969 was confirmed to be a cancer-promoting biological marker via the miR-144-5p/LARP1 axis .
The pathogenesis of most cancers, including OC, are caused by various genes rather than a single gene (Van Cott, 2020). The risk model can estimate a set of risk genes in any given cancer (Cintolo-Gonzalez et al., 2017;Tammemägi et al., 2019). Weighted correlation network analysis (WGCNA) is a combined method for analyzing clinical information and gene expression data (Sun et al., 2017). Through the analysis of gene modules with high correlation with clinical information, a series of key genes with high connectivity in the modules were obtained, which are potentially important for the occurrence and development of tumors (Tian et al., 2017). Previous studies have identified lncRNA-based signatures via WGCNA or by developing a risk model for OC (Li and Zhan, 2019;Zhao and Fan, 2019). However, these studies are still limited in their lack of cell function assays and constantly updated databases.
Thereby, the purpose of our research was to identify a prognosis-associated lncRNA (PAL) signature serving as a noteworthy prognostic biomarker in OC by using WGCNA and other comprehensive analyses. Two additional datasets from the GEO were used to check the accuracy of the model. A competitive endogenous RNA (ceRNA) network was established to explore the mechanisms of candidate PALs. Finally, four candidate PALs were selected for the in vitro assays.

Screening Modules Related to Clinical Phenotype
WGCNA considers not only the coexpression patterns between two genes but also the overlap of neighboring Abbreviations: GEO, Gene Expression Omnibus; HR, hazard ratio; K-M, Kaplan-Meier; lncRNA, long non-coding RNA; OC, ovarian cancer; OS, overall survival; PAL, prognosis-associated lncRNA; ROC, receiver operating characteristic; TCGA, The Cancer Genome Atlas; WGCNA, weighted gene coexpression analysis.
genes (Langfelder and Horvath, 2008). A coexpression network between differentially expressed mRNAs and lncRNAs was established using WGCNA from the R package to identify modular genes closely related to the clinical phenotype. Clinical phenotypes in our study included age at initial pathological diagnosis, clinical stage (stage I, II, III, or IV), lymphatic invasion (no or yes), neoplasm histologic grade (grade I, II, III, or IV), tumor residual disease (no macroscopic disease, 1-10 mm, 11-20 mm, or >20 mm), venous invasion (no or yes), and vital status (dead or alive).
The WCGNA analysis should be subject to scale-free networks. Therefore, the applicable weight parameter β (SoftPower) of the gene coexpression matrix was supposed to conform to the scale-free distribution to the maximum extent. The correlation coefficients (R) of connectivity k and p(k) under each β were calculated, and then, β was selected when R 2 reached 0.85 for the first time. The highly correlated genes were clustered into modules based on clustering and dynamic pruning methods (minModuleSize = 30; MEDissThres = 0.3). Finally, the gene assembly modules closely related to the phenotype were identified via the correlation between the module and clinical phenotype.

Construction and Validation of Risk Model
The lncRNAs in the aforementioned modules were analyzed by univariate Cox regression analysis based on their expression values and overall survival (OS) of each OC sample (P < 0.05). Kaplan-Meier (K-M) analysis and log-rank test were performed using the R package to select the PALs (P < 0.05) for further analysis. Least absolute shrinkage and selection operator (Lasso) regression of the glmnet package (version 2.0-18) (Engebretsen and Bohlin, 2019) was carried out for further dimensionality reduction to screen the more significant PALs for risk model construction. According to multivariate Cox regression analysis, a prognostic risk model was generated based on the following formula: In the risk score (RS) formula, β lncRNA represents the regression coefficient for PALs, and Exp lncRNA means the expression level of homologous PALs. OC patients in the study were divided into low-or high-risk groups according to the optimal cutoff point of RSs gained from Survminer (version 0.4.3) from the R package, and K-M survival analysis was performed between the two risk groups using the log-rank test. In addition, GSE32063 (40 OC samples) and GSE17260 (110 OC samples) were downloaded from National Center for Biotechnology Information (NCBI) GEO (Barrett et al., 2005) and used to develop a prognostic risk model using the same method. Furthermore, the RSs of different clinical indicators, including age (age ≤60 years or >60 years), grade (grade II or III), and stage (stage III or IV) were compared. Several clinical indicators, such as age in GEO, grade I, grade IV, stage I, and stage II, were excluded due to insufficient sample size. Tumor mutational burden (TMB) scores were calculated for each OC patient from TCGA, and the relationship between the risk model and TMB was also assessed.

Construction of ceRNA Network
The microRNAs (miRNAs) targeted by the corresponding PALs were speculated by the DIANA-LncBase v2 (Paraskevopoulou et al., 2016). The target mRNAs by the corresponding miRNAs were speculated using miRTarBase (Hsu et al., 2011). Subsequently, the ceRNA network based on the same miRNAs of PAL-miRNA and miRNA-mRNA was constructed and visualized using Cytoscape (Kohl et al., 2011).

Biofunctional Analysis
Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis was performed on potential target mRNAs of clinical modules or PALs based on clusterProfiler of R package (version:3.8.1, pAdjustMethod = BH, pvalueCutoff = 0.05) (Yu et al., 2012).

In vitro Assays
With approval from the Ethics Committee, a total of 33 OC and 20 adjacent normal ovarian tissues were collected from surgery OC patients between May 2019 and January 2021 at the Affiliated Hangzhou Hospital of Nanjing Medical University (Hangzhou, China). The clinical features are shown in Supplementary Table 1. The OC cell line SKOV-3 was supplied by the Cell Bank of China Academic of Science (Shanghai, China). The IOSE-80, HO-8910, and A2780 cells were purchased from iCell Bioscience Inc. (Shanghai, China). SKOV-3 cells were cultured in McCoy's 5A medium supplemented with 10% fetal bovine serum (FBS) and 1% penicillin-streptomycin. IOSE-80, HO-8910, and A2780 cells were cultured in 90% Roswell Park Memorial Institute (RPMI) 1640 medium with 10% FBS and 1% penicillin-streptomycin. All cells were incubated in a 5% CO 2 incubator at 37 • C.
After RNA extraction and reverse transcription, real-time quantitative PCR (qPCR) analysis was performed using an ABI 7500 instrument to evaluate the expression value of alternative PALs in cells and tissues based on the kit of Takara (Shiga, Japan). Primer sequences are listed in Table 1.
The plasmids used for the experiment were constructed by TSINGKE Biological Technology (Hangzhou, China) based

Symbol
Sequences on the known sequences of TCL6, VLDLR-AS1, RP11-356I2.4, and LINC00893 from NCBI. SKOV-3 and HO-8910 cells were transfected with homologous plasmids (pcDNA-NC, pcDNA-TCL6, pcDNA-VLDLR-AS1, pcDNA-RP11-356I2.4, and pcDNA-LINC00893) using the jetPRIME transfection reagent (Polyplus Transfection, Shanghai, China), according to the manufacturer's instructions. Subsequently, transfected SKOV-3 and HO-8910 cells were made into cell suspensions and then transferred to a 96-well plate or upper chamber (with or without Matrigel) of 24-well transwell inserts (8 µm pore size). For the Cell Counting Kit-8 (CCK-8) assays, the old culture media were removed, and 10 µl cell counting kit-8 solution (MedChemExpress, China) with 90 µl media was added to each well for an additional 2 h on days 1-4. At the wavelength of 450 nm, the OD value of each well was detected by a spectrophotometer (Thermo Scientific, Massachusetts, America). For the transwell assays, the lower chambers were added with 500 µl medium containing 30% FBS. The bottoms of the upper chamber were fixed with 4% paraformaldehyde and stained with crystal violet for 10 min on day 1. The number of cells that invaded through the membrane to the lower surface was counted using Image J software after photographing using a microscope.
The experiments were conducted in triplicate, and each experiment was repeated three times. Statistical analysis was performed using GraphPad Prism version 8.0.1. The data were analyzed by Student's t-test or one-way analysis of variance (ANOVA). P < 0.05 was considered statistically significant.

RESULTS
To facilitate the understanding of our entire study, we created a flowchart, which is shown in Figure 1.

Screening Modules Related to Clinical Phenotype
After the difference analysis, a total of 1,467 upregulated mRNAs and 1,431 downregulated mRNAs were extracted (Figure 2A), and 307 lncRNAs expressed at high levels and 356 lncRNAs expressed at low levels were obtained ( Figure 2B).
WGCNA analysis was further conducted based on 3,560 differentially expressed genes to screen modules related to clinical phenotypes. We assigned the β value to 7 when R 2 was first ∼0.85, which ensured that the network connection was close to the scale-free distribution and was the minimum threshold for smoothing the curve (Figures 2C,D). The modules with correlation coefficients >0.7 (the divergence coefficient was <0.3) were consolidated after clustering ( Figure 2E). A total of seven modules (M1-yellow, M2-black, M3-green, M4-brown, M5-blue, M6-turquoise, and M7-gray) were integrated, and the gray module could not be gathered into other modules; hence, the gray module would not be considered in subsequent analysis ( Figure 2F).
Two further methods were used to mine the modules associated with clinical phenotypes. First, the correlation between each module feature vector gene and the clinical phenotype   Frontiers in Genetics | www.frontiersin.org was calculated. The feature vector gene was the first principal component gene E of a specific module, which represents the overall level of gene expression in the module. Second, the absolute value of the correlation between gene expression in each module and the clinical phenotype was taken as the correlation between the module and the clinical phenotype (Supplementary Figure 1).
Furthermore, the mRNA in each module was subjected to KEGG pathway enrichment analysis, which showed that the black, blue, brown, turquoise, and yellow modules were significantly enriched in 13, 21, 11, 5, and 48 KEGG pathways. Nonetheless, no significant pathways were enriched in the green module. The KEGG pathway of each module was ranked according to the P-value, and the top five were selected for display ( Figure 3A). Some of these modules are associated with important biological processes in tumor genesis and development, such as Ras signaling pathway and mitogenactivated protein kinase (MAPK) signaling pathway of the blue module, p53 signaling pathway of the brown module, and inflammatory mediator regulation of TRP channels in the turquoise module.

Construction and Validation of Risk Model
A total of 25 PALs that were significantly associated with OS were screened based on univariate Cox regression analysis, including one upregulated lncRNA (HR > 1) and 24 downregulated lncRNAs (HR < 1) (Table 2; Supplementary Figure 2). Among the 25 PALs, KEGG pathways were enriched to 19 lncRNAs, and the top five pathways were selected for display ( Figure 3B). Some PALs were enriched in several classic signaling pathways of tumors, such as Ras signaling pathway, MAPK signaling pathway, Hedgehog signaling pathway, and so on. Interestingly, several lncRNAs were enriched in ovarian steroidogenesis or the GnRH signaling pathway (VLDLR-AS1, RP11-356I2.4, LINC00893, and so on) in OC.
Lasso regression was performed on the above 25 PALs to determine the optimal modeling parameter (λ) (Figure 4A). The two dashed lines indicate two special λ values: λ min on the left and λ 1se on the right. The λ values between these two values was considered to be appropriate. The model constructed by λ 1se was the simplest, that is, it used a small number of genes, while λ min had a higher accuracy rate and used a larger number of genes. Hence, λ min was selected to build the model for accuracy in our study.
The final model contained seven PALs, namely, CTD-2540B15.13, LINC00893, MIR503HG, RP11-356I2.4, RP11-386G11.10, TCL6, and VLDLR-AS1. A total of 418 TCGA samples were divided into two risk groups, of which 201 were high risk (≥the optimal cut point) and 217 were low risk (<the optimal cutoff point). The results revealed that OS in the high-risk group was markedly lower than that in the low-risk    Frontiers in Genetics | www.frontiersin.org group ( Figure 4B). The heat map and RS distribution map of seven lncRNA expression values in each sample were drawn as shown in Figure 4C, which showed that the lower the expression level of the seven PALs, the higher the RS and the shorter survival time. Two GEO datasets were utilized to verify the risk model according to the same method described above, and the K-M curve proved the validity of the model constructed by the seven PALs in survival prediction (Figure 4D). Univariate and multivariate Cox regression analyses of risk model, grade, Figo stage, and age for TCGA and GEO datasets demonstrated that the risk model was an independent risk factor for OC patients ( Table 3). The 1-, 3-, and 5-year survival ROC curves predicted by the risk model were drawn (Figures 4E,F). To better predict prognosis at 1-, 3-, and 5-year OS of OC patients, we constructed a nomogram of variables such as the risk score, grade, and Figo stage (Figures 4G,H).
We also found that OC patients with grade III and stage IV OC had higher RSs, while RS was not related to age (Figures 5A-E). As for TMB, OC patients in the low-risk group had lower TMB scores, which indicated that they may be more likely to respond to immunotherapy (Figure 5F).

Construction of a PAL-Associated ceRNA Network
We predicted 19,630 miRNA-mRNA pairs and 129 PAL-miRNA relationship pairs. PALs and mRNAs that were regulated by the same miRNA were screened, and the positively coexpressed PAL-mRNA pairs were combined. Finally, 347 PAL-miRNA-mRNA relationship pairs were obtained, including 5 PALs (TCL6, VLDLR-AS1, RP11-356I2.4, LINC00893, and MIR503HG), 70 miRNAs, and 199 mRNAs (Figure 6). There were 71 PAL-miRNA relationship pairs, 341 miRNA-mRNA relationship pairs, and 242 PAL-mRNA coexpression relationships in the ceRNA network, which could be used to explore the molecular mechanisms involved in the development of OC.

In vitro Assays
Among the five PALs in the ceRNA network we constructed, the functions of lncRNA MIR503HG involved in OC have been investigated (Zhu et al., 2020); hence, the remaining four PALs were preliminarily selected as candidate molecules to perform cell function assays in vitro. Analysis of TCGA dataset (418 OC samples), GSE32063 (40 OC samples), GSE17260 (110 OC samples), and our cohort (33 OC samples) showed that the four PALs were evidently downregulated in OC tissues when compared with normal controls (Figures 7A-P). The expression of the four PALs in three OC cell lines (SKOV-3, HO8910, and A2780) and the normal ovarian epithelial cell line IOSE-80 was detected. As shown in Figures 8A-D, the expression of the four PALs was significantly lower in OC cells than in IOSE-80 cells (p < 0.05).  The K-M survival curves confirmed that higher expression of the four PALs was associated with better OS, which indicated that they may serve as tumor suppressor genes for OC (Supplementary Figure 2). Subsequently, we confirmed that the expression levels of the four PALs increased following plasmid transfection in SKOV-3 and HO-8910 cells (Figures 8E-H).

DISCUSSION
Recent studies have shown that lncRNAs play an important role in regulating the growth, division, metastasis, invasion, proliferation, and drug resistance of OC cells (Braga et al., 2020). Abnormal expression of some lncRNAs in OC may provide important reference information for the diagnosis, treatment, and prognosis of patients (Salamini-Montemurri et al., 2020). However, compared with miRNA, the study of lncRNAs for OC is still in its infancy (Razavi et al., 2021). Therefore, it is necessary to further study lncRNAs in OC. Previous studies have inspired us to explore potential prognosis-associated lncRNAs in OC.
WGCNA is the most representative systems biology algorithm based on transcriptome data to construct gene coexpression networks (Zhang and Horvath, 2005;Langfelder and Horvath, 2008). Using WGCNA, information on gene expression in biological systems can be analyzed quantitatively and at different levels. Although previous studies have applied WGCNA to established an lncRNA-associated signature in malignant tumors (Gong and Ning, 2020;Han P. et al., 2020;Li et al., 2020;Tian et al., 2021;Yuan et al., 2021), research on its application in OC is sparse. In our study, seven stable modules with highly correlated 3,560 genes and correlations with specific clinical factors were clustered. The prognostic risk model based on seven PALs could divide OC patients into two risk groups according to optimal cutoff point, which was validated using two datasets from GEO. After construction of the ceRNA network, four PALs (TCL6, VLDLR-AS1, and LINC00893) in the network were selected for further cell function assays. In the WGCNA analysis, TCL6 was gathered into the brown module, and VLDLR-AS1, RP11-356I2.4, and LINC00893 were gathered into the turquoise module. Therefore, TCL6 may be related to the clinical stage and histological grade of OC. Meanwhile, VLDLR-AS1, RP11-356I2.4, and LINC00893 may be associated with the clinical stage. The KEGG pathway of coexpression analysis showed that three PALs were enriched in several biological functions of OC, such as glutamatergic synapse, inflammatory mediator regulation of TRP channels, and ovarian steroidogenesis, which indicated their potential therapeutic targets.
A previous study revealed that lncRNA miR503HG was downregulated in OC, and downregulation of miR503HG predicted poor survival of OC patients (Zhu et al., 2020), which coincided with miR503HG in our prognostic risk model signature. Coincidentally, the expression of MIR503HG was decreased in colon cancer (Chuo et al., 2019;, triple-negative breast cancer (Fu et al., 2019;Tuluhong et al., 2020), non-small cell lung cancer (Lin et al., 2019;Dao et al., 2020;Xu et al., 2020), cervical squamous cell carcinoma , bladder cancer (Qiu et al., 2019), and hepatocellular carcinoma (Wang et al., 2018). MIR503HG was also proved to serve as a tumor suppressor in in vivo experiment. TCL6 had been demonstrated to be a potential tumor suppressor in breast cancer , hepatocellular carcinoma (Luo et al., 2020), renal cell carcinoma (Yang et al., 2018;Kulkarni et al., 2021), and B-cell acute lymphoblastic leukemia (Cuadros et al., 2019). Overexpression of TCL6 in corresponding cancer cell lines impairs their oncogenic functions, such as cell proliferation and migration/invasion. RP11-356I2.4 (also known as WAKMAR2, lnc-TNFAIP3, or LOC100130476) has been shown to act as a tumor suppressor gene in esophageal cancer and gastric cardia adenocarcinoma. In addition, upregulation of RP11-356I2.4 led to the inhibition of proliferation and invasiveness of cancer cells (Guo et al., 2016a,b). LINC00893 was lowly expressed in thyroid carcinoma tissues and papillary thyroid cancer (PTC) cells. Furthermore, LINC00893 overexpression abrogated the proliferation and migration abilities of PTC cells . In our prognostic risk model signature, lncRNA TCL6, RP11-356I2.4, and LINC00893 were all downregulated in OC cells and tissues. In addition, downregulation of these genes indicated poor OS in patients with OC. Through gain-offunction assays, we determined that the overexpression of the three PALs restrained the proliferation and migration abilities of OC cells, which would fill the gap in their study in OC. Hence, these three molecules are also tumor suppressor genes for OC, suggesting their potential as biomarkers and therapeutic targets. Previous studies and our cell function assays further illustrate the accuracy of our risk score model. However, more in-depth molecular mechanisms are yet to be studied by subsequent researchers.
Although lncRNA VLDLR-AS1 was downregulated and downregulation of VLDLR-AS1 indicated poor survival in OC, VLDLR-AS1 seemed to be an oncogene in terms of OC cellular function. Previous studies showed that VLDLR-AS1 is highly expressed in esophageal squamous cell carcinoma  and hepatocellular carcinoma (HCC) (Yang et al., 2017). Knocking down the expression of VLDLR-AS1 inhibited the proliferation of HCC cells, which was consistent with our cell function assays. The differences in VLDLR-AS1 expression were not consistent with the cell function experiment in OC, which is worthy of further study by subsequent researchers.
There are some limitations to our study. First, there were only 33 OC patients without OS in our cohort; hence, more time and more samples are needed for follow-up. Second, the cell function assays of four candidate lncRNAs were preliminary, which requires further investigation to provide a better understanding.

CONCLUSIONS
In summary, we performed weighted gene coexpression analysis on differentially expressed genes obtained from datasets to screen for modules related to clinical phenotypes and established a seven-PAL-based signature with a prognostic value for OC, which could stratify OC patients into two risk groups with significant differences in prognosis. Two additional datasets were used to verify the accuracy of the model. Meanwhile, four candidate PALs were selected to perform cell function assays, which need further studies of subsequent researchers.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Affiliated Hangzhou Hospital of Nanjing Medical University. The patients/participants provided their written informed consent to participate in this study.