Gene Expression Profiles Identified Novel Urine Biomarkers for Diagnosis and Prognosis of High-Grade Bladder Urothelial Carcinoma

Bladder urothelial carcinoma (BC) has been identified as one of the most common malignant neoplasm worldwide. High-grade bladder urothelial carcinoma (HGBC) is aggressive with a high risk of recurrence, progression, metastasis, and poor prognosis. Therefore, HGBC clinical management is still a challenge. We performed the present study to seek new urine biomarkers for HGBC and investigate how they promote HGBC progression and thus affect the prognosis based on large-scale sequencing data. We identified the overlapped differentially expressed genes (DEGs) by combining GSE68020 and The Cancer Genome Atlas (TCGA) datasets. Subsequent receiver operating characteristic (ROC) curves, Kaplan-Meier (KM) curves, and Cox regression were conducted to test the diagnostic and prognostic role of the hub genes. Chi-square test and logistic regression were carried out to analyze the associations between clinicopathologic characteristics and the hub genes. Ultimately, we performed gene set enrichment analysis (GSEA), protein-protein interaction (PPI) networks, and Bayesian networks (BNs) to explore the underlying mechanisms by which ECM1, CRYAB, CGNL1, and GPX3 are involved in tumor progression. Immunohistochemistry based on The Human Protein Atlas and quantitative real-time polymerase chain reaction based on urine samples confirmed the downregulation and diagnostic values of the hub genes in HGBC. In conclusion, our study indicated that CRYAB, CGNL1, ECM1, and GPX3 are potential urine biomarkers of HGBC. These four novel urine biomarkers will have attractive applications to provide new diagnostic methods, prognostic predictors and treatment targets for HGBC, which could improve the prognosis of HGBC patients, if validated by further experiments and larger prospective clinical trials.


INTRODUCTION
Bladder urothelial carcinoma (BC) has been identified as the ninth most common malignant neoplasm all over the world (1,2). More than 199,000 people died of it and over 549,000 cases were newly diagnosed in 2018 (1,2). Although the age standardized incidence and number of deaths are decreasing in the past 20 years, the number of BC incident cases is growing globally and the BC burden may ascend in the future as a result of aged tendency of population and polluted environment (3,4). BC is classified as high-grade bladder urothelial carcinoma (HGBC) and low-grade bladder urothelial carcinoma (LGBC) based on how cancer cells histologically differ from normal bladder cells (5). HGBC is aggressive and has a high risk of recurrence, progression, metastasis and poor prognosis, while LGBC is a kind of tumor with low malignancy and comparatively good prognosis (5). In addition, treatments for HGBC and LGBC are quite different. HGBC patients should receive radical cystectomy with or without postoperative chemotherapy; LGBC patients are most commonly treated with transurethral resection of bladder tumor (6,7). Hence, an early and accurate diagnosis of BC, particularly differential diagnosis between HGBC and LGBC, is a critical factor for clinical management of BC.
At present, cystoscopy and urine cytology are commonly acknowledged as the gold standard methods for BC diagnosis (8). However, cystoscopy may sometimes miss HGBC, particularly carcinoma in situ (CIS). Besides, as an invasive method, cystoscopy may cause damage to surrounding organs and even lead to tumor metastasis caused by improper human operation (9,10). Although urine cytology is a non-invasive examination, it is costly with poor sensitivity and specificity. What's more, urine cytology is subjective and varies from different pathologists' experience. The above factors contribute to the challenges and high cost associated with BC clinical management.
Recently, many urine-based tests have been carried out to explore potential biomarkers for HGBC. However, most of these urine biomarkers lack of enough sensitivity and specificity and should be used alongside cystoscopy (11). Besides, very few of these biomarkers could be utilized to predict tumor progression, metastasis and prognosis or served as potential therapeutic targets. Therefore, powerful urine biomarkers are still required to improve the diagnosis and prognosis of HGBC.
As a consequence, we conducted a series of analyses based on gene expression profile of high-throughput sequencing data obtained from Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) in order to seek potential urine biomarkers for HGBC. In the present study, we first identified the key differentially expressed genes (DEGs) by combining GEO and TCGA datasets. Then we found that ECM1 (extracellular matrix protein 1), CRYAB (alpha B-crystallin), CGNL1 (cingulin-like 1), and GPX3 (glutathione peroxidase 3) are correlated with diagnosis, progression, metastasis and prognosis of HGBC. Ultimately, we performed gene set enrichment analysis (GSEA), protein-protein interaction (PPI) networks and Bayesian networks (BNs) to explore the underlying mechanisms by which the four hub genes are involved in tumor progression. Immunohistochemistry based on The Human Protein Atlas (THPA) and quantitative real-time polymerase chain reaction (qRT-PCR) based on urine samples were utilized to validate the hub genes and their diagnostic values. In summary, this study indicated that ECM1, CRYAB, CGNL1, and GPX3 could be served as new diagnostic and prognostic urine biomarkers for HGBC.

RNA Data Processing and Identification of Differentially Expressed Genes
We mainly used R software (v.3.5.3 and v.3.4.4: http://www.rproject.org) to analyze and deal with RNA data. To identify DEGs in GSE68020 and TCGA BLCA datasets between BC patients and non-tumor healthy controls, we utilized limma R package (12). The cut-off criteria of adjusted P-value (adj. P-value) was set as 0.05 and the criterion of Fold change was set as |logFC| ≥ 1. For the identified DEGs from GSE68020 and TCGA BLCA datasets, we generated volcano plots using limma R package. For DEGs from GSE68020, we generated a heat map using pheatmap R package.

Receiver Operating Characteristic Curves for Diagnostic Value
To measure the diagnostic values of the 5 hub genes for HGBC, receiver operating characteristic (ROC) curves were plotted and area under the curve (AUC) values were also calculated. Statistical analysis was performed with GraphPad Prism 7.0. P < 0.05 was considered as statistically significant difference.

Survival and Statistical Analysis
Based on TCGA BLCA dataset, univariate and multivariate Cox regression, Kaplan-Meier (KM) method and log-rank test were used to compare the influence of expression levels of the 5 hub genes on overall survival (OS) along with other clinical characteristics. Clinical characteristics included Union for International Cancer Control (UICC) stage, histological grade, pathological T (pT) stage, pathological N (pN) stage, pathological M (pM) stage, age and gender. We utilized survival and survminer R packages to perform these analyses. What' more, we also used Gene Expression Profiling Interactive Analysis (GEPIA) (http://gepia.cancer-pku.cn/) for further calculating disease free survival (DFS) with the 5 hub genes on the basis of TCGA BLCA dataset (13). The correlations between clinicopathologic characteristics and expression of hub genes were analyzed with the chi-square test and logistic regression. The cut-off values of the 5 hub genes expression were determined by their median values. P < 0.05 was considered as statistically significant difference.

Gene Set Enrichment Analysis
GSEA is a computational method that assesses whether a priori defined a set of genes shows statistically significant, concordant differences between two biological states (14). In the present study, GSEA firstly generated an ordered list of all genes according to their correlation with expression of hub genes, GSEA was carried out to elucidate the significant survival difference observed between high expression and low expression groups. Gene set permutations were performed 1,000 times for each analysis. The expression level of hub genes was used as a phenotype label. To illustrate the roles of ECM1, CRYAB, CGNL1, and GPX3, we carried out GSEA to analyze the enrichment of HGBC patients in TCGA BLCA dataset. False discovery rate (FDR) < 25% and nominal P < 0.05 were regarded as the cut-off criteria of sorting Gene Ontology (GO) functional enrichment and KEGG pathway enrichment.

Protein-Protein Interactions Network and Module Analysis
To better understand the metabolism and molecular mechanisms of carcinoma, the functional interactions between proteins become necessary. String online server (version 11.0: http:// string-db.org/) was designed and adopted to collect and integrate the information by consolidating known and predicted proteinprotein association data for a large number of organisms (15). ECM1, CRYAB, CGNL1 and GPX3 were, respectively, put into the tool to construct and visualize the PPI networks about each protein. Interaction score of 0.400 was set as the threshold. Besides, Cytoscape software (Cytoscape_v.3.6.1) was applied to plot the PPI networks.

Construction of Bayesian Networks
In order to further clarify the roles of the CRYAB, ECM1, GPX3, and CGNL1 with HGBC, we constructed BNs to dissect the complex regulatory relationships among the four hub genes. BN is a graphical model that encodes probabilistic relationships among variables of interest (16). In the present study, we allowed these items as the nodes fed into the BNs: histological grade, UICC stage, pN stage and expression of the four hub genes based on TCGA BLCA dataset. Hence, we constructed three BNs and the nodes were described as follows: (1) BN1: BC histological grade+ CRYAB + ECM1 + GPX3 + CGNL1; (2) BN2: BC UICC stage + CRYAB + ECM1 + GPX3 + CGNL1; and (3) BN3: BC pN + CRYAB + ECM1 + GPX3 + CGNL1.
The conditional likelihood of the variables given their parents is represented in a BN by using Gaussian conditional densities. Under the assumption of parameter independence, an initial BN structure is learned from the training data. From this initial network, greedy search algorithm with random restarts is performed to get the highest score posterior network to avoid local maxima. Finally, an optimized BN that maximizes the Bayesian factor is obtained using heuristic search of the network space in a specified domain. The three BNs were carried out using deal R package.

Immunohistochemistry From the Human Protein Atlas
Immunohistochemistry was obtained from The Human Protein Atlas (THPA) (http://www.proteinatlas.org/) (17). THPA is a Swedish-based program initiated in 2003 with the aim to map all the human proteins in cells, tissues and organs using integration of various omics technologies, including antibody-based imaging, mass spectrometry-based proteomics, transcriptomics and systems biology. All the data in the knowledge resource is open access to allow researchers to freely access the data for exploration of the human proteome. The Tissue Atlas and Pathology Atlas showed the distribution of the proteins across all major tissues, organs and tumors in the human body.
Differences of immunohistochemistry between normal bladder tissues and HGBC tissues were compared with Mann-Whitney U test and Fisher's Exact test. P < 0.05 was considered as statistically significant difference. Detailed characteristics of immunohistochemistry data are in Supplementary Table 1.

Urine Samples in Tianjin Validation Cohort, RNA Extraction and Quantitative Real-Time Polymerase Chain Reaction (qRT-PCR)
The study was approved by the Ethics Committee of Tianjin Medical University General Hospital. All recruited participants volunteered to participate and signed informed consent before being enrolled in our study.
A total of 30 patients who were pathologically and clinically diagnosed with BC were enrolled from Tianjin Medical University General Hospital. None of the BC patients had received any surgery, chemotherapy or radiotherapy before collecting urine samples. Clinical and pathological data of patients including age, gender, tumor UICC stage and histological grade were recorded. We also enrolled 30 healthy 1 | Primer sequences used to amplify target genes by quantitative real-time polymerase chain reaction (qRT-PCR).

Gene name
Primer sequences controls matched by age and sex. Urine was collected from the healthy individuals to be used as the healthy control specimens. All the 30 BC patients and 30 controls (Tianjin validation cohort) are Asians. A single and naturally voided midstream urine sample was obtained from all recruited participants. Approximately 50 ml of urine was collected and put on ice immediately, then the samples were centrifuged as soon as possible (not later than 1 h later) at 3,000 rpm for 7 min at 4 • C. Total RNA from urine samples were extracted using RNeasy kit (Qiagen, Valencia, CA). The first chain of cDNA was synthesized by reverse transcription with TaqMan R Reverse Transcription Reagents (Applied Biosystems, Grand Island, NY). GAPDH was used as internal control. The sequences of the primers were displayed in Table 1. qRT-PCR was performed using the CFX96 Touch PCR system (Bio-Rad). The relative mRNA expressions of CRYAB, ECM1, GPX3, CGNL1, and CRNN were calculated by 2 − Ct method. In addition, ROC curves were plotted and AUC values were calculated based on the qRT-PCR results by GraphPad Prism 7.0. P < 0.05 was considered as statistically significant difference. Figure 1A.

Identification of Differentially Expressed Genes
The GSE68020 dataset was processed with limma R package. According to the criteria mentioned above, a total of 17 DEGs including 5 upregulated and 12 downregulated genes were selected for further analyses as shown in the volcano plot and heat map (Figures 1B,D).
The TCGA BLCA dataset was also analyzed with limma R package. After differential expression analysis, 1617 DEGs were screened out to meet the requirements, among which 536 were upregulated and 1,081 were downregulated ( Figure 1C).
To validate the reliability of DEGs, we adopted Venn diagram to obtain overlapped DEGs of the two datasets. Ultimately, 5 DEGs including CRYAB, ECM1, CGNL1, GPX3, and CRNN were confirmed to be appeared in both datasets as shown in Venn diagram ( Figure 1E). All of the 5 hub genes were downregulated genes.

Survival Analysis of Hub Genes in TCGA BLCA Dataset
To explore whether the 5 hub genes are associated with BC and HGBC survival time, we utilized log-rank test and drew KM curves. As shown in Figures 3A,B, we identified that higher expression levels of CRYAB and ECM1 are associated with worse OS time of BC and HGBC (P < 0.05), while GPX3, CGNL1 and CRNN can't influence the OS time. Furthermore, higher CRYAB expression can also lead to a poor DFS time of BC (P < 0.05) ( Figure 3C).

Correlations Between Expression Levels of the 5 Hub Genes and Clinical Outcomes in TCGA BLCA Dataset
To ensure whether expression levels of the 5 hub genes may influence the clinical outcomes of BC, we performed chi-square test and logistic regression. As shown in Figure 4 and Table 2, higher expression levels of CRYAB and ECM1 are observed in HGBC and advanced UICC stage (stage III and stage IV) BC (P < 0.05). In addition, higher expression levels of CRYAB, ECM1 and CGNL1 may predict lymph node metastasis of BC (P < 0.05). However, higher GPX3 expression level is an indicator for early UICC stage (stage I) (P < 0.05).
In order to further confirm the prognostic value of the 5 hub genes, we performed univariate and multivariate Cox regression to calculate hazard ratios (HRs). Among 411 BC samples from TCGA BLCA dataset, 165 BC samples were enrolled in Cox regression analyses since they contained a record of complete information of UICC stage, histological grade, pT stage, pN stage, pM stage, age and gender.   Table 3).

GSEA Revealed Biological Function of Hub Genes in BC (GO and KEGG Pathway Analysis)
To explore the underlying mechanisms by which CRYAB, ECM1, GPX3, and CGNL1 are involved in BC progression, GSEA was carried out between high expression and low expression groups on the basis of TCGA BLCA dataset. Both KEGG pathway analysis and GO functional enrichment were performed.
We identified pathways that are differentially activated in HGBC. Upregulation of CRYAB, ECM1, GPX3, and CGNL1 were enriched in pathways which are vital in tumorigenesis and progression, such as vascular endothelial growth factor (VEGF), TGF-β (transforming growth factor-β), Wnt and MAPK signaling pathways. Downregulation of the four genes can have effects on spliceosome (Figures 5A,C,E,G).
GO functional enrichment was also conducted and we found that the CRYAB, ECM1, GPX3, and CGNL1 are enriched in biological process (BP) including extracellular matrix structural constituent, glycosaminoglycan binding and cytokine binding. With regard to molecular function (MF), they are enriched in collagen containing extracellular matrix and collagen trimer. As for cell component (CC) analysis, they are located in extracellular structure organization and regulation of vasculature development. Based on the 3 genes, the top five significant GO terms for BP, CC, and MF are shown in Figures 5B,D,F,H.

PPI Network Analyses and Module Functional Enrichment
To further detect the interaction between the proteins encoded by CRYAB, ECM1, GPX3, and CGNL1, the four genes were put into String separately. Based on the String database, we constructed four PPI network modules in Figure 6. Besides, functional enrichments of the four modules were shown in Table 4.

Construction of Bayesian Networks
We constructed three BNs as described in Materials and methods section. Since chi-square test and logistic regression show that CRYAB, ECM1, CGNL1, and GPX3 are significantly associated with histological grade, UICC stage and pN stage (P < 0.05), we combined the four hub genes with histological grade, UICC stage  and pN stage, respectively, to construct the three BNs in Figure 7. In each of BNs, an edge specified as node1→ node2 means that node2 is a direct cause of node1.
From Figure 7, we identified that CRYAB, ECM1, CGNL1, and GPX3 are all direct factors of BC histological grade, UICC stage and pN stage. This discovery is consistent with

Immunohistochemistry From the Human Protein Atlas
To further validate our above findings, we evaluated expression levels of CRYAB, ECM1, GPX3, CGNL1 and CRNN between normal bladder tissues and HGBC tissues based on immunohistochemistry from THPA. As shown in Figure 8A, Mann-Whitney U test suggested that normal bladder tissues have higher staining scores of GPX3 (P = 0.0222) and ECM1 (P = 0.0021) than HGBC tissues. However, there is no difference for the staining scores of CRYAB, CGNL1 and CRNN between the two groups (P > 0.05).
In addition, we divided the expression into four levels: negative (-), weakly positive (+), positive (++) and strongly positive (+ + +). Distributions of the four expression levels for each gene were demonstrated in Figure 8B. Fisher's Exact test showed that normal bladder tissues have higher expression level of GPX3 (P = 0.016) and ECM1 (P = 0.001) than HGBC tissues, while no differences were found for expression levels of CRYAB, CGNL1 and CRNN (P > 0.05) ( Table 5).
The results of immunohistochemistry confirmed that GPX3 and ECM1 are differentially expressed between HGBC tissues and normal bladder tissues, which is consistent with the results of TCGA BLCA dataset. Figure 8C showed the expression levels of the 5 genes in TCGA BLCA dataset.

Expression of GPX3, ECM1, CRYAB, CGNL1, and CRNN in Tianjin Validation Cohort
We recruited 30 BC patients and 30 controls from Tianjin Medical University General Hospital for further validation. Clinical characteristics of enrolled BC patients and controls in Tianjin validation cohort are displayed in Table 6. The chi-square test revealed that the patients and controls are matched for age (P = 0.602) and gender (P = 0.438). Among the 30 BC patients, 13 were LGBC patients and 17 were HGBC.
To investigate and confirm whether the five genes were detectable and altered in urine samples of BC patients compared with healthy controls, we performed qRT-PCR to detect the expression levels of GPX3, ECM1, CRYAB, CGNL1 and CRNN at mRNA level, respectively. As shown in Figure 9A, the relative expressions of GPX3, ECM1, CRYAB, and CGNL1 are significantly lower in urine of HGBC patients than in controls (P < 0.05), while no difference were revealed in CRNN expression (P > 0.05). Figure 9B displayed the ROC curves performed to investigate the diagnostic value of the five genes for HGBC. The results suggested that expressions of GPX3 (AUC = 0.8794, P = 0.0001), ECM1 (AUC = 0.9794, P < 0.0001), CRYAB (AUC = 0.9216, P < 0.0001), and CGNL1 (AUC = 0.9765, P < 0.0001) have good predictive power for diagnosis of HGBC, indicating that they may be used as an urine biomarker for HGBC. Figure 9C showed the ROC curves conducted to evaluate the predictive value of differential diagnosis between HGBC and LGBC. We identified that GPX3 (AUC = 0.7692, P = 0.0128), ECM1 (AUC = 0.7330, P = 0.0312), CRYAB (AUC = 0.9457, P < 0.0001), and CGNL1 (AUC = 0.8190, P = 0.0032) can be applied in differential diagnosis between HGBC and LGBC, while CRNN (AUC = 0.5136, P = 0.9001) can't.
The results of qRT-PCR confirmed that GPX3, ECM1, CRYAB and CGNL1 are lower in urine of HGBC patients than controls, which is consistent with the results of GSE68020 dataset. Figure 9D showed the expression levels of the 5 genes in GSE68020 dataset.

DISCUSSION
Accumulating evidence suggests that bioinformatics analysis would be an effective method to find novel molecular biomarkers in early diagnosis, therapeutic process monitoring and prognostic evaluation of cancer (18). Although previous investigations have identified various biomarkers for BC, most of biomarkers have not been applied in clinical practice for their inconsistent performance in terms of specificity and/or sensitivity (19). Besides, very few of the studies have focused on biomarkers for HGBC. In the present study, TCGA BLCA dataset, a large-scale prospective cohort research, and GSE68020 dataset from GEO were exploited in order to explore potential urine biomarkers for HGBC.
Our findings indicated that CRYAB, ECM1, CGNL1, and GPX3 are effective urine biomarkers for HGBC diagnosis, of which CRYAB, ECM1 and GPX3 are also urine biomarkers  for differential diagnosis between HGBC and LGBC. Besides, CRYAB, ECM1, and GPX3 are potential urine prognostic factors for HGBC; ECM1 and GPX3 might be considered as independent prognostic indicators for HGBC. According to clinicopathologic characteristics, we identified that CRYAB, ECM1, GPX3, and CGNL1 may predict histological grade, UICC stage and lymph node metastasis. In order to further validate these findings, we extracted immunohistochemistry of normal bladder tissues and HGBC tissues for these hub genes from THPA. In addition, we also performed qRT-PCR of these hub genes based on the urine samples from 30 BC patients and 30 controls in Tianjin validation cohort. The results confirmed the different expression levels of CRYAB, ECM1, CGNL1, and GPX3 between HGBC patients and controls, and their diagnostic values were also proved. The above findings could provide new diagnostic methods, prognostic predictor and treatment targets for HGBC, which could improve the prognosis of HGBC patients.
Till now, the role of CRYAB in BC has not been reported. It is the first time that our study found CRYAB plays a vital role in diagnosis, metastasis and prognosis of HGBC patients. Both OS and DFS are worse in cases with lower CRYAB expression. CRYAB could enhance tumorigenesis by regulating the VEGF and conferring anti-VEGF resistance in breast cancer (20,21). In addition, CRYAB participates in anti-apoptosis through activating the Akt signaling pathway, enhancing PI3K activity and inhibiting calcium-activated Raf/MEK/ERK signaling pathway (22)(23)(24). As a consequence, we hypothesized that CRYAB promotes tumorigenesis and resist cell apoptosis of HGBC via these signaling pathways. Subsequent GSEA analysis identified that CRYAB is associated with B cell receptor, T cell receptor, VEGF, MAPK, Wnt, and TGF-β signaling pathways, which supports our hypothesis and previous studies.
Previous investigation identified that upregulated ECM1 is associated with BC growth, migration, apoptosis and postoperative recurrence, which was in agreement with our results (25). However, the biological function of ECM1 in different tumors remains controversial. Wang' s study indicated that ECM1 might enhance cell proliferation and invasiveness by regulating the expression of glucose transporter 1, lactate dehydrogenase and hypoxia-inducible factor 1α (25). There is also evidence that ECM1 potentiates the phosphorylation of epidermal growth factor receptor (EGFR) and extracellular signal-regulated kinase through physical interaction with EGFR and activation of EGFR signaling in breast cancer development (26). Besides, studies based on pancreatic ductal adenocarcinoma suggested that increased ECM1 expression abrogated the antitumor effect exerted by miR-23a-5p (27). The present study indicated that ECM1 is as an independent prognostic indicator for HGBC and high ECM1 expression can also predict lymph node metastasis. Both GSEA and functional enrichments of  ECM1 module showed that ECM1 expression is related to cell adhesion, extracellular matrix structural constituent and extracellular structure organization, which may be used to explain the metastasis of HGBC. GPX3 is a member of a family of selenoproteins with vital antioxidant roles (28). It is reported that GPX3 is related to many malignancies including including head and neck, ovarian, and colon tumors (29,30). Hypermethylation of the GPX3 promoter reduces GPX3 expression (31,32). Furthermore, decreased GPX3 expression could inhibit clonogenicity and anchorageindependent cell survival in ovarian cancer progression (33). In addition, interactions between GPX3 and the p53-inducible gene 3 (PIG3) protein leads to activation of the apoptosis in prostate cancer cells (34). A retrospective study based on 40 BC patients reported that high GPX3 expression level in plasma might be predictive indicator for BC diagnosis and recurrence after transurethral resection (35). However, based on 405 BC samples, our results demonstrated that higher GPX3 expression level may predict an early UICC stage and better prognosis. Thus, the exact biological role of GPX3 and its potential mechanism for the progression and recurrence of BC are still unclear. Although our study contained a sufficient capacity, studies based on cells and a larger sample size are required to explore the relationship between GPX3 and BC. Functional enrichments showed that GPX3 plays a role in cellular detoxification and glutathione binding and metabolism. GSEA revealed that upregulated expression of GPX3 may act on extracellular matrix structure and positive regulation of vasculature development. We assumed that GPX3 is involved in the progression and recurrence of HGBC by participating in toxic metabolic process.
CGNL1, an endothelial junction complex protein, promotes GTPase mediated angiogenesis by strengthening adherens junctions via Rac1 activation, which further makes new blood vessels stable and extendable (36). What's more, CGNL1 is also involved in cell-cell junction assembly through regulating the activity of GTPases and Rac (37). Previous studies demonstrated that CGNL1 gene expression is associated with endometrial cancer survival (38). Our results showed that CGNL1 is a diagnostic factor for HGBC and can predict lymph node metastasis. GSEA and functional enrichments showed that CGNL1 may participate in HGBC progression by regulating cell-cell junction, tight junction, cytoskeletal protein binding, tropomyosin binding and growth factor binding, which confirms the findings of previous studies. However, further in vitro and in vivo studies are warranted to validate these mechanisms in BC.
Based on the networks of Bayesian analysis, we observed that the interaction of CRYAB and CGNL1 plays a key role in histological grade, UICC stage and pN stage of BC. Hence, we put CRYAB and CGNL1 into String online server to find the underlying mechanism of interaction. Functional enrichments showed that the two genes may have a combined effect on actin cytoskeleton (GO: 0015629, P = 0.039). In addition, GSEA of KEGG pathway analysis revealed that both genes are enriched in VEGF, MAPK, Wnt, and TGF-β signaling pathways.
Lymph node metastasis is a key indicator to predict poor prognosis of BC (39). In the present study, multivariate Cox regression showed that lymph node metastasis is an independent poor prognostic indicator of OS, which is consistent with previous research.
In the meantime, several limitations remained in our research. Firstly, although immunohistochemistry based on THPA and qRT-PCR based on urine samples confirmed the downregulation of CRYAB, ECM1, CGNL1, and GPX3 in HGBC and their diagnostic values, the exact molecular mechanisms of the these hub genes have not been investigated in the present  other racial types or regions are still required to verify these hypotheses and to make these results more convincible in the future.

CONCLUSIONS
In general, our findings indicated that CRYAB, CGNL1, ECM1, and GPX3 are potential urine biomarkers of HGBC. All the four genes have the capability to be diagnostic indicators for HGBC. Furthermore, CRYAB, ECM1, and GPX3 are potential urine prognostic factors for HGBC, among which ECM1 and GPX3 might be considered as independent prognostic indicators for HGBC and new treatment targets as well. The four genes can also predict histological grade, UICC stage and lymph node metastasis. Immunohistochemistry and qRT-PCR were used to confirm the downregulation of the hub genes and their diagnostic values in HGBC. Among the four hub genes, CRYAB and CGNL1 have not been reported the relationship with HGBC before and the results of Bayesian analysis suggested that the interaction of CRYAB and CGNL1 plays a key role in HGBC.
In addition, we used bioinformatics methods to explore the underlying mechanisms. These four novel urine biomarkers will have attractive applications to provide new diagnostic methods, prognostic predictors and treatment targets for HGBC, which could improve the prognosis of HGBC patients, if validated by further experiments and larger prospective clinical trials.